Purge the junk from Linguee please!

One extremely useful tool that translators have begun using in recent years is Linguee. In bygone years, to find out how a term had been translated on a bilingual website we would usually have to open the page in one language, then find the same page in our target language (which on some websites was not easy) and try to find the same sentence.

Linguee made this process easier by searching through bilingual websites and displaying sentences containing our term in one column, and the translations of those sentences in another column (see image below).

Understandably, many mediocre translations end up in the database, and we can forgive Linguee for search results that include, for instance, poor translations from a European Union webpage, since Linguee will just add all European Union texts.

However, it would be nice if Linguee could start purging out some of the websites whose translations are consistently poor, if not nonsensical. What is particularly frustrating for translators is that some of these websites seem to be favoured by the search results.

Two such websites are jordipujol.cat (a site launched by now-disgraced former Catalan president Jordi Pujol) and solicitormarbella.com.


The above results were all from a single search for “urbanizable”. Clearly the English texts are machine translations. Not only that, but the result from jordipujol.cat contains a machine translation of a machine translation. Nobody is discussing developing property on the sun! Rather, “sol urbanizable” is a mistranslation of the Catalan “sòl urbanitzable”, probably as a result of the Catalan author writing “sol” (meaning sun, also “sol” in Spanish) instead of “sòl” (meaning land, “suelo” in Spanish).

I have just sent a message to Linguee asking for the two inactive websites to be removed from Linguee. It will be interesting to see whether they are removed.


What’s the offside rule?

The offside rule is explained in the Wikipedia article about the rule.

The purpose of this article is to explain why you should ignore Google’s advice on the TV advert recommending that you type “What’s the offside rule” into Google.

Let’s start by considering what happens if we place the search in quotation marks. When I perform this search, Google tells me there are 23,700 results (your results may vary), but there are actually only four pages of results, so there are fewer than 40 hits. The first result is a Yahoo answers page – hardly a reliable source! Since we are searching for sites with the question “What’s the offside rule?”, it’s hardly surprising that many of the results are forums where a user has asked the question and other users have answered.

The TV advert, however, shows the search without quotation marks. When we search in Google without quotation marks, we are looking for each individual word anywhere on the page. So, “offside” and “rule” do not necessarily appear next to each other. This means that you could have an article in which the sentence “the offside wheel” appears in one part of the text and “the insurance company will rule on the matter in another”. Consequently, some of the results may be talking about something completely different.

Of course, this doesn’t really matter. All the hits on my first page of results discuss the offside rule in association football (aka soccer).

But I still think it is bad advice, because it is a complete waste of time to include the words “what’s the” in the search. For starters, we don’t need a page where somebody asks the question; we just want a page where it is explained. And second, since we are not using quotation marks, the words may appear anywhere on the page. If we were adding a word like “rugby”, then this would make sense, since if the word “rugby” appears anywhere on the page, it increases the likelihood of that page being about one of the rugby codes. However, by adding the word “what’s” we would merely eliminating some pages that may be of interest to us, since you can easily write an explanation of the offside rule without using the word “what’s”. What actually happens is that Google will ignore the word altogether, since they know that it adds nothing to a search.

Meanwhile, the word “the” is never useful in a Google search unless it appears as part of a phrase in quotation marks, because any text written in English will always contain the word “the” somewhere in the text. Like “what’s”, Google will actually just ignore the word “the” in your search.

So what’s the best search strategy

I believe the ideal search would be to write “offside rule” in quotation marks, followed either by “association football” in quotation marks or “football” or “soccer” without quotation marks.

However, adding the quotation marks takes time, especially on a mobile phone. As a translator it is often important for me to refine my searches, but very often a quick-and-dirty solution is good enough.

In this instance, the quick-and-dirty solution would be simply to search for the words “offside” and “rule” without any quotation marks. Since the results that come up will be about association football, you would also add the name of the sport if you want to know another sport’s offside rule.


When searching in Google, stick to the keywords, and don’t bother including articles (the, a, an), conjunctions (and, with, but) and question words (what, why, how). You only ever need to include these words if you use quotation marks. So, you might want to search for “Dennis the Menace” in quotation marks, but if you’re not going to bother with quotation marks, you may as well leave out the word “the”.

You’d have thought Google would know better than to recommend including useless words in a Google search!