Posts

The Google Code

Last week, the Internet lit up with the news that a “secret code” had been discovered in Google Translate. But was it really a secret message, or just another bad translation?

Much of the time, Google Translate will provide an imperfect but serviceable translation. However, sometimes it comes up with automatically generated translations that are so bad, they seem uncanny.

The story of the “secret code” was originally published on the Krebs On Security blog.  A few months back, researchers from a couple of different security firms approached computer security reporter Bryan Krebs with an intriguing discovery: putting the traditional “Lorem Ipsum” placeholder text into Google Translate yielded some very strange, politically tinged results. For example, Google translated “lorem ipsum” without capital letters as “China.” “Lorem Ipsum”, capitalized, produced “NATO.”  Check out his blog post for the entire list of seemingly-not-quite-random translations.

The researchers wondered if, perhaps, they had stumbled upon a secret code. Was it used by spies? Activists? Hackers? Perhaps it was meant to be a tunnel through China’s “Great Firewall.”  The truth is out there…but it will be a lot more difficult to uncover it now that Google has fixed the translations, which it did almost immediately after being notified of the issue.

Unfortunately, the most likely explanation is also the most mundane…it’s simply a bad machine translation caused by inadequate, poor quality data.

As ZDNet explained, because lorem ipsum is used as a placeholder,

“[T]here are millions of examples but very few actual translations of them; instead, the placeholder text will get matched up with documents that just look similar to the algorithm but aren’t actually connected. That would explain why you got different translations if you capitalised the words differently or duplicated them, resulting in translations like China, the Internet, NATO, the Company, China’s Internet, Business on the Internet, Home Business, Russia might be suffering, he is a smart consumer, the main focus of China, department and exam. Those are all common phrases – and you might recognise some of them from spammy web sites promising thousands of dollars for working from home or offering you answers to exam questions.”

Additionally, the standard lorem ipsum text is only one step above gibberish, anyway.

For her part, Kraeh3n, the researcher who discovered the “code,” told Krebs that she doesn’t believe it’s random:

“Translate [is] designed to be able to evolve and to learn from crowd-sourced input to reflect adaptations in language use over time,” Kraeh3n said. “Someone out there learned to game that ability and use an obscure piece of text no one in their right mind would ever type in to create totally random alternate meanings that could, potentially, be used to transmit messages covertly.

Meanwhile, TechCrunch is reporting that the odd translations were part of 1o57’s Defcon Badge puzzle.

What do you think?

Photo Credit: Attribution Some rights reserved by pkwahme

Newspaper Discovers Limits of Google Translate

In the United States, Spanish-speaking Latinos are a rapidly growing demographic. Naturally, some news organizations cater to them with Spanish-language editions, especially online.

However, according to Fox News, when the Hartford Courant decided to follow suit, they did not hire a translator, choosing instead to run all of their articles through Google Translate.

The results were about what you’d expect: embarrassing.

Former Hartford Courant columnist Bessy Reyna collected some of the most ridiculous examples of poor translation on her blog. Here are a couple of the juiciest nuggets of failure on display:

  • ”El hombre florero Over Head Smashed novia, policía dice” Literal translation: “The man flower vase Over Head Smashed Girlfriend, police said”
  • Este mujer Hartford acusado de apuñalar con el hombrepelador de patatas” which literally reads: “This woman Hartford Accused of stabbing the man with potato peeler.”

To address the criticism, the paper issued the following disclaimer:

“However, readers should be aware that due to limitations in the Google software some of the translations of the English headlines and articles don’t always translate accurately word-for-word into Spanish.”

Duh. On one level, it’s understandable that a local paper might not have the resources to devote to hiring a full-time Spanish translator. However, simply plugging all of their content into Google Translate appears to be counterproductive. According to Bessy Reyna, Latinos perceived the error-ridden translations as insulting, even offensive:

“Their reactions ranged from “This isn’t even Spanglish” to “Did you see the one today about Norwich? It’s to laugh and cry at the same time.” Others thought it was simply lack of respect and yet another way to humiliate the Latino community.”

The truth is, no matter what business you’re in, if you’re trying to communicate with customers in another language, there’s no substitute for a translator who knows both languages in and out. It’s impossible to put your best foot forward using Google Translate, or any other machine translation program for that matter!

Do you think newspapers should rely on Google Translate?

human translation vs machine translation

A Translation Showdown: Man vs Machine Translation

Computer scientists began trying to solve the problem of machine translation in the 1950s.  Since then, both the availability and quality of machine translation have improved tremendously. But in the battle of human translation vs machine translation, are humans now expendable?

Some scientists working on machine translation claim that with recent improvements, algorithms are almost as good at translation as humans.  And when the subject of “jobs that will soon be taken over by robots” comes up, futurists almost always put “translation” in the crosshairs.

But what happens when machines take on human translators? Earlier this month, Sejong Cyber University and the International Interpretation and Translation Association of Korea decided to find out. 3 machine translation programs went up against a group of human translators. It was a translation showdown: human translation vs machine translation.

Man versus machine, the translation industry’s version of the famous contest between John Henry and the steam-powered hammer  Guess who won? Read more

Cheeseburgery Hamburgers

Cheeseburgery Hamburgers

On the FT’s blogs today Tony Barber wrote this article about his recent experience of machine translation.

After hearing a recommendation that bloggers should use computerised translation to provide foreign language replicas of their own blogs, he decided to put Google Translate to the test.

I won’t spoil it for you – the full article is here but there is one section that made us laugh for a good fifteen minutes… taken from a Polish Newspaper (Gazeta Wyborcza) and translated into English using Google Translate it says,

“A sign of the collapse of the French culture of the restaurant is visible on the streets of Paris rash of quick-service bar, offering generally pogardzane a few years ago and cheeseburgery hamburgers.”

Cheeseburgery Hamburgers – brilliant.

BTW – I’ve been talking about how poor machine translation is for a long time (actually wrote a paper on it at uni).

Translation Fails

Magazine Illustrates Language Expert’s Article With Bungled Translations

Adam Wooten, a translation expert with Globalization Group, was pleased when a local magazine published an article he wrote about the importance of obtaining accurate, professional translations for companies doing business overseas.

He became much less pleased, however, when he received a copy of the magazine and skimmed over the article. Someone at the magazine had decided to “enhance” the article by translating the title, “Lost Into Translation”, into several different languages. In the Deseret News, Wooten writes:

“I became concerned when I saw large, bright, red text splashed across both pages in six languages. Where did these multilingual phrases originate? I knew Globalization Group, the translation company where I work, had not provided any translations…something about them did not look right.” Read more

Google Translate

Can’t I just use Google Translate?

I was asked this question today.

It wasn’t the first time. If I’m honest, it annoyed me that I should have to answer it at all. But I guess if you don’t work in the language industry, you might perceive Google as a trustworthy company who can do no wrong, so you could be forgiven for thinking that their machine translation would be equally reliable. I’m answering it here on the language blog, to share with anyone who may be guilty of having the same thoughts.

It’s surprising (to me, at least) how many times I hear things like;

  • So basically you do the same as Google Translate?
  • Why should I pay you anything when I can get Google Translate to do it for free?
  • Do you use Google Translate for all your translation?
  • Do you just have one big computer who does all the translation?

(the answer is NO to all of the above) Read more

Google translates latin

Google Translate Now Translates Latin Ad Libitum

Last week, Google added another language to its popular Google Translate service, and Latin students everywhere breathed a sigh of relief. Yes, Google Translate now decodes Latin. The announcement came via a blog post written entirely in Latin by engineer Jakob Uszkoreit. Show-offs!

Google expects the Latin version of Google Translate to be quite popular with students who are studying the language, as well as for people studying philosophical and other texts originally written in Latin.

The fact that Latin is a dead language should make Google’s machine translation more accurate, as the company explained in its blog (Latin translation from the Telegraph):

“Unlike any of the other languages Google Translate supports, Latin offers a unique advantage: most of the text that will ever be written in Latin has already been written, and a comparatively large part of it has been translated in to other languages. We use these translations, found in books and on the web, to train our system.”

Read more

Google Translate: Now in Esperanto

Google Translate now comes in 64 flavors. The latest addition to the family is Esperanto. Google announced the news in a blog post last week.

Of course, the obvious question inspired by the announcement is, “Why Esperanto?” After all, it’s not the official language of any country, very few children grow up speaking it, and nobody speaks it exclusively.

If you’re unfamiliar with the language, here’s some background. Esperanto is a constructed language developed in the late 19th century by L.L Zamenhoff. It was designed to be easy to learn, combining and incorporating different aspects of various Indo-European languages.

According to Zamenhoff’s personal letters, the creation of Esperanto was a dream that he had nurtured since he was a child:

“The place where I was born and spent my childhood gave direction to all my future struggles. In Bialystok the inhabitants were divided into four distinct elements: Russians, Poles, Germans and Jews; each of these spoke their own language and looked on all the others as enemies. In such a town a sensitive nature feels more acutely than elsewhere the misery caused by language division and sees at every step that the diversity of languages is the first, or at least the most influential, basis for the separation of the human family into groups of enemies.”

The desire to unite people around the world across language barriers was what inspired him to create Esperanto, and is also what inspired Google to add Esperanto to its machine translation repertoire.

Interestingly, the same characteristics that make Esperanto easy for humans to learn also made it easy for Google Translate to pick up. The Google Translate team explained on their blog:

“As we know from many experiments, more training data (which in our case means more existing translations) tends to yield better translations. For Esperanto, the number of existing translations is comparatively small. German or Spanish, for example, have more than 100 times the data; other languages on which we focus our research efforts have similar amounts of data as Esperanto but don’t achieve comparable quality yet.”

Practically speaking, though, nobody is sure exactly how many people actually speak Esperanto. Per Wikipedia, estimates range from 10,000 to 2,000,000. The underlying idea behind Esperanto is commendable, but it’s still a relatively small linguistic niche.

If you’re trying to reach customers on a global basis, other languages would be probably be a better choice to focus on at first. And remember- a skilled human translator will get you much better results than Google Translate’s admittedly less-than-high quality translations!

Image Source: Attribution Some rights reserved by eliazar

11 Google Translate Facts You Should Know

Google Translate turned 10 years old last week. With the power of the Google empire behind it, it’s the world’s most popular machine translation tool. At K International, we can’t help but see how Google Translate has helped people communicate when professional translation is unavailable. However, we are also familiar with the consequences of relying on it too heavily. To celebrate our decade-long love/hate relationship with this service, here are 10 Google Translate facts you should know.

1. More than 500 million people use Google Translate.

According to the Google Translate blog, the service has more than 500 million users. That’s close to the entire population of the European Union, which has 508 million inhabitants. When Google released Google Translate in 2006, the number of users was measured in the hundreds.

2. Google Translate translates more than 100 billion words per day.

That’s roughly equivalent to a stack of 128,000 Bibles, every single day.

3.  Google Translate now supports 103 languages.

When it was launched 10 years ago, it only supported two: English and Arabic. Read more

Poetry is what gets lost in translation

Google Translate to Tackle Poetry

Robert Frost once said, “Poetry is what gets lost in translation.” However, according to NPR, that hasn’t stopped Google from attempting to translate poetry using their Google Translate machine translation service.

Google research scientist  Dmitriy Genzel told NPR that he considers effectively translating poetry to be the ultimate challenge, saying the attempt is “what we call AI complete. Which means it’s as difficult as anything we can attempt in artificial intelligence.”

What makes it so difficult? According to Carl Sandburg, “Poetry is an echo, asking a shadow to dance.” How do you translate that? It’s a challenge even for knowledgeable human translators to create a translation that captures both the rhythm of a poem and the layers of meaning it contains. Read more