Get email delivery of the Cadence blog featured here
You know how on your Facebook feed, there is a “Memories” feature, where it reminds you of posts from previous years? A couple of things from last year rolled up on my feed today:
And yesterday, I came across the news story from Popular Science about Facebook’s new language translation system.The breakthrough, per this article, is this:
“The standard approach so far has been to use recurrent neural networks to translate text, which look at one word at a time and then predict what the output word in the new language should be... But the Facebook researchers tapped a different technique… which looks at words in groups instead of one at a time.”
I never loved mathematics as a kid, but I will say this: learning about algebraic functions changed every aspect of my life. Looking at one thing in the context of another shook me personally to my core and I found it to be applicable everywhere, whether I was looking at feminist literature, conversational Spanish, organic chemistry, the history of music, Greek philosophy, or even vocal performance in a choir. Heck, it can even be applied to conflict resolution. Isn't this the same as changing your perspective?
The core of it all? Taking one set of data, changing your center and applying some THING, and seeing the result in a new way.
When considering neural networks as the “thing” (the algorithm, the filter, the network to filter through the data), our world can become intelligent. We can look at one kind of data and see it as a function of another: voice recognition (sound to text, as in transcribing voice mails to text), language translation (say, Spanish to English), vision (image to data to act upon, as is required for, say, A^2*)… we are beginning to really look at things in terms of other things.
(Is the logical conclusion of this line of thinking a kind of singularity? If you keep multiplying everything in terms of everything else, do we become mystics? I suppose that’s a question for a different kind of blog, one that goes more into the philosophy realm. Maybe religion. It’s a question too big for me, in any case.)
What I do want to point out, though, is that there is still an ineffable art to translation, whether you’re thinking literally or metaphorically. The news story of Facebook applying CNNs to their translation function is really about being able to look at content in a more holistic way, instead of word by word. At least that is a start.
My father has always loved Russian literature, and he gets more excited than anyone I know when a new translation of Anna Karenina or War and Peace comes out. He excitedly picks it up, reads it voraciously, and when he’s finished, he can put it down and feel as though he has read an entirely new book. He has also been known to quote this line from Heraclitus, “No man ever steps in the same river twice, for it's not the same river and he's not the same man.” (Yes, my dad taught philosophy before becoming a Unitarian Universalist minister. Hmm, maybe I should ask him about mysticism.) He believes the same principle applies to translation, and of course it makes sense that the translation by one person is different than another, each of the translators bringing their own context to the original text --- and the resulting translation is bound to be different.
In the same vein, if I were to ask a cross-section of any group of people to describe a painting, there may be some similarities in every description (“yellow”, “abstract”, “big”, “flower”, “rain”); unsurprisingly there would be vast differences in every person’s translation (“rushed”, “impressionistic”, “sad”, “giving”). It’s the same painting. Each person would bring their own context and interpretation to the description of the image.
There’s a very interesting paper published by Google, that shows the strides being made in this regard, however. It shows the following image:
(The paper that describes their methods is here: Show and Tell: A Neural Image Caption Generator.)
Amazing! Think of the implications to the vision-impaired! Think of what it could mean for your thousands of photos of the first year of your child’s life that you keep meaning to go through!
What this recognition description doesn’t show, however, is the context, the cultural, the artistry of the photograph, the cultures represented. “A group of people shopping at an outdoor market” may look entirely different in my mind's eye than someone in Bangladesh or Canada or Puerto Rico.
We have such a long way to go.
Translated literally, word-to-word, the Spanish phrase Me tome el pelo in English is: Me throw the hair. Translated into a sentence that makes sense in English using the implied grammar, it becomes I’m pulling your hair. But this phrase has nothing to do with hair, just as the phrase I’m pulling your leg has nothing to do with pulling or legs. But this is probably the closest translation for the phrase.
And if we know that ¡Me tome el pelo! and I’m pulling your leg! both mean that I’m joking with you, and maybe the translator can recognize that Ik maak je echt grapje! means the same thing in Dutch, and depending on context, it will know to triangulate the information and translate the phrase about yanking hair or pulling legs or telling jokes or some other phrase that every language has that means KIDDING. The next time the translator is asked to translate one of these phrases, it will recognize the phrase without even knowing each part of it, that the phrase means, as we said in the 80s, NOT!
And how does it decide which one to pick? KIDDING or NOT or I am joking or I’m pulling your leg? Which is “correct”?
(And for even more fun, take a look at this graph! And THIS is why it takes me a while to write a post. So many interesting things to read up on.)
I guess it all comes down to the question I posed the other day, about AI having “intuition”. So much more data is processed by our human brains than we can even fathom, we work on hunches and gut feelings sometimes more than we do on facts we can point to. Add the element of culture? It’s nigh on impossible to believe that true, perfect translation will ever be possible.
Language is a system based on so much more than the flat-out meaning of each word. For the geeks among us, “Darmok” is a perfect demonstration of this. (For the non-geeks, do a quick search on “Darmok and Jalad at Tanagra”, and you’ll see what I mean. Yes, I really am referring to the Star Trek episode.) This episode points out that the fancy “universal translators” of the Star Trek universe are useless without the context of the culture, so even if awlekmorr means cat, it may not really mean cat.
It may mean that time the god of war was turned into a small furry animal and he liked that form so much that he remained in cat form and so when someone is faced with adversity, it could be actually a blessing in disguise, so look on the bright side already.
It could also mean, you know, cat. Without context, you may never know.
P.S. For more information about Cadence products relating to neural networks, check out our Tensilica products!
* A^2 = autonomous automobiles