The Skype translator, announced in May by Microsoft and Skype, may truly be the dawn of a new age of human history – it's literally a facet of science fiction. For a reminder from recent media, the dream of a device that automatically translates live speech is well and alive in the Mass Effect series.
Of course, translation is already a finicky business, which you could tell by sitting with Google Translate for 15 seconds. And now we're going to try translating voice? How would a machine pick up on tone or inflection? Think about the difference, when voiced, between “You're picking up the kids?” and “You're picking up the kids!” What about all the vocal nonsense words – the “um”s and “ah”s and “mm”s?
Science: it's amazing
At Microsoft Research, the big breakthrough spawned off this discrepancy. First, they came up with a learning system that can divide languages into collections of phrases known as n-grams – n being the number of phrases (phrasal SMT). Each n-gram denotes a set of phrases that collectively give the same meaning and Microsoft pushed it so that it can very accurately map between n-grams of different languages with great accuracy.
Then, as it is often said, they found it couldn't account for style. Here's where social media plugs in; in an attempt to overturn the axiom, the researchers came up with a social media translator that essentially accounted for and normalised the various differences into the same datasets the system could process before. It's perfect because every social platform has its own style and having a machine that can learn each platform's quirks will be a small step towards understanding human speech quirks.
For now, researchers say normalisation improved translation by six per cent but there's still much work to go as they adapt their findings from textual social media into the original vocal translation goal. Readers may be able to judge its effectiveness for themselves later this year when Skype Translator goes into beta testing.
READ MORE: What else is science good for? Cool stuff.