One of the most difficult parts of creating Artificial Intelligence that actually appears intelligent is having it read, hear, understand and process human language. Human language is incredibly ambiguous and every sentence we speak, though perhaps clear to us because we have the ability to apply context and lived experience, could have tens of thousands of possible syntactic structures which drastically alter its meaning, making things much harder for a computer system.
For example in the sentence “Alice drove down the street in her car” we can quickly establish what that means but a machine might not find it so simple and instead interpret that the street is located in Alice’s car. This ambiguity might be silly to us but it isn’t entirely impossible and a machine without our lived experiences would not be so quick to discard it as a possible interpretation.
To help AI overcome this problem and provide a solid foundation for Natural Language Understanding systems, Google has developed SyntaxNet, an open-source neural network framework that helps AI figure out all the different ways a sentence could be understood before scoring them and establishing the most likely option.
The most important part of SyntaxNet is its parser, Parsey McParseface (yes, really), which has been built on powerful machine learning algorithms that learn to analyze the linguistic structure of language, and can explain the functional role of each word in a given sentence. Google explained the process in a recent blog post:
“Instead of simply taking the first-best decision at each point, multiple partial hypotheses are kept at each step, with hypotheses only being discarded when there are several other higher-ranked hypotheses under consideration.”
According to Google, Parsey is the most accurate model of its kind in the world, boasting 94% accuracy on a standard benchmark which is
disturbingly astoundingly close to the 96-97% accuracy of linguists trained for the same task. This is only in cases of well-formed text, though, but even when Parsey is tasked with sentences randomly drawn from the web, it still managed 90% accuracy.
The main problem Google is still trying to overcome is ambiguities which require real-world knowledge and contextual reasoning to discount, though they’ve made real progress in the area and they’re ambitiously aiming to “enable equal understanding of natural language across all languages and contexts.” That said, at its current level of accuracy SyntaxNet is still extremely useful for a variety of applications and could greatly improve systems like Google Now.
To read more about the science behind the project and the development of Parsey, you can find Google’s research paper here, or you can visit here to download Parsey and SyntaxNet and implement them into your own projects.
Image via Flickr © Niharb