Artificial intelligence models developed by Microsoft and Alibaba have, for the first time, outperformed humans in a reading comprehension challenge.
The Stanford Question Answering Dataset (SQuAD) consists of a series of questions to which the answers can be found within more than 500 Wikipedia entries.
Alibaba’s deep neural network model scored 82.440 on the ‘exact match’ part of the test, besting the scores achieved by humans (82.304). Microsoft’s similar model achieved a score of 82.650.
The scoreboard is a who’s who of corporates carrying out artificial intelligence research, featuring the likes of Google, IBM Research, Facebook AI Research, Salesforce Research, Tencent and Samsung.
Alibaba and Microsoft have been placed joint first in the ranking, although both companies claim to have reached the better-than-human milestone first.
While Microsoft is listed as having registered its score on 3rd January and Alibaba two days later, Alibaba said those dates were when the companies submitted their models, not when test results were registered.
“It is our great honour to witness the milestone where machines surpass humans in reading comprehension,” said Luo Si, chief scientist for natural language processing at Alibaba’s Institute of Data Science and Technologies (iDST) in a statement. “We are thrilled to see NLP research has achieved significant progress over the year. We look forward to sharing our model-building methodology with the wider community and exporting the technology to our clients in the near future.”
Ming Zhou, assistant managing director of Microsoft Research Asia, said despite the milestone, overall, people are still much better than machines at comprehending the complexity and nuance of language.
“Natural language processing is still an area with lots of challenges that we all need to keep investing in and pushing forward,” he said. “This milestone is just a start.”
The big AI players are investing heavily in reading comprehension and response models.
Alibaba said it had been using the underlying technology during its ‘Global Shopping Festival’ for a number of years to answer customer inquiries.
Microsoft said it was applying earlier versions of the model to its Bing search engine.
“These tools also could let doctors, lawyers and other experts more quickly get through the drudgery of things like reading through large documents for specific medical findings or rarified legal precedent. The technology would augment their work and leave them with more time to apply the knowledge to focus on treating patients or formulating legal opinions,” the company wrote in a blogpost.
It is also working on models that answer probable follow-up questions.
“For example, let’s say you asked a system, ‘What year was the prime minister of Germany born?’ You might want it to also understand you were still talking about the same thing when you asked the follow-up question, ‘What city was she born in?’
“It’s also looking at ways that computers can generate natural answers when that requires information from several sentences. For example, if the computer is asked, ‘Is John Smith a US citizen?’ that information may be based on a paragraph such as, ‘John Smith was born in Hawaii. That state is in the US’” Microsoft explained.