Google’s huge ‘Wikilinks’ corpus could help computers understand human language


Google has released an extremely large dataset which could help developers build software that accurately interprets human language. Known as the Wikilinks Corpus, the collection comprises more than 40 million individual links from web pages to Wikipedia articles, known as “mentions.” Analysing the context of each mention alongside the content of the destination article should allow engineers to more accurately determine the meanings of ambiguous words.

Humans are “amazingly good” at disambiguation

As a post on Google’s Research Blog points out, humans are “amazingly good” at distinguishing between meanings — for instance, “Dodge” the car brand and the verb “to dodge.” This is at least partly attributable to the massive banks of…

Continue reading…

The Verge – All Posts

Leave a Comment