2014年8月26日 星期二

Google's fact-checking bots build vast knowledge bank



自動化取代人力,Google 建全球最大知識庫

作者  | 發布日期 2014 年 08 月 26 日 分類 Google , 網路
14082601-01
搜尋引擎巨人 Google 正建立一個自動化的知識庫(Knowledge Vault),透過演算法自動爬梳網路上的資訊,並利用機器學習將資訊整合成知識,兼具廣度與精度,讓電腦與智慧型手機了解人類的提問,未來有望改進 Google 回答問題的方式,從原本列出一連串的搜尋結果,進化到一目了然的答案。
Knowledge Vault 的前身是 2012 年 Google 所發表的「知識圖表」(Knowledge Graph),一個將資訊結構化的資料庫,透過群眾外包的力量擴大訊息量,內容取自維基百科、CIA 的世界概觀(World Factboo)與協作知識庫 Freebase,內容包羅萬象,有名人、事件等,總共彙整了 5 億個條目及 35 億種事實描述。但最終 Google 發現人的力量還是有限,知識量的擴增出現停滯,於是 Google 改弦易轍,決定以自動搜集的過程取代人力。
到目前為止,Knowledge Vault 已建置了 16 億種事實描述,其中 2.71 億被評比為可信賴的事實,Google 將新事實與已掌握的知識進行交叉比對,發現準確性達 90%。雖然目前 Knowledge Vault 的知識量還未達到知識圖表的水準,但它能自動擴充增加知識,超越的時間指日可待,很快的將成為全球最豐富的知識資料庫。
Google 除了能從網頁上分析文本找尋事實,來餵養它的資料庫,也能抓到一般檯面上看不到的資料,例如 Amazon 的產品銷售數據,或瀏覽某項產品的人數等。Garntner 的技術分析師奧斯丁(Tom Austin)表示,幾家世界上最大的科技公司如 Google、微軟、Facebook、Amazon 和 IBM 等,都在打造類似的知識庫,並處理極為龐大複雜的問題。他表示:當機器能掌握全人類的知識,它的智力將遠遠超過現有的語音助理軟體,在不久的將來,我們會看到能判斷優先順序的電子信箱問世,它能夠找到最重要的 10 封郵件,然後在不用人類的幫忙下,自動處理完剩下的郵件。
(首圖來源:Robert Scoble CC BY 2.0)

Google's fact-checking bots build vast knowledge bank

The search giant is automatically building Knowledge Vault, a massive database that could give us unprecedented access to the world's facts
GOOGLE is building the largest store of knowledge in human history – and it's doing so without any human help.
Instead, Knowledge Vault autonomously gathers and merges information from across the web into a single base of facts about the world, and the people and objects in it.
The breadth and accuracy of this gathered knowledge is already becoming the foundation of systems that allow robots and smartphones to understand what people ask them. It promises to let Google answer questions like an oracle rather than a search engine, and even to turn a new lens on human history.
Knowledge Vault is a type of "knowledge base" – a system that stores information so that machines as well as people can read it. Where a database deals with numbers, a knowledge base deals with facts. When you type "Where was Madonna born" into Google, for example, the place given is pulled from Google's existing knowledge base.
This existing base, called Knowledge Graph, relies on crowdsourcing to expand its information. But the firm noticed that growth was stalling; humans could only take it so far.
So Google decided it needed to automate the process. It started building the Vault by using an algorithm to automatically pull in information from all over the web, using machine learning to turn the raw data into usable pieces of knowledge.
Knowledge Vault has pulled in 1.6 billion facts to date. Of these, 271 million are rated as "confident facts", to which Google's model ascribes a more than 90 per cent chance of being true. It does this by cross-referencing new facts with what it already knows.
"It's a hugely impressive thing that they are pulling off," says Fabian Suchanek, a data scientist at Télécom ParisTech in France.
Google's Knowledge Graph is currently bigger than the Knowledge Vault, but it only includes manually integrated sources such as the CIA Factbook.
Knowledge Vault offers Google fast, automatic expansion of its knowledge – and it's only going to get bigger. As well as the ability to analyse text on a webpage for facts to feed its knowledge base, Google can also peer under the surface of the web, hunting for hidden sources of data such as the figures that feed Amazon product pages, for example.
Tom Austin, a technology analyst at Gartner in Boston, says that the world's biggest technology companies are racing to build similar vaults. "Google, Microsoft, Facebook, Amazon and IBM are all building them, and they're tackling these enormous problems that we would never even have thought of trying 10 years ago," he says.
The potential of a machine system that has the whole of human knowledge at its fingertips is huge. One of the first applications will be virtual personal assistants that go way beyond what Siri and Google Now are capable of, says Austin.
"Before this decade is out, we will have a smart priority inbox that will find for us the 10 most important emails we've received and handle the rest without us having to touch them," Austin says. Our virtual assistant will be able to decide what matters and what doesn't.
Other agents will carry out the same process to watch over and guide our health, sorting through a knowledge base of medical symptoms to find correlations with data in each person's health records. IBM's Watson is already doing this for cancer at Memorial Sloan Kettering Hospital in New York.
Knowledge Vault promises to supercharge our interactions with machines, but it also comes with an increased privacy risk. The Vault doesn't care if you are a person or a mountain – it is voraciously gathering every piece of information it can find.
"Behind the scenes, Google doesn't only have public data," says Suchanek. It can also pull in information from Gmail, Google+ and Youtube."You and I are stored in the Knowledge Vault in the same way as Elvis Presley," Suchanek says.
Google researcher Kevin Murphy and his colleagues will present a paper on Knowledge Vault at the Conference on Knowledge Discovery and Data Mining in New York on 25 August.
As well as improving our interactions with computers, large stores of knowledge will be the fuel for augmented reality, too. Once machines get the ability to recognise objects, Knowledge Vault could be the foundation of a system that can provide anyone wearing a heads-up display with information about the landmarks, buildings and businesses they are looking at in the real world. "Knowledge Vault adds local entities – politicians, businesses. This is just the tip of the iceberg," Suchanek says.

Knowledge vault

Richer vaults of knowledge will also change the way we study human society "This is the most visionary thing," says Suchanek. "The Knowledge Vault can model history and society."
Google already has a way to track mentions of names over time using historical texts, measuring the popularity of Albert Einstein vs Charles Darwin, for instance. By adding knowledge bases – which know the gender, age and place of birth of myriad people – historians would be able to track more in-depth questions, such as the popularity of female singers over time, for example.
Suchanek has already carried out a version of this kind of data-driven history. By combining a knowledge base called YAGO with data from French newspaper Le Monde, he was able to show how the gender gap in French politics changed over time. This was only possible because YAGO knows the gender of every French politician, and can apply that knowledge to names mentioned in Le Monde. He will present the work at the Very Large Databases Conference in Hangzhou, China, in September.
It might even be possible to use a knowledge base as detailed and broad as Google's to start making accurate predictions about the future based on analysis and forward projection of the past, says Suchanek.
"This an entirely new generation of technology that's going to result in massive changes – improvement in how people live and have fun, and how they make war," says Austin. "This is a quantum leap."
This article appeared in print under the headline "Welcome to the oracle"

沒有留言:

網誌存檔