Filed under: Search | Tags: Deep Peep, Deep Web, Google, Prof. Juliana Freire
Prof. Juliana Freire at the University of Utah is working on an ambitious project called DeepPeep (www.deeppeep.org) that eventually aims to crawl and index every database on the public Web. Extracting the contents of so many far-flung data sets requires a sophisticated kind of computational guessing game.
“The naïve way would be to query all the words in the dictionary,” Ms. Freire said. Instead, DeepPeep starts by posing a small number of sample queries, “so we can then use that to build up our understanding of the databases and choose which words to search.”
Based on that analysis, the program then fires off automated search terms in an effort to dislodge as much data as possible. Ms. Freire claims that her approach retrieves better than 90 percent of the content stored in any given database. Ms. Freire’s work has recently attracted overtures from one of the major search engine companies.
As the major search engines start to experiment with incorporating Deep Web content into their search results, they must figure out how to present different kinds of data without overcomplicating their pages. This poses a particular quandary for Google, which has long resisted the temptation to make significant changes to its tried-and-true search results format.
Leave a Comment so far
Leave a comment