(U-WIRE) LINCOLN, Neb. Like McDonald's with its secret sauce, Mehran
Sahami must be careful how much he tells people about how his employer manages
to successfully serve millions every day.
Sahami, a senior research scientist for Google.com, spoke to about 100 computer
science and engineering students at the University of Nebraska on Thursday about
the history of Web browsers and how they had been modified to better serve an
ever-growing number of Web users.
The lecture, titled "The Past, Present and Future of Web Information Retrieval,"
was sponsored by the Department of Computer Science and Engineering Colloquium
Series at the University of Nebraska-Lincoln.
Between 1999 and 2002, the number of Web users has gone from 140 million to
320 million; the number of Web pages from 500 million to about 6 billion; and
the number of queries per day from 100 million to 500 million.
"Obviously, we need to build a system that will surpass those needs," Sahami
said.
The main objective for researchers at Google.com is not only to create a browser
that makes finding the right information easier and faster for users, but also
to make deceiving a browser with spam harder, he said.
Older browsers ranked Web documents by how often a search term appeared in
the document, which encouraged spammers to create bogus Web pages with certain
words just to achieve a higher ranking in a search, he said.
The first browsers also assumed Web pages were generally coherent, queries
were long and specific and words were correctly spelled, Sahami said.
In fact, many Web pages are created by people who shouldn't be creating Web
pages, including 5-year-olds, he said. Queries, which used to be between 10
and 20 words long for other browsers, are down to about two. And misspelling
is rampant on the Internet.
"For 'Britney Spears' alone, there are more than 800 misspellings," he said.
In addition, a browser must be able to complete a search quickly, he said.
"How often would you want to wait 10 minutes for a better ranking?" he asked.
"We can think of better (ways) to retrieve information from the Web, but if
it can't run it in half a second, we can't use it."
In order to run quickly and give people the information they want, Google.com
uses a sophisticated mix of ranking technologies that weigh the words in a search
and the sites that could be a possible match, he said.
For example, the validity of a Web site can in part be determined by the quality
and number of links to it on the Internet, he said.
Google.com was founded in 1998 by Larry Page and Sergey Brin, who met when
they were graduate students at Stanford University in 1995.
The company has 12 offices worldwide and employs more than 1,000 people, including
32 roller hockey players and two massage therapists. In 1999, the company had
an equity valued at $25 million, according to the company's Web site.
Leen-Kiat Soh, assistant professor of computer science at UNL who is teaching
a course this semester on Web information retrieval, says he hoped the talk
would inspire his students to get excited about computer science research.
"I don't want them to think we're just making them suffer for the degree just
so they can earn money," Soh says. "They should see that research is used to
solve real problems and make money, too."
Rachael Seravalli