Lehigh University
Web spams worst enemy

Brian Davison, Ph.D., assistant professor of computer science and engineering

When Brian Davison was a graduate student back in the late 1990s, he and his colleagues worked to build a search engine. Whenever it ran a search, however, interspersed with good results would be links to advertisements or links that redirected the user to Web sites unrelated to the search.

Today, this problem is known as web spam, and with the backing of Microsoft’s Live Labs, Davison, assistant professor of computer science and engineering, is web spam’s greatest enemy.

Baoning Wu, a Ph.D. candidate from Nanjing, China, works closely with Davison on cutting-edge research to reduce and eventually eliminate web spam. “Our goal is to remove as much web spam as possible from what the average user sees,” says Davison. “We would eventually like to either make web spamming of no value or make the effort to get that value so high that it's not worth it anymore.

"Baoning’s dissertation work focuses exactly on the problem of web spam—how to detect and eliminate search engine spam. We have been working together on various aspects of this problem for the last two years.”

Doctoral candidate Baoning Wu has worked with Davison in developing methods to eliminate web spam.

One type of web spam is called “cloaking.” A web site that uses cloaking presents two different versions of itself: one to the search engine, and a different one to the user. "A cloaked web site may look good in a search engine, but when you click on it, it's on an entirely different topic. It may be even inappropriate or offensive," says Davison.

Methods for effectively weeding out cloaked web sites were nonexistent until Davison and Wu devised an innovative algorithm to detect cloaking. The process identifies a group of sites that are likely to be using cloaking, and then applies a more rigorous analysis to that subset to make the final determinations. This prevents search engines from having to analyze every single web site for cloaking.

Davison and Wu had two papers published at the 15th International World Wide Web Conference, which is one of the most prestigious conferences on internet technologies developments. One paper was on their new algorithm. “When we submitted the paper, we were optimistic, but it was hard to tell whether it would be accepted,” says Wu.

Only 11 percent of the papers were accepted, so to have a pair published (the other one was titled "Topical TrustRank: Using Topicality to Combat Web Spam") was quite an accomplishment. Davison and Wu traveled to Edinburgh, Scotland, to make presentations in late May. Representatives attended the conference from many different internet companies, including major search engines. "This is definitely of interest to major search engine companies. They want to know when a site is basically lying to them," said Davison.

In select company

Davison was recently awarded a grant and special data access through Microsoft's Live Labs program. The goal of Live Labs is to promote innovation in internet technologies. Davison was one of only 12 researchers to receive this award. For a complete list of award winners, go online.

As a result of being one of the 12 winners, Davison will now have access to large search engine logs from Microsoft to aid his research. According to Davison, it is “orders of magnitude larger than anything else available to academia.”

The Live Labs Manifesto states, “Something intangible—a better algorithm—can massively increase global utility and welfare.” Davison said, “I think that's absolutely true. The internet is becoming more and more incorporated into people's everyday lives. If search engines can use our research, then in a sense we are improving lives.”

Davison also works to improve the lives of students by holding extra office hours for his students in the Sigma Phi Epsilon fraternity. Davison said that it was an enjoyable experience and he plans to continue it in the fall semester.

Wu worked as an intern at Google during the summer of 2005, and at Microsoft Research during the summer of 2006. After he earns his Ph.D., he hopes to get a job at a major search engine company such as Google, Microsoft, or Yahoo.

Web spam is only one of Davison's research areas. Among others, he has a research project funded by the National Science Foundation titled "Understanding and Enhancing Queries." The project explores ways of utilizing query logs to learn what users search for and to provide suggestions for future searchers. This on-going work has already resulted in query suggestion methods, web page classification improvements, and had led to new methods of topical calculations of authority.

In addition, Davison was the organizer of the first international workshop on Adversarial Information Retrieval on the Web (AIRWeb), and is organizing the second, which will be held in August.

--Gabriel West

Posted on Tuesday, June 20, 2006

