A domain-independent architecture for efficient information retrieval on the World Wide Web by Marc Langheinrich Advisors Professor Ipke Wachsmuth University of Bielefeld, Faculty of Technology Professor Oren Etzioni University of Washington, Computer Science & Engineering Abstract The World Wide Web's rapid growth in recent years has provided a wealth of online information. Including already more than hundred million documents, finding a particular page has become a daunting task of battling the Web's "information overload". Most popular methods of finding information on the Web are known for being either notoriously imprecise or often incomplete: simple searches can easily return hundreds or thousands of irrelevant pages, while others might fail to include even a single relevant one. This thesis presents a novel architecture called "Dynamic Reference Sifting", which attempts to combine the comprehensiveness of Web indices, such as AltaVista or Hotbot, with the accuracy of Web directories, such as Yahoo. Dynamic Reference Sifting uses the output of general purpose search services, combined with additional, orthogonal information sources; domain specific heuristics; and a flexible categorization scheme to filter out all but the single correct page. Our experiments show that for certain types of pages, this approach can provide nearly twice the accuracy and at least the same coverage as any existing service. We have implemented a prototype called "Ahoy! The Homepage Finder", which demonstrates the feasibility of our approach. Ahoy! is publicly accessible on the Web, and has served more than 500,000 queries since it was fielded in May 1996. In order to demonstrate the domain independence and generality of our architecture, we will also present two simple prototypes using Dynamic Reference Sifting in the domains of academic papers and jokes. Both systems were developed and implemented in less than ten days, but prove highly successful in our initial experiments.