Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books
Hengzhi Wu1
, Gabriella Kazai2
and Michael Taylor2 
| (1) |
Department of Computer Science, Queen Mary,University of London, UK |
| (2) |
Microsoft Research, Cambridge, UK |
Abstract
Through mass-digitization projects and with the use of OCR technologies, digitized books are becoming available on the Web
and in digital libraries. The unprecedented scale of these efforts, the unique characteristics of the digitized material as
well as the unexplored possibilities of user interactions make full-text book search an exciting area of information retrieval
(IR) research. Emerging research questions include: How appropriate and effective are traditional IR models when applied to
books? What book specific features (e.g., back-of-book index) should receive special attention during the indexing and retrieval
processes? How can we tackle scalability? In order to answer such questions, we developed an experimental platform to facilitate
rapid prototyping of a book search system as well as to support large-scale tests. Using this system, we performed experiments
on a collection of 10 000 books, evaluating the efficiency of a novel multi-field inverted index and the effectiveness of
the BM25F retrieval model adapted to books, using book-specific fields.
Keywords Book search - multi-field indexing - BM25F - efficiency - effectiveness
References secured to subscribers.