Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

A Search Engine for Indian Languages

Ashwani MujooContact Information, Manoj Kumar MalviyaContact Information, Rajat MoonaContact Information and T V PrabhakarContact Information

(7)  Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur, India
Abstract
There is a great need for a search engine for web documents written in languages other than English. In this paper, we describe the design issues of a Search Engine for Indian Languages. We also describe the implementation of two Search Engines for Indian Languages, one for documents in ISCII and the other for documents in Unicode. The software allows full-text indexing and searching of a database of documents written in any Brahmi-based Indian Language. The Search engine gathers the HTML documents from the web, indexes and compresses the documents and then searches for the given keywords. The main features of the search engines are phonetic tolerance, morphological analysis, compression and indexing, leading and trailing substring matches for keywords, search through compressed documents. The implementation includes a search server architecture, which can be accessed from a WYSIWYG front end, which is a Java swing applet. Performance results show that the search engine achieves a compression of almost 80 percent and has an appreciable precision and recall.

Contact Information Ashwani Mujoo
Email: mujoo@cse.iitk.ac.in

Contact Information Manoj Kumar Malviya
Email: manojkm@cse.iitk.ac.in

Contact Information Rajat Moona
Email: moona@cse.iitk.ac.in

Contact Information Prabhakar
Email: tvp@cse.iitk.ac.in
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.106 • Server: mpweb23
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)