Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Filtering of Large Numbers of Unstructured Text Documents by the Developed Tool TEA

Jan ŽižkaContact Information and Aleš BourekContact Information

(3)  Department of Biophysics, Faculty of Medicine, Masaryk University in Brno, Joštova 10, 662 43 Brno, Czech Republic
(4)  Department of Information Technologies, Faculty of Informatics, Masaryk University in Brno, Botanická 68a, 602 00 Brno, Czech Republic
Abstract
This paper describes a text-document-filtering software tool TEA (TExt Analyzer), which was originally developed for physicians to support selections of large numbers of unstructured medical text documents obtained from available Internet services. TEA learns interesting and relevant documents for individual users basically by the naïve Bayes algorithm. Moreover, TEA provides a number of additional functions that can improve its classification accuracy, allow more specific document selection for individual users, and enable users to work with dictionaries generated from analyzed documents. The learning process of TEA is based on a set of labeled positive and negative examples of text documents, which obtain their labels from users interested in documents of certain, usually very specific topics.

Contact Information Jan Žižka
Email: zizka@informatics.muni.cz

Contact Information Aleš Bourek
Email: bourek@med.muni.cz
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.105 • Server: mpweb08
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)