A significant part of medical data remains stored as unstructured texts. Semantic search requires introduction of markup tags.
Experts use their background knowledge to categorize new documents, and knowing category of these documents disambiguate words
and acronyms. A model of document similarity that includes a priori knowledge and captures intuition of an expert, is introduced. It has only a few parameters that may be evaluated using linear
programming techniques. This approach applied to categorization of medical discharge summaries provided simpler and much more
accurate model than alternative text categorization approaches.