distances
are computed in a multi-dimensional space. The axes of this space in principle relate to the features inherent in the input
data. Usually, such features are chosen by neural network developers, thereby introducing a possible bias. A method of automatically
generating feature sets is discussed, with specific reference to the categorisation of streams of free-text news items. The
feature sets were generated by a procedure that automatically selects a group of keywords based on a lexico-semantic analysis.
Three different types of text streams – headlines only, news summaries and full news items including the body of the text
–have been categorised using Self-Organising Feature Maps (SOFM). A method for assessing the discrimination ability of a SOFM,
based on Fisher’s Linear Discriminant Rule suggests that the maps trained on vectors related to summaries only provides a
fairly accurate cluster when compared with vectors related to full text. The use of summaries as document surrogates for document
categorisation is suggested.
Keywords:Automatic classification; Kohonen map; Linear discriminant rule; SOFM; Text classification; Training NN; Weirdness
coefficient