The ability to infer the characteristics of offenders from their criminal behaviour (‘offender profiling’) has only been partially
successful since it has relied on subjective judgments based on limited data. Words and structured data used in crime descriptions
recorded by the police relate to behavioural features. Thus Language Modelling was applied to an existing police archive to
link behavioural features with significant characteristics of offenders. Both multinomial and multiple Bernoulli models were
used. Although categories selected are gender and age group, in principle this can be applied to any characteristic recorded.
Results indicate that statistically significant relationships exist between both age and sex in certain types of crime. Both
types of language model perform with similar effectiveness. It is also possible to identify automatically specific terms which
when taken together give insight into the style of offending related to a particular group.
Keywords Text Data Mining - Language Models - Crime Data - Investigative Psychology - Offender Profiling