Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Spoken Information Extraction from Italian Broadcast News
| |
|
Spoken Information Extraction from Italian Broadcast News
Vanessa Sandrini5 and Marcello Federico5 
| (5) |
ITC-irst — Centro per la Ricerca Scientifica e Tecnologica, 38050 Povo, Trento, Italy |
Abstract
Current research on information extraction from spoken documents is mainly focused on the recognition of named entities, such
as names of organizations, locations and persons, within transcripts automatically generated by a speech recognizer. In this
work we present research carried out at ITC-irst on named entity recognition in Italian broadcast news. In particular, an
original statistical named entity tagger is described which can be trained with relatively little language resources: a seed
list of named entities and a large untagged text corpus. Moreover, the paper discusses and presents named entity recognition
experiments with case sensitive automatic transcripts, generated by the ITC-irst speech recognizer, and by training the named
entity model with seed lists of different size.
Fulltext Preview (Small, Large)
 References secured to subscribers.
|
|
|
|
|
|