Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Spoken Information Extraction from Italian Broadcast News

Vanessa SandriniContact Information and Marcello FedericoContact Information

(5)  ITC-irst — Centro per la Ricerca Scientifica e Tecnologica, 38050 Povo, Trento, Italy
Abstract
Current research on information extraction from spoken documents is mainly focused on the recognition of named entities, such as names of organizations, locations and persons, within transcripts automatically generated by a speech recognizer. In this work we present research carried out at ITC-irst on named entity recognition in Italian broadcast news. In particular, an original statistical named entity tagger is described which can be trained with relatively little language resources: a seed list of named entities and a large untagged text corpus. Moreover, the paper discusses and presents named entity recognition experiments with case sensitive automatic transcripts, generated by the ITC-irst speech recognizer, and by training the named entity model with seed lists of different size.

Contact Information Vanessa Sandrini
Email: sandrini@itc.it
URL: http://munst.itc.it

Contact Information Marcello Federico
Email: federico@itc.it
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.105 • Server: mpweb20
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)