Lecture Notes in Computer Science, 2010, Volume 6008/2010, 212-223, DOI: 10.1007/978-3-642-12116-6_18

A Named Entity Extraction using Word Information Repeatedly Collected from Unlabeled Data

Tomoya Iwakura

View Related Documents

Abstract

This paper proposes a method for Named Entity (NE) extraction using NE-related labels of words repeatedly collected from unlabeled data. NE-related labels of words are candidate NE classes of each word, NE classes of co-occurring words of each word, and so on. To collect NE-related labels of words, we extract NEs from unlabeled data with an NE extractor. Then we collect NE-related labels of words from the extraction results. We create a new NE extractor using the NE-related labels of each word as new features. The new NE extractor is used to collect new NE-related labels of words. The experimental results using IREX data set for Japanese NE extraction show that our method contributes improved accuracy.

Fulltext Preview

Image of the first page of the fulltext document