Lecture Notes in Computer Science, 1998, Volume 1533/1998, 41-49, DOI: 10.1007/3-540-49381-6_6

Maximizing Agreement with a Classification by Bounded or Unbounded number of Associated Words

Hiroki Arimura and Shinichi Shimozono

View Related Documents

Abstract

We study the efficient discovery of word-association patterns, defined by a sequence of strings and a proximity gap, from a collection of texts with binary labels. We present an algorithm that finds all d strings and k proximity word-association patterns that maximizes agreement with the labels. It runs in expected time complexity O(k d-1 n log d+1 n) and O(k d-1 n) space with the total length n of texts, if texts are uniformly random strings. We also show that the problem to find a best word-association pattern with arbitrarily many strings is MAX SNP-hard.

Fulltext Preview

Image of the first page of the fulltext document