Lecture Notes in Computer Science, 2002, Volume 2423/2002, 291-294, DOI: 10.1007/3-540-45869-7_34

A Theoretical Foundation and a Method for Document Table Structure Extraction and Decompositon

Howard Wasserman, Keitaro Yukawa, Bon Sy, Kui-Lam Kwok and Ihsin Tsaiyun Phillips

View Related Documents

Abstract

The algorithm described in this paper is designed to detect potential table regions in the document, to decide whether a potential table region is, in fact, a table, and, when it is, to analyze the table structure. The decision and analysis phases of the algorithm and the resulting system are based primarily on a precise definition of table, and it is such a definition that is discussed in this paper. An adequate definition need not be complete in the sense of encompassing all possible structures that might be deemed to be tables, but it should encompass most such structures, it should include essential features of tables, and it should exclude features never or very rarely possessed by tables.

Fulltext Preview

Image of the first page of the fulltext document