Information on the web is often placed in a structure having a particular alignment and order. For example, Web pages produced
by Web search engines, CGI scripts, etc generally have multiple records of information, with each record representing one
unit of information and share a distinct visual pattern. The pattern formed by these records may be in the structure of documents
or in the repetitive nature of their content. For effective information extraction it becomes essential to identify record
boundaries for these units of information and apply extraction rules on individual record elements. In this paper I present
REBIEX, a system to automatically identify and extract repeated patterns formed by the data records in a fuzzy way, allowing
for slight inconsistencies using the structural elements of web documents as well as the content and categories of text elements
in the documents without the need of any training data or human intervention. This technique, unlike the current ones makes
use of the fact that it is not only HTML structure which repeats, but also the content matter of the document which repeats
consistently. The system also employs a novel algorithm to mine repeating patterns in a fuzzy way with high accuracy.