The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started
out with the idea to develop a logic-based extraction language and a tool to visually define extraction programs from sample
Web pages, the scope of the project has been extended over time. Today, new issues such as employing learning algorithms for
the definition of extraction programs, automatically extracting data from Web pages featuring a table-centric visual appearance,
and extracting from alternative document formats such as PDF are being investigated.
This work is funded in part by the Austrian Federal Ministry for Transport, Innovation and Technology under the FIT-IT Semantic
Systems program.