- About
- Innovations
- Divisions
- Compiler Technology Department
- Computer Systems Department
- Information Systems Department
- Software Engineering Department
- System integration and multi-disciplinary collaborative environments
- System Programming Department
- Theoretical Computer Science Department
- Academic council
- Dissertation council
- Verification Center of the Operating System Linux
- Center of competence in parallel and distributed computing
- Education
- Editions
- News
Extracting Objects and Their Attributes from Tables in Text Documents.
Authors
Astrakhantsev N.
Abstract
Extracting information from tables is an important and rather complex part of information retrieval.
For the task of objects extraction from HTML tables we introduce the following methods: determining table orientation, processing of aggregating objects (like Total) and scattered headers (super row labels, subheaders).
Full text of the paper in pdfKeywords
Information extraction; information retrieval; natural language processing; table processing; table extraction; semi-structured information extraction; html; wiki markup
Edition
Proceedings of SYRCoDIS'11: The Seventh Spring Researchers Colloquium on Databases and Information Systems, 2011, pp. 34-47.
Research Group
All publications during 2011
