Extracting Objects and Their Attributes from Tables in Text Documents.

Authors

Astrakhantsev N.

Abstract

Extracting information from tables is an important and rather complex part of information retrieval.

For the task of objects extraction from HTML tables we introduce the following methods: determining table orientation, processing of aggregating objects (like Total) and scattered headers (super row labels, subheaders).

Full text of the paper in pdf

Keywords

Information extraction; information retrieval; natural language processing; table processing; table extraction; semi-structured information extraction; html; wiki markup

Edition

Proceedings of SYRCoDIS'11: The Seventh Spring Researchers Colloquium on Databases and Information Systems, 2011, pp. 34-47.

Research Group

Information Systems

All publications during 2011

All publications

На нашем сайте мы используем cookie файлы, содержащие информацию о предыдущих посещениях веб-сайта. Данные обрабатываются для улучшения качества работы нашего веб-сайта. Если вы не хотите использовать cookie файлы, измените настройки браузера.

Понятно