Proceedings of ISP RAS


Using prefix trees for searching text strings with disk-based storage.

Taranov I.

Abstract

Searching and storing large amounts of text data is a common problem in computer science, though it have been discussed for a long time. We introduce data structure for searching a set of string values and storing it on disk efficiently which is used for indexing XML data in Sedna XML DBMS. This paper describes algorithms for insertion, deletion and searching of variable-length strings in disk-resident trie structures. We also compare our implementation with existent B-tree implementation and show that proposed data structure in some cases occupies several times less disk space than B-tree does with the same search efficiency.

Keywords

dictionary data structures; tries; B-trees; DBMS indicies

Edition

Proceedings of the Institute for System Programming, vol. 20, 2011, pp. 283-296.

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

Full text of the paper in pdf (in Russian) Back to the contents of the volume