An Optimized Byte Pair Encoding Algorithm for String Matching on Compressed Text

Abstract / Excerpt:

The compressed pattern matching problem is a heavily tackled branch of string pattern matching - one of the operations of string processing. One major part involve in the compressed pattern matching problem is the compression algorithm. In this paper, we discuss the optimization of one compression algorithm - the Byte Pair Encoding Algorithm.
A study of Takeda et al. already acknowledges the Byte Pair Encoding as text compression scheme that accelerates pattern matching. The Byte Pair Encoding has its perceived drawbacks, such as uncompetitive compression ratio and compression time in comparison to well-known compression algorithms.

The proponents of this paper are interested in the pursuit of this study to further improve not only the pattern matching speed but also to make the use of this compression algorithm practical - by improving its compression ration and compression time.

Info
Source InstitutionAteneo de Davao University
UnitComputer Studies
AuthorsOlivar, Ravilo Ven, Pasion, Nico Archelaus, Yap, Jestoni Mark
Page Count8
Place of PublicationDavao City
Original Publication DateSeptember 1, 2010
Tags Byte Pair Encoding Algorithm, String Marching
Preview

Download the PDF file .