| Abstract |
In this thesis we present parallel algorithms and implementations of a bzip2-like lossless data
compression scheme for GPU architectures. Our approach parallelizes three main stages in the
bzip2 compression/decompression pipeline: Burrows-Wheeler transform (BWT), move-to-front
transform (MTF), and Huffman coding. In particular, we utilize a two-level hierarchical sort
for the forward-BWT, invent a novel parallel reverse-BWT algorithm, design a novel scan-based
parallel MTF and reverse-MTF algorithm, implement parallel Huffman encoding/decoding, and
implement a parallel reduction scheme to build the Huffman tree. For each algorithm, we perform
detailed performance analysis, discuss its strengths and weaknesses, and suggest future
directions for improvements. Overall, our GPU implementation of the compression pipeline is
dominated by BWT performance and is 2.78x slower than bzip2, with BWT and MTF+Huffman
respectively 2.89x and 1.34x slower on average. Our overall GPU decompression runtime
is distributed more evenly and is 1.2x faster, with reverse-BWT 2.45x faster and reverse-
MTF+Huffman 1.83x slower than bzip2.
|