Search

What is best suitable compression format large sized Log files?



  • Use a compression format that supports splitting, like bzip2 (although bzip2 is fairly slow), or one that can be indexed to support splitting, like LZO.
  • Use Sequence File, which supports compression and splitting.
  • Use an Avro data file, which supports compression and splitting, just like Sequence File, but has the added advantage of being readable and writable from many languages, not just Java. 
  • For large files, you should not use a compression format that does not support splitting on the whole file, since you lose locality and make MapReduce applications very inefficient.