- Use a compression format that supports splitting, like bzip2 (although bzip2 is fairly slow), or one that can be indexed to support splitting, like LZO.
- Use Sequence File, which supports compression and splitting.
- Use an Avro data file, which supports compression and splitting, just like Sequence File, but has the added advantage of being readable and writable from many languages, not just Java.
- For large files, you should not use a compression format that does not support splitting on the whole file, since you lose locality and make MapReduce applications very inefficient.
What is best suitable compression format large sized Log files?
-
Main driver class which provides job configuration parameters. Mapper class which must extend org.apache.hadoop.mapredu...
-
This will be used to extract various date formats. The available date formats as follows. Syntax: to_char ( date , fo...