Compression in Hadoop
File compression brings two major benefits: it reduces the space needed to store files, and it speeds up data transfer across the network, or to or from disk. When dealing with large volumes of data, both of these savings can be significant, so it pays to carefully consider how to use compression in Hadoop. Some of the compression formats used in Hadoop Compression Format Tool Algorithm Filename Extension Splittable DEFLATE NA DEFLATE .deflate No gzip gzip DEFLATE .gz No bzip2 bzip2 bzip2 .bz2 Yes LZO lzop LZO .lzo No Snappy NA Snappy .snappy No Codecs A codec is the implementation of a compression-decompression algorithm and in Hadoop, it is represented by an implementation of the CompressionCodec interface. Compression Format Hadoop CompressionCodec DEFLATE org.apac