HDFS is
more suitable for large amount of data sets in a single file as compared to
small amount of data spread across multiple files. This is because Namenode is
a very expensive high performance system, so it is not prudent to occupy the
space in the Namenode by unnecessary amount of metadata that is generated for
multiple small files. So, when there is a large amount of data in a single
file, name node will occupy less space. Hence for getting optimized
performance, HDFS supports large data sets instead of multiple small files.