Throughput
is the amount of work done in a unit time. It describes how fast the data is
getting accessed from the system and it is usually used to measure performance
of the system. In HDFS, when we want to perform a task or an action, then the
work is divided and shared among different systems. So all the systems will be
executing the tasks assigned to them independently and in parallel. So the work
will be completed in a very short period of time. In this way, the HDFS gives
good throughput. By reading data in parallel, we decrease the actual time to read
data tremendously.