A file
can be larger than any single disk in the network. There’s nothing that
requires the blocks from a file to be stored on the same disk, so they can take
advantage of any of the disks in the cluster. Making the unit of abstraction a
block rather than a file simplifies the storage subsystem. Blocks provide fault
tolerance and availability. To insure against corrupted blocks and disk and
machine failure, each block is replicated to a small number of physically
separate machines (typically three). If a block becomes unavailable, a copy can
be read from another location in a way that is transparent to the client.