Everyday
a large amount of unstructured data is getting dumped into our machines. The
major challenge is not to store large data sets in our systems but to retrieve
and analyze the big data in the organizations, that too data present in
different machines at different locations. In this situation a necessity for
Hadoop arises. Hadoop has the ability to analyze the data present in different
machines at different locations very quickly and in a very cost effective way.
It uses the concept of MapReduce which enables it to divide the query into
small parts and process them in parallel. This is also known as parallel
computing.