Search

Will data locality optimization possible at reducer phase ?



No, Reduce tasks can not be started on nodes where the map outputs are present on the cluster because usually reduce tasks are lesser in number compared to map tasks and some time a single reducer is required to process all the map tasks output.

So, map outputs need to be transferred to the nodes on which reduce tasks get executed.