Search

What is copy phase in Reduce tasks ?



In Mapreduce framework, the map tasks may finish at different times, but the reduce tasks start copying map task outputs as soon as each map task completes. This is known as the copy phase of the reduce task.
The reduce task has five copier threads by default so that it can fetch map outputs in parallel, but this number can be changed by setting the mapred.reduce.parallel.copies property.