Search

If reducers do not start before all the mappers are completed then why does the progress on MapReduce job shows something like Map(80%) Reduce(20%) ?



As said above, Reducers start copying intermediate output data from map tasks as soon as they are available and task progress calculation counts this data copying as well. So, even though the actual reduce() method is not triggered to run on map output data, job progress displays completion percentage of reduce phase as 10 % or 20 %. But the actual reduce() method processing starts execution only after completion of map phase by 100 %.