Let us
take a scenario where we want to count the population in two cities. I have a
data set and sensor list of different cities. I want to count the population by
using one mapreduce for two cities. Let us assume that one is Bangalore and the
other is Noida. So I need to consider key of Bangalore city similar to Noida
through which I can bring the population data of these two cities to one
reducer. The idea behind this is some how I have to instruct map reducer
program – whenever you find city with the name ‘Bangalore‘ and city with the
name ‘Noida’, you create the alias name which will be the common name for these
two cities so that you create a common key for both the cities and it get
passed to the same reducer. For this, we have to write custom partitioner.
In mapreduce when you create a ‘key’ for city, you have to consider ’city’ as the key. So, whenever the framework comes across a different city, it considers it as a different key. Hence, we need to use customized partitioner. There is a provision in mapreduce only, where you can write your custom partitioner and mention if city = bangalore or noida then pass similar hashcode. However, we cannot create custom partitioner in Pig. As Pig is not a framework, we cannot direct execution engine to customize the partitioner. In such scenarios, MapReduce works better than Pig.
In mapreduce when you create a ‘key’ for city, you have to consider ’city’ as the key. So, whenever the framework comes across a different city, it considers it as a different key. Hence, we need to use customized partitioner. There is a provision in mapreduce only, where you can write your custom partitioner and mention if city = bangalore or noida then pass similar hashcode. However, we cannot create custom partitioner in Pig. As Pig is not a framework, we cannot direct execution engine to customize the partitioner. In such scenarios, MapReduce works better than Pig.