Search

What is side data distribution in Mapreduce framework ?



The extra read-only data needed by a mapreduce job to process the main data set is called as side data.
There are two ways to make side data available to all the map or reduce tasks.
    • Job Configuration
    • Distributed cache