Search

When can we use side data distribution by Job Configuration and when it is not supposed ?



Side data distribution by job configuration is useful only when we need to pass a small piece of meta data to map/reduce tasks.

We shouldn’t use this mechanism for transferring more than a few KB’s of data because it put pressure on the memory usage, particularly in a system running hundreds of jobs.