The
“SparkCore” performs an array of critical functions like memory management,
monitoring jobs, fault tolerance, job scheduling and interaction with storage
systems.
It is the
foundation of the overall project. It provides distributed task dispatching,
scheduling, and basic input and output functionalities. RDD in Spark Core makes
it fault tolerance.
RDD is a collection of items distributed across many nodes
that can be manipulated in parallel. Spark Core provides many APIs for building
and manipulating these collections.