Search

Define RDD.?



RDD is the acronym for Resilient Distribution Datasets – a fault-tolerant collection of operational elements that run parallel.
The partitioned data in RDD is immutable and distributed. There are primarily two types of RDD:
  • Parallelized Collections : The existing RDD’s running parallel with one another.
  • Hadoop datasets: perform function on each file record in HDFS or other storage system.