Hadoop DP Notes
Home
Contact Us
Search
What is “RDD”?
RDD stands for Resilient Distribution Datasets: a collection of fault-tolerant operational elements that run in parallel. The partitioned data in RDD is immutable and is distributed in nature.
Newer Post
Older Post
Home
What are the main components of Mapreduce Job ?
Main driver class which provides job configuration parameters. Mapper class which must extend org.apache.hadoop.mapredu...
TO_CHAR
This will be used to extract various date formats. The available date formats as follows. Syntax: to_char ( date , fo...