Search

Problem Scenario 1 : You have been given MySQL DB with following details. user=root password=root database=retail_db table=retail_db.categories jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following activities.


1. Connect MySQL DB and check the content of the tables.
2. Copy "retail_db.categories" table to hdfs, without specifying directory name.
3. Copy "retail_db.categories" table to hdfs, in a directory name "categories_target".
4. Copy "retail_db.categories" table to hdfs, in a warehouse directory name "categories_warehouse".


Solution : 
Step 1 : Connecting to existing MySQL Database
mysql --user=root --password=root retail_db

Step 2 : Show all the available tables
show tables;

Step 3 : View/Count data from a table in MySQL
select count(1) from categories;

Step 4 : Check the currently available data in HDFS directory
hdfs dfs -ls

Step 5 : Import Single table (Without specifying directory).
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username=root --password=root -table=categories 
Note : Please check you dont have space between before or after '=' sign.

Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs

Step 6 : Read the data from one of the partition, created using above command.
hdfs dfs -cat categories/part-m-00000

Step 7 : Specifying target directory in import command (We are using number of mappers =1, you can change accordingly)
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username=root --password=root --table=categories --target-dir=categories_target --m 1

Step 8 : Check the content in one of the partition file.
hdfs dfs -cat categories_target/part-m-00000

Step 9 : Specifying parent directory so that you can copy more than one table in a specified target directory. Command to specify warehouse directory
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username=root --password=root --table=categories --warehouse-dir=categories_warehouse --m 1

Step 10 : See the content in one of the file (partition)
hdfs dfs -cat categories_warehouse/categories/part-m-00000