Search

Problem scenario 2 : You have been given below information Avro schema as below



{
  "type" : "record",
  "name" : "orders_partition4",
  "doc" : "Hive partitioned table schema ",
  "fields" : [ {
    "name" : "order_id",
    "type" : [ "int", "null" ]
  }, {
    "name" : "order_date",
    "type" : [ "long", "null" ]
  }, {
    "name" : "order_customer_id",
    "type" : [ "int", "null" ]
  }, {
    "name" : "order_status",
    "type" : [ "string", "null" ]
  } ],
  "tableName" : "orders_partition4"
}
1. Create a managed hive table based on above schema and two additional columns as below
order_value with default value as -9999 (This column can not be null)
order_description with default value as "Not Defined" (This column can store null)
2. While creating table use the data from below location.
hdfs:///user/hive/warehouse/retail_stage.db/orders
Solution :
Step 1 : Create a schema file named order.avsc locally and modiy it for two additional column as below also add the dfault value for these new columns.
{
  "type" : "record",
  "name" : "orders_partition1",
  "doc" : "Hive partitioned table schema ",
  "fields" : [ {
    "name" : "order_id",
    "type" : [ "int", "null" ]
  }, {
    "name" : "order_date",
    "type" : [ "long", "null" ]
  }, {
    "name" : "order_customer_id",
    "type" : [ "int", "null" ]
  }, {
    "name" : "order_status",
    "type" : [ "string", "null" ]
  }, {
    "name" : "order_value",
    "type" : "int",
    "default" : -9999
  } , {
    "name" : "order_description",
    "type" : [ "string", "null" ],
    "default" : "Not Defined"
  }  ],
  "tableName" : "orders_partition2"
}
Step 2 : Create an avsc file in hdfs at below location (You can use either hue or create file locally and upload to hdfs)
hdfs://quickstart.cloudera/user/cloudera/retail_stage1/sqoop_import_orders2.avsc
Step 3 : 
Create a managed table
CREATE TABLE orders_partition4 (
order_id int,
order_date bigint,
order_customer_id int,
order_status string,
order_value int,
order_description string
)STORED AS AVRO
LOCATION 'hdfs:////user/hive/warehouse/retail_stage.db/orders'
TBLPROPERTIES ('avro.schema.url'='hdfs://quickstart.cloudera/user/cloudera/retail_stage1/sqoop_import_orders2.avsc');
Step 4 : Check the tables;
select * from orders_partition4;
describe orders_partition4;
-------------------------------------------
{
  "type" : "record",
  "name" : "orders_partition4",
  "doc" : "Hive partitioned table schema ",
  "fields" : [ {
    "name" : "order_id",
    "type" : [ "int", "null" ]
  }, {
    "name" : "order_date",
    "type" : [ "long", "null" ]
  }, {
    "name" : "order_customer_id",
    "type" : [ "int", "null" ]
  }, {
    "name" : "order_status",
    "type" : [ "string", "null" ]
  } ],
  "tableName" : "orders_partition4"
}