One Star

How to push data from HDFS to Hive Table

Hi All,

I am not able to push data from HDFS to Hive .
Objective :
Create tables and load data into Hive.

Platform :
CDH 4.4 .
TOS for Big Data 5.4.0
Ubuntu OS.

Architecture :
Both CDH and TOS are on same machine(CDH with Single host implementation)
Components being used are :
tHiveConnections
tHiveCreateTable
tHiveLoad
More info :
Screen shot.

  • Big Data
6 REPLIES
One Star

Re: How to push data from HDFS to Hive Table

Error Log:
Starting job Test at 06:53 14/11/2013.

connecting to socket on port 4047
connected
: org.apache.hadoop.hive.conf.HiveConf - hive-site.xml not found on CLASSPATH
: org.apache.hadoop.conf.Configuration - mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
: org.apache.hadoop.conf.Configuration - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
: org.apache.hadoop.conf.Configuration - mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
: org.apache.hadoop.conf.Configuration - mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
: org.apache.hadoop.conf.Configuration - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
: org.apache.hadoop.conf.Configuration - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
: org.apache.hadoop.conf.Configuration - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@2b03be0:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
: org.apache.hadoop.conf.Configuration - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@2b03be0:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
: org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Opening raw store with implemenation classSmiley Surprisedrg.apache.hadoop.hive.metastore.ObjectStore
: org.apache.hadoop.hive.metastore.ObjectStore - ObjectStore, initialize called
: DataNucleus.Persistence - Property datanucleus.cache.level2 unknown - will be ignored
: org.apache.hadoop.conf.Configuration - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@bf47ae8:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
: org.apache.hadoop.conf.Configuration - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@bf47ae8:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
: org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.hive.metastore.ObjectStore - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
: org.apache.hadoop.hive.metastore.ObjectStore - Initialized ObjectStore
Hive history file=/tmp/root/hive_job_log_8aaa235f-3f7d-4acc-a297-a13212b2ef24_1181626818.txt
: hive.ql.exec.HiveHistory - Hive history file=/tmp/root/hive_job_log_8aaa235f-3f7d-4acc-a297-a13212b2ef24_1181626818.txt
: org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
: org.apache.hadoop.conf.Configuration - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@4eda1515:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
: org.apache.hadoop.conf.Configuration - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@4eda1515:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
: org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
: org.apache.hadoop.hive.service.HiveServer - Putting temp output to file /tmp/root/8aaa235f-3f7d-4acc-a297-a13212b2ef245572030930163956377.pipeout
: org.apache.hadoop.hive.service.HiveServer - Running the query: set hive.fetch.output.serde = org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
: org.apache.hadoop.hive.service.HiveServer - Putting temp output to file /tmp/root/8aaa235f-3f7d-4acc-a297-a13212b2ef245572030930163956377.pipeout
: org.apache.hadoop.hive.service.HiveServer - Running the query: SET mapreduce.framework.name=yarn
: org.apache.hadoop.hive.service.HiveServer - Putting temp output to file /tmp/root/8aaa235f-3f7d-4acc-a297-a13212b2ef245572030930163956377.pipeout
: org.apache.hadoop.hive.service.HiveServer - Running the query: SET yarn.resourcemanager.address=mlbis.local:8032
: org.apache.hadoop.hive.service.HiveServer - Putting temp output to file /tmp/root/8aaa235f-3f7d-4acc-a297-a13212b2ef245572030930163956377.pipeout
: org.apache.hadoop.hive.service.HiveServer - Running the query: CREATE TABLE IF NOT EXISTS tkt_details(DeviceName STRING,TicketCount STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '
' STORED AS TEXTFILE
: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver -
: hive.ql.parse.ParseDriver - Parsing command: CREATE TABLE IF NOT EXISTS tkt_details(DeviceName STRING,TicketCount STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '
' STORED AS TEXTFILE
: hive.ql.parse.ParseDriver - Parse Completed
: org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - Starting Semantic Analysis
: org.apache.hadoop.hive.ql.parse.SemanticAnalyzer - Creating table tkt_details position=28
: hive.metastore - Trying to connect to metastore with URI thrift://mlbis.local:9083
: hive.metastore - Waiting 1 seconds before next connection attempt.
OK
: hive.metastore - Connected to metastore.
: org.apache.hadoop.hive.ql.Driver - Semantic Analysis Completed
: org.apache.hadoop.hive.ql.Driver - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.conf.Configuration - mapred.job.name is deprecated. Instead, use mapreduce.job.name
: org.apache.hadoop.hive.ql.Driver - Starting command: CREATE TABLE IF NOT EXISTS tkt_details(DeviceName STRING,TicketCount STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '
' STORED AS TEXTFILE
: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.Driver - OK
: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.service.HiveServer - Returning schema: Schema(fieldSchemas:null, properties:null)
: org.apache.hadoop.hive.service.HiveServer - Running the query: LOAD DATA INPATH '/tmp/logs/mssql.txt' OVERWRITE INTO TABLE tkt_details
: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver -
: hive.ql.parse.ParseDriver - Parsing command: LOAD DATA INPATH '/tmp/logs/mssql.txt' OVERWRITE INTO TABLE tkt_details
: hive.ql.parse.ParseDriver - Parse Completed
Loading data to table default.tkt_details
: org.apache.hadoop.hive.ql.Driver - Semantic Analysis Completed
: org.apache.hadoop.hive.ql.Driver - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver - Starting command: LOAD DATA INPATH '/tmp/logs/mssql.txt' OVERWRITE INTO TABLE tkt_details
: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.exec.Task - Loading data to table default.tkt_details from hdfs://mlbis.local:8020/tmp/logs/mssql.txt
Failed with exception Unable to move sourcehdfs://mlbis.local:8020/tmp/logs/mssql.txt to destination /user/hive/warehouse/tkt_details/mssql.txt
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
: org.apache.hadoop.hive.ql.exec.Task - Failed with exception Unable to move sourcehdfs://mlbis.local:8020/tmp/logs/mssql.txt to destination /user/hive/warehouse/tkt_details/mssql.txt
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move sourcehdfs://mlbis.local:8020/tmp/logs/mssql.txt to destination /user/hive/warehouse/tkt_details/mssql.txt
at org.apache.hadoop.hive.ql.metadata.Hive.renameFile(Hive.java:2031)
at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2175)
at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:627)
at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1301)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:234)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:192)
at org.apache.hadoop.hive.jdbc.HiveStatement.execute(HiveStatement.java:132)
at demobigdata.test_0_1.Test.tHiveLoad_1Process(Test.java:724)
at demobigdata.test_0_1.Test.tHiveCreateTable_2Process(Test.java:624)
at demobigdata.test_0_1.Test.tHiveConnection_1Process(Test.java:450)
at demobigdata.test_0_1.Test.runJobInTOS(Test.java:970)
at demobigdata.test_0_1.Test.main(Test.java:835)
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied by sticky bit setting: user=root, inode="/tmp/logs/mssql.txt":admin:supergroup:-rw-r--r--
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkStickyBit(FSPermissionChecker.java:245)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:146)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4705)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4687)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkParentAccess(FSNamesystem.java:4655)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:2695)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:2663)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:2642)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:610)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44964)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1427)
at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:339)
at org.apache.hadoop.hive.ql.metadata.Hive.renameFile(Hive.java:2027)
... 18 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied by sticky bit setting: user=root, inode="/tmp/logs/mssql.txt":admin:supergroup:-rw-r--r--
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkStickyBit(FSPermissionChecker.java:245)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:146)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4705)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4687)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkParentAccess(FSNamesystem.java:4655)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:2695)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:2663)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:2642)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:610)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44964)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745)
at org.apache.hadoop.ipc.Client.call(Client.java:1237)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy14.rename(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy14.rename(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:355)
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1425)
... 20 more
: org.apache.hadoop.hive.ql.Driver - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
: org.apache.hadoop.hive.ql.Driver -

: org.apache.hadoop.hive.ql.Driver -
: org.apache.hadoop.hive.ql.Driver -

disconnected
Job Test ended at 06:53 14/11/2013.
Employee

Re: How to push data from HDFS to Hive Table

You just have a permission issue within HDFS.
Please change that using hadoop fs -chmod and haddop fs -chown
One Star

Re: How to push data from HDFS to Hive Table

Hi Remy ,
can you please suggest me on the components i should be using to write data directly from hDFS to Hive table. I tried with tHiveLoad which upload data from file and not from HDFS file . Can you please guide me . Currently i am pushing or writing data to a file on linux box and then fetch that to hive tables using tHiveload.
Employee

Re: How to push data from HDFS to Hive Table

Hi,
If your data is already on HDFS, then I think you can use the tHDFSCopy to move the data from the current location to the HDFS location of your Hive table.
Does it make sense?
One Star

Re: How to push data from HDFS to Hive Table

Hi Remy,
Makes sense . Thanks a lot. Can you do me another favor . Can you suggest any better BI engine which would fit better with Talend Open Studio for Big Data. Currently , i am trying with SpagoBI and find it pretty confusing and complex. Thanks.
One Star

Re: How to push data from HDFS to Hive Table

Hi 
I think problem in your tHiveLoad component.Your local has been checked.Checked it out and give the hdfs file path,it will load data to your Hive Table directly.