Demo2_UseCase_HiveELT - MapR Sandbox error

One Star

Demo2_UseCase_HiveELT - MapR Sandbox error

I'm trying to work through the HiveELT demo on the MapR sandbox.  I was able to do the load data and create tables demos just fine.  However, it seems to be whenever the client needs to execute a mapreduce as part of the Hive job, it fails.
Regular mapreduce demos also work fine.  I can even run the generated HiveQL statement in Hue, so this seems to be only affecting Hive based MapReduce through the client.  Error portion of the Log is below, any help would be appreciated!
java.io.IOException: cannot find dir = maprfs://maprdemo:7222/user/talend/data/usecase1/in/orders/orders.txt in pathToPartitionInfo:
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:260)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.<init>(CombineHiveInputFormat.java:104)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:409)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1060)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1052)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:173)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:934)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:885)
: org.apache.hadoop.hive.ql.exec.Task - Job Submission failed with exception 'java.io.IOException(cannot find dir = maprfs://maprdemo:7222/user/talend/data/usecase1/in/orders/orders.txt in pathToPartitionInfo: )'

Four Stars

Re: Demo2_UseCase_HiveELT - MapR Sandbox error

Your error message seems to indicate you're missing your load file in hdfs... Can you either use the WebUI to browse HDFS to confirm that the file is there, or verify using PuTTY
One Star

Re: Demo2_UseCase_HiveELT - MapR Sandbox error

I verified the files exists, both through Hue, and the console.  The HiveQL created in the HiveMap task, can be run successfully through Hue as well, which validates the data is there.  But this still continues to fail on the last task.  I've made no changes to the VM, and ran the pre-req tasks to load data and create the metadata.
Employee

Re: Demo2_UseCase_HiveELT - MapR Sandbox error

Can you post the query and the create table statement?  And a screenshot of the process?
One Star

Re: Demo2_UseCase_HiveELT - MapR Sandbox error

I'm not sure where to grab the create table statement, as a pre-req job created the table fine.  I've attached a SS of the job that loads it, and where it fails.
The query statement that works in hue is (customer and order are the files referenced in the error):
SELECT
customers.customernumber, customers.customername, customers.streetaddress, customers.city, customers.zip, customers.state, SUM(orders.amount), COUNT(orders.amount), MIN(orders.amount), MAX(orders.amount), AVG(orders.amount)
FROM
 customers JOIN  orders ON(  orders.customernumber = customers.customernumber  )
GROUP BY customers.customernumber,  customers.customername,  customers.streetaddress,  customers.city,  customers.zip,  customers.state
Employee

Re: Demo2_UseCase_HiveELT - MapR Sandbox error

Ok so this may be a simple path problem.  If you look at the job that created the tables and check where the orders.txt file is being created, I think it is different from the path in  your error message.
One Star

Re: Demo2_UseCase_HiveELT - MapR Sandbox error

Everything checks out.  I can even browse and query the orders and customer table in Hue.  I can even run the query referenced above.  This error only occurs in a hive task that executes a mapreduce.  Creating and dropping tables runs fine (I can connect to hive).
I'm not sure if its a reference issue.  But I downloaded this VM from Talend and made no changes before executing these tasks...