One Star

tPigLoad from S3 in problem

Using with HDP 2.3, Talend 6.0.1 latest release and attempting to load data file from s3. I can load and dump some of the data fine on Pig console. Any suggestions? Thanks!
Error during parsing. Failed to create DataStorage
Caused by: 
Failed to parse: Failed to create DataStorage
: org.apache.pig.PigServer - exception during parsing: Error during parsing. Failed to create DataStorage
Failed to parse: Failed to create DataStorage
at org.apache.pig.parser.QueryParserDriver.parse(
at org.apache.pig.PigServer$Graph.parseQuery(
at org.apache.pig.PigServer$Graph.registerQuery(
at org.apache.pig.PigServer.registerQuery(
at org.apache.pig.PigServer.registerQuery(
at myproject.local_other_process_data_0_1.local_other_process_data.tPigLoad_2Process(
at myproject.local_other_process_data_0_1.local_other_process_data.runJobInTOS(
at myproject.local_other_process_data_0_1.local_other_process_data.main(
Caused by: java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(
at org.apache.pig.builtin.JsonMetadata.findMetaFile(
at org.apache.pig.builtin.JsonMetadata.getSchema(
at org.apache.pig.builtin.PigStorage.getSchema(
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(
at org.apache.pig.newplan.logical.relational.LOLoad.<init>(
at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(
at org.apache.pig.parser.LogicalPlanGenerator.load_clause(
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(
at org.apache.pig.parser.LogicalPlanGenerator.statement(
at org.apache.pig.parser.LogicalPlanGenerator.query(
at org.apache.pig.parser.QueryParserDriver.parse(
... 7 more
Caused by: No FileSystem for scheme: s3n
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
at org.apache.hadoop.fs.FileSystem.createFileSystem(
at org.apache.hadoop.fs.FileSystem.access$200(
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
at org.apache.hadoop.fs.FileSystem$Cache.get(
at org.apache.hadoop.fs.FileSystem.get(
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(
... 20 more
One Star

Re: tPigLoad from S3 in problem

Figured out the problem. You've to add jars in the tPigLoad component - its a manual step to figure out what jars to add and takes quite long as you've to add one by one. I would hope when the tool provides a checkbox "s3 path" it should include such required jars by itself and not leave it on end user to do it. :-).. glad it worked eventually!
- Zeeshan
One Star

Re: tPigLoad from S3 in problem

can you please give us a tip which are the libraries ?