One Star

tPigLoad from S3 in problem

Using with HDP 2.3, Talend 6.0.1 latest release and attempting to load data file from s3. I can load and dump some of the data fine on Pig console. Any suggestions? Thanks!
Error during parsing. Failed to create DataStorage
Caused by: 
Failed to parse: Failed to create DataStorage
: org.apache.pig.PigServer - exception during parsing: Error during parsing. Failed to create DataStorage
Failed to parse: Failed to create DataStorage
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:201)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1735)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1682)
at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
at org.apache.pig.PigServer.registerQuery(PigServer.java:636)
at myproject.local_other_process_data_0_1.local_other_process_data.tPigLoad_2Process(local_other_process_data.java:1561)
at myproject.local_other_process_data_0_1.local_other_process_data.runJobInTOS(local_other_process_data.java:2152)
at myproject.local_other_process_data_0_1.local_other_process_data.main(local_other_process_data.java:2009)
Caused by: java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:53)
at org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:109)
at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:189)
at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:538)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:901)
at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
... 7 more
Caused by: java.io.IOException: No FileSystem for scheme: s3n
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70)
... 20 more
2 REPLIES
One Star

Re: tPigLoad from S3 in problem

Figured out the problem. You've to add jars in the tPigLoad component - its a manual step to figure out what jars to add and takes quite long as you've to add one by one. I would hope when the tool provides a checkbox "s3 path" it should include such required jars by itself and not leave it on end user to do it. :-).. glad it worked eventually!
- Zeeshan
One Star

Re: tPigLoad from S3 in problem

can you please give us a tip which are the libraries ?