I am using Talend Big Data Community Edition and one of the ask is to extract data from EBCDIC-Packed decimal data file. I had setup the tFileInputEBCDIC component in the edition and generated .xc2j file using cobol2j-1.5.4 as well Talend XML schema file using the copybook. Used the .xc2j file and the .xml schema file in tFileInputEBCDIC. The copy book is correct since the same works when tested in another ETL tool & it works. When I run in Talend after the above steps, the below error message comes after 25 records are loaded. The 25 records loaded also is incorrect and has lots of junk characters. I have the following questions: 1. Can we read a EBCDIC packed decimal file in community edition? 2. Are the steps I have described above are correct? 3. What could be the possible reason for the error and incorrect junk data in target? 4. Is there any other option/alternative available to try out? Kindly provide feedback as this is currently a roab block. Thanks in advance. Starting job New_EBCDIC_BCMaster at 21:42 24/05/2013.
connecting to socket on port 3633 connected Exception in component tFileInputEBCDIC_1 net.sf.cobol2j.RecordParseException: Unexpected EOF while reading record nr: 24. at net.sf.cobol2j.RecordSet.next(RecordSet.java:107) at talend_try1.new_ebcdic_TESTFF_0_1.New_EBCDIC_TESTFF.tFileInputEBCDIC_1Process(New_EBCDIC_TESTFF.java:13175) at talend_try1.new_ebcdic_TESTFF_0_1.New_EBCDIC_TESTFF.runJobInTOS(New_EBCDIC_TESTFF.java:17581) at talend_try1.new_ebcdic_TESTFF_0_1.New_EBCDIC_TESTFF.main(New_EBCDIC_TESTFF.java:17449) Caused by: net.sf.cobol2j.FieldParseException: RecordSet ERROR Cannot parse field: PIN-PVV. Data: '@@@', Picture: X(4), Type: X, Size: 4 RecordSet ERROR Total bytes processed before error: 69997 disconnected at net.sf.cobol2j.RecordSet.readText(RecordSet.java:282) at net.sf.cobol2j.RecordSet.getFieldsValues(RecordSet.java:156) at net.sf.cobol2j.RecordSet.next(RecordSet.java:89) ... 3 more Caused by: net.sf.cobol2j.FieldParseException: at net.sf.cobol2j.RecordSet.readText(RecordSet.java:269) at net.sf.cobol2j.RecordSet.readText(RecordSet.java:280) ... 5 more Job New_EBCDIC_TESTFF ended at 21:42 24/05/2013.
Your way is correct. Yes the EBCDIC component can read packed decimals. Your problem looks like the description about your EBCDIC file is not correct (In fact to long). Unfortunately I know only the DI Enterprise version and this version ships with a copybook wizard. Since the version 5.1.3 of Talend Studio, this wizard works without problems. I would suggest creating a very basic simple EBCDIC file with 2 datasets and one field (packed decimal) and try to read it. Often problems in the xc2j configuration raise later exception because of only small misleading offset. Did you checked the correct content of your first 25 records? I would bet there are errors.
I worked on many EBCDIC files with Talend and for sure, this issue is because wrong COBOL copy book mapping i am wonder how your other ETL tool is able to parse it using same Copybook book? please check your COBOL length is matching with file record length, Unexpected EOF while reading record nr: 24. ERROR Cannot parse field: PIN-PVV. Data: '@@@', Picture: X(4), Type: X, Size: 4 this error show that your mention length is exceeded for some of the column so it is not able to Picture column using X data type, so it should be parse using S9(4)Comp-3. If you show me actual Cobol mapping and EBCDIC file in Hex view i could suggest you in better way.
Thanks for your replies. The copybook contains 3 schemas and I used them to generate the xc2j and XML schema files. Then specified the distinguish field value to highlight the schema which is to be read and ran the job. Below is the error message and in the message "C1" is first field value. Not sure why it is appearing. Due to restrictions I am unable to share the data file and copy book. Is there any suggestions to fix the issue. kindly reply and thanks in advance.
connecting to socket on port 3456 connected Exception in component tFileInputEBCDIC_1 net.sf.cobol2j.FileFormatException: No such record format : C1 at net.sf.cobol2j.RecordSet.next(RecordSet.java:85) at training.test_0_1.test.tFileInputEBCDIC_1Process(test.java:8508) at training.test_0_1.test.runJobInTOS(test.java:18850) at training.test_0_1.test.main(test.java:18718) disconnected Job test ended at 20:05 14/06/2012.