Four Stars

Import dirty delimited input - (SAP spool data)

Dear community,
I am relatively new to Talend (=already built some succesful jobs). Can you help me how to elegantly tackle following:
* I have a regular download a background job in SAP print spool.
* It is a dirty input which in fact is both a positional and delimited file (a positionaltable separated by "|") - excerpt below
* Due to being a print spool, every 65 lines I have 4-5 lines with inconsistent stuructre vs the other lines (header for a new printed page) - one line has column separators, the other not, the third is a column header with correct spearators, etc.
This confuses the tFileimportDelimited component, as these pageheader lines contain different nr of columns and no line break character, so the columns start to shift, then run into an error when the schema does not fit anymore.
What shall I use to flawlessly import such a file. - My current Talend job is uploaded too.
Bonus question: it has dates of 00.00.0000 and timestamps of 00:00:00 which cause an error in SQL - probably not null content is handed over from Talend, however in Excel I can see the content becomes -1 (could be an issue of Excel only, as it is able to deal with 1BC or 1 AD, no null)
| 0001|44033745 |AR01|AR12|PPPG23AR1 |            1.180 |PC |PC |9910685721|          |0000000000|03.11.2014|5007554593|            |          |101|15.04.2015|302944    |          |02.11.2017          |15.04.2015|00:00:00|
| 0005|44063917 |AR01|AR11|EY4014    |            5.760 |PC |PC |5450002789|          |0000000000|29.01.2015|5007554603|            |          |101|15.04.2015|CH8000    |          |31.12.2017          |00.00.0000|00:00:00|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
16.12.2016                                                                                    SQL - movements 101,102 for batch age                                                                                        2
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Item|Material |Plnt|SLoc|Batch     |          Quantity|BUn|BUn|PO        |Cost Ctr  | TO Number|Manuf. Dte|Mat. Doc. |Order       |Sales Ord.|MvT|Pstng Date|Vendor    |Customer  |Shelf life exp. date|Doc. Date |GI Time |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0001|44063917 |AR01|AR11|EY4515    |           18.888 |PC |PC |5450002789|          |0000000000|29.01.2015|5007554603|            |          |101|15.04.2015|CH8000    |          |31.12.2017          |27.02.2015|00:00:00|
| 0003|44063917 |AR01|AR11|EY4516    |           17.640 |PC |PC |5450002789|          |0000000000|29.01.2015|5007554603|            |          |101|15.04.2015|CH8000    |          |31.12.2017          |27.02.2015|00:00:00|
5 REPLIES
Moderator

Re: Import dirty delimited input - (SAP spool data)

Hi,
What does your expected result look like? Have you tried to use tfileinputpositional component to set custom pattern to see if it is Ok with you?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: Import dirty delimited input - (SAP spool data)

Expected output is an MSSQLUpload, cleaned up, trimmed, bad rows ommitted, 0 date fixed.
Rows between batches EY4014 and EY4515 left out (these repeat every 65 rows while full import is 100.000s rows)
Field names are in the header
Yes, the first component is a positional import.
That works better, but is painful to set up, and the field width seems also to vary depending the longest value in the extract, so at repeating extraction, would fail. SO I would prefer a delimited import with "|" as separator, but it seems the text file needs first a cleaning from these odd rows before starting the delimited import. Or shall I use Multi-scheme or something?
Thanks
Moderator

Re: Import dirty delimited input - (SAP spool data)

Hi,
Here is "Trim"option in advanced setting of tfileinputdelimited component which trims the leading and trailing white space.
Have you tried to check out this option to see if it works? 
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Moderator

Re: Import dirty delimited input - (SAP spool data)

Hi,
Is this solution Ok with you? Any update for your issue?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: Import dirty delimited input - (SAP spool data)

I still have a problem. Delimited gets confused as soon as it hits the first line with only ------- in it, and no field separator.
It takes 3 rows as Material field, and from there the fields start to shift.
What I mean is e.g. Material 44063917 becomes the plant nr, etc. - actually the conversion fails from here, I only get rejected lines from tfileinputdelimited.
Need a test data to try yourself?