[resolved] CSV Input Problem

One Star

[resolved] CSV Input Problem

Hello,
I have a problem with my CSV Files, I cannot use tFileInputDelimited to input data from a CSV File without any problems.
The CSV File got ~1million rows and 80 columns.
Some Rows are comments that are just one big string and some Columns also contain comments.
When using tFileInputDelimited I always get the Error: "For input string: "(whole comment)""
The problem is that I want these "CommentRows" to be ignored, is there any way to do that?
Thanks for your help.
Moderator

Re: [resolved] CSV Input Problem

Hi AWeller,
Some Rows are comments that are just one big string and some Columns also contain comments.

Could you show us some sample of your input source, please?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: [resolved] CSV Input Problem

Hi Sabrina,
I can't post data of these directly, so i will just make 2 example rows that look like the csv rows.
The Columns are separated by pipe (|)
My problem is that the first column should be a number, but some rows contain strings instead of numbers and i want to know if there is any possibility to filter these rows out of the input data so that i won't get an error like "For input string:..."
row 1:
ijfioejfiwehfiwoefewifewifewifwhrtherhezthtzrehjztrjtzrjtzrjztrjzjtzjztjtzjztfjftjtfrjrjrtjtrjtrjtrjtrjttjtjtzjtzjttzjztjtzjztjtzjztjtzjj||0|N|4|5|30||||1|43242523|Z|K|54350435345|gireginegioerjnhgiergjhrie|16.11.10.|fjf_fe|FA|mfioehfoiuwehfeowifjhie
row 2:
190226546503|1|14.05.11|1|1|1|1|61454500|613854545500545427801|1|0|0|4197156446|||9756407|234634010|0|6451|7641|266432,56|||||||0|||51|71|2112,56||||||32||242,56|||||||0|||23442,56||1||||332||trtrtrtrtr
Thanks,
Arno
Seventeen Stars

Re: [resolved] CSV Input Problem

hi,
could you read all fields as String, store somewhere and filter your data 'later' ?
it's often easier&helpful to extract raw data as String only.
If you don't have to do some calculation on your data, never mind to read number as a string Smiley Wink
regards
One Star

Re: [resolved] CSV Input Problem

Thanks! I'm already trying this it seems to work Smiley Tongue
Now I have another problem, I wanna connect those 2 CSV Files with tMap and output one combined CSV File, but it seems to be too much Data because it needs extraordinary long and only gets ~1462 rows/s and stops after 150 seconds with the Error
"Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded"
Any idea on that?
Regards
Seventeen Stars

Re: [resolved] CSV Input Problem

I wanna connect those 2 CSV Files with tMap and output one combined CSV File

what kind of 'combination' ? are you doing some mapping, join filter. ?
Do you need all the data to be in the tMap ?
First thing,  it's to manage Only the data you need => filter, extract data to keep the right ones.
You can also increase allocated memory of your jvm.
GC overhead limit exceeded mean that Garbage collector is working too hard, too often Smiley Happy
regards
laurent
One Star

Re: [resolved] CSV Input Problem

what kind of 'combination' ? are you doing some mapping, join filter. ?
Do you need all the data to be in the tMap ?
First thing,  it's to manage Only the data you need => filter, extract data to keep the right ones.

Join filter.
No, i'm filtering Data in the tMap, seems to be a bad idea Smiley Surprised
What components are good for filtering? I'm trying tFilterColumns and tFilterRow.
Thanks for all that help!
Seventeen Stars

Re: [resolved] CSV Input Problem

i'm filtering Data in the tMap, seems to be a bad idea

it's depend on several things ...
but if you're made some join, it's better to filter before tMap.
be aware that all data from lookup flow are store in memory by default . it could be a reason for your "out of memory".

Are you constraint by the memory that you can allocate to your jvm (base on the production environment) ?
A better way, if it's not a "real-time" application , could be filter data in a job & store in another file or tables (I/O with mysql isam engine could be  a good solution).
Read the filtering data and join your data in a separate job.
When you got some problem with memory, try to cut treatment in to several jobs.
Using tables to store raws data and make filter with a where clause can also be a solution.
When you haven't got a lot of allocated memory for your jvm, store data in database can help you.
for example by using ELT component to join data you will work with DB engine ressources => less needed memory for your jvm.
Keep in mind that you have to optimize your Talend job before thinking to increase JVM memory Smiley Wink
https://community.talend.com/t5/Migration-Configuration-and/OutOfMemory-Exception/ta-p/21669?content...
One Star

Re: [resolved] CSV Input Problem

Well, all worked fine with tFilterColumns before tMap. I even combined 3 CSV's with 6 million, 3 million and 2 million rows into a new CSV with 3 million rows with tmap in 600 seconds.
I only got Memory Warnings like "Warning: to avoid a Memory heap space error the buffer of the flow has been limited to a size of 2000000 , try to reduce the advanced parameter "Max buffer size" (~100000 or at least less than 2000000), then if needed try to increase the JVM Xmx parameter."
I made tMap store the temp Data in a extra folder so this worked fine for me.
Thanks!
Seventeen Stars

Re: [resolved] CSV Input Problem

great , put post as resolved if it's the case Smiley Wink