I have to manage csv files with more then 2000 columns, I don't need all of them, but I can't filter them because the tFileInputDelimited component that I have put before FilterColumn give me the "Java exceeding 65535 bytes limit" error.
I don't know how I can select a subset of columns in advance, from metadata tab I cannot select only the first n columns..
Thanks in advance
Could you please have a look at this article about:https://community.talend.com/t5/Design-and-Development/Exceeding-the-Java-65535-bytes-limit/ta-p/180... ?
Let us know if it helps.
Thanks for your answer, I had alreary read that post but it can't help me.
My job is very simple, I have to read some columns from a csv file that has 3000 columns.
So my job contains only a tFileInputDelimited follow by a filterColumn, i cannot simplify more the job.
I could divide the job in subjob but first I have to choose the columns so I have to deal with these CSV files.
If you just need to select some of the columns, can you define the schema so that it groups some of the other columns together? For example, if you have columns A through Z, and you need D,J, and P, you could group A through C as a single column, E through I as another, and so on. I don't know if that will solve the problem, since I've never encountered this error, but if it truly is the number of columns (as opposed to the total number of characters they contain), it might work.
It's a Java (jvm) constraint that generated code cannot exceed 65535 bytes.
Is it necessary for you to read 3000 columns from your input source?
It is not necessary for me to read all 3000 columns.
Let's suppose that I need only 3 columns, the 10th, the 100th and the 1000th.
How can I manage to do this?
I have to create a tFileInputDelimited and then select only the columns of interest.
But I cannot define the tFileInputDelimited cause of the java error.
Do you know a way to preselect only the three columns that I need?
Thanks in advance
Thanks for your answer, your solution could be ok but I don't know how to group together the groups of columns.
I need an automatic way cause I'm reading more than 100000 identical csv, it's about sensors and I have a csv for each minute in which the machine is running.
The csv contains 3000 columns but I have to import only a little subset.
If all the files have the same fields, can you truncate the CSV file so the last column it contains is the last column you need? For example, if you need columns 4, 5, and 100, you would truncate it so that it only had 100 columns. I'm not sure of the best way to do this, but you might try Googling "Talend truncate CSV file".
I know how to truncate the file, but it is not enough, because I need some columns that are in the middle!
It depends on the task, is variable time to time.
Of course I won't analyze more than 100 columns together.
Has anybody a solution to this problem?
Given the fact that I don't need all 3000 columns but my files have, and that I have to choice different columns each time?
Maybe can you use a tFileInputFullRow to read the file then, using a piece of Java code with regex, extract the desired columns to generate the data flow limited to these columns.
The first 100 community members completing the Open Studio survey win a $10 gift voucher.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Read about OTTO's experiences with Big Data and Personalized Experiences
Pick up some tips and tricks with Context Variables
Take a look at this video about Talend Integration with Databricks