One Star

Using TfileInputDelimiter; mark n-th row in a csv file

Hi,
i'm looking to make a Talend Job that use a csv file to take informations to put in a db (MySql 5.5).
Some of this information are about some file present in the file system and then copy these files to a different folder.
My job until now can read each row of the csv file, take the information and if the file specified in the row exits moves it in a different folder.
What i need to add it's a way to recover the job in case something wrong happen (loss of power supply, ecc)
So it was decide to mark every row passed (with a word, for example "DONE"), so that when the job start again it can avoid to redone the old row. But i' don't know how to write ( "to mark" ) the n-th line.
I'm not sure if i can use a tJavaRow and write manually the file or using a tFileOutputDelimited and mark the row (but how indicates which row i want to mark?)
Below you can see the job currently:

Thanks to anyone who can and wants help me
Bye
5 REPLIES
One Star

Re: Using TfileInputDelimiter; mark n-th row in a csv file

Hi,
I'm not sure I understood exactly what you want, but it seems you need to iterate to do this for each row (otherwise you'll write the file once every has been processed). This might not very performant if you have many rows, in this case you might have to enhance your DB schema to deal with retry. Plus you'll need to be extra careful using a file for data consistency
One Star

Re: Using TfileInputDelimiter; mark n-th row in a csv file

Hi,
I'm not sure I understood exactly what you want, but it seems you need to iterate to do this for each row (otherwise you'll write the file once every has been processed).
@ERIC2D
Yes, what i wanto to do is go through the CSV file line by line, but until now i still haven't figure out how to do.
So my firt questions is: how can read CSV a row by row?
And also i need to write on a CSV file but not appending a row at the end of the file but changing a fild in a row (or changing just the row, bust just one partciular row)
So my second question is: how can I modifcy a particular row in a CSV file without rewriting the entire file?
In the meantime I developed my Job using a new CSV file (csvTemp), to keep track of the processing and analyzing that file (csvTemp) to recover the flow in case of crash (example: i've just write on the temp file but not yet insert the data in the db, ecc.).
This choice has forced me to add many tests involving the csvTemp, the file System and the db.
I'm making some test and it's seem work but it think can be better, nore agile way, Any ideas?
"This might not very performant if you have many rows, in this case you might have to enhance your DB schema to deal with retry."
- Yeah, true. My initial file can have max 2000 rows
"Plus you'll need to be extra careful using a file for data consistency"
Againt, true. Infact i wuold like to avoid to have another file and find a way to go direct to teh last Row done.
Thank you - sorry for my english
Seven Stars

Re: Using TfileInputDelimiter; mark n-th row in a csv file

To do "something" based on each row of the file, you should pass the flow from tFileInputDelimited to tFlowToIterate.
You cannot modify a CSV file without rewriting the whole file.
The only way to go direct to the last row done is to keep a record of that row number somewhere. Then when the job is started, you can read that row number and use it as the "Header" setting of the tFileInputDelimited.
One Star

Re: Using TfileInputDelimiter; mark n-th row in a csv file

Thank you Alevy for your help!
So i can' t modify a CSV file during the execution of my Talend's job. That's useful
Menawhile, i' ve decided to solve the problem starting from the MP3 files, using a tFileList to iterate the mp3 folder and for each one made the required operation. Like you suggested, for each file i save the key, writing this information on a different csv file
See the image attached
Beyond that, you said to save the row number and using as "Header" for the tFileInputDelimitedto recover the job. But how can i do this? How can i dinamically tell to my tFileInputDelimited component to ignore the x-1 rows?
Seven Stars

Re: Using TfileInputDelimiter; mark n-th row in a csv file

You need to count the rows processed, using e.g. the Numeric.sequence routine. Then have a post-job to store that value into a file.
The next time you run the job, read the file and store the value to a globalMap or context variable and then use that variable as the "Header" setting in tFileInputDelimited.