Five Stars

Point me towards the correct components?

Talend comes with soooo many components, which is great, but as a noobie I don't know where to start looking for the tools that might help me accomplish the following:

 

I've successfully built a job in TOS that pulls data from a folders worth of csv files and copies them to a MySQL database using tFileList, tFileInputDelimited, etc.

 

What I'd like to do now is log the name of each file that gets copied and the datetime it was copied. Then, when the job is run in the future, I want to consult that log and limit my export to the files which either haven't been copied or have been edited since the last time they were copied. So, a couple questions:

  1. At the end of each file iteration, how can I generate a single row containing the filename and current time to insert into a db?
  2. Within each iteration, how can I compare the current file name and current file modified on date to the data from transaction log and then short-circuit the iteration if appropriate?
7 REPLIES
Ten Stars

Re: Point me towards the correct components?

after your iteration with an onsubjobok trowgwnwrator
1 row, String, getting (String)globalMap.get("tFilelist1_FILE……")
then you know how to insert into bd

for the secon point:
after file iteration add tmap with a lookup on tMysqlInput.

good luck

Francois Denis

Don't forget to tag when it's "solved"!

Five Stars

Re: Point me towards the correct components?

Not sure I understand what you're saying for my second point. The only output from tFile is an iterate row and that is not a valid input for a tMap, so I don't understand what you mean when you say "after file iteration add tmap".

Five Stars

Re: Point me towards the correct components?

In regards to your first point. I have added a tRowGenerator element but I don't see any way of calling globalMap.get("tFilelist1_FILE……").

functionsDropdown.png

Nine Stars

Re: Point me towards the correct components?

Suggest reading the manual speficly :
- OnComponentOk vs OnSubjobOk
- tLogCatcher , tStatcatcher, tFlowmeter
In case of files:
- tFileProperties, able to generate an md5-hash.

For tracing what happened and job restarts always ... yes always... create 2 or 3 additional columns
- SRC_LOAD_DT fill it with TalendDate.getCurrentDate()
- JOB_PID fill it with pid (which is the process identifier)
- MD5_FILENAME which contains the HASH from the tfileProperties
Ten Stars

Re: Point me towards the correct components?

select "…" and in value (String)globalMap.get("tFileList_1_CURRENT_FILE"")

Francois Denis

Don't forget to tag when it's "solved"!

Ten Stars

Re: Point me towards the correct components?

you have a folder with your csv files so I guess youuse tfileList - iterate - tfileinput.
on thîs tfileinput add the onSubjobOk link to add file name to your db.

on a second time when you want to add only new file you have to insert tmap(used to filter) on the row link between tFileInput and tMysqlOutput.

Francois Denis

Don't forget to tag when it's "solved"!

Ten Stars

Re: Point me towards the correct components?

The best way to do that is to directly link tFileList to an tIterateToFlow. use tmap and tMySqlInput to filter files to Upload link this file list to a tFlowToIterate To load Data and add filename to bd.

Francois Denis

Don't forget to tag when it's "solved"!