Reload Lookup for each tFileList iteration

One Star

Reload Lookup for each tFileList iteration

Hi,
I have 20 files (large data set split) which contain account and date of creation. The different files may have same account with different date, however I only want to load the account in the target table, just the first time I see it; i.e. when it is not present in my lookup.
I am iterating through the files using tFileList, and using bulk exec with mysql. However, for each interation load in my table I need to reload my lookup as I only want to load the first time I see the account.
I am using TOS 3.2 and I have seen that I can relaod at each row, but with 200Mil source files rows, and 10Mil lookup, I will be there for ever.
Is there another way to do this? I tried to use the iterate from the db input components, however, i can not seem to be able to connect it anywhere.
Some help will be really appreciated.
Regards
Yann
Community Manager

Re: Reload Lookup for each tFileList iteration

Hello
You can iterate all the files and use a tUnite component to merge all the records before join, so it can insert all the records which don't present in lookup.
eg:
tFileList--iterate-->tFileInputDelimited--tUnite--tMap--->tMysqlOutput
|
lookup
|
tMysqlInput
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Reload Lookup for each tFileList iteration

Hi,
Thanks for the reply, but the volume is too high and I will run into heap space issues. and this will not fix my problem, as what I load in 1 file may be present in the next file, hence I need to relaod my lookup for each iterate.
Any more suggestion?
I tried to use the filepath property of tFileList with a tForEach but because it set on the look up, the filepath of tFileList is not known before loading the lookup.
Regards
Yann
Community Manager

Re: Reload Lookup for each tFileList iteration

Hello
I tried to use the filepath property of tFileList with a tForEach but because it set on the look up, the filepath of tFileList is not known before loading the lookup.

Move your join action to a child job, pass the file path from father job to child job. See my screenshots.
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Reload Lookup for each tFileList iteration

Hi,
I tried what you have suggested, and it worked.
However, now, none of my job with tFileList work with the look ups, I am absolutely stumped by this,
I take off the lok up and it iterates ok, but i put it back and it fails after 1 interation and give error on look up. the data is ok, and I test it for inegrity
All there jobs with look up seem to fail also, when they never failed before.
This has been happening since I used the tfilelist in a context.
Can you help me, please, as my project is getting quite urgent now, and I have developped so much and I am stuck with my jobs with tfilelist and look up do not work anymore.
I even tried to install a new workspace and import my jobs, but not luck.
I am using TOS3.2
Some screen shots atached
Regards
Yann
One Star

Re: Reload Lookup for each tFileList iteration

pics
Community Manager

Re: Reload Lookup for each tFileList iteration

Hello
I used the tfilelist in a context.

What do you mean here?
please create a simple job with lookup to see if the problem still exists, if so, you can send it to me via email. I will debug it...
Best regards

shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Reload Lookup for each tFileList iteration

it looks like I mixed few things around and that what it gave me these errors.
A simple jobs was OK
Thanks
Yann