Six Stars

How to iterate on tFileInputFullRow rows?

Hi,

 

I have a tFileInputFullRow followed by a tHashOutput component. 

If they are linked by a row link, then tHashOutput component makes available their hash when all the rows have been read.

I need an iterations on the rows, in the way that the tHashOutput component makes available data row by row.

 

I have tried in this way:

tFileInputFullRow ---iterate---> tIterateToFlow ---row---> tHashOutput 

but I'm not able to configure the tIterateToFlow, because I don't know which global variable contains the row, row by row, from tFileInputFullRow component.

 

Can you help, please?

 

Regards,

Lorenzo

 

 

20 REPLIES
Twelve Stars

Re: How to iterate on tFileInputFullRow rows?

This component will output your rows with one column containing the full row.The simple way of achieving this is to connect your tFileInputFullRow to a tFlowToIterate (via a row link) and then use the "iterate" link from there. The globalMap variable will have the key.....

 

globalMap.get("{row}.{column}")
Rilhia Solutions
Six Stars

Re: How to iterate on tFileInputFullRow rows?

Ok, but my next step has to be a tHashOutput, which does accept only row input link.
And accessing via globalmap is not useful for tHashOuput purpose.
Should I put a tIterateToFlow after the tFlowToIterate, to be able to connect to tHashOutput?
It sounds weird.
Twelve Stars

Re: How to iterate on tFileInputFullRow rows?

Yes, use a tIterateToFlow to do this.

 

However, this raises the question as to why you are iterating at all. Why not just use a normal row link from the file to the tHashOutput? What does iterating gain you? 

 

(I'll take another look at the original post once I have sent this..... @talend not being able to see all of the posts while responding is a bit of a pain).

Rilhia Solutions
Twelve Stars

Re: How to iterate on tFileInputFullRow rows?

OK, I have just looked at the original post. I am still struggling to see what iterating will gain you here. Why do you need access to the tHashOutput data inbetween rows arriving? What is your requirement here because there is possibly a better way of achieving this.

Rilhia Solutions
Six Stars

Re: How to iterate on tFileInputFullRow rows?

I have to (regex) parse the rows, row by row, elaborate the parsing and insert the result, row by row.
My Job is, indeed, quite complex, and I'll look for a better and more simple design, later (not now).
I parse the row with some regexp in parallel (even not really parallel) with a tReplicate. I extract some fields from the row: some of them I have to insert to lookup tables, get the generated sequence ids (put them in more tHashOutput) and insert the main row (collecting them by tHashInput), together to other fields from the original row.
The "parallel" solution with tReplicate is the first idea to perform separate elaborations, field by field.
If I put some tRegexExtractFields (mentioned by memory) cascade (with a INSERT at the end of the chain), instead of parallel, it makes complicated to handle any field in different way (i.e. inserting in lookup tables, getting id and going ahead in the regex chain).
Twelve Stars

Re: How to iterate on tFileInputFullRow rows?

OK, I am not claiming to fully understand what you want from that description (and it does sound quite complex), but I *may* have a solution that will simplify it (but not necessarily make it any quicker). All of the complicated parallel logic you have described could be relocated to a child job. If you passed the row into the child job via a context variable, you could then carry out all of the logic you have described in that child job one row at a time. The child job could be connected to your tFile component with either the row link or via a tFlowToIterate component using the iterate link. 

 

Rilhia Solutions
Six Stars

Re: How to iterate on tFileInputFullRow rows?

Hi @rhall_2_0

 

if possible, I would prefer not to use a child job now.

I tryed solution you suggested me (FlowToIterate/IterateToFlow) but it does not works:

 

tFileInputFullRow --row1--> tFlowToIterate --iterate--> tIterateToFlow --row2--> tHashOutput --OnComponentOK--> ...

 

I have configured the tIterateToFlow to map (String)globalMap.get("row1.<key>") BUT the tHashOutput component waits for all rows, before putting data in output.

does it depend to the fact that I mapped "row1", I mean the output of a component (tFileInputFullRow) designed to supply the whole rows at the same time?

Should I, in some way, use the "iterate output link" of tFileInputFullRow?

 

it seems so strange, to me, that I'm not simply able to elaborate row-by-row, really iterating on them, instead of having the all together.

 

Hope you can help, without child-job.

 

 

 

Twelve Stars

Re: How to iterate on tFileInputFullRow rows?

The row link does process "row by row" but the iterate method allows you to complete whole subjobs per row. That is the key difference.

 

You are suffering from a timing issue here. I *think* you are relying on parallel processing within a subjob, which you simply won't get without hacking a solution together. It sounds like it would be much better to break this problem down into several subjobs, storing your intermediary steps in tHash components. However, I am not fully aware of the requirement, so maybe there is a good reason for approaching it like this.


However, to approach this problem and get free access to your data without a complete load (the issue you have with the tHash components) you might be able to get round this with a tJavaFlex and a HashMap and/or Arraylist. If you load your data into one of these data types and save them to the globalMap everyrow, you will have access to that data anywhere else in the same job immediately (allowing for timing constraints). 

 

Rilhia Solutions
Six Stars

Re: How to iterate on two subjobs should beW rows?

So, before approaching to tJavaFlex (that I don't know yet), let me better understand and better explain my job.

 

You told that the row link does process "row by row" but the iterate method allows to complete whole subjobs per row.

It sounds to me like row and iterate links would be enough for me.

Maybe the "problem" is that tFileInputFullRow does not let subsequent components to elaborate row by row?

 

All (and only) what I need are just two subjobs: the firts to load row-by-row from file and the second one to elaborate data. And the connection between them should be tHashOuput -- OnComponentOK --> tHashInput.

This is because I need to setup a "OnSubjobOK" output link from the second subjob (to a third one) to let me complete elaborations and insert results into DB.

 

Well, the only way I found to connect two subjobs (file reading / bunch of elaborations row by row) and passing data from the first to the second, is tHashOutput / tHashInput.

But I'm not able to let the tHashOutput making its output available row by row.

 

Do I really need, then, tJavaFlex and a HashMap to send row-by-row data to a tHashOutput (and letting it to make its output available row-by-row)? "row link" and "iterate link" are not, somehow, enough?

 

Let me investigate tJavaFlex.

Six Stars

Re: How to iterate on tFileInputFullRow rows?

Well, 

I took a look to tJavaFlex, and as far as I understand you suggest me to put the rows (all together) in the GlobalMap, and then loop on them after, wherever I want.

Is it right?

 

Please pardon me if I insist, I still don't believe that what I'm looking for is so hard. Let's simplify, and let's say:

tFileInputFullRow --- .... ---> tPostgresqlOutput

With (mandatory) the two components above need to be in two different subjobs.

 

Question: what I have to put in place of "..." to insert (into DB) per row, I mean to insert row by row?

Twelve Stars

Re: How to iterate on tFileInputFullRow rows?


Lorenzo wrote:


Please pardon me if I insist, I still don't believe that what I'm looking for is so hard. Let's simplify, and let's say:

tFileInputFullRow --- .... ---> tPostgresqlOutput

With (mandatory) the two components above need to be in two different subjobs.

 

Question: what I have to put in place of "..." to insert (into DB) per row, I mean to insert row by row?


But you say it is not that simple. I feel like the requirements keep changing. Can you give an data flow example of exactly why the data from tFileInputFullRow must be iterated through instead of being processed on a row link? My assumption was that it is due to a complex timing  requirement (hence my convoluted suggestions), but it appears you think it should be simpler.

What exactly do you need to do to the data one row at a time in-between reading it and writing?

Rilhia Solutions
Six Stars

Re: How to iterate on tFileInputFullRow rows?

My Job need to:

- read rows from a text file

- parse (regex) row by row, extracting fields: some of them need to be inserted separately, in specific tables, getting back the ID of the row; some other of them need to be elaborated (all logic is already in place) to generate different fields

- IDs and new generated fields (from the previous step) need to be collected, all together, joined (I already have an unique key that comes from the original row, and that let me to join all the data later) and the resulting row have to be inserted.

 

All of above need to be made row by row, for lookup reason, and also because I would have the flow more flexible as possible, to add (in future) other stuff that could require a row-by-row flow.

 

So, my design is:

 

read file --...--> thashoutput (it should row-by-row) --OnComponentOK--> tHashInput_1 (row-by-row) ----> a lot of elaborations / lookup inserting ----> tHashOuput (one for each ID or generated field

and, from tHashInput_1 --OnSubjobOk--> here I collect all tHashInput for single IDs and Fields, join them, and insert into DB

 

So, yes, if you help me in get a real row-by-row data passing between two different subjobs, it would be great (and enough, I think).

Thank you for your help.

Six Stars

Re: How to iterate on tFileInputFullRow rows?

Here is an extract of my Job.

As you can see, "ok" on OnComponentOK (output link of tHashOutput) comes just after all (in this case 2) rows are read.

That means tHashInput will receive ALL rows at the same time, and it does not permit me a "per row" elaboration.

 

image.png

 

So, the question is simple, I think: 

How to pass row by row toward to tHashOuput component, from a tFileInputFullRow?

Of course, tFlowToIterate and fIterateToFlow are already "data-linked" thanks to a global variable defined in the first one, and mapped in output by the second one.

 

 

 

Twelve Stars

Re: How to iterate on tFileInputFullRow rows?

OK, the tHash components will not work in that way. They must be completely read from or written to before the "OnComponentOK link will work. That is the same with all components using the OnComponentOK link. You *could* use a tFlowToIterate after the tHashOutput (the data will flow through this) and then you can iterate over the next subjob using a tFixedFlowInput.

Rilhia Solutions
Six Stars

Re: How to iterate on tFileInputFullRow rows?

It does not work. 

tHashInput read twice (in case of two records) the hash filled by tHashOutput, and it is fine, but the first time finds the first record and the second time finds the first and the second records together.

I could try to read always the last record only, but... how? and when we are handling thousands of records, the process could take too much effort?

Twelve Stars

Re: How to iterate on tFileInputFullRow rows?

You shouldn't need the tFlowToIterate or tIterateToFlow at the beginning. You also shouldn't need the tSleeps (unless you want to slow it down for some reason).

 

Do not read from the tHashInput (in the second subjob). Read from the tFixedFlowInput (after setting the column values using the globalMap variables your tFlowToIterate will create). The tHashOutput in the first subjob I told you to leave in there so that you could keep an in-memory store of your initial data. But it will have nothing to do with the second subjob at all. You have been trying to use the tHash components in a way that they are not meant to work. They will not let you read from them while you are writing to them in the way that you want.

 

 

Rilhia Solutions
Six Stars

Re: How to iterate on tFileInputFullRow rows?

Thanks, so it is almost confusing.

I don't think to be able to follow your "attempts", so far.

If you are able to replicate the job yourself, and be sure it works, and send a screenshot or a clear explanation about which components to use and where and how to link them, well, it will be appreciate. Otherwise it is just a waste of time.

 

(of course tSleep components are only aimed to visually see statistics and be able to understand if rows are handled one by one or not)

 

Regards,

Lorenzo

Ten Stars

Re: How to iterate on tFileInputFullRow rows?

Can you read in the file, parse out the values you need to insert into separate tables, then take a second pass on the file and pull in the new IDs as lookup values from the database?

The tasks you're describing are abstract, but don't sound like they require Talend gymnastics. You may not be able to do everything for each row one at a time, but you can probably do everything you need to do for the entire file.

Six Stars

Re: How to iterate on tFileInputFullRow rows?

I already have stored procedure that insert a new value (if needed, because sometimes the already exist, since they are lookup) and give back the ID.
So no need to perform to passes on the file.

I'm not asking for a full design of my job, I just would you help in find a way to have a 'by row' execution of a second subjob for rows read from a file, with separation (two/more subtasks) between file reading and data elaborating cannot believe it is not possible.
Twelve Stars

Re: How to iterate on tFileInputFullRow rows?

Here is a simple example of what you need to do to iterate from a flow as you have described.....

 

Iterate.jpg

The data is read from the same component as you are using.

The tJavaFlex simply uses a basic splitter to split the data into columns.

The tFlowToIterate converts the row into sets of globalMap varaibles.
The tFixedFlowInput is triggered by the Iterate link. You can see the number of rows that are shown on the links. 4, 4, 4 and then 1. That tells you that the file to the tJavaFlex have had all 4 rows sent as a normal flow. The same between the next two components. We then have 4 occurrences of the Iterate link fired. But after that we see that 1 row has fired. This is indicating that 1 row has fired 4 times.

The tFixedFlowInput makes use of the globalMap variables that are created by the tFlowToIterate.

Rilhia Solutions