One Star

Multiple issues with Talend 6.1.1 if amount of rows increases

I upgraded Talend 5.1.2 (r90681) to 6.1.1.20151214_1327. This due to the fact I needed to convert UTC to GMT and vice versa, which was not possible in 5.1.2.
But in this version I have multiple issues, all based on how much records which have to be processed.
In general all jobs work perfect, until the throughput is above a several hundreds, of several thousands of rows. Then I get errors on drivers, or in this case wrong sorting and garbage in .csv output.
Source files:
http://  www . filedropper . com / statussenentijdennaarchainware02
If component cwReturnJobLogs1 has this where clause:
WHERE ID > 161670059
  and ID < 161760492
All goes smooth and I get the result I want, in the right order. (Result_1.csv)
But if the where clause is changed to:
WHERE ID > 161640059
  and ID < 161763492
The result isn't in the correct order. (Result_2.csv)
If you look in the .csv files, you can search for 25050690. In the correct order, the second line says: LO START and later in the file LO GEREED. But in the second result file, you first see LO GEREED and then LO START.
The Where clause if normally dynamic. Every 15 minutes the job is started, retrieving the MAX_ID of the last session and processing all ID since then, saving the new MAX_ID in the database. So the amount of rows processed can be between 0 and thousands of rows.
In 5.1.2 we never had these issues.
If something is not clear, ask!
15 REPLIES
One Star

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

How can I resolve this??
One Star

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

Anybody ?
Fifteen Stars

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

This is a very difficult post to answer since there is no job example to look at. You say that this is affecting a lot of your jobs once the number of rows go over a few hundred. Is it possible to maybe step through an example with screenshots of your job and maybe that will give us a clue. Also, have you checked that your environment is supported for the version of Talend you are using?
I have to be honest and say that this is likely a problem with your environment or with your jobs, since v6 is being used by a lot of people for jobs that process hundreds of thousands of rows. There have been some installation issues with v6, but they have usually precluded people from doing anything with it. 
Rilhia Solutions
One Star

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

I have posted a link in my message which is a link to a Zip file (I had to do with spaces, because I was not allowd to add live URL's)
This zip file contains the source data, the output data and the project itself.
If it's an environment issue, where do I start looking to find the problem?
Windows 7, Talend 6.1.1
Fifteen Stars

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

I can't get anything from that link I am afraid. It just takes me to an upload page. Can you use the "Upload" button here to share your project?
About the environment, first check that your environment (including Java, DB, etc) are covered in the system requirements found here (https://help.talend.com//pages/viewpage.action?pageId=264282428). Click "Next" and see how your environment compares.
Rilhia Solutions
One Star

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

As far as I can see, my environment is in sync with the requirements.
Are there requirements regarding SQL Server? (SQL server 2008R2)
One Star

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

Fifteen Stars

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

I'm afraid I don't know the requirements for SQL Server off the top of my head. I suspect that 2008 would be OK though...even if technically not supported.
I will take a look at your job and get back to you. But I won't be able to do it immediately. 
.......that doesn't stop anyone else from jumping in if you think you may have seen these symptoms before :-)
Rilhia Solutions
Fifteen Stars

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

I've just had a quick look and have spotted something that is not going work as I suspect you expect. Now, I don't have SQL Server on my current machine so have only been able to take a quick look. But the area which I have put a red box around is a problem.....

I assume that you are using the Iterate to call the next query once per LopendID. The query it is firing is below....
"SELECT ID,
TripID,
JobID,
JobReference,
LogTime,
LogTimeZone,
LogType,
LogCode,
TemplateMessageID,
EAIRecordId,
LogStatus,
Latitude,
Longitude,
PositionTime
FROM CarrierWeb.dbo.cwReturnJobLogs
WHERE ID=" + context.LopendID

The context.LopendID is set in the tJavaRow which is connected by a flow. You cannot guarantee the timing of this. Iterates and Flows work very different. What you need to do here is use a tFlowToIterate and then pass the globalMap variable which is generated for the tFlowToIterate, to the next query.
I suspect that you issue is related to either this logic, or a similar type of issue.
Rilhia Solutions
One Star

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

I just realized, I uploaded the work-around version. (with an itterate, so for every row a .csv file is generated)
I suspect that you issue is related to either this logic, or a similar type of issue.

Maybe in the original version, but not in this work-around version.
The output is complete and in the right order.

This is the faulty version, where one .csv file is generated with all rows. The output is not always complete and in a random order if the amount of rows is high.
StatussenEnTijdenNaarChainware2.zip.zip
One Star

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

Although I didn't see anything unusual, I do respect your feedback. Do I use the tflowToIterate in this way?:
Fifteen Stars

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

OK, there are a couple of issues with this job that I have spotted. I will list them below....
1) You are trying to write out to the same file context.LogDir+"temp.csv" using multiple tFileOutputDelimited components. This is not safe, but may have worked due to luck. Try opening two versions of the same .txt file and add text to both. Then save one. Then save the other. Open the file and you will not have all of the text you have written. That is a simplified example of what I expect is happening. 
2) You are relying on a combination of your SQL query data order AND (maybe you do not realise this) the order of preference for the tMAP outputs (seen below)
This is not ideal at all and seems to tally with what you are experiencing when you say as the rows go up, the ordering starts to get thrown out.
What I suggest is that you rebuild the job to output to tHashOutput components instead of your file. Link the tHashOutputs so they are saving to the same location. Then use a tHashInput followed by a tSortRow component to order your data as required (this may need a bit of massaging to get the order you wish). Then write the file once you have controlled the ordering.
Rilhia Solutions
Fifteen Stars

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

The tFlowToIterate should go straight after the SQL component. If the row connector between the SQL component and the tFlowToIterate is called "row1" and you have a column that you want to use called "ID", then it will create a globalMap variable called "row1.ID". You then access this using the following code....
((Integer)globalMap.get("row1.ID"))

It needs to be cast to an Integer (assuming the type of ID is Integer) before it is used.
You do not need to the tJavaRow in this.
Rilhia Solutions
One Star

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

OK, there are a couple of issues with this job that I have spotted. I will list them below....
1) You are trying to write out to the same file context.LogDir+"temp.csv" using multiple tFileOutputDelimited components. This is not safe, but may have worked due to luck. Try opening two versions of the same .txt file and add text to both. Then save one. Then save the other. Open the file and you will not have all of the text you have written. That is a simplified example of what I expect is happening.

Okay, I understand this. Although I wonder what has changed since Talend 6, because in Talend 5, we never have experienced this in years.
So, something has changed since Talend 6, or we had tons of luck with Talend 5 Smiley Happy
The job itself is build around 6 years ago. I changed this job a few weeks ago, to include the tJavaRow_1 component, so I could convert UTC time to GMT time. This was not possible in Talend 5, that's the reason why I started using Talend 6.
(Don't fix things, if they ain't broke. Well this change, to Talend 6, broke some things)

2) You are relying on a combination of your SQL query data order AND (maybe you do not realise this) the order of preference for the tMAP outputs (seen below)
295954/mini_blob_20160425-0535.png
This is not ideal at all and seems to tally with what you are experiencing when you say as the rows go up, the ordering starts to get thrown out.

The tMap order is deliberately set in this order, so yes I did realize this.

What I suggest is that you rebuild the job to output to tHashOutput components instead of your file. Link the tHashOutputs so they are saving to the same location. Then use a tHashInput followed by a tSortRow component to order your data as required (this may need a bit of massaging to get the order you wish). Then write the file once you have controlled the ordering.

I know how to use the tHashOutput component, I use this component in several other jobs.
Fifteen Stars

Re: Multiple issues with Talend 6.1.1 if amount of rows increases

I have to say that since I wasn't able to actually run it, I had to make an educated guess as to what was likely to be causing your ordering issue. There may be another cause in there that I did not spot (....I have a day job so was only able to look at it briefly Smiley Happy ). With regard to the tMap ordering, lots of people are not aware of this and it is just left as the order in which outputs connections are made. I didn't spend a great deal of time on the internals of the tMap, but did notice that the filtering there seemed to be driven by values computed by the tMap variables, using the values in the data. Are you sure that the data is correct?
I suspect that if you take control of your data ordering ordering (as suggested) that you will get rid of this issue. There will have been changes between v5 and v6. There are also quite substantial changes between java 7 and 8. Although the data was ordered by your input query, I find that it is not a good idea to depend on that order being maintained throughout jobs. So if data order is important, you should always ensure that you take steps to guarantee this.
Good luck with getting this sorted :-)
Rilhia Solutions