Append using tAdvancedOutputXML

One Star

Append using tAdvancedOutputXML

We are getting incredible poor performance on using the append option on the component tAdvancedOutputXML. The first XML is created from a CSV source and looks something like with one loop called Race and looks like below. The second source is a CSV that has the ID field and another loop element (reason we can't use the MSXML because it appends all loops to the bottom where we need race in the middle). The append actually appends to the top under id instead of the bottom in 4.0.3 and 4.2.3, however, in 4.2.2 it works correctly. The problem is the append runs at like .01 rows per second. We really need this append to run much quicker. Any ideas? Have tried temp directories, buffer increase, etc. Right now, running XP on 4 gig memory, 32 bit, but soon going to 64 bit, 8 gig memory. Also, need to open a bug on 4.2.3 in which the append doesn't appear to work..appends to top of student instead of the bottom like 4.2.2 does.
<Student id="SID_71346">
<StudentUniqueStateId>71346</StudentUniqueStateId>
<StudentId>764750431</StudentId>
<LocalId>71346</LocalId>
<Name>
<FirstName>Marguerite</FirstName>
<LastSurname>Lindline</LastSurname>
</Name>
<Race>
<RacialCategory>White</RacialCategory>
<RacialCategory>Hispanic</RacialCategory>
</Race>
<BirthData>
<BirthDate>2004-03-15</BirthDate>
<BirthCity>Trenton</BirthCity>
<BirthCountry>United States</BirthCountry>
<BirthState>TX</BirthState>
</BirthData>
<LimitedEnglishProficiency>Limited</LimitedEnglishProficiency>


The second XML to append looks like:

<Student id="SID_71346">

<Languages>
<Language>Spanish</Language>
<Language>English</Language>
</Languages>

</Student>
Community Manager

Re: Append using tAdvancedOutputXML

Hi
Thanks for your first post on forum!
.01 rows per second is unacceptable number, I think there must be a job design problem in the job, can you upload some screenshot of jobs? So that we could know more details on the job.
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Append using tAdvancedOutputXML

The smaller xml tree is appended the larger one.
Community Manager

Re: Append using tAdvancedOutputXML

Hi
It is a big job, so many components in the job and many columns on the schema, and you add the following expression on each column, it cut down the performance apparently.
(Disability_CSV_In.DISABILITY == null || Disability_CSV_In.DISABILITY.equals("")) ? null : Disability_CSV_In.DISABILITY
Do you really need the expression on each column?
If you have enough memory available, go to windows-->preference-->talend-->run/debug and allocate more memory to execute the job.
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Append using tAdvancedOutputXML

From this image you can see that we do not want to create an empty element. It appears that the way Talend or java works on that check box is that if the field is null then the element doesn't print, however, if the element is empty string, then it does print the unwanted tag so you have to account for that in the tmap. Also, the expression logic is actually in the first non append xml creation that runs fast. It's only the append stream that ones slow in which it has only has one element loop tied to the student group and no expressions in tmap. Sounds like memory is the issue although I have tried unsuccessfully changing that. The pc has 4 gig but bumping run time memory causes Talend to crash. Our work around right now will be to go back to using the merge (xmlms..) even though that throws all loops elements to the bottom, so we have developed a "xml re-arranger" routine to move the loop elements to the correct location. The other possible solution will be moving to a 64 bit architecture with 8 gig memory. Let me know if anyone else has other possible solutions.
Thanks,
Jay
One Star

Re: Append using tAdvancedOutputXML

I have a similar issue when writing close to 200K records in the xml file. I have three loops
1. Order--> multiple shipping addresses
2. Order--> multiple billing addresses
3. Order --> multiple lines in an order.
I am using toutputMSXML to write data to final output and using three source netezza component for reading the data. Talend job is taking days to complete. Job starts with processing 300row/s and it slows down after few hours and taking 1 row/s. Does anyone know how to solve this problem? I am thinking to generate three separate files and use TAdvance xml component and then somehow merge them. Any suggestions?
Community Manager

Re: Append using tAdvancedOutputXML

Hi kumard3
Try the 'append the source xml file' option on tAdvancedFileOutputXML to append the loop element one by one to an existing XML file, you can find an example job in this topic:
http://www.talendforge.org/forum/viewtopic.php?id=5975
kumard3
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Append using tAdvancedOutputXML

Do i need to define the complete structure of the xml when loading the data first time with the first loop or structure needs to be changed after each looping? I can't add multiple loops in one tAdvanceFileOutputXML component.
One Star

Re: Append using tAdvancedOutputXML

See the attached snapshot for example:
Community Manager

Re: Append using tAdvancedOutputXML

Hi kumard3
It is unable to define multiple loop elements on tAdvancedFileOutputXML, you just need to define the structure for the loop element need to be changed at a time.
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Append using tAdvancedOutputXML

We tried this approach and it seems job is taking 200 secs for 1000 rows and for 10,000 rows it's freezing. Is there any other way in Talend which can help merging three separate files into one file?
Community Manager

Re: Append using tAdvancedOutputXML

Hi
You can try the tXMLMap component that allows you to define a Document and set multiple loop elements, link it to a tFileOutputXML to generate a XML file.
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Append using tAdvancedOutputXML

We are using Talend 5.0.3 version and tXMLMap is not allowing to have multiple loops. We need to have three loops.
any suggestions?
One Star

Re: Append using tAdvancedOutputXML

Shong,
Do you know, why tXMLMAP is not allowing multiple loops? None of the approach is working right now and this issue is production critical. Can you suggest some other way to fix this issue?