Big XML

Big XML

Hi!
I have a problem:

800mb XML file with 500var in it.

I create the XML metadata for it but I can't enter more than 175-200 vars?
I could just copy some fields over and run it 3 times to have all fields and join the 3 results.
But when I load the first time (not even tried to merge/join/..) it takes half an hour to run the job.
And it does nothing.
I have an inputXML with the metadata (calling the vars that I want with the Xpath.-> works when defining the vars/xpath) connected with a tmap and a mssqloutput.

What could possibly be the problem?
JOb starts and finishes but has no rows?
Employee

Re: Big XML

Hi,
You have 3 Generation mode in the tFileInputXML component in the view Advanced Settings = Slow and Memory-consuming / Memory-consuming / Fast with low memory consumption.
Considering the size of your XML file which is uncommon; I would suggest you turn the Generation mode in : Fast with low memory consumption.
Then for that size of XML and requirement for Hierarchical file mapping and parsing; Talend Platforms (Data Management, BigData, DataServices, etc...) provide the Talend Data Mapper along with the tHMap which is strongly focus on large and complex XML or JSON dataset; or EDI or EBCDIC Copybook mainframe files.
If this XML requirement is a recurrent requirement in your organization and very critical for your project; it could be a good idea to get introduced to our Sales team for a quick tour of our Platform offering and the Talend Data Mapper.
Best regards;

Re: Big XML

THanks for the answer.
I'm working with that generation mode and still no results.
I'll tell it to my sales manager.
Thank you

Re: Big XML

Okay,
Did some more investigation and tests:
Filesize is not the issue.
The issue is the number of variables for the metadata.
i can only enter up to 211 in the metadata and he won't use them I don't have an output.
If I only make the metadata for the same file with for example 30 vars it runs.
Anybody having ideas of workarounds?
One Star

Re: Big XML

FrederikvdgTalend - are you trying to read the big XML or write to it? Let me know... I'm using Data Mapper for a similar complex XML file at the moment and will soon share how to set up and use it...
Short of using Data Mapper, the way I've gotten around this issue in the past is leveraging database XML methods to generate XML and pass that to Talend for stitching. Here's an example on Oracle:
SELECT
KeyField1, KeyField2,
XMLAGG (
XMLFOREST (
field1,
field2,
....
....
).GETCLOBVAL()) AS DATAROW_XML
FROM
TableName
What I'm doing above is pushing generation of a huge list of fields back to the DB, and returning just 1 field (Datarow_XML) that has a neatly formatted XML field. I then pass this to say a tFileOutputMSXML to build the XML. This is how I've typically gotten around the "is exceeding the 65535 bytes limit" (the 65k java method limit).
Let me know if this helps... Else, share more details...
Six Stars

Re: Big XML

Do you mean 211 fields defined in the schema? Perhaps it has something to do with the size of the struct or method being generated as Will mentioned. An experiment: What if you use the shortest field names possible? Try to shorten all the field names in the schema to less than 8 characters.

Re: Big XML

I'm trying to read in the XML file.
I have 539 fields to add in the schema with XPath so I can't shorten the names of the Xpath fields.
The "schemaoutputnames"/fieldnames are not an issue.

Thanks, still looking for a solution :-)
Six Stars

Re: Big XML

539? Yes, I can see that might be a problem. Smiley Happy You might try the old tSmooksInput component if you can find it and make it work with the current version. You would still have alot of Xpaths but you offload the XML processing to Smooks and can use separate schemas if you are looping, or you could also see if you can split it with Smooks or Twig. If you need ALL of them in one schema that's going to be tricky. I suppose you might be able to make multiple passes (gross)

Re: Big XML

That is indeed what I'm trying.
Too bad it takes so much time and I'm not even sure it will work..
One Star

Re: Big XML

Take a look at something I wrote on parsing multischema XMLs - http://kindleconsulting.com/blog/entry/parsing-nested-multi-schema-xml-files-talend-tfileinputXML-tX...
Else, as mentioned by Cantoine, I'd look at Data Mapper. Applying it at the moment and will share the use case when done.