Solved! Go to Solution.
Hi Seiif - Before suggesting alternatives (below), have you changed your XML parser to SAX in tFileInput, increased your heap size for the job and tried it? DOM parser is very memory intensive whereas SAX is not...
Like jholman, I've done this using sed utility in a shell script (.sh) on the filesystem, called from a tSystem. Using sed, I looked for a particular tag (open tag for the XML), and wherever I found it, I extracted the text between.
Another cruder method I did recently was reading the file as plain text (tFullRow), looking for these markers in the XML, marking them with an increment counter (sequence), and then split the file using tMap. This was for queue data that needed to be processed for each 'row'.