One Star

[resolved] Excel to XML: dynamically define columns to process

Hi TALEND-community,
I've started using TOS only last week and so far I'm impressed by the functional range of the tool.
Still I encountered a problem I cannot see a simple solution for and seems like it has been discussed in similar ways before here in this forum. The task I want to accomplish is reading data from a XLS file and write the data to XML. The XLS source files are varying a little form one file to another and have different number of columns. Additionally not all columns existing in an XLS file need to be written to XML.
I will have a kind of job configuration file which includes the list of columns I want to import for each XLS file, e.g. or . What I already know (or at least I think I know) is that a scheme for a tFileInputExcel-component cannot be dynamic, which IMHO means a) I will have to read all columns which might occur (e.g. A-Z or even more) and b) the reduction of columns will have to take place after the data extraction based on the rules read from the configuration file.
The good thing is that the target scheme of the XML file is kind of generic, which means there are no specific target fields for individual source fields:
<measure id="COLUMN_TITLE"> < octetmeasure value="VALUE"/> </measure>
If I use my two samples above the result should be either
<measures >
<measure id="A"> < octetmeasure value="value1"/> </measure>
<measure id="B "> < octetmeasure value="value2"/> </measure>
<measure id="D "> < octetmeasure value="value3"/> </measure>
<measure id="G "> < octetmeasure value="value4"/> </measure>
<measure id="H "> < octetmeasure value="value5"/> </measure>
<measure id="K "> < octetmeasure value="value6"/> </measure>
</measures >
or it should be
<measures >
<measure id="A"> < octetmeasure value="value1"/> </measure>
<measure id="B "> < octetmeasure value="value2"/> </measure>
<measure id="C "> < octetmeasure value="value3"/> </measure>
<measure id="D "> < octetmeasure value="value4"/> </measure>
<measure id="E "> < octetmeasure value="value5"/> </measure>
<measure id="F "> < octetmeasure value="value6"/> </measure>
<measure id="K "> < octetmeasure value="value7"/> </measure>
</measures >

Ignoring single rows based on filter criteria is straight-forward, but how can I achieve something like this for columns? Maybe a solution with individual java code? Any idea is welcome!
Thank you!
Michael
1 ACCEPTED SOLUTION

Accepted Solutions
One Star

Re: [resolved] Excel to XML: dynamically define columns to process

Thanks!
I put up a blog post on this with screen shots http://bekwam.blogspot.com/2011/11/dynamically-filtering-columns-for-xml.html
The post works with Java Reflection so it avoids coding the column names with expressions like "input_row.shelf".
3 REPLIES
One Star

Re: [resolved] Excel to XML: dynamically define columns to process

Hi,
Start by defining an Excel schema that includes all columns. Drag this on the canvas as an input component.
Add a tJavaRow and a tFileOutputMSXML. Synchronize all columns. In the tFileOutputMSXML, uncheck the "Create empty element if needed" checkbox.
In the tJavaRow, filter the row values based on the column logic. For example, if I have a context variable "OUTPUT_COLUMNS" that's set to a list of column names "brand,type,shelf", I can use logic like this filter. Note the ternary operator which will swap a null for an empty string of an unfiltered column. For a more general case, you might fill up a hashmap from a file or database record containing a list of columns. Then, the tJavaRow would iterate over the column names rather than have them hardcoded.
if( context.OUTPUT_COLUMNS.contains("shelf") ) {
output_row.shelf = (input_row.shelf!=null)?input_row.shelf:"";
}
The result is an XML document that doesn't have elements for the filtered columns, but will return an empty element for empty unfiltered columns.
One Star

Re: [resolved] Excel to XML: dynamically define columns to process

Dear walkerca,
thank you for your detailed suggestion. I will let you know if I succeded.
Btw: your blog @ http://bekwam.blogspot.com has been a very useful source of information in the last days. Please keep up the good work, it's highly appreciated!
Michael
One Star

Re: [resolved] Excel to XML: dynamically define columns to process

Thanks!
I put up a blog post on this with screen shots http://bekwam.blogspot.com/2011/11/dynamically-filtering-columns-for-xml.html
The post works with Java Reflection so it avoids coding the column names with expressions like "input_row.shelf".