Five Stars

Job design unstructured excel to csv

Hello,

 

I have a job i am to create where by  i have an unstructured excel file roughly in the form of a table. my job i to take this and to output it as a table. I am having difficulty as the tExcelInput component is designed to handle files that already come in a fairly tructured manner. 

 

in the file attached we have a cross tab. i want  the out put to be such that we have it in the following form

 

+---------------+---------+-------------+--------------------------------+------+
| Geography | Date    | Category | Subcategory                   | Data |
+---------------+---------+-------------+--------------------------------+------+
| UK              | Jun-07 | RETAIL    | Shops-Prime                   | 4      |
| UK              | Jun-07 | RETAIL    | Shops-Good Secondary | 4.75 |
| UK              | Jun-09 | RETAIL    | Shops-Prime                   | 6      |
| UK              | Jun-09 | RETAIL    | Shops-Prime                   | 7.75 |
| UK              | Jun-07 | RETAIL    | Shops Centers-Prime      | 4.75 |
| etc              | etc       | etc           | etc                                  | etc   |
+---------------+---------+-------------+--------------------------------+------+

 

 

the subcategory is a join of the subcategory in the file and the sub-sub-category under it.

also to be noted is the fact the columns where date is like jun07-jun09 is ignored so the only data from columns B,C and E is needed.

 

The key point is that the original file must not be cleansed or reformatted such that Talend must be able to find the relevant data from the unstructured file as is. is there any way to do it or is it even possible?

 

let me know if i can provide any more information

 

7 REPLIES
Ten Stars

Re: Job design unstructured excel to csv

Does the data arrive with headers? If so, you can probably do something along the lines of this https://www.rilhia.com/tutorials/dynamic-column-order

 

Rilhia Solutions
Five Stars

Re: Job design unstructured excel to csv

there are header but it the fact it has header of two orders, 1 on top and 1 one the first column, as well as that each header is meant to be a drill down , if you look at the attached image...

Ten Stars

Re: Job design unstructured excel to csv

This shouldn't make it much harder. You have to know some rules about the file. You need to work out what rules you know of, then work out how you can use those rules to derive a solution to this. The lack of fixed structure can be handled using functionality similar to what I demonstrated in the link. But you need to know what definites you are working with first

Rilhia Solutions
Five Stars

Re: Job design unstructured excel to csv

can suggest any ideas on how you might go about deconstructing the file in the example to generate a table??

Ten Stars

Re: Job design unstructured excel to csv

OK. This is probably easier for me to do than explain. You will need Java. I will give you a description of what I would do and you will need to see if you can implement it given the tutorials I will recommend. None of them will tell you how to solve this, but will contain methods which you will find useful in achieving this.

 

The way I would attempt this would be to read your data into a tHashOutput component. You say you are not aware of the file structure, I assume that means you are not sure how many "Equivalent Yield" columns there are. Once you have the data in your tHashOutput, check the values of the first couple of rows. You need to be looking for the dates above the cells that hold "Equivalent Yield". You can create a Java collection (ArayList, HashMap, etc, maybe using a tJavaFlex) of the dates. A nice way of doing this would be to use a HashMap and have the column letter as the key or value and the date as the other.

 

Next you will need to use that Collection as a lookup with a tMap. This tutorial shows you how to multiply rows (https://www.rilhia.com/quicktips/quick-tip-row-multiplication). Using your Collection data in the lookup section of what this tutorial describes, you can multiply out the rows so that if you have 4,5 or 6(etc) columns with "Equivalent Yields" values, you can use the tMap to create a row for each. For example.....

 

Retail                        Jan 2009            Jan 2014

Prime                        4.6                      5.2
Goods Secondary     1.2                      6.8

Secondary                 3.2                     4.2

 

........becomes.....

 

Good Secondary     Jan 2009      1.2

Good Secondary     Jan 2014       6.8

Prime                       Jan 2009      4.6

Prime                       Jan 2014      5.2

Secondary               Jan 2009      3.2

Secondary               Jan 2014      4.2

 

 

Once you have the data in that structure, you can hold it in another tHashOutput or carry out further calculations from there. However I think by this point it should be easier to work with

Rilhia Solutions
Five Stars

Re: Job design unstructured excel to csv

 "You need to be looking for the dates above the cells that hold "Equivalent Yield""

 

im struggling on this bit. from the thashMap how can i look at rows? and secondly how can i find rows above those that have  "Equivalent Yield?  also is there a way to get the metadata for example if i found a date jun-09 in c3, how can i poke that job so that c3 is returned instead of the actaul value of the data??

Ten Stars

Re: Job design unstructured excel to csv

Do you have any Java knowledge? If yes, this will be easier.

 

In regard to the question about how you can look back, this tutorial will help (https://www.rilhia.com/quicktips/quick-tip-compare-row-value-against-value-previous-row).

Once you have identified the columns which hold the "Equivalent Yield" headings and identified their corresponding dates, that is when you can create your Java collection and make use of the other tutorials I sent you. 

 

Unfortunately this is kind of a tricky thing to work on with you remotely. If I was sitting with you this would be a lot easier. The reason for this is there is no component which will do all of this for you. It is entirely possible and not that complicated once you know what you are doing, but if you are new to Talend this is a bit of a tough way to start.

Rilhia Solutions