I need some help and hope you will manage to do it. I Have the following data get after a pivotToColumnsDelimited. I need to replace the value in each column A, B,C, etc... where the id is set by the corresponding value in the column Value. I tried with a tReplace, a tMap but i do not know how to do it. I precised that the columns A,B,etc... are created by the pivot done in the last step of my current process. If you have some idea. Thanks
Solved! Go to Solution.
I'm afraid the description does not give enough information. Maybe if you show a "before" and "after" table, it might fill in the blanks
At the beginning, I have this table:
After the pivot, i get this table:
and now, I want to obtain this one:
There are a couple of ways in which you could do this, but since you have started with the pivot table way, I'll go with that. I am making the following assumptions....
1) You know that there will always be codes A to E (or at least be aware of a maximum number of codes).
2) The rest of your data is pretty static (as shown in the examples)
3) That you can change your pivot table calculations (you are currently using "id" where I think you should be using "value")
If you can get a pivot table like below....
....you can use a tAggregatedRow, group by "id" and output the rest of the columns using the "First" function while ticking "Ignore null values". This should return 1 row .....but you might need to tweak what I have described with a little experimentation
I took your advises in account for the pivot table and it's OK but I have still an issue with the tAggregatedRow.
For the point 2) & 3) of your assumptions, you are right but for the point 1), it is not the case. I have provided a short list (A to E) for the example but this list is really longer and all contained values could be not present in all run.
Probably is because I'm a newbie in Talend and do not know how to well managed it but with the tAggregatedRow, I get only columns of my first table(id, code, value, day, time, utc). I saw that I can add manually the others (A to E) by editing the schema but I'm not sure is the correct way due to the previous explanation. I can export my project if it can help you to understand my issue.
Sure, the maximum of columns is 1500.
I'm currently looking about dynamical schema but it's look only within the enterprise version of Talend with subscription.
Wow! OK, I think I am misunderstanding your requirement now. You surely cannot be wanting to produce a single row with up to 1500 columns in it?
Yes I want. it's the reason why i try to do this transformation. I want one line with all value by code for one id at one moment (day,time,utc).
OK. We can do this, but I will need to know more. How does it need to be represented when it is done? Is it in a CSV file or is it for a databse. If it is for a database, my next question is do you have a table already configured to receive that data?
No it's only csv file but I manage to create an tFileInputDeliminted by using metadata menu. Like that i managed to create a dynamic schema and know i will retest your first idea
OK, I have a better idea that will be far more efficient.
1) You start with this table.....
Read from that in the same way you are doing it now. You should have 5 columns (id, code, value, day, time, utc). I am assuming you are OK with the data types. But since you are just intending to output these as Strings (all CSV fields are essentially Stinrg) it makes sense to just read them all as Strings.
2) Now for this part it is important that you understand that the assumption is that id, day, time and utc will not change in the one dataset.
Connect to a tJavaFlex.id = ""
In the Start Code section, use the following code.....
String id = ""; String values = ""; String day = ""; String time = ""; String utc = "";
In the Main Code section, use the following code....
id = row1.id; day = row1.day; time = row1.time; utc = row1.utc; values = values+","+row1.value;
In the End Code section, use the following code....
globalMap.put("id", id); globalMap.put("day", day); globalMap.put("time", time); globalMap.put("utc", utc); globalMap.put("values", values);
3) Now create a new SubJob starting with a tFixedFlowInput.
Create a column for each of the variables above (id, day, time, utc, values).
In the corresponding column, add the following code (change the values to suit the column)....
4) Connect that to a tMap. Create 1 output column called "content".
In that column concatenate all of the columns shown above (I will assume the row is "row4")....
Notice in the above I have put a "," between each of the columns but not between the row4.utc and row4.values. That is because the row4.values column will always start with a ",".
5) Now connect to a tFileOutputRaw component. This will output to your file. You are manually handling the CSV formatting and just outputting that to a single column.
If you look at your file and assuming that you have done everything correctly (and I have not made any significant errors....this is without having tried it), you should see what you want
Watch the recorded webinar!
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.
Test drive Talend's enterprise products.