Five Stars

group records and create file for each record

Hi,

I currently have a Excel macro enables workbook which reads the data in file then groups the data based on certain criteria, say for example Name and timesheet data. It then creates an individual workbook for each of these groups and names the workbook based on the name and the timesheet value column. It sorts the data first then inserts new line after each group, then copies this data and paste into another workbook, save and closes the file (current workbook), now read the next set of data in the original file and copies and paste the next set of group data and puts in new book and so on utill it reaches end of file with no data and closes the application.

I have been asked to rewrite this job in Talend as the timesheet data we longer receive in excel. The timesheet data comes in a pipe text file. I have been able to create a excel file from the pipe text file.

However, the problem I am having now is I need to read the output file which has been produced from the pipe text file and group this data and create a individual file for each set of group data. I have searched and cannot find any that meet my requirements. Please see screen shots. I will be ignoring Column 0 = "I". Lucky the data is already sorted for me this time, however  might not be next time.

Now I want to group the data by the name and date, and produce individual output (excel file) where it will need to name these file based on certain column names.

I am using talend 6.5 therefore do not have tMatchgroup and cannot use tXLMAP as this requires joins. I just want the job extract the data from the pipe file then convert  the file,  then group the data and produce individual files using one single output component as I do not know how many rows the text file would have. I thought I had found a way but the final output file is blank.

Please any help would be appreciated. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Employee

Re: group records and create file for each record

Hi,

 

     Attached job will help you to parse the data. You will have to change the formatting of the date according to your need.

image.png

 

     The job will identify all the distinct values of the file and store them in a hash value.

 

     Then based on this control values from hash input, the file will be read multiple times to fetch the necessary data. 

 

      One the data is separated, they will be moved to multiple directories based on column 3 and the file name will be concatenation of column 2 and column 15 (screenshots below). You can change the file and directory structure according to your needs.

      

image.png

 

image.png

 

image.png

 

The output of file B is as shown below.

image.png

 

 

Note:- 1) Since multiple reads will be happening on same input file, test the performance if you are planning for huge data volumes and make necessary changes.

2) If you want to process multiple input files, add a tfilelist at the beginning of first subjob.

 

      If the idea has helped you, could you please mark the topic as solution provided.It will help in enriching the Talend community.

 

Warm Regards,

 

Nikhil Thampi

21 REPLIES
Employee

Re: group records and create file for each record

Hi,

 

     Could you please share some sample input data for analysis and the expected output file format?

 

      There are lo of solutions we can do but let us try to do solutioning based on current data structure.

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: group records and create file for each record

@nikhilthampi thank you for your quick response, much appreciated. I did add screen shot but seems like it did not attach.

 

So Once the pipe text file is converted to excel, data should like this. Currently already grouped. I this data is read, it needs to ignore  column 0= "I"

Column0

Column1

Column2

Column3

Column4

Column5

Column6

Column7

Column8

Column9

Column10

Column11

Column12

Column13

Column14

Column15

Column16

I

14001141

Testing

                                  

PO BOX

 

                                        

 

                                       

Manchester

 

GB11111111

Certified Tax Invoice   

5109341818

2019

 

18.06.2018

T

9827861

Blog

A

 

11.06.2018

111111

13

7.5

7.5

Hours

   

tester

A

18.06.2018

16.06.2018

35

T

9827861

Blog

A

 

12.06.2018

111111

13

7.5

7.5

Hours

   

tester

A

18.06.2018

16.06.2018

35

T

9827861

Blog

A

 

13.06.2018

111111

13

7.5

7.5

Hours

   

tester

A

18.06.2018

16.06.2018

35

T

9827861

Blog

A

 

14.06.2018

111111

13

7.5

7.5

Hours

   

tester

A

18.06.2018

16.06.2018

35

T

9827861

Blog

A

 

15.06.2018

111111

13

5

5

Hours

   

tester

A

18.06.2018

16.06.2018

35

I

14001141

Testing

                                  

PO BOX

 

                                       

 

                                        

Manchester

 

GB11111111

Certified Tax Invoice   

5109341818

2019

 

18.06.2018

T

9828186

Blog

B

 

11.06.2018

111111

1

7

1

Days

   

tester

A

18.06.2018

16.06.2018

4

T

9828186

Blog

B

 

13.06.2018

111111

1

7

1

Days

   

tester

A

18.06.2018

16.06.2018

4

T

9828186

Blog

B

 

14.06.2018

111111

1

7

1

Days

   

tester

A

18.06.2018

16.06.2018

4

T

9828186

Blog

B

 

15.06.2018

111111

1

7

1

Days

   

tester

A

18.06.2018

16.06.2018

4

T

9828186

Blog

B

 

12.06.2018

111111

1

7

0

Hours

Z003

tester

A

18.06.2018

16.06.2018

4

  I need talend to group the above as below. And based on this group produce a output for this first set of record. I need the job to loop through so that it produced new files for each group.

Column0

Column1

Column2

Column3

Column4

Column5

Column6

Column7

Column8

Column9

Column10

Column11

Column12

Column13

Column14

Column15

Column16

T

9827861

Blog

A

 

11.06.2018

111111

13

7.5

7.5

Hours

   

tester

A

18.06.2018

16.06.2018

35

T

9827861

Blog

A

 

12.06.2018

111111

13

7.5

7.5

Hours

   

tester

A

18.06.2018

16.06.2018

35

T

9827861

Blog

A

 

13.06.2018

111111

13

7.5

7.5

Hours

   

tester

A

18.06.2018

16.06.2018

35

T

9827861

Blog

A

 

14.06.2018

111111

13

7.5

7.5

Hours

   

tester

A

18.06.2018

16.06.2018

35

T

9827861

Blog

A

 

15.06.2018

111111

13

5

5

Hours

   

tester

A

18.06.2018

16.06.2018

35

The excel output file should then save the file using the naming conventions of column 2 and column 15.

Employee

Re: group records and create file for each record

Hi,

 

     Attached job will help you to parse the data. You will have to change the formatting of the date according to your need.

image.png

 

     The job will identify all the distinct values of the file and store them in a hash value.

 

     Then based on this control values from hash input, the file will be read multiple times to fetch the necessary data. 

 

      One the data is separated, they will be moved to multiple directories based on column 3 and the file name will be concatenation of column 2 and column 15 (screenshots below). You can change the file and directory structure according to your needs.

      

image.png

 

image.png

 

image.png

 

The output of file B is as shown below.

image.png

 

 

Note:- 1) Since multiple reads will be happening on same input file, test the performance if you are planning for huge data volumes and make necessary changes.

2) If you want to process multiple input files, add a tfilelist at the beginning of first subjob.

 

      If the idea has helped you, could you please mark the topic as solution provided.It will help in enriching the Talend community.

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: group records and create file for each record

@nikhilthampi Much appreciated. I will have a look at attached and let you know how I get on. Once , again thank you so much.

Five Stars

Re: group records and create file for each record

@nikhilthampi

 

Much appreciated. I will have a look at attached and let you know how I get on. Once , again thank you so much.

Five Stars

Re: group records and create file for each record

I am unable to import the project due to version compatibility. I am working on Talend 6.5, and generally able to import jobs but 6.5 is not allowing me due to the latest version used by yourself.
Employee

Re: group records and create file for each record

Hi,

 

    I created the job in Talend Version 7. Could you please download TOS version 7 for importing the data?

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: group records and create file for each record

@nikhilthampi

Have used talend Open Studio big data 7. the the job runs with little changes made to it however nothing is created in the output file for each person, also it is created each line rather than grouping the data in one sheet. So instead of having say 10 files, I have 314 files. The last tmap is showing 0 rows. Also the file name is not showing the correct columns, it showing the firstname the date and rate, rather than, firstname, surnanme and date.

 

Employee

Re: group records and create file for each record

Hi,

 

     Between first subjob and second subjob, you are using On component ok instead of ideal way of doing (ie On Subjob OK).

 

      Also it seems your aggregation component has not configured correctly. Please refer the columns I have used for aggregation ( I have used the User name, Type and Date for the grouping).

 

image.png

 

 

In my job, the input data got grouped to 2 after aggregation layer but it seems your grouping has resulted in almost same values. 

 

A good idea to debug the job is always to see output using a tlogrow to make sure that your getting the expected values after adding the component.

 

Please use my job as a reference point but always make necessary changes according to your exact project requirement.

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: group records and create file for each record

I figured out the incorrect naming convention, the aggregated component (input outcome) i had different columns selected.
Employee

Re: group records and create file for each record

Fantastic job :-)

 

A good idea will be to always have unique names for your components rather than Column1, Column2 etc. This will help in easy debugging.

 

Could you please mark the topic as solution provided as the issue has been resolved now?

 

It will help the Talend community for future reference.

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: group records and create file for each record

@nikhilthampi

 

On Sub Job OK was not available to me through the fileinput, I have now used "on sub job ok" from tfilelist.
With regards to the aggregated component, I have renames the fields and used the same columns as your job to create the grouping but the grouping is not working and not outputing any data in the text file.

 

Employee

Re: group records and create file for each record

Hi,

 

     On SubJob Ok will eb available only for the first component in a Subjob. 

 

      The data fetch is happening correctly in the sample job I had given. So the issue could be some minor mistakes when you recoded it at your end. Please try tlogrow to see the results after connecting each component to verify the effect of your changes.

 

Warm Regards,

 

Nikhil Thampi

 

      

Five Stars

Re: group records and create file for each record

I think the issue maybe the input file which I using for mapping for the end result. I have used a variable for file path assuming it will go back to tfilelist_1, but I don't think it works that way, is that correct?

Employee

Re: group records and create file for each record

Hi,

 

    I have attached the excel file which I used based on the data you shared in the post. Could you please compare all the files you are having is having same format.

 

    Once you connect a tfilelist -> tfileInputExcel , you will have to use the variable ((String)globalMap.get("tFileList_1_CURRENT_FILE")) to get the file name from file list.

 

Note:- Change the value of number after tfilelist in the variable based on the component actual name.

 

Warm Regards,

 

Nikhil Thampi

 

 

Five Stars

Re: group records and create file for each record

I convert the pipe text file to excel, then use this file to read the data and produce excel file based on aggregation. I do not need to exclude "I" as the extraction ignore this column due to the data type.

Five Stars

Re: group records and create file for each record

Please see my below reply where I have attached my project and text file. Thank you

Employee

Re: group records and create file for each record

Hi,

 

      Once you read the distinct values, you need to pick them from all of your input files. So you will need another tfilelist in your second subjob.

 

      In between, it is a good practice to arrange the components in straight line rather than in zig-zag manner for better readability.

 

       I have added tFilelist_2 intentionally in U shape so that it is is easily noticeable. Once you understand the flow,please make the flow as a straight line.

 

image.png

 

Since  you have got the solution for your original query, could you please mark the topic as solution provided? It will help to enrich the Talend community for later references.

 

Warm Regards,

 

Nikhil Thampi 

 

 

Five Stars

Re: group records and create file for each record

Apologies, I had some other components which was deactivated. the job flow has been straighten for readability. I have also marked original query as "Accept as Solution".
I have used the the project you craeted and just reset the fields to the "alias" and that does not work in grouping the data. The issue is on grouping.
Five Stars

Re: group records and create file for each record

Thank you soo much for all your help with this. I have finally managed to get it working.
Employee

Re: group records and create file for each record

Very good Shaf.

 

Enjoy programming in Talend

 

Warm Regards,

 

Nikhil Thampi