Multiple Record Type

Four Stars

Multiple Record Type

Hai,
I have a file in the following format.

G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G205abcd1234idpe20090805
G206abcd1234leyc20090805
G203wxyz5678efgh20090805
G203jsdf92342urfj20090805
I need to combine the subsequent G203, G204, G205 and G205 lines into one line
The record types G204, G205 and G206 are not mandatory
Thanks and regards,
Amirths
Community Manager

Re: Multiple Record Type

Hello Amiths
Please show us a expected output file format.
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
Four Stars

Re: Multiple Record Type

Hai SHONG,
Input file
=======
G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G205abcd1234idpe20090805
G206abcd1234leyc20090805
G203wxyz5678efgh20090805
G203jsdf92342urfj20090805
G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G205abcd1234idpe20090805
G206abcd1234leyc20090805
G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G205abcd1234idpe20090805
Output file
========
G203abcd1234efgh20090805|G204abcd1234jhdf20090805|G205abcd1234idpe20090805|G206abcd1234leyc20090805
G203wxyz5678efgh20090805
G203jsdf92342urfj20090805
G203abcd1234efgh20090805|G204abcd1234jhdf20090805|G205abcd1234idpe20090805|G206abcd1234leyc20090805
G203abcd1234efgh20090805|G204abcd1234jhdf20090805
G203abcd1234efgh20090805|G204abcd1234jhdf20090805|G205abcd1234idpe20090805
Imagine this as a set of transactions for an account
The transaction details continue in G204, G205 and G206 if the txn length is more than the per line limit
Here I dont have a unique key to merge the lines as I can have more than 1 set of transactions for an account
Basically I need to scan the records in a loop after G203 till I reach G206 or another G203
Thanks and regards,
Amirths
Six Stars

Re: Multiple Record Type

I think the fastest route ( if you don't have in the record some information to correlate the different records ) is to generate a synthetic key...
For your job I think the quickest job is to lay out a flow like
input file ---> tjavarow --> denormalize on full line -> fileoutput
to create a key to correlate records, in tjavarow place a code like
----------------------------------
int mykey = (Integer)(globalMap.get("MYSYNTHKEY") == null ? 0 : globalMap.get("MYSYNTHKEY"));
if(input_row.Column0.equals("G203")) {
//gen new key and store it
mykey++;
globalMap.put("MYSYNTHKEY", mykey);
}
output_row.FULLROW = input_row.Column0+input_row.Column1;
output_row.SYNTHKEY = mykey;
----------------
Then, in output you will have a full line of the input file plus a key that correlate all the relative Gxxx records so you can easy obtain your output file with tdenormalize on the FULLROW column.
Four Stars

Re: Multiple Record Type

Hai,
Thanks for the reply......
But the problem in my case is, the input file is a positional file.
The key generates a sequeunce number and so the position of the key varies for each set of transaction, which I cannot handle dynamically.
Thanks and regards,
Amirths
Six Stars

Re: Multiple Record Type

Amirths,
I assure you that the design I suggested produce the file you said in output and works as positional.
It is correct, try it.
The generated key is not ever increasing, but increases only when G203 is presented.
I.E.
Input with positional first column first 4 chars, second column the remaining:
-------
G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G205abcd1234idpe20090805
G206abcd1234leyc20090805
G203wxyz5678efgh20090805
G203jsdf92342urfj20090805
G203abcd1234efgh20090805
G204abcd1234jhdf20090805
G205abcd1234idpe20090805
G206abcd1234leyc20090805
-----
Synth Key Added with tjava ( another field )
-----
1G203abcd1234efgh20090805
1G204abcd1234jhdf20090805
1G205abcd1234idpe20090805
1G206abcd1234leyc20090805
2G203wxyz5678efgh20090805
3G203jsdf92342urfj20090805
4G203abcd1234efgh20090805
4G204abcd1234jhdf20090805
4G205abcd1234idpe20090805
4G206abcd1234leyc20090805

then tdenormalize on the full row excluding the key
------
G203abcd1234efgh20090805G204abcd1234jhdf20090805G205abcd1234idpe20090805G206abcd1234leyc20090805
G203wxyz5678efgh20090805
G203jsdf92342urfj20090805
G203abcd1234efgh20090805G204abcd1234jhdf20090805G205abcd1234idpe20090805G206abcd1234leyc20090805
-----

You can handle the synth key dinamically with tDenormalize component.

Hope that help
bye
Four Stars

Re: Multiple Record Type

Hai max,
Thanks for u r time.
But the issue is, I may get the feed with 10 million records approx
Thanks and regards,
Amirths
Six Stars

Re: Multiple Record Type

No problem...
To optimize:
- you can process only the changed data if you need to schedule
- or process data in chunks
- or you can process as a one shot ( anyway I think that for >10M records you should use 64bit version of talend )
bye