One Star

CSV File Schema Column order

I have bunch of .csv files. I need to make sure that the columns are in the correct order when they load in the database. Sometime the vendor might switch columns that are similar. (I am using tfileinputDelimited for the. csv file)
for example.
correct:
firstname, lastname
ryan, smith
but if out of order:
lastname, firstname
smith, ryan
I need this last one to be stopped and an error telling me that the columns are out of order.
How can I do this with a .csv file?
Thanks,
Ryan
(Using Java)
5 REPLIES
Seven Stars

Re: CSV File Schema Column order

I think the only way you could do it would be to read the header row of the file as one string (tFileInputFullRow), tNormalize it so each field name becomes a record, assign a sequence number and compare it to a file or table that contains the expected schema sequence (tMap).
Getting fancy, you could even read the data from the file into a dummy table in your database (made up of all strings with field names Col1, Col2 etc) and then using the results of your comparison above, construct a SQL statement in tJavaRow that will allow you to retrieve the fields from that dummy table in the correct expected order (tInput) so that the rest of your job will work correctly.

Re: CSV File Schema Column order

have your expected column order in a file like
firstname
surname
load it into a variable object as a ArrayList- vExpCol
load file header - load into another varible obj as above - vCols
then in a tJavaRowFlex,
(create on top part) ArrayList - vRows
middle part
load all columns into a local
vRows
compare the two ArrayLists for the row columns
if different then load like:
outrow.firstname = vRows.get( vCols.lastIndexOf( "firstname" ) ) ;
use Excel to make this code.
this is not complete - i am travelling using the mobi.
One Star

Re: CSV File Schema Column order

I think this post might be what you need....
http://www.talendforge.org/forum/viewtopic.php?pid=30540
One Star

Re: CSV File Schema Column order

I think the only way you could do it would be to read the header row of the file as one string (tFileInputFullRow), tNormalize it so each field name becomes a record, assign a sequence number and compare it to a file or table that contains the expected schema sequence (tMap).

I have attached a screenshot of what you described. Is this what you had in mind? Here's the output from my job.
Starting job columncheck at 08:05 03/07/2010.
connecting to socket on port 3785
connected
.-------+---------.
|matched col order|
|=------+--------=|
|line |newColumn|
|=------+--------=|
|GroupID|1 |
|Deleted|7 |
'-------+---------'
.-----------------+---------.
| unmatched col order |
|=----------------+--------=|
|line |newColumn|
|=----------------+--------=|
|XAP_CreateDate |2 |
|GroupName |3 |
|GroupCreationDate|4 |
|GroupCreator |5 |
|AdvisoryGroup |6 |
'-----------------+---------'
disconnected
Job columncheck ended at 08:05 03/07/2010.
Seven Stars

Re: CSV File Schema Column order

Pretty much although it might be more useful if tMap_2 left-joined only on the column name and the match output had a filter expression t.newColumn==row3.sequence. Then the nomatch output could include row3.sequence for comparison.
Note that in your example you only need the purple arrow selected for inner-join rejects but in my suggestion you need only the orange arrow selected for filter rejects.