One Star

Comparing two .csv files and displaying the differences.

I am looking for a way where I can compare two csv files.
I have used the tfilecompare and recieved the basically messages that the files differ as my
error message.
What I want is to include not only that message but a display showing me the differences between the files.
Can someone give me an idea of how to do this?
Thanks
9 REPLIES
One Star

Re: Comparing two .csv files and displaying the differences.

Hi,
This post might help: http://www.talendforge.org/forum/viewtopic.php?id=12930
Regards,
Rick
One Star

Re: Comparing two .csv files and displaying the differences.

It seems to me that this post is a little different than what I am doing.
I already have csv files set up in the tFileCompare both a reference file and the file I want to compare them against one another.
So the schema that is set up for this is the following file, file_ref, moment, job, component, differ and message.
When I attempted to use the tmap there is no schema other than above so how would I set the schema to pull the differences
within the csv files?
Thanks
One Star

Re: Comparing two .csv files and displaying the differences.

Hi,
The tFileCompare only seems to provide a statement of whether the files are the same or different. It seems to be that simple, which is why the schema is fixed.
Can you be a bit more specific about what differences you are looking for, e.g. inserts, updates, deletes or maybe row numbers?
Regards,
Rick
One Star

Re: Comparing two .csv files and displaying the differences.

What I have is two CSV files which will have row names suchas
Source_System Standard Transaction_ID File_Name File_Type File_Extension Product Issue_Type Export_Codes Applicable_CVNs Inline_Applicable_CVNs
Underneath these titles is data which has been incorporated to these in each CSV.
Both CSV files will have the same row names. And what I am trying to do is validate that the with given transaction both CSV files match one another.
If there is any differences I need to see that output.

The big issue is that there are thousands of rows of data to go through and I want to pick out just the information that is different within the CSV files. And then output that information into either a tlogrow or an texceloutput.
I hope this explains what I am trying to do.
If more information is needed please let me know.
Thanks
Seventeen Stars

Re: Comparing two .csv files and displaying the differences.

hi,
not sure I unserstand all your need ... Smiley Happy
do you need to validate the schema between 2 csv files ? (which one is reference so ?)
You need to compare 2 files line by line ?
both ?
I'm not sure what you need to compare
regards
laurent
One Star

Re: Comparing two .csv files and displaying the differences.

What I need to compare is the data that is in the reference matches the data in test file.
If it doesn't I need to receive a report which will tell me the differences. Sometimes I may have a defect that
maybe present.
One of these CVS files comes from my Unix box. And one is output that I am getting from my database.
I need to validate these two CVS files.
Six Stars

Re: Comparing two .csv files and displaying the differences.

Use tmap with a join link on the field you want to cross check and then use rejects link
One Star

Re: Comparing two .csv files and displaying the differences.

can you pls..give me job desgin.
One Star

Re: Comparing two .csv files and displaying the differences.

If you need to compare a single field, the flow would be
tFileInput(test)
|
V
tFileInput(new) -->tMap (inner join on match field)
|-------catch inner join rejects - differences file.
If the entire row needs to be exactly the same, add an additional step between each inputs and the tMap
to consolidate all fields in the row into a single field. ( This could be done a few ways - example in a tMap through concatenation).