One Star

using talend to identify updates(like CRC) in a schema

I am running a monthly import of a schema into a DB via talend, and I want to be able to compare what has changed in the old schema (as in import 2 has different values than import 1) in the old DB since the last import. I want to update any values in the new schema that have been modified in the old schema (since the last import), however i want to keep everything else in the new schema i.e. anything that hasn't changed in the old schema I don't want to update in the new schema. Is it possible to use talend to add an extra column (e.g. in the DB of the new schema)or something to identify what rows for example have changed since the last import compared to the latest import?!
Tags (1)
15 REPLIES
One Star

Re: using talend to identify updates(like CRC) in a schema

Hi, I have done this on several projects. There is a tAddCRCRow in the Data Quality menu on the palette. Just add this to your job and put the appropriate column in your tables. Basically what you want is this:
1. A current_whatever table that will hold your most current import
2. A previous_whatever table that will hold your previous import
3. A history_whatever table that will hold the rows that changed
The basic job is this:
1. Truncate previous_whatever table
2. Copy current_whatever table to previous_table
3. Run import into current_whatever table and set the action to clear the table when it runs
4. Do a join on the CRC column and write the rejects to the history_whatever table
Hope this makes sense.
One Star

Re: using talend to identify updates(like CRC) in a schema

Thanks for the response.
I have the tAddCRCCRow that outputs the value for each row into a log file along with the columns that I want. However I am generating XML files that are inputted to a DB using a third party product ( it has to be done this way ). Also I want to be able to check the value of each column wihtin a given row, because I ONLY want to update the values within a row that have changed from the last import ( this is because values might have changed on the current schema and i want to keep these changes ) and leave the other values(in the current schema) unchanged. Finally I want a process that can do that filtering and then output the XML ( xml file) that I currently use to import into the DB.
The options that you gave below would be perfect if i could directly connect to the DB but I cant, I need to do it all in files instead.....
One Star

Re: using talend to identify updates(like CRC) in a schema

You could still use my method, just use your xml files instead and compare them directly. The beauty of Talend is that all inputs and outputs are treated the same way so you can join and sort on them using the same methods.
One Star

Re: using talend to identify updates(like CRC) in a schema

Ok fair enough. But what about then inspecting every column within each row to see which ones have changed?
One Star

Re: using talend to identify updates(like CRC) in a schema

For that I'm not sure. If anything, you could probably do a tMap and join on the history and current files. Then you could compare each current column to the history column.
One Star

Re: using talend to identify updates(like CRC) in a schema

thanks very much for the response.
anyone know what the problem with this is as an xpath query:
/root/row/FullName
<?xml version="1.0" encoding="ISO-8859-15"?>
<root>
<row>
<FullName>
causing:
Exception in thread "main" java.lang.Error: Unresolved compilation problems:
NameSpaceTool_tFileInputXML_3 cannot be resolved to a type
NameSpaceTool_tFileInputXML_3 cannot be resolved to a type
Syntax error on token "/", delete this token
root cannot be resolved
row cannot be resolved
One Star

Re: using talend to identify updates(like CRC) in a schema

Try this xpath:
//root//row//FullName
One Star

Re: using talend to identify updates(like CRC) in a schema

i was originally doing it right, but unfortunately in the docs, it has ' ' whereas it should " ".
does that make sense?
One Star

Re: using talend to identify updates(like CRC) in a schema

so i want it to loop on the row element, as there are multiple rows in the file:
Loop XPath query "/root/row"
then in the mapping table i have:
column XPath query
fullName FullName
the thing is i am reading in the results to a Tmap and outputting them to an xml file but nothing is appearing in the xml file?!
One Star

Re: using talend to identify updates(like CRC) in a schema

Sometimes in the docs they use a perl job which requires a ' ' If you're using java it's always " "
One Star

Re: using talend to identify updates(like CRC) in a schema

Ya I learnt that the hard way!!! Smiley Happy
How can you tell if it is definitely reading in the values from the file though? Cos I dont think it is reading in the values....
One Star

Re: using talend to identify updates(like CRC) in a schema

Add a tLogRow to your job and direct the output from your xml reader to the tLogRow. When you run the job, it will show the results on the screen.
One Star

Re: using talend to identify updates(like CRC) in a schema

i tried changing the name of FullName to FullNames
and the job still completes even though no element named fullnames exist in the input file......
Thanks for all the help by the way Smiley Happy
One Star

Re: using talend to identify updates(like CRC) in a schema

sorry just checked the tLogRow and it is reading in the correct values. So the problem is somewhere else Smiley Happy
Thanks again. Smiley Happy
One Star

Re: using talend to identify updates(like CRC) in a schema

Forgot to set the Loop element on the output file. d'oh
It appears to not loop correctly on the element though, as it only seems to read in the first element of the first row and output that one for all the rows.......
can these threads be merged into one?