Four Stars

SQL server SQL code traping

Total rookie to Talend DBA being told Talend doesn't support trapping SQL statements like those doing inserts?  Additionally how should a developer insert error rows into a established error table?

5 REPLIES
Thirteen Stars

Re: SQL server SQL code traping

You can see all SQL statements that will be fired at databases, but you may need to be able to read Java to get hold of Insert and Update statements. Click on the code tab to search for those. This is because Talend handles those for you automatically. You *can* write your own using the t{database}Row components if you wish. Your Select statements you can prepare whenever you want and therefore write them to the System.Out quite easily.

 

The error rows question depends entirely on what sort of errors you are dealing with. You will need to go into more detail regarding this, but I can't think of an error you couldn't put into an error table of your choosing.

Rilhia Solutions
Four Stars

Re: SQL server SQL code traping

Thanks for taking time to address this.  I need the insert statement have someone who may read Java and is using Die on Error but when I ask about the failing insert statement and data row values it doesn't quite click.

Anything like a procedure for getting the SQL at failure time or even capturing all SQL (so we can figure it out from there) would be very helpful.     On the insert data capture side of the question our target table has RI and unique indexes that can bounce insert attempts so every table has an _Error table shadow with oversized character fields so the values failing the insert can be captured.  These _error tables have the same attribute names as the base table.  Is there a way to grab the failed insert SQL alter the table to be the same name only with _error on the end and do another insert?

Thirteen Stars

Re: SQL server SQL code traping

Unfortunately the the t{database}Output components do not output the SQL they use. As I said, you will have to search the code for that. As an example, it will look something like below...

resourceMap.put("conn_tMysqlOutput_1", conn_tMysqlOutput_1);
				conn_tMysqlOutput_1.setAutoCommit(false);
				int commitEvery_tMysqlOutput_1 = 10000;
				int commitCounter_tMysqlOutput_1 = 0;

				int count_tMysqlOutput_1 = 0;

				String insert_tMysqlOutput_1 = "INSERT INTO `" + "new_table"
						+ "` (`id`,`name`,`age`) VALUES (?,?,?)";

				java.sql.PreparedStatement pstmt_tMysqlOutput_1 = conn_tMysqlOutput_1
						.prepareStatement(insert_tMysqlOutput_1);

The important bit, as far as you are concerned, is knowing why a row was rejected. This can also be a bit fiddly. Different database components have different ways of enabling this functionality. There are 2 steps to enabling this.....

1) In ALL database components, switch off "Die on error" on the "Basic settings" tab.

2) This is the bit that can change per type of database. For MySQL, turn off the "Extend Insert" option on the "Advanced Settings". For SQL Server and Oracle, turn off "Use Batch Size" on the "Advance Settings" tab. Have a play around for other DB types.

 

Once the above is done it will reveal a "Rejects" output next to "Main" when you check the possible outputs of the component. The rejects output will give the row details of the row that failed and why the error occurred. For example, here is a sample reject output that I have quickly knocked up....

91|Martin|23|23000|Duplicate entry '91' for key 'PRIMARY' - Line: 91
33|Warren|56|23000|Duplicate entry '33' for key 'PRIMARY' - Line: 92
11|Benjamin|2|23000|Duplicate entry '11' for key 'PRIMARY' - Line: 93
82|Calvin|24|23000|Duplicate entry '82' for key 'PRIMARY' - Line: 94
14|Richard|60|23000|Duplicate entry '14' for key 'PRIMARY' - Line: 95
26|Gerald|32|23000|Duplicate entry '26' for key 'PRIMARY' - Line: 96

This shows an id, a name an age and the cause of the error. The SQL being generated above, is the SQL that failed to fire correctly.

 

So, not necessarily the easiest way of identifying the issue (and certainly not a way that will produce a terribly efficient process), but a way in which you can get the data you require.

 

Regarding your flow of data, I have always found that a far easier way of logging rows that haven't completed correctly is to keep track of the data as I go. For example, my first "staging" table will be a table of nothing but varchars. Every column sized to allow anything. I make it VERY difficult for data not to be loaded. I then go through a series of tables tailored to each process (data validity, lookups, transformation, integrity, business rules, etc, etc). The ids are generated for the first table and then carried throughout the rest of the tables. Then, if a row does not make it to a table, I already have a log of what did not make it by simply looking for missing records from the previous table. This suits my current project since I am looking for a high level of data lineage, but I can understand that this might not be ideal in a situation where you are looking for high efficiency. 

 

Anyway, I hope this helps :-)

 

Rilhia Solutions
Four Stars

Re: SQL server SQL code traping

Thanks again helpful stuff!   I think that resolves the first question the second question solution you offered looks like a great way to enforce master data management rules but what I need is a one pass solution due to high record counts.    I too use oversized fields in my table_of_the_same_name_only_ERROR tables.   I just want the failed inserts to go to that without passing through the data a second time.  Any ideas?

Thirteen Stars

Re: SQL server SQL code traping

You can always just log the records from your rejects data into your error table. You do not need to include the error. The rejects row will contain all of the data you tried to insert, but failed to. 

 

Rilhia Solutions