Hello members! I have a quick question and it doesn't necessarily pertain to the subject line although that is the current issue. I am loading 12 different flat files into staging and currently have one job that iterates through a control table and then loads the file and associated table using variables setup in the looping process and dynamic schema. I would like to modify this job to use a bulk db component for performance reasons and want to change the flow to load the flat file to another flat file (I add columns to the dynamic schema in a tMap) and then bulk load it. I would like to capture the number of rejects for auditing purposes and that is not possible with a bulk load component. My thought was to run the flat file through the tSchemaComplianceCheck before writing to the bulk load file, but that component does not support dynamic schemas. Can anyone tell me a design where I can get both performance gain (like bulk loading) but also get the necessary metadata about the load (insert count, reject count, etc.)? Thanks, Scott
The problem with the schema check is, you could have a couple of different schemas and the current component does not have a list of different possible schemas. One idea could be to setup a schema with all possible column names and check the dynamic schema by the column name. The next problem: CSV files produce in Talend only schemas with the data type String. The bulk loaders of the different database vendors provides in many different ways the information about the result of the import. Often the bulk loads only response with only a ok-message. I suggest to check the target table after the import to get the numbers.
Thanks for feedback. I do like the idea of doing a count on the target table, especially since this is staging. But, you're right, in order to get the file ready for bulk, it would also need the specific file "schema" to match the db schema in order to insure data integrity. I didn't think about that, but that leads me back down the road of non-dynamic and separate flows. The current design is maintenance friendly, a fairly straight-forward approach and provides the necessary metadata for auditing. I cringe at the idea of creating several subjobs or child jobs (per respective file/table) using conditional flows during the loop iteration in order to gain performance. I want the cake and be able to eat it too... Luckily the performance is well within the acceptable range, so I have some time to stew on it before making changes leading to a major release. I appreciate your input!