Six Stars

How to Handle Validating Excel Input where input format/schema changes during iteration

I'm having trouble gracefully handling changes in my Excel input during job execution. I have numerous monthly reports to ingest but the columns included in the reports is not static (they change between months). To cope with this I have used tSchemaComplianceCheck to validate the main flow after tFileInputExcel with 1 compliance check for each schema type I encounter in testing. My problem is the tFileInputExcel also has an input schema associated with itself so some rows are being rejected at this stage before they get into my schema checking stage. 

 

What is best practice when dealing with changeable inputs during a job? and is it possible to stop Excel performing any checks so that at least all my rows pass through my schema compliance check 

1 ACCEPTED SOLUTION

Accepted Solutions
Six Stars

Re: How to Handle Validating Excel Input where input format/schema changes during iteration

Hi,

 

How many schemas have you got to compare?

What is the maximum number of columns? What is the max data length possible in any column?

Consider  you have 10 columns max and 50 is the max length

 

create a metadata with 10 columns with 50 character length and string as a data type.

 

you may load that into a thashoutput  and use thashinput  everytime to compare in each schema type

 

Cheers!

Gatha

 

10 REPLIES
Six Stars

Re: How to Handle Validating Excel Input where input format/schema changes during iteration

Hi,

I had a similiar situation. 

I have created reject row output on tFileInputExcel and pointed to reject file once again.

Because some of the data cannot be read by Talend

like floating number data and the data type is Integer, this will get rejected at

This will get rejected at tFileInputExcel itself.  

 

I hope it helps

 

Six Stars

Re: How to Handle Validating Excel Input where input format/schema changes during iteration

At the moment my process looks like this. 

I read the Excel file with schema 1 applied then compare to the same schema. If there is errors (row40) I can filter by error code at attempt to process the file using a different schema. My problem is not all rows are submitted through the schemaCompliance component as rows get rejected (row1).

 

I'd love to have only 1 reject row where I can capture all ingested rows that didn't fit schema 1 and then compare those against schema 2 and if correct them process the file against schema 2 instead. How does my setup compare to yours @gatha_vdm

tExcel-To-SchemaValidation.png

Six Stars

Re: How to Handle Validating Excel Input where input format/schema changes during iteration

Talend does a basic schema check at tFileInputExcel.

Because it tries to store the data in the respective data type.

 

Option1

Create a metadata with all the column as a string without any not null constraint. 

In the next stage, you may convert them to the respective data type(The data might fail in this step itself.). 

 

Option 2

Pls use the reject output of tFileInputExcel similar way you want to use the tSchemaCompilationCheck

 

Cheers! 

Six Stars

Re: How to Handle Validating Excel Input where input format/schema changes during iteration

Talend does a basic schema check at tFileInputExcel.

Because it tries to store the data in the respective data type.

 

Option1

Create a metadata with all the column as a string without any not null constraint. 

In the next stage, you may convert them to the respective data type(The data might fail in this step itself.). 

 

Option 2

Pls use the reject output of tFileInputExcel similar way you want to use the tSchemaCompilationCheck

 

Cheers! 

Tags (1)
Six Stars

Re: How to Handle Validating Excel Input where input format/schema changes during iteration

Should "enabling" the "Die on Error" for the tFileInputExcel mean all my rows from the excel file will get sent to the next componment (tSchema) as there isn't a "Reject" flow to handle?

Six Stars

Re: How to Handle Validating Excel Input where input format/schema changes during iteration

Hi,

 

if there is a possible record can go into reject flow then the program might fail if you choose to die on error.

 

Cheers!

Gatha

 

Six Stars

Re: How to Handle Validating Excel Input where input format/schema changes during iteration

Pity, the solution I have appears to need to iterate through the entire file even when a read-error is detected in tFileInputExcel before the step where I compare it to a different schema can start.
Six Stars

Re: How to Handle Validating Excel Input where input format/schema changes during iteration

Hi,

 

How many schemas have you got to compare?

What is the maximum number of columns? What is the max data length possible in any column?

Consider  you have 10 columns max and 50 is the max length

 

create a metadata with 10 columns with 50 character length and string as a data type.

 

you may load that into a thashoutput  and use thashinput  everytime to compare in each schema type

 

Cheers!

Gatha

 

Six Stars

Re: How to Handle Validating Excel Input where input format/schema changes during iteration

I don't have that many (2 or 3 per file type) but not sure its scalable if the number of schema increases. Would your suggestion not cause problems when a columns contents are shorter than max? i.e. could 2 columns be detected as 1 ? 

Six Stars

Re: How to Handle Validating Excel Input where input format/schema changes during iteration

I just gave some examples. It doesn't need to be that many. Even if it is many if you specify the parameters and criteria right it shouldn't give a problem. 

 

"i.e. could 2 columns be detected as 1? "

 

This will never happen as Excel is storing each and every column in separate cells.