Six Stars

tSchemaComplianceCheck tripping due to misread of Excel input

I have a number of excel files (not sharable) from which I check using a tSchemaValidation before connecting to a tMap. The rows that are rejected are saved to an excel file and the error message states

ColumnName:exceed max length

When I check the columns they do exceed the max length but I believe its because the preceding column (ColumnNameMinus1) contents is now in ColumnName. 

Both columns are strings.

 

When I check the source data the ColumnNameMinus1 contains the correct value which is showing up in ColumnName when I ingest it

 

What could be causing the column contents to get mixed up when reading from excel?

  • Data Integration
6 REPLIES
Moderator

Re: tSchemaValidation tripping due to apparent misread of Excel input

Hi,

Which version of Talend are you using? Can you send me an example excel file for testing?

Best regards

Sabrina

Six Stars

Re: tSchemaValidation tripping due to apparent misread of Excel input

I'm using Talend 6.3.1 but unfortunately I can't share the input data

I thought I had added the following details to the thread

 

I checked some of the files which were tripping the schemaValidation and it turns out that my files have different schema (some 11 columns, others 14) I thought applying the schema to the tFileInputExcel component would trip an error at that stage.

What is the best way of handling the situation where I have multiple schema for the same type of file?

Tags (1)
Moderator

Re: tSchemaValidation tripping due to apparent misread of Excel input

Hello,

Are you referring to multiple sheets from excel? Why don't you check the 'all sheet' box on tFileInputExcel to read data from all sheets? Or you have multiple sheets with difference columns?

Can you please give us some example data which will be helpful for us to understand your situation?

Best regards

Sabrina

Six Stars

Re: tSchemaValidation tripping due to apparent misread of Excel input

I have multiple excel files with 1 sheet each (File1.xls,File2.xls...File20.xls) the issue (I think) is that the schema is consistent for "most" of those files (11 columns) but for a handful the input increases to 14 columns.

 

I've imported the most frequent schema into the repository but I can't seem to associate multiple schema (for the delta schema) with the same Excel File so it looks like I need to have multiple files.

 

Is the best way to handle this situation to process the reject flow from tValidateSchema and compare it to the next most frequent schema (14 columns) by replacing the tMap in this image with a tValidateSchema or is there a different way?

tSchemaValidate.png

Six Stars

Re: tSchemaValidation tripping due to apparent misread of Excel input

I've tried the following to Validate the input against multiple schema designs so the data is processes correctly but I've run into problems. In the job below

the top line attempts to read an excel file and compare it to the Schema (Dec2016), I want to configure the error handling so that the next line is triggered to compare the same file against another schema (13Columns)

Validating against multiple schema designsValidating against multiple schema designs

  1. If I set OnComponentError or on SubjobError from the 1st tFileInput to the lower line's FileInput, the lower line never executes.
  2. If I try "OnComponentError" from the tSchemaComplianceCheck instead I get a build error OnComponentError cannot be resolved to a variable
  3. If I try trigger "Run if" I don't get the errorCode listed as a parameter to test against
Six Stars

Re: tSchemaValidation tripping due to apparent misread of Excel input

Would a solution be to replicate the tFileExcelInput and test each flow against a different schema 

 

tFileInput -- tReplicate ----CheckSchema-Dec2016 -onComponentOk ---tMap

                                    \

                                     \ CheckSchema-13Columns -onComponentOk ---tMap