Hi everybody, i am reading from a delimited file and writing the output in a table. The source file has this schema : a:varchar b:integer c:date c:varchar Some of the rows have more fields than the expected and they should be rejected, what i need to do is to save these rejected rows somewhere. Can i achieve that without knowing before how many fields the row has? Than you all
Some of the rows have more fields than the expected and they should be rejected,
Could you please elaborate your case with an example with input and expected output values? Are your looking for TalendHelpCenter:tSchemaComplianceCheck which helps to ensure the data quality of any source data against a reference data source. Best regards Sabrina
-- Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Hi Sabrina, as you see in the screenshot, my file has 2 fields: region_id: integer and region_name:varchar. The last 2 rows of the file have 3 fields and what i want to do is to save the rejected rows in all cases. tSchemaComplianceCheck would check types, nullability and length of rows against reference values but what if in my file i have a row that has the 2 first fields correct but it has other fields? tSchemaComplianceCheck will not detect this row. I want the third row of my file (3,b,c) to be rejected and saved somehow...is it possible? Thank you
Thank you Sabrina for your reply, actually, this resolve only in part my problem, i still need to save the whole rejected row in my output, including the additional fields if any, and your solution only show the first 2 fields because the source file schema has only 2 fields. For example: the row (3,b,c) is correctly rejected, but in the rejected output i only have: |=---+----+------------------------------- |code|name|errorMessage |=---+----+------------------------------- |3 |b | name.length() > 1 failed so i loose the third field (c)... Thanks again
Hi Sabrina, actually no...what i wanted to achieve is the following: I define the file schema as 2 fields Field1: int; Field2: String For some reason, the source file has some wrong rows that do not respect this schema and these rows have more fields (more separators) I want my job to reject these rows and to show me the rejected row entirely and not only part of it! I can't modify the schema definition of the source file because it should have only 2 fields and i want the job to reject those rows, if any, that have more fields and in my output i want to see something like this: The row: "3,b,c" was rejected because it has more fields than the expected! Thank you again