Dynamic Schema tFileInputExcel to tOracleOutput throws ORA-01723

Highlighted
Four Stars

Dynamic Schema tFileInputExcel to tOracleOutput throws ORA-01723

I'm building a job to read blobs from table A, these blobs are excel files of unknown or half-known schema.

the blob is tMapped into ((java.sql.Blob)row4.FILE_DATA).getBinaryStream();  and then read as file from stream ((java.io.InputStream)globalMap.get("dyn_xlsx.content")) which works great.

 

I am attempting to Drop + Create a DB Table with a given name (provided in the input) then INSERT the data from the file.

  • Q: Are there any constraints on the column headers in the source file? (illegal characters, oracle keywords etc. or is this taken care of in the component?)

Each file is opened via tFileInputExcel which is using the "Dynamic" type (as described in various guides).

  • Q: When I examine the Dynamic column metadata it has all been mapped to "id_String", what is the recommended method to improve the quality of this typing to catch say: Date, Integer or Real

When I run the job I'm getting an ORA-01723 error in tOracleOutput which I assume is related to the CREATE TABLE statement.

  • Q: I'm not sure how the statement is built (Is there anyway to log /output the failed statement from Talend?)

 

[statistics] connecting to socket on port 3436
[statistics] connected
Country/RA/INV: Argentina; id_String->VARCHAR len:100
From_EudraVigilance: NO; id_String->VARCHAR len:100
Total_Cases: 19; id_String->VARCHAR len:100
Prompt: 19; id_String->VARCHAR len:100
Late: ; id_String->VARCHAR len:100
%_On_Time: 1; id_String->VARCHAR len:100
Exception in component tDBOutput_2 (kpi_load_blob)
[FATAL]: ariel_project.kpi_load_blob_0_1.kpi_load_blob - tDBOutput_2 ORA-01723:

 

 

 

  • Q: Optionally, if mandate explicit schema definitions (and put them in db or similar), how would I dynamically assign these schema to my components in runtime?

Thanks

Community Manager

Re: Dynamic Schema tFileInputExcel to tOracleOutput throws ORA-01723

Hi @torh,

 

You raise some interesting questions there. I will try to answer the ones I can.

 

1) I have not tried to create a table with a dynamic schema in Oracle (and don't have Oracle available to me now), but I have done it in MySQL. I've just retried it now and am able to create column names with a variety of headers from my Excel. Underscores are used to replace spaces. Are you struggling with particular headers?

2) I just tested this and found the same thing as you. Even though I set my Excel column formats to specific number types, the column types were still identified as String. I don't believe you will be able to solve this without a change in the Dynamic Schema code. I have raised a Jira for this here: https://jira.talendforge.org/browse/TDI-42184

3) The ORA-01723 error looks like it is caused by trying to set the size of a column (in a create statement) to 0. This could be because one of your columns in your Excel file is empty. I cannot test this at the moment. If you can test this and see that is the problem, I suggest raising a Jira about the Dynamic Schema used to create tables. I should point out I did not see this behaviour when creating a table with an empty column in MySQL.

4) I'm not sure I understand your last question. Can you explain in a bit more detail please?

Four Stars

Re: Dynamic Schema tFileInputExcel to tOracleOutput throws ORA-01723

Thanks @rhall_2_0 

1) I was just trying to rule out some potential problems. I can't find anything in the docs.

2) Thanks for that.

3) I've got null values, but not headers. It would really help if there was a way to see the SQL throwing the error. The exception would normally come from specifying a varchar type but not giving the length.

4) Sure. It seems I may have to abandon the generic route and require/define a schema for every file to be processed. My question then was, I will have to manage ~100 schema, I don't want to build 100 identical jobs (schema and destination table aside). How would I go about making the schema a parameter, can I use context for that?

Community Manager

Re: Dynamic Schema tFileInputExcel to tOracleOutput throws ORA-01723

In response to point 3, I think that since you have nulls and the type of the column (given point 2) will be varchar, that is probably what is happening. You should be able to see what column sizes are being approximated using this bit of code...

System.out.println(DynamicUtils.getCreateTableSQL(row2.DynamicColumn,"oracle_id"));

I looked at the code generated and have extracted this to identify column sizes. Simply add this to a tJava component that is fired after the dynamic row is first generated. In my case it was after "row2" had fired. I supply the row number and the dynamic column. The "oracle_id" section is my approximation of what it should be on your job. For mine it was "mysql_id". Hopefully this will give you more clues.

 

In answer to 4, there is a way around this. But it will not be nice and you will need some "knowns" from your source data. Given what we know about the Dynamic Schema with Excel, we can get everything we need apart from the correct column type. I have spoken to R&D and it looks like with Excel the Dynamic Schema functionality is slightly limited due to type information from Excel. With a database, the type information is easily acquired, not so with Excel. So we can get the number of columns, column names and approximate sizes of columns (although this may need looking into). Given this, if you can get metadata on column names/types per Excel sheet, you can cross-reference this info and dynamically build table create statements and inserts in a tJavaFlex. As I said, it is not idea, but it is a workaround. If you look at the code below...

 

Dynamic columns = row1.newColumn;

for (int i = 0; i < columns.getColumnCount(); i++) {  
    DynamicMetadata columnMetadata = columns.getColumnMetadata(i);  
    System.out.println(columnMetadata.getName() + ": " + 
columns.getColumnValue(i) + "; " + columnMetadata.getType());
}

 ....you can use this example "for loop" over the columns in a dynamic schema row, to build your insert statements. You could also use it to build your create statements (with a bit of cross-referencing with your metadata).

 

I know this is not ideal, but it would be quicker than building hundreds of jobs to get around this.

 

I will continue speaking to R&D about how these problems highlighted by you can be overcome with a change to the Dynamic Schema functionality.

 

What’s New for Talend Spring ’19

Watch the recorded webinar!

Watch Now

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch

Downloads and Trials

Test drive Talend's enterprise products.

Downloads