One Star

Load Multiple CSV files with different Structure

Hi all,
We have a project requirement where we are required to load 100+ CSV files into the oracle database. All 100 CSV files have different structure. Before loading the files into the database we need to perform basic validations like, 
- for Date column check if date is in correct format
- trim the white spaces for strings
- check if numeric data types columns actually consists of numbers
Now I have 2 queries:
- I am thinking of storing the name and schema of the CSV files in database. Read that meta data information of the file and then depending upon the meta data, load the data into the oracle tables. This way tomorrow is 100+ new CSV files are added for loading, I do not have to create a new job everytime.
Please suggest how it can be achieved. 
13 REPLIES
Moderator

Re: Load Multiple CSV files with different Structure

Hi,
Have you already checke talend dynamic schema feature which allows you to design schema with an unknown column structure (unknown name and number of columns).
Here is the article about:TalendHelpCenter:How to process changing data structure?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Load Multiple CSV files with different Structure

I am aware of it but this does not allow me to perform any data quality checks on the Data. This will be a simple load from file to the database or vice versa.
Thanks,
Sankalp
Five Stars

Load CSV file with different Structure to multiple Table

Hello,

         How can I import data from excel or csv file and dump data in two tables  State Name, State Code, City Name, City Code in State_City table and Sr No, Name, Address, Department, Designation, Date of Joining in Emp table? Here is the attached file.

Employee

Re: Load CSV file with different Structure to multiple Table

Hi Dhara,

  

        You will require two logical operations in this context.

 

      After reading the file (while reading, specify limit value as 5 so that you will read only header part), you need to push the data to a tjavarow or tmap. Since State Name is in column A and value is in column B, you can do a comparison operation. ie, if input_row.columnA is equal to "State Name:" then output_row.statname=input_row.columnB. Similar operation can help you to parse other header details.

 

     For reading the detailed content, read the data using texcelinput component again but this time specify the header as 5. In this way, you will be skipping the header components. 

 

     Hope this can resolve your issue. If it is working, please mark as accepted solution so that it will help Talend community.

 

Warm regards,

Nikhil Thampi

    

Tags (1)
Five Stars

Re: Load CSV file with different Structure to multiple Table

How can I write if else condition in tJavaRow component?

if (input_row.A=='State Name:')
{
output_row.statename = input_row.B;
System.out.println(output_row.statename);
}
else if (input_row.A=='City Name:')
{
output_row.cityname = input_row.B;
System.out.println(output_row.cityname);
}

I wrote this code but getting error 'Invalid Character constant'.

See the screenshot

2.PNG

Employee

Re: Load CSV file with different Structure to multiple Table

Hi,

 

You have assigned the value as constant and that is the reason for error. You will have to try like below in tjavarow.

 

if (!Relational.ISNULL(input_row.A) && input_row.A.equals("State Name:"))
{
output_row.statename = input_row.B;
}
else if (!Relational.ISNULL(input_row.A) && input_row.A.equals("City Name:"))
{
output_row.cityname = input_row.B;
}

 

If you think the suggestion has worked, please mark the topic as acceptable solution.

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: Load CSV file with different Structure to multiple Table

I am getting duplicate rows when I dump data into Oracle table and I am not getting state code value and city code value.

This is my code.

if (!Relational.ISNULL(input_row.A) && input_row.A.equals("State Name:"))
{
output_row.statename = input_row.B;
}
else if (!Relational.ISNULL(input_row.A) && input_row.A.equals("City Name:"))
{
output_row.cityname = input_row.B;
}
else if (!Relational.ISNULL(input_row.C) && input_row.C.equals("State Code:"))
{
output_row.statecode = input_row.D;
}
else if (!Relational.ISNULL(input_row.C) && input_row.C.equals("City Code:"))
{
output_row.citycode = input_row.D;
}

Five Stars

Load Excel file with different Structure

I am getting duplicate rows when I dump data into Oracle table and I am not getting state code value and city code value.

This is my code.

if (!Relational.ISNULL(input_row.A) && input_row.A.equals("State Name:"))
{
output_row.statename = input_row.B;
}
else if (!Relational.ISNULL(input_row.A) && input_row.A.equals("City Name:"))
{
output_row.cityname = input_row.B;
}
else if (!Relational.ISNULL(input_row.C) && input_row.C.equals("State Code:"))
{
output_row.statecode = input_row.D;
}
else if (!Relational.ISNULL(input_row.C) && input_row.C.equals("City Code:"))
{
output_row.citycode = input_row.D;
}

Can you help me to resolve this?

 

 

 


E1.PNG

Output:

StateName  CityName StateCode CityCode

Gujarat        null             123           null

Gujarat        abc            123           234

Gujarat        abc            123           234

 

Thanks,

Employee

Re: Load Excel file with different Structure

Hi Dhara3010,

 

        You will have to denormalize the dataset to get a single row of data. I have used the same excel sheet provided by you in your previous chat and I got the result set in single line. I have attached the sample job for your reference as zip file. Please make necessary changes according to your need.

 

         image.png

And the result I got is as shown below (you may have to reformat the date column according to your choice).

 

 image.png

The code used in tjavarow is as shown below.

 

//Code generated according to input schema and output schema
output_row.merge_data="Yes";
output_row.statename = "";
output_row.cityname = "";
output_row.statecode = "";
output_row.citycode = "";
output_row.fromdate = "";
output_row.todate = "";

 

if (!Relational.ISNULL(input_row.A) && input_row.A.equals("State Name:"))
{
output_row.statename = input_row.B;
}
else if (!Relational.ISNULL(input_row.A) && input_row.A.equals("City Name:"))
{
output_row.cityname = input_row.B;
}
else if (!Relational.ISNULL(input_row.A) && input_row.A.equals("From Date:"))
{
output_row.fromdate = input_row.B;
}

if (!Relational.ISNULL(input_row.C) && input_row.C.equals("State Code:"))
{
output_row.statecode = input_row.D;
}
else if (!Relational.ISNULL(input_row.C) && input_row.C.equals("City Code:"))
{
output_row.citycode = input_row.D;
}
else if (!Relational.ISNULL(input_row.C) && input_row.C.equals("To Date:"))
{
output_row.todate = input_row.D;
}

 

 

Warm Regards,

 

Nikhil Thampi

 

Five Stars

Re: Load Excel file with different Structure

Hi Nikhil Thampi,

Thanks a lot for this solution. It really solved my problem. It is the best solution.

 

Thanks,

Dhara

Employee

Re: Load Excel file with different Structure

Hi,

 

     Happy to help :-)

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: Load Excel file with different Structure

Hi,

    What if State Name is State Name1(Column name mismatch) in file or there is no data in file or value is not found or format is mismatch or full file is blank or record not found or improper file. How can I handle such file and data? i.e. Error handling

 

Thanks,

Dhara

Employee

Re: Load Excel file with different Structure

Hi,

 

     You will have to invoke the subsequent error flows based on the different business rules you are planning to create.

 

For example, you can check the number of records from incoming excel data file by checking the value of pseudo variable ((Integer)globalMap.get("tFileInputExcel_1_NB_LINE")). If the value of this variable is zero, then you can go for the error flow. Similarly you will have to define each error flow in your mind and catch the errors using Talend error handling components.

 

Warm Regards,

 

Nikhil Thampi