Column data mismatch in hive

Five Stars

Column data mismatch in hive

Dear folks,

Present I am working on talend big data 6.4.1v.

I am ingesting data from oracle to hive by using tHdfsOutput, tHiveLoad components.

By using tHiveCreateTable component I was created table in Hive by using field delimiter comma(,).

After ingestion completed, verified records in hive there is some values jump from one column to another.

I replaced field seperator with "\t" , "I" after that also I am facing some mismatch column values.

Kindly suggest me which field seperator is better for me to avoid this issue.

 

 

Regards,

Rupesh.M

Five Stars

Re: Column data mismatch in hive

Hi Rupesh,

It sounds like your source data contains a lot of open text fields (such as comments on a record) - often these may contain pipes due to end users catching them while entering data.

 

It's a better option to sanitise the input rather than try to work around it. Using the tReplace component on the columns that can receive open text means you can replace special characters with a space or blanks. You could use a regex specifying which characters are acceptable (for example [a-zA-Z0-9]+ for only alphanumeric characters) so the rest get replaced with a space

15TH OCTOBER, COUNTY HALL, LONDON

Join us at the Community Lounge.

Register Now

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now