Talend Bigdata -POC-Use case Help

One Star

Talend Bigdata -POC-Use case Help

Hello Experts,
We are planning to start a POC for around 15 use cases by using Talend Big data open source edition and on successful we are planning to replace our existing commercial ETL tool with Talend Enterprise BIG data edition.
Could you please someone help me on implementing the below use case in Talend:-
One of our Sql server source table will be updating frequently in every 2 hrs with some transnational data (through some front end application) and we need to load that data into an Hadoop HDFS file (which is Big data environment)
We need to load the data from Sql server table to HDFS file in every 2hrs and each time we load the file we should extract only new or modified rows from the table instead of loading whole data in to the file (due to waste of space)
There is a ‘Load_date_Time’ column in source table but we can’t trust it.Hence each time we extract the data we need to compare the data with already loaded in previous cycle and load only new or changed rows in to target HDFS file.Also we don't have any control on source tables apart from just extracting the data.
And Talend job should be automated to run for every 2hrs.
How do we achieve above 2 scenarios,any help would be appreciated!
Thanks in advance!
Abhi
Moderator

Re: Talend Bigdata -POC-Use case Help

Hi Abhikriti,
Thanks for posting your job requirement here.
We have redirected your requirement to our bigdata experts and then come back to you as soon as we can.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Employee

Re: Talend Bigdata -POC-Use case Help

Abhikriti,
What DB you're using ? MS-SQL ? What does the data look like ?
We can offload the data from the DB to HDFS using tSqoop component and then do some post-processing using HIVE, MAPREDUCE or SPARK. (Please note the MAPREDUCE and SPARK are only available with Talend entreprise, that you can download and try).
The tELTHiveXXX components can be used to do the post-processing by creating Hive tables and results.
Best Regrads,

Calling Talend Open Studio Users

The first 100 community members completing the Open Studio survey win a $10 gift voucher.

Start the survey

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now