As per my analysis, for importing data from RDBMS to HDFS, we can do it either using tSqoop or tMSSql.
I.e. tSqoop can directly import the data to hdfs whereas using sql components we have to fetch data and then store in file and put the file on hdfs.
Now, I want to know. are there any other pro's and con's of these both approaches.
Ex. In sql approach I can manipulate the data before putting on HDFS, can we do this from sqoop as well?
So can someone please briefly elaborate this?
Thanks in advance!
In order to take advantage of MapReduce, you can use sqoop component to load data from RDBMS to HDFS directly.
Please have a look at Apache's documentation about Sqoop.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Pick up some tips and tricks with Context Variables
Take a look at this video about Talend Integration with Databricks
Learn how<SPAN>to modernize your Cloud Platform for Big Data Analytics with Talend and Microsoft Azure</SPAN>