As per my analysis, for importing data from RDBMS to HDFS, we can do it either using tSqoop or tMSSql.
I.e. tSqoop can directly import the data to hdfs whereas using sql components we have to fetch data and then store in file and put the file on hdfs.
Now, I want to know. are there any other pro's and con's of these both approaches.
Ex. In sql approach I can manipulate the data before putting on HDFS, can we do this from sqoop as well?
So can someone please briefly elaborate this?
Thanks in advance!
In order to take advantage of MapReduce, you can use sqoop component to load data from RDBMS to HDFS directly.
Please have a look at Apache's documentation about Sqoop.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Learn how to make your data more available, reduce costs and cut your build time
Read about OTTO's experiences with Big Data and Personalized Experiences
Pick up some tips and tricks with Context Variables