As per my analysis, for importing data from RDBMS to HDFS, we can do it either using tSqoop or tMSSql.
I.e. tSqoop can directly import the data to hdfs whereas using sql components we have to fetch data and then store in file and put the file on hdfs.
Now, I want to know. are there any other pro's and con's of these both approaches.
Ex. In sql approach I can manipulate the data before putting on HDFS, can we do this from sqoop as well?
So can someone please briefly elaborate this?
Thanks in advance!
In order to take advantage of MapReduce, you can use sqoop component to load data from RDBMS to HDFS directly.
Please have a look at Apache's documentation about Sqoop.