One Star

tHBaseInput - Incremental data read from Hbase

Problem Statement1: We are working on a project where we are supposed to read only the incremental data from Hbase table (Cloudera 4.6). A background process is running 24*7 and it is loading the data in Hbase table.
Approach: we are using a tHBaseInput component to read the data from the Hbase table, but we are not able to find any filter where we can provide the timestamp value so that it can read only the data which is loaded after the last run. 
I am not sure if i am missing something on the component or is it the limitation in talend tHBaseInput. I am using 5.4.2 Talend Big data.  
Problem Statement2: By default tHBaseInput uses scan to fetch the data form Hbase table and the cache size of this object is set to 1, which means the map-task will make call back to region-server for every record processed. Due to this the tHbaseInput is taking a lot of time to read from Hbase table (30 Mins for 1 Lakh records). We tried to do it in java by creating a new scan object and setting the cache size as 1000 and we were able to read 1 Lakh records in just 2 Minutes.
Do we have any properties in tHBaseInput where we can increase the default cache for scan. 
4 REPLIES
Community Manager

Re: tHBaseInput - Incremental data read from Hbase

Hi
1.) Is there a timestamp field in your table?
2.) Have a try to add the related property in the advanced settings panel of tHbaseInput component.
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: tHBaseInput - Incremental data read from Hbase

Hi
hemant056, have you got any solution for this?
I am facing same problem. I have input data in the form of epoch time. Urgent help needed.
One Star

Re: tHBaseInput - Incremental data read from Hbase

Hi Shong,
Can you please help me with this? I have timestamp field in my input database, which is epoch time in byte[].
Four Stars

Re: tHBaseInput - Incremental data read from Hbase

Hi
I have a similar requirement same as the above. Can I read the data from HBase table based on regions??