Batch processing large set of MySQL Data

Five Stars

Batch processing large set of MySQL Data

Hi,

I have a billion of records in MySQL database. I want to read the records batch-wise (say 1000 records at a time), with those set of records i want to map to desired columns and convert to excel sheet or XML file format. Each file should contains 1 lac rows of records. Found this link 
https://help.talend.com/reader/Dx4Xf8ykIvnjCCGyFrnUWw/aqGkWGJGV3o7u_MYna4K4A similar to how i wanted so followed this link and created a job as shown below:

TalendJobWithTitle.JPG

but its taking more time for small records for example: i run 2 lac records it almost took 1 hour 45 mins. Please suggest any other process to achieve my expectation or need alteration to do. 

 

Thanks and regards.

Employee

Re: Batch processing large set of MySQL Data

Hi,

 

     You can use tparallelize component to do data extraction faster and perform all the subjobs after that.

 

      The first step will be to identify the total number of records and pick them based on some specific criteria (say data load time range). The first sub job after tparallelize should process first 25%, second sub job should take next 25% etc.

 

      Also increase the batch extraction size from MySQL to higher value than current 1000 records since you would like to store 1 lac (100k) records per file. This will also increase your throughput.

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: Batch processing large set of MySQL Data

Hi Nikhil,
I'm using TOS_DI-V6.1.0 , so i couldn't find component which you have mentioned.
Can you please suggest alternative.

Thanks and Regards.
Employee

Re: Batch processing large set of MySQL Data

Hi,

 

     No worries. We can still do lot of fine tuning.

 

a) First step is to avoid reading the files multiple times. You are reading 1000 records from MySQL to Delimited file and then copy it from there to an Excel file. Instead read the data from MySQL and push it to target excel file directly. You can use "Append existing file" option to move data to existing file (Use tfiletouch to create empty file at beginning). This will avoid multiple read issue from your current job.

b) Change the query to increase the fetch size to 20,000 records and increase it by 20k to check the performance. If the data fetch is working fine for 100,000 records in one go, you can avoid the looping while writing to same target excel file.

c) In run tab of the job, go to advanced settings, increase the -Xms and -Xmx parameters to give more memory to the job. But please do not give astronomically high values. Check your server capacity as a reference point.

d) Another pan point could be in the data fetch query itself. See whether the query performance is good. If the query joins are badly written, the data fetch will take longer time. See the explain plan and make necessary tuning there also.

 

If the details have helped you to find the path ahead, please mark it as solved to help Talend community.

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: Batch processing large set of MySQL Data

Hi Nikhil,

Thanks for your quick reply. 

  •  First i'm reading records from MySQL and mapping using tMap to get desired columns (say i'm considering 50 columns out of 200) and based on some condition i need to generate single excel file and multiple excel files so i'm reading MySQL records to delimiter then push into the target excel-file.
  • Second point you specified to increase the fetching size, i have tried but didn't work .
  • Other settings that you specified know i have already done.
  • "SELECT   *  FROM table1  WHERE  table1.jobId =52  limit 1,1000"  this a sample query i'm using in tMysqlRow component.

 

Please help me out on this.

 

Thanks and Regards

Employee

Re: Batch processing large set of MySQL Data

Hi,

 

      If you are using only 50 columns out of 200 columns from MySQL table, you should not fetch these columns from DB at all. This will increase the efficiency of data fetch from DB layer. You should fetch a column from DB only if you really need that column.

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: Batch processing large set of MySQL Data

Then how i'm supposed to do. Kindly suggest me the flow.

 

Warm Regards

Employee

Re: Batch processing large set of MySQL Data

Hi,

 

    If you assume there is an employee table with columns like id, employee_name, salary, DoB, joining_date and lot more columns in the source table.

 

    If you need only id and employee_name from that table, your select query in tDBInput should contain only those columns in query and schema area.

 

Warm Regards,

 

Nikhil Thampi

 

     

Five Stars

Re: Batch processing large set of MySQL Data

I have implemented this in tmap component. Has you have mentioned to use it in the query it self, shall i remove tmap component and use it in query component only?
Kindly oblige.

Five Stars

Re: Batch processing large set of MySQL Data

and we need all the data to load as we are writing into excel
Employee

Re: Batch processing large set of MySQL Data

Hi,

 

    In your previous post, you had mentioned that you are planning to use selected columns.

 

"First i'm reading records from MySQL and mapping using tMap to get desired columns (say i'm considering 50 columns out of 200)"

 

     If you do not need all the columns, please do not fetch them from DB at the beginning itself. It will automatically reduce the size of the result set and there by performance also.

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: Batch processing large set of MySQL Data

Hi ,
Can you please explain me briefly about tparallelize component and how can i apply this in my scenario.

Thanks and Regards
Employee

Re: Batch processing large set of MySQL Data

Hi,

 

     You can implement parallelism in multiple methodologies. For example, you can create a job which takes the conditions for SQL where clause as parameters and fetch the result to output file.  

 

     Once you create a job like this, you can call multiple instances of same job from a parent job by using tparallelize component. 

 

Warm Regards,

 

Nikhil Thampi

Five Stars

Re: Batch processing large set of MySQL Data

Hi,
Thanks for your reply, i'm unable to understand the solution which you have mentioned. Can you please elaborate according to my scenario it would be of great help.

Thanks and Regards.