Six Stars

DataPrep Row limit 10.000 - Impact of changing dataset.records.limit

Good morning community,

 

a customer asked why there is a row limit of 10.000 rows for a DataSet in Talend Data Preparation (6.3.1). I have found the setting dataset.records.limit in application.properties but I'm wondering what the impact is when we increase that limit to 100.000 rows. What is the expected amount of memory necessary for 100.000 rows for DataPrep?

 

Thanks in advance!

 

Best regards,

MK

1 ACCEPTED SOLUTION

Accepted Solutions
Community Manager

Re: DataPrep Row limit 10.000 - Impact of changing dataset.records.limit

Hi Mirco,

 

The sample size is set to 10.000 rows by default so that we keep the UI fairly responsive in all situations. The more you will increase the value, the less responsive the UI will become. I can hardly come up with accurate guidelines on what would be a "good" value: it depends on many factors, such as:

  • The network bandwidth and latency
  • The browser used (IE and Edge are notoriously slower than Firefox, itself slightly slower than Chrome)
  • The dataset content itself. It is not a matter of rows, it is also a matter of number of columns. The number of columns actually has a stronger impact on performances than the number of rows, considering that we profile data per column.

Back to your question on the expected amount of memory required, you can see how that depends on the number of columns, in addition to the number of rows in the sample.

 

So the simplest way to go is to give it a try with 100.000 rows and see how the product behaves.

 

Additional points to consider:

  • The setting only applies to new datasets, it is not retroactive.
  • The setting applies to all datasets, you cannot configure the sample size per dataset (yet).

 

Regards,

 

Gwendal

2 REPLIES
Community Manager

Re: DataPrep Row limit 10.000 - Impact of changing dataset.records.limit

Hi Mirco,

 

The sample size is set to 10.000 rows by default so that we keep the UI fairly responsive in all situations. The more you will increase the value, the less responsive the UI will become. I can hardly come up with accurate guidelines on what would be a "good" value: it depends on many factors, such as:

  • The network bandwidth and latency
  • The browser used (IE and Edge are notoriously slower than Firefox, itself slightly slower than Chrome)
  • The dataset content itself. It is not a matter of rows, it is also a matter of number of columns. The number of columns actually has a stronger impact on performances than the number of rows, considering that we profile data per column.

Back to your question on the expected amount of memory required, you can see how that depends on the number of columns, in addition to the number of rows in the sample.

 

So the simplest way to go is to give it a try with 100.000 rows and see how the product behaves.

 

Additional points to consider:

  • The setting only applies to new datasets, it is not retroactive.
  • The setting applies to all datasets, you cannot configure the sample size per dataset (yet).

 

Regards,

 

Gwendal

Six Stars

Re: DataPrep Row limit 10.000 - Impact of changing dataset.records.limit

Hi Gwendal,

 

thanks for the clear and fast response! We will give it a try with 100.000 rows!

 

Best regards,

MK