We get this question/feedback a lot: why does the tool limit to the top 30,000 rows, or: 30,000 is not enough.
Data Prep Free Desktop loads the entire dataset in memory. 30K is not a hard limit, just a safeguard to stay beyond acceptable response times for the average hardware. As more high-end hardware can handle more rows, and because 30K may be too little for a file or too many for another, in an upcoming upgrade there will be a UI control to let you increase this limit as you see fit.
In the meantime you can play trial & error by changing this arbitrary limit in a config file located here on Windows: \config\application.properties. Just edit the number in your favorite text editor. Sorry Apple users (including yours truly) the similar file on OS X is not as easily editable.
The commercial add-on due in June will feature more sophisticated techniques and scale with large files.
Downloaded Data Prep and installed. Install went well. First Excel file was 77,000 rows. Loaded 30,000. Found this message and implemented changes as 300,000 then at 3,000,000. Data Prep shows 30,000/30,000 in the top corner. Using Data Prep version 1.3.0. Tried closing all browsers, deleting the loaded files and reloading, deleting the preparation. At a loss as to what to try next.
Restarted computer and the issue was resolved. Appears there is a background process that reads the config file once per computer start.
Hi Kyle, I am also stuck with same issue. We have configured a job in talend studio which passes csv file that holds more than 70000 records using tFileInputDelimited component to tDataPrep component, But it goes in infinite loop and even with blank recipe and nothing is coming as an output. Is there any workaround to divide csv file in multiple files?
What is the cost of the commercial version of Data Preparation and Data Quality tools. How do I get the commercial version.
If you have a matrix for pricing & features for all your tools it would help
Could you please send an email to firstname.lastname@example.org with your requirement? Our colleagues from sale team will assistant you to optimize product pricing.
Feel free to let us know if it is Ok with you.
I'm trying to use data preparation free version 2.1.1.
I'd like to unlock the upper limit of the dataset records by modifying application.properties.
But the upper limit was not changed even by reboot Windows 7.
If you have any other way to change it, please let me know.
This setting is the only way to change the limit. It indeed needs a Data Prep restart to be taken into account, but it also applies only to the new datasets added after the limit has been changed. It is not retroactive.
Thank you for reply.
I understood it needs to restart the app after modifying the setting file.
I already did it and I also restarted PC because it might run as background task.
Of course, I uploaded the dataset with around 50K records after reboot.
But nothing changed...
That is odd - nobody ever faced the issue.
Can you confirm that:
And just in case you installed in the default folder: you need administrator rights to update the configuration file. Can you re-open the file and check that you do see 50000 and not 30000?
Apologies if the questions seem basic/dumb, but I'd rather rule out any obvious cause before looking at something more fishy.
As mentioned in the first post, the file name is application.properties. It is located in the config folder in the Data Preparation installation folder. If you used the default installation path, then the path is C:\Program Files (x86)\Talend\Talend Data Preparation Free Desktop 2.1\Talend-DataPreparation-Free-Desktop-windows-2.1.1\config\application.properties
I ran into some of the issue discussed above.
Had to restart the service using:
Program Files (x86)\Talend\Talend Data Preparation Free Desktop 2.5\Talend-DataPreparation-Free-Desktop-windows-2.5.1/stop.bat
And running stop.bat as an admin, that did the trick.
I set my limit to 5,000,000 and was unable to load a file with 3.2M records.
Then failed again with a 250k record.
It either crashes or reloads the Data Sets screen without my file in it.
Has anyone hit a similar limit or run into these issues?
Must say, as a (currently) non-user I'm still not clear:
How does one process a file that is, for example 500,000 rows? I take it this simply is not possible at all with Data Prep free version?
I don't understand how the paid version works: presumably the 10,000 row limit is a 'working' limit but the whole file is still processed at the end? Or not?
And if not, how exactly is one supposed to process large files?
Thanks in advance,
Processing more than 30,000 rows is not possible with Data Prep free version without changing the parameter define above.
With the paid version, this is how it works:
Hope it helps
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend
Test drive Talend's enterprise products.