One Star

Slow tFilelist?

Hello,
I have a job which needs the latest file from a directory with a lot of files ( over 56.000 with about 700 with the same filemask that I am searching for ).
The file I need is searchable and contains a datetimestamp in the file ( but not always from today or yesterday ).
On a local disk it runs adequate ( it finds the file in about 2 sec ) but if i try it on a windows share which has the files it takes over 40 minutes. What's wrong with it.
The filename I'm searching for is : "test." + context.customernumber + "*.txt"
with the settings sorted by date desc and then a iterate to a tjava which sets a globalvar if it's unset else it does nothing (so this way I get the latest file)
I have tried sorted by date asc and then keeping the last iteration but the time remains the same.
The setup :
Client (which runs Talend) Win7
Server (which has the files on a samba share) windows 2003 server
I am almost desperate enough to create a subjob which gets the complete filelisting unsorted and then sort them in the subjob. But I don't think this is the correct way to go.
8 REPLIES
One Star

Re: Slow tFilelist?

Hi
Welcome to Talend Community!
Could you explain in detail about your job logic?
I need to know what the job will do if there is a latest file. It will copy this file or move it?
Sometimes when Talend job try to handle a file(e.g. Excel) which is opend by other user, the job will wait until the file is not in use.
Or i miss some detail?
Regards,
Pedro
One Star

Re: Slow tFilelist?

Job logic is pretty simple ( in the test-job i created for finding the problem ):
There are some things coming from a context.
tJava_1 :
if context property is null then the filemask should be different from when the context property is filled.
tFileList_1 :
search for all files with the filemask specified in the tJava_1 property (this takes 30 minutes in this example)
tJava_2 :
print the last record found
tFileExist_1 :
the start of the job if there is a last file.
In this example I was searching without a context property so the filemask should be : class.*
The file-specs are :
Total files : 16655
class.* files : 486
I don't see where the 30 minutes goes in this job. So there is no opening / closing of files involved. All the files on the server are closed
One Star

Re: Slow tFilelist?

Ok, some further investigation revealed that tFilelist with sort-options set is terribly slow.
It's about 100x faster to build a tfilelist (without sorting) -> tfileinfo -> tsortrow than to use the sorting possibiilities on the tfilelist settings.
One Star nc
One Star

Re: Slow tFilelist?

tFileList seems to have some problem when working with network paths... I have a directory containing about 2k files and tFileList freezes in spite of the very good latency time of the connection... I suppose it is a bug?
One Star

Re: Slow tFilelist?

@nc : What kind of network connection are you using? I was using windows-UNC paths (so I guess it uses the SMB-components).
If you are using FTP or some other network connection the problem may be somewhere else...
One Star nc
One Star

Re: Slow tFilelist?

I'm using a standard windows UNC path as of "\\serverName.domainName.local\sharedDirectory". When I open the UNC path in the windows explorer I see the list of files in a flash and I'm able to walk in each directory without any delay... In spite of the above, when I try to print the directory list with a simple job as of "tFileList->tLogRow" I have to wait many minutes...
One Star nc
One Star

Re: Slow tFilelist?

Just one note: the "order by" and "order action" setting are left on their default value.
I didn't reported well the simple job to test the behavior: it's "tFileList->tIterateToFlow->tLogRow".
Thanks,
N.
Six Stars dgm
Six Stars

Re: Slow tFilelist?

It's faster to list files using system command than using tfilelist.