I'm currently using Talend Open Studio Data Integration 5.5.1. I've created a job that first creates an S3 connection using tS3Connection. Then, I'm using tS3List to list all of the objects in a specific bucket. The tS3List component returns exactly 43,000 objects. But, based on the list of objects returned, I believe it's cutting off the results. When I look at the actual data returned, I see that all of the objects in the list are coming back in alphabetical order and the object list does indeed seem to cut off with objects starting with a W. Although I can't get an exact count, I believe this bucket has between 43,000 and 44,000 objects. I know that Amazon's S3 APIs return 1000 objects per set and the tS3List component is able to page through each set of results. However, I think what's happening is that it's not returning the last page of objects, which contains less than 1000 objects. Is this a known bug, or is there a configuration item that i'm not setting to deal with this? thanks!
This job is just a test job, to see how these S3 related components work. I had previously been using some components that are on Talend Exchange. So, I wasn't too concerned about using tRowGenerator vs tIterateToFlow at this time. However, here's a screenshot of the same job using tIterateToFlow. I have the same issue, the tS3List component is returning exactly 43,000 objects. I can also say that we've added files to this bucket in the last few days, yet the tS3List component has been consistent in returning exactly 43,000 rows. So, that goes back to my original question. Is the tS3List component not returning the last page of results, if it's less than 1,000 total objects?
Has anyone resolved this? It appears that Amazon S3 limits the "GET" operation of a bucket to 1,000. It's unclear (at least in the link below) if the "max-keys" parameter can be set to an amount greater than 1,000. Another option would be to set a "marker" which would start the next search at the 1,001 object in alphabetical order. Amazon List Size Limit