Four Stars

Google Cloud Storage File Streaming


I have a bunch of files in Google Cloud Storage that I'd like to stream into my job.  Is it possible to stream in the byte array data for each file?


Currently I'm using tGSGet to download the files to my local machine first and then looping through them to get byte array data. Ideally, I'd be able to loop through the files without downloading them first.


Re: Google Cloud Storage File Streaming


Currently, you have to download the files to your local machine when use tGSGet component. It retrieves objects which match the specified criteria from Google Cloud Storage and outputs them to a local directory.

A new feature of Google Cloud Storage in Apache Spark Batch/Streaming is available in V 6.4.

Please have a look at this online document about:TalendHelpCenter:tGSConfiguration properties for Apache Spark Streaming

Best regards



Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: Google Cloud Storage File Streaming

I am using a BigQuery query to get a list of bucket/key values. I want to iterate on those to pull files.

tBigQueryInput --> tflowtoiterate -->tgsget


If I hard code Bucket name: "mybucket" and Keyprefix: "myfolder/mydocument" it works as expected and downloads the file to my local machine.


However, toflowtoiterate saves my bucket and key to global variables first. When I use those global variables in tGSget it gives me a null error. 


How can I get this to work?