Four Stars

Google Cloud Storage File Streaming

Hello,

I have a bunch of files in Google Cloud Storage that I'd like to stream into my job.  Is it possible to stream in the byte array data for each file?

 

Currently I'm using tGSGet to download the files to my local machine first and then looping through them to get byte array data. Ideally, I'd be able to loop through the files without downloading them first.

2 REPLIES
Moderator

Re: Google Cloud Storage File Streaming

Hello,

Currently, you have to download the files to your local machine when use tGSGet component. It retrieves objects which match the specified criteria from Google Cloud Storage and outputs them to a local directory.

A new feature of Google Cloud Storage in Apache Spark Batch/Streaming is available in V 6.4.

Please have a look at this online document about:TalendHelpCenter:tGSConfiguration properties for Apache Spark Streaming

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: Google Cloud Storage File Streaming

I am using a BigQuery query to get a list of bucket/key values. I want to iterate on those to pull files.

tBigQueryInput --> tflowtoiterate -->tgsget

 

If I hard code Bucket name: "mybucket" and Keyprefix: "myfolder/mydocument" it works as expected and downloads the file to my local machine.

 

However, toflowtoiterate saves my bucket and key to global variables first. When I use those global variables in tGSget it gives me a null error. 

 

How can I get this to work?