I have a bunch of files in Google Cloud Storage that I'd like to stream into my job. Is it possible to stream in the byte array data for each file?
Currently I'm using tGSGet to download the files to my local machine first and then looping through them to get byte array data. Ideally, I'd be able to loop through the files without downloading them first.
Currently, you have to download the files to your local machine when use tGSGet component. It retrieves objects which match the specified criteria from Google Cloud Storage and outputs them to a local directory.
A new feature of Google Cloud Storage in Apache Spark Batch/Streaming is available in V 6.4.
Please have a look at this online document about:TalendHelpCenter:tGSConfiguration properties for Apache Spark Streaming
I am using a BigQuery query to get a list of bucket/key values. I want to iterate on those to pull files.
tBigQueryInput --> tflowtoiterate -->tgsget
If I hard code Bucket name: "mybucket" and Keyprefix: "myfolder/mydocument" it works as expected and downloads the file to my local machine.
However, toflowtoiterate saves my bucket and key to global variables first. When I use those global variables in tGSget it gives me a null error.
How can I get this to work?