NOTE: Talend Streams has been replaced with Pipeline Designer. More information about this new Talend application in Pipeline Designer introduction
I try to create a connection in DataStram but I'm not able to have successfull connection.
How fill Service Account File & Temp Location ?
I try a link for my p12 file & gs bucket for temp location but always "Internal Error" when I check.
Thanks for using Talend Data Streams and welcome to our community!
To be able to use our Google BigQuery connector, you will have to connect to the EC2 instance using SSH (https://help.talend.com/reader/U~NvWT4juBbI~~2xEeN56g/uATvEWVxSYcttokqpc1MsA) and upload your service_account.json (https://cloud.google.com/compute/docs/access/service-accounts) into /opt/data-streams/data/extras
Once the service_account.json file has been uploaded to the AMI, you need to point to /opt/data-streams/data/extras/service_account.json in the Google BigQuery connector parameters.
The temporary storage should be a path to one of your GS buckets that will be used for temporary files when submitting the jobs.
it's work, It's wasn't clear that temp would be a cloud storage, Before I was tried with /tmp/ in EC2 instance
It's better, I have avro file now.
Preview doesn't work and it's complicated for manipulate structure
I save data my file in S3, why I Can't choose the destination file's name ? Path in dataset seems to be a directory
At this time, Preview in pipeline works after few try
I have an other issue, data preview was not propagate.
If I had column in pyhon, I can't manipulate this (in aggregate for exemple)
Can you explain more " I recommend getting the sample explicitly from the dataset form", I can define dateset structure explicitely ?
To modify data, it's only with Python Code ? You support only Apache Beam Python SDK, not Java ?
I'll go to test other connector, if necessary I can help you with Google Cloud.
Thanks for the detail in your response! It helps quite a bit in debugging.
It looks like you have your PythonRow set to FLATMAP, when you really want MAP. Try changing it and seeing if it fills out your preview better! If you use FLATMAP without adding records to outputList in your user-defined code, it will just filter all of the inputs.
As a reminder: for FLATMAP, set the python variable outputList to an array containing 0..n output records (aka python object) for each input.
For MAP, set the python variable output in your user-defined code, and it should be exactly 1 output record per input.
For your questions, we're using the Java SDK and the PythonRow is implemented with Jython. Upcoming features in Beam should help mix languages in the same pipeline, but it's a long-term feature not yet available in Beam.
"I recommend getting the sample explicitly from the dataset form" --> My apologies, I just meant clicking the Get Sample button manually after a dataset change to make sure that the sample has been correctly retrieved. You can't set a schema in the dataset, except for those datasets that specify their query.
Finally, when you create a new column in PythonRow, it won't (yet) show up in the autocomplete box in the next Aggregate. You can still use it, but you have to enter it manually!
Please be assured that we're paying attention to questions and feedback -- the schema autocomplete should be fixed in the next release, and we've already opened a discussion for improving the user experience of PythonRow. I haven't had a change to look at your previous question about S3.
Watch the recorded webinar!
Learn how and why companies are moving to the Cloud
Accelerate your data lake projects with an agile approach
Create systems and workflow to manage clean data ingestion and data transformation.