Four Stars

Avro file generation from schema file and data file

Hi Team,

 

I am new to talend. I have a requirement to generate AVRO file from Schema and Data are in two separate files. Do we have any feasibility to do that in Talend.

 

Regards

Vinay G

3 REPLIES
Eight Stars

Re: Avro file generation from schema file and data file

@gudivinay
Out of the box , not within community edition, somebody of Talend might want to answer this, Talend Data Streams (a brand new soluton) has it. Also Parquet.
I would strongly suggest keep Avro schema management outside of Talend and as something you probably want to manage 'independently'. You plan to use mutli schema versions too?

The apache Avro project has the jar files which you need to incorporate/import to have it's functionality.

Keep me posted because Im not sure if I will incorporate it within my ETL jobs in Talend (community edition) , because It's hard to find proper schema management (tools) and intuitive gui... I definitely will when we roll out a data-streaming bus...

Any experience on this and love to share?
Four Stars

Re: Avro file generation from schema file and data file

@Dijke

Thanks for your response. I have used another route to do it. By creating a java class which will convert my data file to Json and used tLibraryLoad to Avro jar and called json to avro conversion method from the jar. Its a lengthy process but tried in that way.

Eight Stars

Re: Avro file generation from schema file and data file

@gudivinay Thanks for the update! Are you happy you managed it?
Did you als tried the cmdline tools by Avro? no programming needed tot get your avro files, json in and avro out. tSystem component.

Could you eloborate a little more on your usecase for Avro? What architecture/platforms are depending on Talend in/out or processed?

 

As i was facing something different but its schema/metadata json related: I've setup a nodejs which contains my schema's in JSON and is able to output JSON, String, XML and bytes. Depending on the job I retrieve all the info I need and everybody can access this information, just by the url.

Node setup: jsonpath, jsonxml, express, glob

Next is to have the nodejs work with avro conversion.

I only want to implement this feature when I need serialization and/or multischemaversions.