Avro file generation from schema file and data file

Four Stars

Avro file generation from schema file and data file

Hi Team,

 

I am new to talend. I have a requirement to generate AVRO file from Schema and Data are in two separate files. Do we have any feasibility to do that in Talend.

 

Regards

Vinay G

Ten Stars

Re: Avro file generation from schema file and data file

@gudivinay
Out of the box , not within community edition, somebody of Talend might want to answer this, Talend Data Streams (a brand new soluton) has it. Also Parquet.
I would strongly suggest keep Avro schema management outside of Talend and as something you probably want to manage 'independently'. You plan to use mutli schema versions too?

The apache Avro project has the jar files which you need to incorporate/import to have it's functionality.

Keep me posted because Im not sure if I will incorporate it within my ETL jobs in Talend (community edition) , because It's hard to find proper schema management (tools) and intuitive gui... I definitely will when we roll out a data-streaming bus...

Any experience on this and love to share?
Four Stars

Re: Avro file generation from schema file and data file

@Dijke

Thanks for your response. I have used another route to do it. By creating a java class which will convert my data file to Json and used tLibraryLoad to Avro jar and called json to avro conversion method from the jar. Its a lengthy process but tried in that way.

Ten Stars

Re: Avro file generation from schema file and data file

@gudivinay Thanks for the update! Are you happy you managed it?
Did you als tried the cmdline tools by Avro? no programming needed tot get your avro files, json in and avro out. tSystem component.

Could you eloborate a little more on your usecase for Avro? What architecture/platforms are depending on Talend in/out or processed?

 

As i was facing something different but its schema/metadata json related: I've setup a nodejs which contains my schema's in JSON and is able to output JSON, String, XML and bytes. Depending on the job I retrieve all the info I need and everybody can access this information, just by the url.

Node setup: jsonpath, jsonxml, express, glob

Next is to have the nodejs work with avro conversion.

I only want to implement this feature when I need serialization and/or multischemaversions.

 

 

Four Stars

Re: Avro file generation from schema file and data file

@Dijke, Sorry for the delay in Response.

 

Yes, I am using command line tool also to do that. Here what I have done as high level.

 

Created Java Class to convert my data file to Json.

Invoking shell script to generate the avro file using data(Json file) and Schema file.

 

Let me know if any clarifications needed.

Six Stars

Re: Avro file generation from schema file and data file

We've done something similar. We use the Confluent Schema Registry to keep our schemas. At the beginning of each job, a tJava component retrieves the schema from the registry, parses it, and stuffs it into a context variable. We then have a routine that takes a parsed schema and a JSON string and returns a byte array. We have a second routine that takes a parsed schema and a byte array and returns a JSON string.