Fasta file in Talend

Highlighted
Four Stars

Fasta file in Talend

Dear Talend, I am having a problem to read a fasta file from talend.
I am still new to talend open studio for big data.


The Fasta file is as Follows:

>FAM138A ENST00000417324 1:35138-35736(-)
atgctgctgactatagagacaaagtctcactatgttgctcaggctggtcttgaactcctggcctcaagcgatcctcccac
ctcagcctcccaaagtgttgggattatagacatgagccactgcacctggccgaccttgggcaagttcttaaacccttcaa
agcctcatttttctccaatcacaaaagggaaagatggtaatattttccccaccaaattcttgtcggatgccctcacagaa
ttgagattatgtacgtaa
>ENSG00000197490 ENST00000359752 1:37397-54936(+)
atgttgctcaccttatgggcagggtctcactatgttgctgaggctggtctcaaactcctgacctcaagcaatctgtctgc
ttcagcctcccaagtagctgagaatacagggacaagccattgcacctga

 

I have  tried to use several input components like tFileInputDelimited, tFileInputMSDelimited and so on but i dont know a standard way to read the fasta file from talend.
I have also tried to used some process component like tMap, tJavaRow and tJavaFlex. But i could not get the output i want.

 

My objective is to extract each information from the fasta file and store it in a csv file.

Can someone help me, i am stuck with that for more than 2 weeks.


The output should be as followed:

FAM138A; ENST00000417324 1:35138-35736(-); atgctgctgactatagagacaaagtctcactatgttgctcaggctggtcttgaactcctggcctcaagcgatcctcccacctcagcctcccaaagtgttgggattatagacatgagccactgcacctggccgaccttgggcaagttcttaaacccttcaaagcctcatttttctccaatcacaaaagggaaagatggtaatattttccccaccaaattcttgtcggatgccctcacagaattgagattatgtacgtaa

 

FAM138A;ENST00000417324;1:35138-35736(-);atgctgctgactatagagacaaagtctcactatgttgctcaggctggtcttgaactcctggcctcaagcgatcctcccacctcagcctcccaaagtgttgggattatagacatgagccactgcacctggccgaccttgggcaagttcttaaacccttcaaagcctcatttttctccaatcacaaaagggaaagatggtaatattttccccaccaaattcttgtcggatgccctcacagaattgagattatgtacgtaa

 

 

 

 

Moderator

Re: Fasta file in Talend

Hello,

How did you row separactor and field separator in input component?

From your requirement, we can create an input schema where you can take the row separator and field separator according to your Fasta file in and then use tMap component to pick the desired output columns.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Have you checked out Talend’s 2019 Summer release yet?

Find out about Talend's 2019 Summer release

Blog

Talend Summer 2019 – What’s New?

Talend continues to revolutionize how businesses leverage speed and manage scale

Watch Now

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog