Fasta file in Talend

Four Stars

Fasta file in Talend

Dear Talend, I am having a problem to read a fasta file from talend.
I am still new to talend open studio for big data.


The Fasta file is as Follows:

>FAM138A ENST00000417324 1:35138-35736(-)
atgctgctgactatagagacaaagtctcactatgttgctcaggctggtcttgaactcctggcctcaagcgatcctcccac
ctcagcctcccaaagtgttgggattatagacatgagccactgcacctggccgaccttgggcaagttcttaaacccttcaa
agcctcatttttctccaatcacaaaagggaaagatggtaatattttccccaccaaattcttgtcggatgccctcacagaa
ttgagattatgtacgtaa
>ENSG00000197490 ENST00000359752 1:37397-54936(+)
atgttgctcaccttatgggcagggtctcactatgttgctgaggctggtctcaaactcctgacctcaagcaatctgtctgc
ttcagcctcccaagtagctgagaatacagggacaagccattgcacctga

 

I have  tried to use several input components like tFileInputDelimited, tFileInputMSDelimited and so on but i dont know a standard way to read the fasta file from talend.
I have also tried to used some process component like tMap, tJavaRow and tJavaFlex. But i could not get the output i want.

 

My objective is to extract each information from the fasta file and store it in a csv file.

Can someone help me, i am stuck with that for more than 2 weeks.


The output should be as followed:

FAM138A; ENST00000417324 1:35138-35736(-); atgctgctgactatagagacaaagtctcactatgttgctcaggctggtcttgaactcctggcctcaagcgatcctcccacctcagcctcccaaagtgttgggattatagacatgagccactgcacctggccgaccttgggcaagttcttaaacccttcaaagcctcatttttctccaatcacaaaagggaaagatggtaatattttccccaccaaattcttgtcggatgccctcacagaattgagattatgtacgtaa

 

FAM138A;ENST00000417324;1:35138-35736(-);atgctgctgactatagagacaaagtctcactatgttgctcaggctggtcttgaactcctggcctcaagcgatcctcccacctcagcctcccaaagtgttgggattatagacatgagccactgcacctggccgaccttgggcaagttcttaaacccttcaaagcctcatttttctccaatcacaaaagggaaagatggtaatattttccccaccaaattcttgtcggatgccctcacagaattgagattatgtacgtaa

 

 

 

 

Moderator

Re: Fasta file in Talend

Hello,

How did you row separactor and field separator in input component?

From your requirement, we can create an input schema where you can take the row separator and field separator according to your Fasta file in and then use tMap component to pick the desired output columns.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.