Converting RTF to plain text using Talend 4.0.0

One Star

Converting RTF to plain text using Talend 4.0.0

Hi,
I have a requirement of converting data stored in RTF format to plain text using Talend 4.0.0. The source may be both SQL Server 2005 and Oracle 10g. Target database is SQL Server 2005. The data stored in the source database is of text datatype and is as follows:
{\rtf1\ansi\deff0\deftab720{\fonttbl{\f0\fswiss MS Sans Serif;}{\f1\froman\fcharset2 Symbol;}{\f2\fswiss MS Sans Serif;}} {\colortbl\red0\green0\blue0;} \deflang4105\pard\plain\f2\fs17 Sections 1.22 and 17 \par The Tenant is to pay a fixed rate for CAM, initially to be $10.40 psf per annum for the calendar year 2001, to be increased each subsequent year by 5% per annum. \par \par Section 17 \par CAM costs shall include the cost of equipping, policing, protecting, lighting, cleaning, maintenance, management, operation, repair and replacement of the Common Areas. \par \par Section 49 \par Tenant has the right to object within one year, otherwise any statement, invoice or billing shall be presumed correst and Tenant must pay. \par \par }
I have to extract only the text part from it as output:
Sections 1.22 and 17
The Tenant is to pay a fixed rate for CAM, initially to be $10.40 psf per annum for the calendar year 2001, to be increased each subsequent year by 5% per annum.
Section 17
CAM costs shall include the cost of equipping, policing, protecting, lighting, cleaning, maintenance, management, operation, repair and replacement of the Common Areas.
Section 49
Tenant has the right to object within one year, otherwise any statement, invoice or billing shall be presumed correst and Tenant must pay.
Is there any component that I can use, or maybe a piece of java code that works on 4.0.0 version of Talend? Please help!
Community Manager

Re: Converting RTF to plain text using Talend 4.0.0

Hello
Read the source records as a string, and then use tExtractRegexField to extract fields with regex. eg:
tMssqlInput--row1--tExtractRegexField --tMssqlOutput
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Converting RTF to plain text using Talend 4.0.0

Hi , i am stuck with the same problem, can you please expalin how can we use the textractRegex component, what should be the regex pattern string?
One Star

Re: Converting RTF to plain text using Talend 4.0.0

Hi, please update, what should be the REGEX string , i tried a few stings from the web, but they are not foolproof and fails for a few input rtf strings.
(reference: https://tinyurl.com/rtf-txt)
Community Manager

Re: Converting RTF to plain text using Talend 4.0.0

Hi Shree
First of all, I would suggest you to learn the basic usage of tExtractRegexField component. If you have trouble to extract the value with regex from the input string, please give us an example of your data, and what are your expected output?
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Converting RTF to plain text using Talend 4.0.0

Hi all,
I am looking for the regex also in order to convert string RTF to string plain text with tExtractRgexFields. Did you find it?
Thank you,