Problem with tExtractRegexFields component

One Star

Problem with tExtractRegexFields component

Hi,
I use TOS 5.0.1.
I put tExtractRegexFields on a job and for 1 line it split on several lines instead of several fields ...
My regexp : "\"(+?)\"\\s?|(+)\\s?|\\s"
Data to split (3 lines) :
08/01/2012:23:59:59 +0100 192.168.90.17 "" "GET /templates/toto&typeUnivers=R HTTP/1.1" 200 344 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB7.2; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.30729; .NET CLR 3.0.30618; .NET4.0C)]" "http://www.xxx.com/MAG000036.swf" external www.xxx.com "192.168.57.107" 27 "234B2789A03896013D3A12A0652D0DF9" PRODUCTION_EXTERNAL_ORI 192.168.2.230:8080 "." "vide"
09/01/2012:00:00:00 +0100 192.168.250.202 "" "GET /templates/onglets-home-univers.png HTTP/1.1" 200 12102 "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; fr-fr) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4]" "https://www.xxx.com/creationCompteClient" external secure.xxx.com "192.168.250.202" 9 "750BCE11C09241CDADC7C6893E0CD5E9" PRODUCTION_EXTERNAL_SECURE 192.168.242.230:8080 "." "vide"
09/01/2012:00:00:00 +0100 192.168.235.141 "" "GET /repository/Parapharmacie_4.png/image_w397_h320 HTTP/1.1" 200 133367 "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.2; Trident/4.0; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707)]" "http://www.xxx.com/infospratiques" external www.xxx.com "192.168.17.80" 127 "96C3D22C6B031DD66F09D16A8AA529F1" PRODUCTION_EXTERNAL_ORI 192.168.242.230:8080 "." "vide"
It split in 19 fields but i have 19 lines by input line ...
Regards,
One Star

Re: Problem with tExtractRegexFields component

Hi
I hava reproduced your job and i get 27 lines as the following image.
Please show me more details about 'split in 19 fields'.
Regards,
Pedro
One Star

Re: Problem with tExtractRegexFields component

Hi,
Thanks for your answer.
My regex split my line in 19 fields, eg :
line = 08/01/2012:23:59:59 +0100 192.168.90.17 "" "GET /templates/toto&typeUnivers=R HTTP/1.1" 200 344 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB7.2; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.30729; .NET CLR 3.0.30618; .NET4.0C)]" "http://www.xxx.com/MAG000036.swf" external www.xxx.com "192.168.57.107" 27 "234B2789A03896013D3A12A0652D0DF9" PRODUCTION_EXTERNAL_ORI 192.168.2.230:8080 "." "vide"
And after the split i'll do have :
field1=
field2=08/01/2012:23:59:59
Field3=+0100
field4=""
field5="GET /templates/toto&typeUnivers=R HTTP/1.1"
...
field19="vide"
Each line is in CSV format with whitespace delimiter and character " optionnaly enclose string.
Regards,

Calling Talend Open Studio Users

The first 100 community members completing the Open Studio survey win a $10 gift voucher.

Start the survey

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog