[resolved] tFilter - test for numeric from tFileInputExcel

One Star

[resolved] tFilter - test for numeric from tFileInputExcel

I would like to filter rows comprised of digits in a column. Screenshots are attached.
The job looks like this:
tFileInputExcel -- tFilterRow -- 2 tLogRows (one MAIN and one REJECTS)
The sample excel input file follows. There are headings between each line of data. (LOAN_ID is defined as a STRING in schemas throughout the job)
LOAN_ID
32498572983
LOAN_ID
2934703202
I'm attempting to duplicate the instructions found here: http://www.talendforge.org/forum/viewtopic.php?id=5158. In this post, there are 2 approaches discussed. One uses CONDITIONS and one uses ADVANCED.
In the first approach (as per the POST), I set up CONDITIONS as:
input column : Loan_ID
function: MATCH
operator: EQUALS
value: "\\d" (as TRIM is enabled in the tFileInputExcel component, I think this could be "^\\d$", but that doesn't work either)
There are no rows accepted to MAIN -- all are rejected, but there are no errors.
In the second approach, I removed the CONDITIONS column/regex and enabled ADVANCED. I tried 2 different statements. The first (from the POST -- screenshot UsingADVANCED1)
input_row.Loan_ID.matches("\\d")
No rows are accepted -- all are rejected and I get a Null pointer exception.
connected
Exception in component tFilterRow_8
java.lang.NullPointerException
at bank_loan.testfilter_0_1.testFilter.tFileInputExcel_3Process(testFilter.java:2622)
at bank_loan.testfilter_0_1.testFilter.runJobInTOS(testFilter.java:5772)
at bank_loan.testfilter_0_1.testFilter.main(testFilter.java:5624)
3000076855||||WOOLLEY||||||||||||||||||||
Loan ID||||Name||||||||||||||||||||
3000076141||||EWASKO||||||||||||||||||||
disconnected
With this statement (screenshot UsingADVANCED2):
java.util.regex.Pattern.matches("^\\d $", input_row.Loan_ID)
I get 2 rows, but also get the Null pointer exception..
connected
Exception in component tFilterRow_8
java.lang.NullPointerException
at java.util.regex.Matcher.getTextLength(Unknown Source)
at java.util.regex.Matcher.reset(Unknown Source)
at java.util.regex.Matcher.<init>(Unknown Source)
at java.util.regex.Pattern.matcher(Unknown Source)
at java.util.regex.Pattern.matches(Unknown Source)
at bank_loan.testfilter_0_1.testFilter.tFileInputExcel_3Process(testFilter.java:2620)
at bank_loan.testfilter_0_1.testFilter.runJobInTOS(testFilter.java:5771)
at bank_loan.testfilter_0_1.testFilter.main(testFilter.java:5623)
Loan ID||||Name||||||||||||||||||||
disconnected
1. I'm missing something from the POST.. what might that be?
2. How can I learn from these null pointer exception messages? Do I then need to step thru the Java code (how?) to find the offending statement or statement that was offended.?
3. What syntax is best for using regexes in TIS?
Thanks
Mark
One Star

Re: [resolved] tFilter - test for numeric from tFileInputExcel

I was able to filter the rows using tJavaRow by changing the datatype of LOAN_ID to long and applying this:
if (input_row.Loan_ID > 0) {
output_row.Loan_ID = input_row.Loan_ID;
} else {
continue;
}

There are 2 issues with this approach:
1) Most of the posts related to using tJavaRow as a filter indicate not to do this.. not much explanation why..
2) It doesn't seem to be possible to have a rejects connection off of tJavaRow.. which is desirable. Again, not sure why.
Really like to understand how to use tFilterRow, so I'm back concentrating on that..
Mark
One Star

Re: [resolved] tFilter - test for numeric from tFileInputExcel

OK, partially resolved.
Using CONDITIONS section of tFilterRow -- my regex really didn't match. Allowing for one or more (+) solved this issue.
Using AVANCED section with either one of these statements: (single, not at the same time)
STMT1: java.util.regex.Pattern.matches("^\\d+$", input_row.Loan_ID)
STMT2: input_row.Loan_ID.matches("^\\d+$")
..still receiving null pointer exception.
Any help in learning now to decipher the error message would be greatly appreciated...
STMT1 ERROR
Exception in component tFilterRow_8
java.lang.NullPointerException
at java.util.regex.Matcher.getTextLength(Unknown Source)
at java.util.regex.Matcher.reset(Unknown Source)
at java.util.regex.Matcher.<init>(Unknown Source)
at java.util.regex.Pattern.matcher(Unknown Source)
at java.util.regex.Pattern.matches(Unknown Source)
at bank_loan.testfilter_0_1.testFilter.tFileInputExcel_3Process(testFilter.java:2860)
at bank_loan.testfilter_0_1.testFilter.runJobInTOS(testFilter.java:6011)
at bank_loan.testfilter_0_1.testFilter.main(testFilter.java:5863)
STMT2 ERROR
Exception in component tFilterRow_8
java.lang.NullPointerException
at bank_loan.testfilter_0_1.testFilter.tFileInputExcel_3Process(testFilter.java:2862)
at bank_loan.testfilter_0_1.testFilter.runJobInTOS(testFilter.java:6012)
at bank_loan.testfilter_0_1.testFilter.main(testFilter.java:5864)
Seven Stars

Re: [resolved] tFilter - test for numeric from tFileInputExcel

It's likely that you are reading an empty cell from the Excel worksheet. An NPE will be thrown if you apply the .matches method to a null value. The best way to verify where exactly the error is occurring is by switching to the generated code (click the Code tab at the bottom left of the job design) and go to the first line of your job referenced in the stack-trace (2860 when you use STMT1 and 2862 when you use STMT2).
If it's appropriate you can use the tFileInputExcel advanced setting "Stop to read on empty row". However, if you need to allow for nulls, the easiest way is to go back to using simple mode of tFilterRow:
input column : Loan_ID
function: MATCH
operator: EQUALS
value: "\\d+"
In advanced mode you would use: input_row.Loan_ID!=null && input_row.Loan_ID.matches("\\d+")
(Note that the ^ and $ boundary markers are not necessary.)
One Star

Re: [resolved] tFilter - test for numeric from tFileInputExcel

Hi Alevy,
Really appreciate the response. Couple of follow-ups:
1) Do I need to "close" this post.. if so, how do I do that?
2) the regex boundary markers are not necessary... is that because there's an auto trim taking place or?
3) which section of the forum (or elsewhere.. ) would be appropriate to advertise for a TIS tutor for hire..? I need a resource that I can ping daily/weekly with a wide range of questions (TIS/TAC/ETL/Best practices..)
Thanks again,
Mark
Seven Stars

Re: [resolved] tFilter - test for numeric from tFileInputExcel

Glad to help.
1) At the bottom right of your initial post in this topic is a "Set this topic as resolved" link.
2) The value to which .matches is applied must match the value within the brackets in its entirety. The start/end of string anchors are effectively redundant. They have more relevance when you use regex find-and-replace i.e. the .replaceAll method.
3) Non technical discussions seems to be the only appropriate place but I suggest you just keep posting questions on the forum. Usually you'll get the help you need if you explain your problem thoroughly Smiley Wink.
One Star

Re: [resolved] tFilter - test for numeric from tFileInputExcel

Hi ,
I am trying to filter rows with no data in certain columns. My input files are csv files which i am mapping to output csv.
But my input file has two columns with missing data. i have been trying to filter those ones without success. When i try to use tfilter it is doing nothing and when i use also "" to identify null values it just puts "" around data in the output file. can someone please help me in filtering.
Seven Stars

Re: [resolved] tFilter - test for numeric from tFileInputExcel

Please start a new topic for a new question.