The Definitive Guide to Data Quality
1. Pass the input row to tJavaRow with the following code to remove all non-space non-alpha characters and convert to lower-case:
output_row.ColumnName = input_row.ColumnName.replaceAll("|\\d","").toLowerCase();
2. Then use tNormalize to convert it to one row for each word.
3. Then use tAggregateRow to group by and count the words.
tAggregateRow doesn't like counting strings
Would you have any tips or ideas for an equally fantastic solution to filter out words I've compiled into a DB table
Watch the recorded webinar!
Introduction to Talend Open Studio for Data Integration.
Test drive Talend's enterprise products.
Practical steps to developing your data integration strategy.