1. Pass the input row to tJavaRow with the following code to remove all non-space non-alpha characters and convert to lower-case:
output_row.ColumnName = input_row.ColumnName.replaceAll("|\\d","").toLowerCase();
2. Then use tNormalize to convert it to one row for each word.
3. Then use tAggregateRow to group by and count the words.
tAggregateRow doesn't like counting strings
Would you have any tips or ideas for an equally fantastic solution to filter out words I've compiled into a DB table