how to filter out columns with certain strings in the column name?

One Star

how to filter out columns with certain strings in the column name?

I've set up a File delimited Metadata, and I want to be able to automatically filter out any columns that have certain strings in the name, specifically "schedule" or "hist." And I cannot, for the life of me, figure out how to do this.
Its likely that the columns in this metadata will change, and I don't want to have to manually go in and remove these columns any time the metadata is updated.
I tried to use tFilterColumns to do this (since tFilterRow works for row filtering) but it just seems like a simpler version of tMap to me.

One Star

Re:how to filter out columns with certain strings in the column name?

Hello,
Could you please be more precise
Maybe I did not understand your problem well but... try this :
You can use a tFilterRow with advanced condition.
The condition would be :

MyField.indexOf("MyString") != -1


If this condition is true, your string contains searched text
Please ask if you want a more detailed solution

One Star

Re:how to filter out columns with certain strings in the column name?

ok, the delimited file I'm importing has a bunch of columns in it, some of these columns contain a lot of schedule or historical data and we don't want to import those columns daily. so I wanted to know if there was a way to filter out columns, by setting a filter on their names, like you can filter out rows with tFilterRow (pic #3)
Ok, so you can see from the attached picture (pic #1) that the file has a lot of columns (thats only a couple of the several dozen columns). Some of these columns have column names that include the words schedule or hist (like the two circled in red). I want to be able to set a filter that says to not import those columns, with those words in the column name, into the database
And I want to be able to set up a filter rule to do this, like you can set up rules to filter out rows in tFilterRow, but for columns

One Star

Re:how to filter out columns with certain strings in the column name?

You can use a tFilterRow with advanced condition.
The condition would be :
MyField.indexOf("MyString") != -1

If this condition is true, your string contains searched text


I don't think thats quite going to work... because its not the rows I want to filter out but some of the actual columns

One Star

Re:how to filter out columns with certain strings in the column name?

Have you tried the tFilterColumns ?
It matches and input flow against a destination and only copies over columns having the same name.

One Star

Re: how to filter out columns with certain strings in the column name?

Yea I looked at tFilterColumns, but thats still manual. And this file will be changing.

Basically, to explain the file, we request a list of columns/information from Bloomberg nightly.
This list of columns is subject to change based on any requirements changes of other departments in our company, if they need different/more information. Because these column changes come from outside our group, we cannot predict how often we will be asked to change the column list.
Also, the program that sends out the requests to Bloomberg wouldn't just append the new column to the end of the list of columns already being requested. The new column will go into the list, of requested columns, based on whatever sorting mechanism that program uses.

So, not only will the columns be subject to change but their order will be as well - which would make it very inconvenient to have to manually filter out the columns we don't want every time the file is changed.
And, to compound this problem, we have 10 request files (each with different lists of columns) we send/get to/from Bloomberg every night. So, if I was only going to have to manually remove these columns for one file, that wouldn't be fun - but having to do it for 10 files is just not productive.

One Star

Re: how to filter out columns with certain strings in the column name?

From the job flow you posted above - your tFilterRow is conducting a basic validity check (CUSIP?).
Then that output will either go to rejects or the good DB.
Will the schema of the destination table (tMSSqlOutput2) always be the same?
Or will it vary based upon the incoming text file?

One Star

Re: how to filter out columns with certain strings in the column name?

based upon the incoming text file

One Star

Re: how to filter out columns with certain strings in the column name?

Sorry if I'm being dense here - but Talend cannot handle dynamic / run-time schema changes in the Database Input/Output components.
So the only way this job you posted MIGHT work would be if the destination table in your tMSSqlOutput2 always had the exact same schema.
In other words, if you have x input files (let's say 4) with a common set of columns among them and that common set of columns would be output to ONE MSSQL table. Something like...
INCOMING FILE 1
============
CUSIP | HIGH | LOW | CLOSE | VOL | HIST_PRICE | HIST_VOL
INCOMING FILE 2
============
CUSIP | TICKER | HIGH | LOW | CLOSE | VOL |
INCOMING FILE 3
============
CUSIP | TICKER | HIGH | LOW | CLOSE | VOL | SCHEDULE_G
INCOMING FILE 4
===========
CUSIP | TICKER | HIGH | LOW | CLOSE | VOL | COMPANY_NAME | SCHEDULE_G | HIST_VOL

OUTPUT DATABASE TABLE
=================
CUSIP | HIGH | LOW | CLOSE | VOL |

In such a case, you could define a tFilterColumns with a schema matching that of the Output Database Table and only those columns having matching names would be imported into the Database table.

One Star

Re:how to filter out columns with certain strings in the column name?

Talend cannot handle dynamic / run-time schema changes in the Database Input/Output components.


I guess that answers my ?

So the only way this job you posted MIGHT work would be if the destination table in your tMSSqlOutput2 always had the exact same schema.
In other words, if you have x input files (let's say 4) with a common set of columns among them and that common set of columns would be output to ONE MSSQL table. Something like...
INCOMING FILE 1
============
CUSIP | HIGH | LOW | CLOSE | VOL | HIST_PRICE | HIST_VOL
INCOMING FILE 2
============
CUSIP | TICKER | HIGH | LOW | CLOSE | VOL |
INCOMING FILE 3
============
CUSIP | TICKER | HIGH | LOW | CLOSE | VOL | SCHEDULE_G
INCOMING FILE 4
===========
CUSIP | TICKER | HIGH | LOW | CLOSE | VOL | COMPANY_NAME | SCHEDULE_G | HIST_VOL

OUTPUT DATABASE TABLE
=================
CUSIP | HIGH | LOW | CLOSE | VOL |

In such a case, you could define a tFilterColumns with a schema matching that of the Output Database Table and only those columns having matching names would be imported into the Database table.


I guess that answers my ?
No I want those 10 different files to go to 10 different tables, and I want each of those tables to be in the same format as the file it is based off of (just without any columns that have HIST or SCHEDULE in the name, without having to manually remove those columns)

One Star

Re: how to filter out columns with certain strings in the column name?

I surely don't know of any way to accomplish your task in the present environment. But, then again, I'm no "guru" when it comes to Talend. I've only been using it for a couple months. Perhaps there is a way - but it is beyond my means at present. Maybe one of the folks from Talend could offer some sort of solution. Sorry.

One Star

Re:] how to filter out columns with certain strings in the column name?

I am currently working on an ETL solution using Talend
The aim of this solution is to load a custom csv file (with randomed columns) into a Talend job with SQL database output (always the same table).
SMaz is right, Talend cannot load dynamic metadata while processing the job.
To go round this problem, I found this solution :
- First, you have to make sure that Talend is processing a file with a fixed metadata schema.
In your case, it could be all possible columns
- A custom application (web UI with ASP .NET in my case) load the input file and allow the user to manually map the input fields with the fixed output fields. At the end of this operation, a csv file is generated. This file contains always the same metadata schema with mapped data. This file is goeing to be treated by Talend job
In your case, this custom app could analyse the input file headers and map the input columns functions of your input columns name without any user intervention
Maybe it doesn't fit to you, but with this solution I can allow a user to load a document with randow columns into a efficient Talend job doeing many things.
I don't know if there is a more simple way to do this...

One Star

Re:how to filter out columns with certain strings in the column name?

thats a possibility, but the # of possible columns is kind of enormous, 11648 at this point in time, and is subject to change at the data provider's whim
Smiley Sad

One Star

Re:how to filter out columns with certain strings in the column name?

Smiley Surprised
If you can get a database containing all possible columns list, you may write a program to generate the file and to enhance it when database is growing...
But it'll be a quite complex solution... hope for you that TOS's team will find something to help better ^^

One Star

Re:how to filter out columns with certain strings in the column name?

I have a similar question. Is there any way to obtain a list or other enumeration of the names of the columns that are being provided in the result set?
For example, take the simple tRowGenerator component. I use it to generate 30 rows with a simple schema of 5 column. The rows coming from it are called row_1 and are piped into a tJavaFlex component that reads them. The tJavaFlex component uses hard-code references such as row_1.columnA and row_1.columnB. What I need is run-time access to the schema of row_1. Using a runtime argument, I need to determine whether or not to send the column values to the log.
Any suggestions?

Community Manager

Re:how to filter out columns with certain strings in the column name?

Hi mquinz
This topic is rather old, can you please report a new topic for your question and give us an example to explain your request.
Thank you!
Shong

----------------------------------------------------------
Talend | Data Agility for Modern Business