Six Stars

know the component

Hi All,


I have a question,

I need to extract data from address table and dump into another similar table wherein address_1 field value has to be split into street_number, house_number and box_number whereas other column values in the table needs to be dumped same.

can you give please share the best way to do it.


  • Data Integration
Six Stars

Re: know the component

The solution depends on the source data. 

tExtractDelimitedFields could do this, but if your data is not delimited (by a space for example) you have to look at the tMap and make some complex mapping to match your needs.


Please read the tExtractDelimitedFields documentation and make some tests as the component is a bit complex to handle.

Six Stars

Re: know the component

Yes the data is not consistent in the column
for example
row1 column data - "Lindenhofstraat 2"
row2 column data - "RUE ADOLPHE RECTEM, 42/01"
row3 column data - "Adolf Saxplein 3-bus 23"
Ten Stars

Re: know the component

This is a BIG problem in data integration and if you can find an answer that suits every scenario, you are a better person that I. However, there are some things to think about that will help you approach this....



First of all (and arguably the biggest thing to consider) do you have a fixed data structure that you are working with or is a human responsible for deciding the structure when it is entered into the system you are working with? Will the house number always precede the street name, for example? Consider the address 1 Liverpool Road, Islington, London, N1 1NL. This could be represented as shown or maybe like this....
1 Liverpool Road, Islington, N1 1NL .....or
1 Liverpool Road, London, N1 1NL .....or

1 Liverpool Road, Islington, Greater London, N1 1NL ......or

1 Liverpool Rd,.......
......see where I am heading? Although to a human eye this is pretty simple to understand, when you try and apply a fixed rule for splitting this up it can potentially produce different results where different address categories get put into the same field. I could have given far more complicated examples if I had used flats, commercial premises, places in other countries, etc.

Above I have assumed some logical way of separating the address components. However (as @saukema suggested) this may not be the case. How do you write a rule to know when a Street name ends and a Town name begins? How do you know whether a Town is included?


Of course, your data may be strictly structured and therefore will be easy to split (I've rarely seen this in address data that is delivered in one String though), but if it is not can I suggest using something like the Google's Geocoding API ( It isn't terribly complicated to use and will give you a trusted structure you can make use of. It will also return the cleaned address in an XML structure, making it very easy for you to retrieve different sections. There are still issues with this approach since sometimes addresses are so varied by location, that the structure that is returned does not elegantly match your database schema (in fact you will find this is a big issue unless some serious thought is put into your db design for this.....if you get to design it).


Having reread this I notice it doesn't necessarily answer the question asked, but it is certainly something that will become a consideration and might actually help solve your original question while taking into account further complications.

Rilhia Solutions
Six Stars

Re: know the component

Six Stars

Re: know the component

Understod and agreed, but as per the source data i dont think the format is specific and mutually agreed upon while storing the data, better to keep the data in one column as you rightly pointed out that until the format of the stored data is known/specified its very difficult to solve.
Thanks much