One Star

Loading Excel dates into Oracle

I needed to load Excel data into my Oracle table and had some date fields in the source and target data. I wanted to be able to load the dates correctly into Oracle. After some struggle I was able to do so and now want to share some tips and tricks with the community. I use perl but the approach should be similar in Java. I also don't think it's the best method - please let me know if you know how to do it better!
There were 2 main problems:
1) Dates come from excel as some weird numbers (it's aclually number of days since Jan 1, 1900) -
I found a couple of pages that explain how dates are stored in Excel:
http://www.cpearson.com/excel/datetime.htm
I also found a ready-to-use perl sub to convert excel dates into the human-readable format here: http://www.goldfisch.at/knowledge/355 , and used it with the little change in my tMap component (InputDate is the variable that stores the date in Excel format and comes from the Excel spreadsheet). Here is the code that you can copy and paste to your tMap component's output Expression field, it will convert Excel data to 'YYYY-MM-DD' format:
sub {
sub excel2date
{
use DateTime;
my $dd=shift;
my $d = DateTime->new(year=>1900,month=>1,day=>1);
$d->add(days=>$dd-2);
return sprintf("%04u",$d->year).'-'.sprintf("%02u",$d->month).'-'.sprintf("%02u",$d->day);
}
return excel2date($row1);
}->()

at the output I got the date in 'YYYY-MM-DD' format
Now to the second problem:
2) My Oracle uses the different format of the date stamp ('DD-MON-YY'), for example to load my data I need to convert my dates to something like: '01-JAN-01' - and I don't like this nonsense.
I also found a good page that explains how to change date format for your Oracle session here: http://www.dba-oracle.com/t_nls_date_format_sysdate.htm
Unfortunately, I could not find a way in Talend to alter my session (maybe somebody can help me with this?). So I used the following trick:
1) Created a tOracleConnection
2) Opened the perl source code and found a line: our $dbh_tOracleConnection_1 = DBI->connect ... As you can see Talend uses the variable $dbh_tOracleConnection_1 for my connection. So I thought if Talend can use it - I also can Smiley Happy
3) I added the tPerl component with the following line to it:
$dbh_tOracleConnection_1->do("ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'");

4) Connected tOracleConnection_1 and tPerl_1 with OnSubjobOk trigger and connected the tPerl_1 with the next component the same way.
5) Now I can use tOracleOutput component to load data into Oracle table in 'YYYY-MM-DD' format. Don't forget to switch to "use an existing connection" in tOracleOutput and choose the tOracleConnection_1 from the list.
That's it - simple but not very straight-forward. So please let me know if there is a better way to do it.
Tags (1)
1 REPLY
Employee

Re: Loading Excel dates into Oracle

That's it - simple but not very straight-forward. So please let me know if there is a better way to do it.

Thanks for the time you have taken to show your method, we appreciate this kind of contribution :-)
Here comes my "better way to do it" :
1) concerning the conversion of Excel raw dates, I advise to use the Perl module DateTime::Format::Excel. Load it in a tLibraryLoad ("Custom Code" family) and in a tMap, apply the following Perl expression:
DateTime::Format::Excel->parse_datetime($row1)->ymd('-')

Note : tFileInputExcel will give you the raw date only when "with format" property is unchecked (Advanced Settings), else it will output strings like "8-25-08" (for 2008-25-08). By default, "with format" is checked.
2) When writing to Oracle, no need to play with NLS_DATE_FORMAT variable (it is really useful only when reading). You should use instead the "Additional columns" properties (Advanced settings) and use the SQL expression:
"to_date(?, 'YYYY-MM-DD')"

Related topics :
- 1959
- 516