One Star

tFileInputExcel ignoring values after 000

Hello all,
I have isolated this case to tFileInputExcel component that just refuses to read values after "000".
For example: the following value reads fine: 9.001
but the following 9.0001 reads in as 9
I have tried quiet a few things to get this to work without any luck.
I'm attaching the screenshots and a link to Talend job and sample excel file.
Feel free to download entire thing from here:
http://db.tt/p7LzArAu
I'm on Ubuntu 10.10 64 Bit.
I'm using TOS-All-r63143-V4.2.2
14 REPLIES
One Star

Re: tFileInputExcel ignoring values after 000

I have to assume this is a problem with your Open Office. Real Microsoft Excel generated from Windows does work. I will attach the file but below you can see the results coming from my Windows VM:

tarting job test at 14:36 19/11/2011.
connecting to socket on port 3633
connected
Debug: format string is 0.0000
.-------------------.
| #1. tLogRow_1 |
+----------+--------+
| key | value |
+----------+--------+
| Quantity | 1 |
| Price | 1.0735 |
+----------+--------+
.-------------------.
| #2. tLogRow_1 |
+----------+--------+
| key | value |
+----------+--------+
| Quantity | 2 |
| Price | 9.0001 |
+----------+--------+
.------------------.
| #3. tLogRow_1 |
+----------+-------+
| key | value |
+----------+-------+
| Quantity | 3 |
| Price | 9.001 |
+----------+-------+
.-------------------.
| #4. tLogRow_1 |
+----------+--------+
| key | value |
+----------+--------+
| Quantity | 4 |
| Price | 1.0001 |
+----------+--------+
disconnected
Job test ended at 14:36 19/11/2011.

Best,
-Nestor
One Star

Re: tFileInputExcel ignoring values after 000

See the file that works for me here: http://db.tt/3vTiWBrt
One Star

Re: tFileInputExcel ignoring values after 000

And just to eliminate any doubts below is the result of running the Talend exported job in OSX (which is BSD alike so it should worn in Ubuntu as well)

nestor-krfs:test nestor$ ./test_run.sh
Debug: format string is 0.0000
.-------------------.
| #1. tLogRow_1 |
+----------+--------+
| key | value |
+----------+--------+
| Quantity | 1 |
| Price | 1.0735 |
+----------+--------+
.-------------------.
| #2. tLogRow_1 |
+----------+--------+
| key | value |
+----------+--------+
| Quantity | 2 |
| Price | 9.0001 |
+----------+--------+
.------------------.
| #3. tLogRow_1 |
+----------+-------+
| key | value |
+----------+-------+
| Quantity | 3 |
| Price | 9.001 |
+----------+-------+
.-------------------.
| #4. tLogRow_1 |
+----------+--------+
| key | value |
+----------+--------+
| Quantity | 4 |
| Price | 1.0001 |
+----------+--------+
One Star

Re: tFileInputExcel ignoring values after 000

I just saw the excel inside the zip file. The cells are not formatted correctly. They are type General instead of Number. Change it to Number, 4 decimal places and it will work.
One Star

Re: tFileInputExcel ignoring values after 000

=nestor.urquiza@gmail.com]I have to assume this is a problem with your Open Office. Real Microsoft Excel generated from Windows does work. I will attach the file but below you can see the results coming from my Windows VM:

There is nothing wrong wtih OO-generated file.
In my file, cell formating is Number:General. Once, I change it to Number: 4 digit rounding it works. Even with OO.
I need to read the file as is from Talend.
It's a work around, but it is not a very good one.
I need to read vales, how they are stored. Visually they also look proper.
In my Talend job, I just try to read them in as Strings.
One Star

Re: tFileInputExcel ignoring values after 000

I have seen this to fail with several API's and even from SSIS before. That is why I went the extra mile debugging this problem and I found out there is a bug in the JXL API. I have reported so in the Yahoo group http://tech.groups.yahoo.com/group/JExcelApi/messages. My message is still not showing up ther eprobably because is going through moderation at the moment.
Definitely it would be great if Talend/JXL can accept the General format as text without losing any part of the content of the cell. Talend uses the JXL library. You realize that if you switch to debug mode, put a break point on the component and examine the code there.
Can you try converting with POI directly? Look into here for a quick try https://github.com/anjlab/excel2csv-java. If the resulting CSV does come correct then you can convert to CSV before feeding the file into Talend. That way you will have a temporary solution while you wait for the bug to be fixed. Below is an approximate copy of what I posted to JXL Yahoo user group:
There is a problem with precision in the jxl api. This is a continuation of this post: http://www.talendforge.org/forum/viewtopic.php?pid=69610
1. Compile source code from http://www.java-tips.org/other-api-tips/jexcel/converting-excel-documents-to-csv-files.html for example:
javac -cp jxl-2.6.12.jar ConvertCSV.java
2. Put the attached input.xls in the same directory where the compiled class is and run the program, for example:
java -cp jxl-2.6.12.jar:. ConvertCSV
3. Look at the resulting file input.csv. It should be:
1,1.0735
2,9.0001
3,9.0010
4,1.0001
But instead you get:
1,1.074
2,9
3,9.001
4,1
BTW I tested this with latest version of JXL http://mvnrepository.com/artifact/net.sourceforge.jexcelapi/jxl/2.6.12 and also with the one Talend 4.2.2 ships with (c:\TOS-Win32-r63143-V4.2.2\lib\java\jxl.jar) just in case I had the opportunity to patch Talend but they both failed.
One Star

Re: tFileInputExcel ignoring values after 000

One Star

Re: tFileInputExcel ignoring values after 000

The post has been responded and indeed there is a bug in Talend. Anybody from the Talend team reading this? I think we need to open a bug ticket. Here is the content of my post:
I will get to the Talend team now as clearly there are two things missing in their component:
1. A way to specify precision.
2. A correction so they stop to use getValue() instead of getContents()
In the generated code I can see:
if (true && jxl.CellType.NUMBER == cell_tFileInputExcel_2
.getType()) {
temp_row_tFileInputExcel_2 = String
.valueOf(((jxl.NumberCell) cell_tFileInputExcel_2)
.getValue());
String content = cell_tFileInputExcel_2
.getContents();
if (content != null
&& content.indexOf(separatorChar) == -1
&& (temp_row_tFileInputExcel_2
.indexOf("E") == -1)) {
temp_row_tFileInputExcel_2 = content;
} else {
String literal = temp_row_tFileInputExcel_2;
if (literal != null) {
literal = df
.format(((jxl.NumberCell) cell_tFileInputExcel_2)
.getValue());
}
temp_row_tFileInputExcel_2 = literal;
}
} else {
temp_row_tFileInputExcel_2 = cell_tFileInputExcel_2
.getContents();
}
So clearly "content" is picking a dirty value there and that is propagated later on in the code resulting in wrong parsing.
One Star

Re: tFileInputExcel ignoring values after 000

I meant to say "they start to use"
One Star

Re: tFileInputExcel ignoring values after 000

Talend team can you please advise if we should open a bug?
Four Stars

Re: tFileInputExcel ignoring values after 000

Bump
One Star

Re: tFileInputExcel ignoring values after 000

Is there a solution to this issue yet?
I have a similar problem in TOS 5.2.1, with the 'Read real values for numbers' checked, but TOS seems to round up values if the forth digit after the decimal point is >= 5. That is 0.9995 and above is rounded up to 1.0, although i have the length set to 12 and precision set to 6.
I have tried using float (which is really what i want), bigDecimal and double - but same issue.
Seventeen Stars

Re: tFileInputExcel ignoring values after 000

The JXL library is not the best. I recommend using Apache POI. This library is bullet proof and work well with both: xls and xlsx.
In the component suite tFileExcelWorkbookOpen, tFileExcelSheetInput and some more, the latest Apache POI library is used. Both components are available in Talend Exchange. I suggest trying them.
By the way, reading numbers should not depend on a display format, the mentioned components read the real value and does not use the formatting pattern for reading - if you use in your schema a Double or BigDecimal.
One Star mtm
One Star

Re: tFileInputExcel ignoring values after 000

I submitted a bug report for this issue. I'm apparently not able to post links here, but the issue number is TDI-31901.