I'm using Talend studio for big data 7.1 (open source) version. Facing problem related to read excel file which is having cell value as more than 15 digits. Actually, it is well known issue of excel, if cell format is Number and the value exceeds more than 15 digits, it converts the extra digits to 'Zero'.
In the image, excel file value is 17 digits. A, B and C columns are in cell format Text, General and Number respectively.
We are processing Number format value through Talend Studio.
In Talend, using Excel File Connector, it is showing the preview of the above image as:
It is showing the data discrepancy in Number Formatted column of excel file and Talend preview. Both values i.e Number format in excel and Number format in preview are different.
Though the actual value changes in excel itself, but why Talend studio does not process the same value as excel file?
Please someone help me with this.
Excel will format the numbers during display but if you open the same file with a simple text editor like notepad++, it will show the actual values present in the file.
Talend always read the files like a normal text editor (means always read the actual values stored in file) instead of data shown in Excel in auto-formatted manner. So I would say Talend is working as expected.Please also make sure that you are allocating right data types in Talend which will hold the values of the size mentioned in picture.
If the answer has helped you, could you please mark the topic as resolved? Kudos are also welcome :-)
Thanks for the reply @nikhilthampi
In excel file, Number value up to 15 digits is fine. But value more than 15 digits showing mismatch value in excel itself and in Talend also.
As i mentioned in image, 16th and 17th digits are converted to zero and while processing through Talend, Talend assuming some random value for converted zeros i.e 16th and 17th digit, which is exactly different from the actual value's 16th and 17th digits.
Please refer the both images and you can verify with counting the digits. It is where the actual problem i'm facing.
Here are my quick queries?
a) What is the data type you are using to read the data using Talend?
b) Could you please paste a screenshot of the same data (which you have shown earlier in Excel and Talend) using Notepad++?
My answers are as below,
a) For column Actual value, General value and Number value datatypes are String, Long, Long respectively.
Screenshot for Talend preview for same values.
Could you please read them as String itself from source and do a conversion use tConverttype later in your job flow and let us know the results?
I've tried the solution as you said, but it didn't work. It is still showing different values.
If you are also using Talend big data studio, could you please try this type of problem by your side? It will not take too much time.
If it is resolved by your side, it'll definitely very helpful for all.
Thanks and regards,
Join us at the Community Lounge.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Learn how to make your data more available, reduce costs and cut your build time
Read about OTTO's experiences with Big Data and Personalized Experiences
Take a look at this video about Talend Integration with Databricks