Five Stars

Inconsistent time part in DATE field: bug or what?

I was debugging a Talend Data Integration jobs for days searching why sometimes my job faces some FK violations in the database. Today I realized that some Date fields are being stored with the time part of the date as "01:00:00" instead of "00:00:00".

 

I produced a very small example to illustrate what I think it is a bug:

tMSSQLInput ---> tJavaRow

 

tMSSQLInput (connected to any database, it doesn't really matter):

 

select convert(DATETIME, '2015-10-18') mydate
union all
select convert(DATETIME, '2015-10-19')
union all
select convert(DATETIME, '2015-10-20')

 

tJavaRow

System.out.println(input_row.mydate);

When I run this small job, I got the following result:

[statistics] connecting to socket on port 4031
[statistics] connected
Sun Oct 18 01:00:00 BRST 2015
Mon Oct 19 00:00:00 BRST 2015
Tue Oct 20 00:00:00 BRST 2015
[statistics] disconnected

Note that the 2015-10-18 is with the time part as "01:00:00". Particularly, I can only reproduce this behavior with this magic date. Any other date, except Oct 18, 2015, set the time part as "00:00:00".

 

Do someone know why this happens? Can someone confirm this weird behavior is really a bug, or am I missing something?

  • Data Integration
18 REPLIES
Six Stars

Re: Inconsistent time part in DATE field: bug or what?

Hi Diego,

 

on Sunday 2015-10-08 at 00:00:00 daylight saving time startet in Brazil and clocks were turned forward one hour to 01:00:00.

 

Best regards,

 

Thomas

 

 

Ten Stars

Re: Inconsistent time part in DATE field: bug or what?

I believe this is Java *thing* related to Brazilian Summer Time. Is it causing you a big issue?

 

Edit: I've just noticed that @ThWabi got here first and explained it better than I did.

Rilhia Solutions
Five Stars

Re: Inconsistent time part in DATE field: bug or what?

You are right guys. I improved my example with:

 

select convert(DATETIME, '2015-10-16') mydate
union all
select convert(DATETIME, '2015-10-17')
union all
select convert(DATETIME, '2015-10-18')
union all
select convert(DATETIME, '2015-10-19')
union all
select convert(DATETIME, '2015-10-20')

and the result was: 

[statistics] connecting to socket on port 3708
[statistics] connected
Fri Oct 16 00:00:00 BRT 2015
Sat Oct 17 00:00:00 BRT 2015
Sun Oct 18 01:00:00 BRST 2015
Mon Oct 19 00:00:00 BRST 2015
Tue Oct 20 00:00:00 BRST 2015
[statistics] disconnected

That is, Java seems to shift the date to follow my system timezone.

 

But despite of knowing this, I believe we are dealing with a bug anyway. Let me explain.

If we keep in mind that a Date/Calendar object in Java represents a moment in time (not a date), then "2015-10-18 00:00 BRT" and "2015-10-18 01:00 BRST" belongs to the same moment, which means their internal representation are the same. Java knows that "2015-10-18 00:00 BRT" doesn't really exists, so they prefer to display the Date/Calendar object using the second representation. So far, so good.

 

But, if the internal representation of them are the same, why they insert different content to the database?

 

My guess is that Talend component is converting the Date/Calendar object to its string representation (or any other conversion that offset the date according to the timezone), which is causing this. Since my database is not using time zones at all, I got an inconsistent behavior.

 

Do you agree with me? Or am I missing something again?

 

Is it causing you a big issue?

Yeah, it is driving me nuts actually. All Date fields of my database has FK pointing to a master Calendar table, which tells which dates are allowed in the database (this kind of consistency check is needed because of a design issue). Since there are no correspondent key in the Calendar table, I get an error when I try to insert a record with a valid date, but different time. Of course I already solved the problem before posting here, using String/VARCHAR fields for Date instead of the Date itself, but this seems to be an odd solution for me. I prefer to check what else I can do, and maybe file a bug to Talend if needed.

Six Stars

Re: Inconsistent time part in DATE field: bug or what?


rhall_2_0 wrote:

Edit: I've just noticed that @ThWabi got here first and explained it better than I did.


Yeah, rhall_2_0, I have BRSST (Brazilian Super Summer Time), which is four minutes ahead of you. Smiley Wink

 

Thomas

 

Five Stars

Re: Inconsistent time part in DATE field: bug or what?

What you guys think about my concerns? Should I file a bug?

 

Is there another way to solve the problem without having to deal with Date as String?

Ten Stars

Re: Inconsistent time part in DATE field: bug or what?

This isn't a bug with Talend. This is a Java feature to handle the time shift. Midnight on the 18th technically never exists as the clocks are shifted forward to 1am.

 

https://www.timeanddate.com/time/change/brazil/sao-paulo?year=2015

Rilhia Solutions
Five Stars

Re: Inconsistent time part in DATE field: bug or what?

It seems you did not take my concerns in consideration.

Despite being a Java thing or not, Talend is striping time zone information when it shouldn't. This is leading to an error.

Talend doesn't explicit a time zone (and don't give an option to do so). Since it doesn't explicit a time zone, it is up to Java to decide what to use to represent the date, so it follows system settings.

Note that, because of this, I can't really use DateTime type to deal with dates without time zones, because the results will always be inconsistent. In a job where both sides don't deal with time zones at all, data will flow through Talend and get messed when they hit the start of Daylight Saving Time. Better yet, this may happen or not depending of the settings of the system where Talend is running! This is why I think this is a bug.

Ten Stars

Re: Inconsistent time part in DATE field: bug or what?

OK, I see what you are saying. I do not believe that Talend is converting to or from a String type (unless you are implicitly converting using a tMap, for example). It would seem silly to do this when all Talend does is generate Java code. However, it doesn't make it easy for you to play around with timezones, I agree. What you can do to mitigate this is to use some Java with a tJavaRow to manipulate your Dates with specific timezone requirements.

 

While this is not ideal for you requirements, I still don't believe this is a bug. But maybe it is a good a feature request. 

Rilhia Solutions
Five Stars

Re: Inconsistent time part in DATE field: bug or what?

Sorry, but I am still not convinced.

 

In my job, I have a simple tMSSQLInput connected to tMSSQLOutput. No tMap, no code, nothing special.

I am describing a situation where the data read from a DATETIME column differs from the written to another DATETIME column. And we detected that this happens because of system settings, and there is no way to avoid this without writing custom code. Do you really believe this is OK and it is not a bug?

 

Also, note that I am not trying to find a workaround. I already solved this problem in particular.

Ten Stars

Re: Inconsistent time part in DATE field: bug or what?

I'm afraid I have a bit of an issue with this sentence " Since my database is not using time zones at all, I got an inconsistent behavior".  First of all, databases do use timezones. Most will internally store dates as UTC but they will also either have configurable timezones or will just use the system default.

 

Secondly, the date you are talking about is impossible in your locale. So if your database is storing it as "2015-10-18 00:00:00", it is configured incorrectly and you are using UTC and adjusting it to BRST. However, I suspect from looking at your example that you are simply expecting the date to be "2015-10-18 00:00:00" in your database because all of your other dates specified without explicitly setting a time, show midnight. Is this case. I notice you are using SQL Server. Try this code and you will see what timezone is default for your SQL Server.....

DECLARE @TimeZone VARCHAR(50)
EXEC MASTER.dbo.xp_regread 'HKEY_LOCAL_MACHINE',
'SYSTEM\CurrentControlSet\Control\TimeZoneInformation',
'TimeZoneKeyName',@TimeZone OUT
SELECT @TimeZone

 

If it is Brazil then it should never represent that date as  "2015-10-18 00:00:00".

 

To show that this behaviour is not a Talend bug and is the same for all Java applications, I have quickly knocked up the following code. This can be run in a tJava within a Job or in any Java IDE. It will do exactly the same.

 

//Setting the default timezone
java.util.TimeZone utcTimeZone = java.util.TimeZone.getTimeZone("America/Sao_Paulo");
java.util.TimeZone.setDefault(utcTimeZone);
System.out.println(java.util.TimeZone.getDefault().getDisplayName());


//Date set using only yyyy-MM-dd. Normally defaults to 00:00:00
SimpleDateFormat noTimeFormat = new SimpleDateFormat("yyyy-MM-dd");
Date date1 = noTimeFormat.parse("2015-10-18");

//Date set using only yyyy-MM-dd'T'HH:mm:ss. Explicitly setting time to 00:00:00
SimpleDateFormat withTimeFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss");
Date date2 = withTimeFormat.parse("2015-10-18T00:00:00"); 

//Show Java time in milliseconds since Jan 1 1970
System.out.println("Time in milliseconds after date set using only yyyy-MM-dd = " + date1.getTime());
System.out.println("Time in milliseconds after date set using only yyyy-MM-dd'T'HH:mm:ss = " + date2.getTime());

//Show date and time formatted to yyyy-MM-dd'T'HH:mm:ss
System.out.println("Date shown with yyyy-MM-dd'T'HH:mm:ss for date1 = "+withTimeFormat.format(date1));
System.out.println("Date shown with yyyy-MM-dd'T'HH:mm:ss for date2 = "+withTimeFormat.format(date2));

//Set date using milliseconds
Date dateFromMilli1 = new Date(date1.getTime());
//Set date using milliseconds minus 1 millisecond
Date dateFromMilli2 = new Date(date1.getTime()-1);

System.out.println("Date when set by date1's time in milliseconds = "+withTimeFormat.format(dateFromMilli1));
System.out.println("Date when set by date1's time in milliseconds minus 1 millisecond = "+withTimeFormat.format(dateFromMilli2));

System.out.println("Not a Talend bug");

If either of your databases is representing the date with "00:00:00", then it is an issue with your configuration. If you are mixing your timezones, you will get issues like this.

As I said before, easy timezone functionality within Talend might be a nice feature request, but this is not a bug. 

Rilhia Solutions
Five Stars

Re: Inconsistent time part in DATE field: bug or what?

You are right, but you are also wrong.

Obviously my database has a timezone. Every system has one, if this is your point.
But I am using a field that do not care to them. We are dealing with data, not with systems. In SQL Server we have DATETIME field (date without timezone) and DATETIMEOFFSET (date with timezone).

 

So, since the field is not taking the timezone in consideration, "2015-10-18 00:00:00" is a valid value, no matter where you are, or what's your system time zone. How you will use it, it is up to you and your system. So sorry if I was not clear before, but I meant that I was using for dates only datatypes without timezone, not that my system is nowhere in the space.

 

You made a cool demonstration to show me how Java handles time zones. But databases sometimes don't use them, because of a very common design decision. Talend is enforcing a timezone when it uses Date Class to store the value without considering what Java will do with it. This is wrong.

Ten Stars

Re: Inconsistent time part in DATE field: bug or what?

OK, but Talend is not enforcing a timezone, that is Java. Talend is entirely relying on Java's implementation of Date.

 

I think it would be nice to be able to select a timezone for a job without having to either run the job on a machine with the correctly set timezone, or set it in code. It would be a nice feature request. 

Rilhia Solutions
Five Stars

Re: Inconsistent time part in DATE field: bug or what?

Great. Now I think we are in the same line of reasoning.

I 100% agree that Talend is relying on Java's implementation of date. This is the big deal.

Date Class is incompatible with dates without time zone. This kind of date require special treatment in Java, and Talend does not provide any. Talend is doing bad when it allows us to use Date to represent this kind of data, as they were perfectly compatible.

At least a warning would take place is this situation, warning the user that this may lead to errors. This is not a new feature, is something that already exists, but it is broken and useless (at least, it is if you mind that your job will act consistently, no matter where it is executed).
Ten Stars

Re: Inconsistent time part in DATE field: bug or what?

There is no such thing as a date without a timezone though. The closest to what you are describing is UTC. I would argue that your scenario is actually not a common scenario. While I agree that an option might be nice to allow users to tailor this with a setting, there are hundreds of similar quirks of Java that many would argue would be nice to be able to have an easy to set Talend feature. In fact, Talend do provide a mechanism to deal with this. If you go to the "Run" tab and select "Advanced settings" you will see that you can specify JVM arguments. I suspect that the JVM argument you will want will be .....

-Duser.timezone=UTC

Using UTC time is arguably the standard that all systems should use to avoid confusion. This is very simple to set, but maybe requires a little Googling to find.

My point is that you can handle this with what Talend provides. As an example, recursion in SQL Server can be handled, but not without a greater than normal understanding of how to use the product. That is the same here. It can be handled, but you need to know how to handle it. But it is not a bug. 

Rilhia Solutions
Five Stars

Re: Inconsistent time part in DATE field: bug or what?

 

There is no such thing as a date without a timezone though.

Again, you are right and wrong at the same time. I think your sentence may be rewritten as: "Java does not have a class to handle dates without time zones." If it is the case, I agree with you.

 

I expect you understand the distinction between an "instant in time" from a "position in the calendar" (date), a "position in the clock" (time), and a "position in space" (time zone). You need all these three elements to describe an "instant in time".

 

The Date Class represent an instant in time, it is not dependent of a timezone at all. But when we print a instant in time on the screen, we like it to be presented as a position in calendar, and a position in the clock, relative to our time zone. This is fine, and it is exactly what Java do. In Java, as in most systems, dates follow system locale when not explicit set. 

 

But my data source is not giving to Talend an "instant in time". It is providing a position in calendar, and a position in the clock. This is my raw data, nothing else. But when Talend stores this information in the Date Class, it is 'converting' my raw data into an "instant in time", which is a representation that my data can't afford.

 

Particularly, what we found here is that, when the time zone is not explicitly set, Java does not have a compromise of using the input time zone as the output (in the case described here, my input is read as BRT, but the output is BRST, because Java noted that BRT doesn't exists at that specific date/time: who asked for this sanity check?). But Talend is dealing with a DATETIME field that doesn't have a time zone who needs to flow to another database that also doesn't. The information is not being presented to the user, it is not changed or anything. Don't you think it should be consistent?

 

You are trying to convince me that this is OK, because I am presenting a very specific scenario and that my environment is not set correctly, or anything similar. Bugs usually happens on specific scenarios, don't they? My point is that, in Java, a Date class is NOT suitable to represent a DATETIME field without time zone (at least, not without handling the lack of this information). It is a misconception of who mapped the fields of Talend that the Date Class fits this kind of field.

 

 

And again you are pointing hacks I can do. In your last message, you are telling me: "fake your time zone to UTC, because Talend does not care what Java is doing with the dates". If you search Google, you will find several sysadmins that prefer to set their servers to UTC to avoid these problems. I know. This is fine as a project decision, but you only have to do this because your software is bad designed. When this happens, you fix it or you give up?

 

I know there are thousands of workarounds, this is not what I am looking for and my specific problem is already solved. Now I am trying to find what we can do to fix this behavior, because it is indeed inconsistent.

 

 

I can imagine several situations where the results would be inconsistent. What should I do if I have a job that deals with data with and without timezone at the same time? How about if each database has a different time zone? Talend is relying on Java's implementation, great. But how about the adjacent systems? Are they fully compatible with Java's implementation? The solution is to always write custom code to assure the data is not messed, or is Talend able to anticipate this and handle everything nicely?

Ten Stars

Re: Inconsistent time part in DATE field: bug or what?


I can imagine several situations where the results would be inconsistent. What should I do if I have a job that deals with data with and without timezone at the same time? How about if each database has a different time zone? Talend is relying on Java's implementation, great. But how about the adjacent systems? Are they fully compatible with Java's implementation? The solution is to always write custom code to assure the data is not messed, or is Talend able to anticipate this and handle everything nicely?


This is exactly why UTC is used to store dates across systems which may be used in different time zones. This is how Facebook works, how Google works.....you name a major international organisation and this is how it would work. It is not a hack, it is the right way to do it. The data is stored in UTC and then translated to where it is needed. Dates are never independent of time zones in reality. You are using a flawed philosophy when you use an example where you expect to be able to represent a period of time which is impossible in some time zones, in those time zones. To alleviate this you have UTC which allows you to store those times irrespective of regional fluctuations and allows you to cater to them with regional code.

 

Your database (whether you know it or not) is using a time zone. I have no idea which, but it will be using one. You cannot escape a time zone when it comes to dates because of the issues that regional differences cause and the need to translate between them. We could have a long philosophical debate on whether a time in space requires a timezone when compared to GMT or whether time really exists, but we are getting away from the point.

 

Talend does not have a bug. It is a development tool. A tool that sits on top of Java (a Java code generator). The flaw you are trying to highlight is actually a massive advantage if you know how to make use of it. No programming language makes everything easy for every permutation of requirements. They tend to be engineered towards the most common requirements. Most people use dates with timezones. In the real world dates are dependent on a zone when used by people, when being stored by a computer they are generally stored independent of fluctuating time zone (daylight savings time, etc) and that is a key use of UTC. If you want to have a date that is not hampered by regional fluctuations (which would be advised in most systems storing data), then you use UTC.

 

https://en.wikipedia.org/wiki/Coordinated_Universal_Time

 

This is not a Talend issue, your issue here is with Java. But this isn't an "issue" with Java, it is a feature that is very useful. What you have basically criticised are fundamental standards for dealing with dates. I have shown you how to force your Talend job to use UTC in a couple of ways (which is what you want), but going back to our earlier discussion I agree that adding this in a more user friendly way might be a nice feature.  

Rilhia Solutions
Five Stars

Re: Inconsistent time part in DATE field: bug or what?

 

... The data is stored in UTC and then translated to where it is needed... 

From the Java Docs:

  • ... Although the Date class is intended to reflect coordinated universal time (UTC), it may not do so exactly, depending on the host environment of the Java Virtual Machine.

How this can be? Aren't they stored in UTC? How it depends of the environment? Oh lord. My head is blowing!

 

... You cannot escape a time zone when it comes to dates because of the issues that regional differences cause and the need to translate between them...

How about this, this and this?

  • Sometimes you really just want to represent a "local" (timezone unaware) date and time rather than an instant in time. To be honest it's more often useful to represent just a time - e.g. "wake me up at 8am, regardless of timezone" - but date and time could be useful too.

Never mind. Obviously they don't know what they are talking about. It seems to be a "long philosophical debate".

 

Smiley Frustrated

 

You won. Forget about it.

Ten Stars

Re: Inconsistent time part in DATE field: bug or what?

This wasn't about "winning" or "losing", it was about your incorrect assertion that the behaviour you experienced was a Talend bug. As I have said from the beginning, it is not a bug. We then moved over to your assertion that Java is seriously flawed in the way that it handles Dates, and that makes it a "Talend bug" because Talend doesn't automatically understand what you want to do and provide a drop down "fix all" to cover every permutation of Date usage. That is not a bug, it is not even a flaw in functionality. Talend is a development tool and while many Talend sales guys may claim their "grandmother can do it", you can only get the most out of a tool like this if you have more than a basic understanding of computer science....and it REALLY helps if you have a working understanding of Java. 

The functionality that you are looking for can quite easily be built within Talend, but it is not standard and as such you will not find easy drag and drop solutions to it. I did suggest a relatively easy fix to this (using UTC and setting your JVM argument to ensure that this would be the case), but you have managed to find a flaw in this through Googling.....which will not actually affect you and I think you know that.

 

By the way, the examples you have given to counter my last post do not really work. You really need to read the whole piece about UTC in the Java documentation and understand it. Regarding the  "Sometimes you really just want to represent a "local" (timezone unaware) date and time rather than an instant in time. To be honest it's more often useful to represent just a time - e.g. 'wake me up at 8am, regardless of timezone' - but date and time could be useful too".... comment, you do realise that to enable that functionality on a phone, for example, you would need to have a universal time recorded and maintained (UTC) and an awareness of wherever you are in the world in order to calculate when to "wake me up at 8am". What that is talking about is not fixing a timezone for your application, but being able to get notifications to wake you at 8am wherever in the world you are. Precisely why timezones are important    

 

 

 

Rilhia Solutions