Tcl Source Code

View Ticket
Login
Ticket UUID: 1464890
Title: after command hangs when system time changes
Type: Bug Version: None
Submitter: jflarvoire Created on: 2006-04-05 12:37:52
Subsystem: 03. Timer Events Assigned To: kennykb
Priority: 5 Medium Severity:
Status: Open Last Modified: 2007-03-15 16:36:30
Resolution: None Closed By:
    Closed on:
Description:
Hello,

I wish to reopen bug #432038, and its duplicate bug
#1052859:

Changing the system clock hangs all Tcl events handlers
that are scheduled with the after command.
This makes it impossible to have reliable periodic
handlers written in pure Tcl. (They may die
unexpectedly anytime somebody runs the date/time command)

I've just been badly hurt by this very problem on a
large cluster we manage: We had system monitoring
scripts written in Tcl that kept dying mysteriously on
some of the nodes. And the illness was spreading with
time...

Contrary to what was written in a comment of bug
#432038, using NTP does not prevent the problem:
We saw the problem on machines with harware clocks
slowly drifting. After each reboot, NTP takes some time
for resynchronizing the system time... hanging all our
Tcl scripts timers when it eventually does.

Also I completely disagree with the closing comment in
bug #1052859 about the lack of monotonically increasing
timers in both Unix and Windows.
The Unix sleep command, used in tons of shell scripts,
is completely immune to such system date/time changes.
This is easy to verify. Let's use in Tcl's after
command the same timing function that Unix' sleep does.
Likewise, Windows' waitable timers are immune too. I
just verified it with the sample code for
CreateWaitableTimer in MSDN library.

Jean-François
User Comments: jflarvoire added on 2007-03-15 16:36:30:
Logged In: YES 
user_id=617204
Originator: YES

What about having a surveyor thread, querying the system time every 500ms,
and initiating a corrective action if it detects any abnormal change?
- Any change that goes backwards in time is proof of a system clock update.
- Any change going forward more than a given threshold is a likely update too.
  (May also be a delay due to high system load. Would have to make the surveyor
   thread a high priority thread to make it resilient to such overloads)

Note that this is how we eventually fixed our problem.
Instead of a thread, we used an outside program dedicated for that.
Obviously an internal thread would be much more efficient.

kennykb added on 2007-03-12 23:27:51:
Logged In: YES 
user_id=99768
Originator: NO

The trouble with what you're demanding is that it works only if there is only one [after] active, anywhere in the program.  As soon as there is more than one, you get back into the same situation.  Certainly, you can use sleep() or a waitable timer to delay until it's time for the *first* after to wake up.  But in the situation:

     after 10000 {do something}
     after 5000 {do something else}

once you've awakened on the first [after], you have to figure out what time the *second* one should fire.  If the clock has backed up, how do you do that and get the behaviour you demand?  Spinning a thread per [after] so that they can all use blocking timers would probably be infeasible - I know of applications that have thousands of [after]s scheduled.