Tcl Source Code: View Ticket

Ticket UUID:	1464890
Title:	after command hangs when system time changes
Type:	Bug	Version:	None
Submitter:	jflarvoire	Created on:	2006-04-05 12:37:52
Subsystem:	03. Timer Events	Assigned To:	kennykb
Priority:	5 Medium	Severity:
Status:	Open	Last Modified:	2007-03-15 16:36:30
Resolution:	None	Closed By:
		Closed on:
Description:	Hello, I wish to reopen bug #432038, and its duplicate bug #1052859: Changing the system clock hangs all Tcl events handlers that are scheduled with the after command. This makes it impossible to have reliable periodic handlers written in pure Tcl. (They may die unexpectedly anytime somebody runs the date/time command) I've just been badly hurt by this very problem on a large cluster we manage: We had system monitoring scripts written in Tcl that kept dying mysteriously on some of the nodes. And the illness was spreading with time... Contrary to what was written in a comment of bug #432038, using NTP does not prevent the problem: We saw the problem on machines with harware clocks slowly drifting. After each reboot, NTP takes some time for resynchronizing the system time... hanging all our Tcl scripts timers when it eventually does. Also I completely disagree with the closing comment in bug #1052859 about the lack of monotonically increasing timers in both Unix and Windows. The Unix sleep command, used in tons of shell scripts, is completely immune to such system date/time changes. This is easy to verify. Let's use in Tcl's after command the same timing function that Unix' sleep does. Likewise, Windows' waitable timers are immune too. I just verified it with the sample code for CreateWaitableTimer in MSDN library. Jean-François
User Comments:	jflarvoire added on 2007-03-15 16:36:30: Logged In: YES user_id=617204 Originator: YES What about having a surveyor thread, querying the system time every 500ms, and initiating a corrective action if it detects any abnormal change? - Any change that goes backwards in time is proof of a system clock update. - Any change going forward more than a given threshold is a likely update too. (May also be a delay due to high system load. Would have to make the surveyor thread a high priority thread to make it resilient to such overloads) Note that this is how we eventually fixed our problem. Instead of a thread, we used an outside program dedicated for that. Obviously an internal thread would be much more efficient. kennykb added on 2007-03-12 23:27:51: Logged In: YES user_id=99768 Originator: NO The trouble with what you're demanding is that it works only if there is only one [after] active, anywhere in the program. As soon as there is more than one, you get back into the same situation. Certainly, you can use sleep() or a waitable timer to delay until it's time for the first after to wake up. But in the situation: after 10000 {do something} after 5000 {do something else} once you've awakened on the first [after], you have to figure out what time the second one should fire. If the clock has backed up, how do you do that and get the behaviour you demand? Spinning a thread per [after] so that they can all use blocking timers would probably be infeasible - I know of applications that have thousands of [after]s scheduled.