Author: Frédéric Bonnet <[email protected]>
State: Final
Type: Project
Vote: Done
Created: 24-May-2018
Post-History:
Keywords: Tcl,threads
Tcl-Version: 8.7
Vote-Results: 8/0/1 accepted
Votes-For: DKF, KBK, JN, JD, DGP, FV, SL, AK
Votes-Against: none
Votes-Present: BG
Tcl-Branch: tip-509
Abstract
This TIP proposes to improve the Tcl_Mutex
API by enforcing a consistent
behavior on all core-supported platforms regarding reentrancy.
Context
This TIP is inspired by a request from FlightAware to fix deadlock issues with TclX signal handling. A specific issue has been opened to discuss the proposed implementation here.
Rationale
As of Tcl 8.6, the man page for thread support Thread.3
states that:
The result of locking a mutex twice from the same thread is undefined. On some platforms it will result in a deadlock.
On Windows platforms, mutexes are implemented using Win32 critical sections,
which are reentrant. Tcl_Mutex
are just plain CRITICAL_SECTION *
.
On Unix platforms (this includes MacOS X), Tcl relies on the pthread library
for multithreading and synchronization primitives such as mutexes. Tcl_Mutex
are just plain pthread_mutex_t *
. pthread mutexes are not reentrant by
default, though the PTHREAD_MUTEX_RECURSIVE
attribute can be used at creation
time to make them so, but this possibility is not available on older systems.
The Tcl philosophy has always been to erase platform-specific peculiarities in
favor of overall multi-platform consistency, sometimes to the point of
implementing or emulating commonly available features on less capable platforms.
Therefore it feels natural to pursue this goal by making Tcl_Mutex
reentrant
on all platforms and achieving a consistent behavior on both Windows and Unix.
Specifications
This TIP proposes to replace the following text from the Thread.3
man page:
The result of locking a mutex twice from the same thread is undefined. On some platforms it will result in a deadlock.
by the following text:
Mutexes are reentrant: they can be locked several times from the same thread. However there must be exactly one call to Tcl_MutexUnlock for each call to Tcl_MutexLock in order for a thread to release a mutex completely.
Portability issues
Windows
Mutexes are naturally reentrant on Windows systems, so no special work is required.
Unix
On pthread-based Unix systems that support the PTHREAD_MUTEX_RECURSIVE
attribute, all pthread_mutex_t
made available as Tcl_Mutex
will be created
using this attribute. This includes all but the oldest variants of Unix.
On pthread-based Unix systems that do not support the
PTHREAD_MUTEX_RECURSIVE
attribute, reentrancy will be achieved by combining a
regular, non-reentrant pthread_mutex_t
, with a thread-specific lock counter
accessible through a pthread_key_t
data key. This counter keeps track of the
number of calls to Tcl_MutexLock
minus the number of calls to
Tcl_Mutex_Unlock
. Tcl_MutexLock
increments the counter, but only calls
pthread_mutex_lock
when the initial value is zero. Tcl_Mutex_Unlock
behaves
symmetrically: it decrements the counter, and only calls pthread_mutex_unlock
when it reaches zero. This ensures that a thread never calls
pthread_mutex_lock
twice on the same mutex, and only calls
pthread_mutex_unlock
when the thread no longer holds it.
Detection of PTHREAD_MUTEX_RECURSIVE
availability is done at configure time
thanks to the AC_CHECK_DECLS
autoconf macro in tcl.m4
.
Potential incompatibilities
Although this TIP introduces a major change to Tcl_Mutex
behavior on Unix, it
is very unlikely that this will break any existing code:
- Windows-only code will see no change in behavior.
- Unix-only code with reentrant mutexes is fatally flawed in the current state of the Tcl core, since this results in a deadlock. At best, this TIP will fix such hard-to-reproduce situations (case in point: TclX signals), and the code will transition from the nonworking state to the working state (which, according to Dr. John Ousterhout, is the best performance improvement). At worst, this change will trigger a new class of reentrancy-related bugs on already broken code.
- Multiplatform code in its current form behaves either inconsistently or consistently, depending on whether it uses reentrant mutexes or not. Consistent code will remain consistent, inconsistent code will become consistent as the Unix version aligns with the Windows version.
Related Bugs
Bug #f4f44174 demonstrates the
deadlock issue with a script based on TclX. The root cause is the asynchronous
event handler's Tcl_Mutex
being locked twice from the same thread when a
signal handler interrupts a thread in the middle of a mutex-protected section,
which on Unix platforms results in a deadlock. The proposed implementation fixes
this issue.
Implementation
The proposed implementation is available on branch tip-509 in the Tcl Fossil repository.
Copyright
This document has been placed in the public domain.