Author: Christian Werner <[email protected]>
State: Final
Type: Project
Vote: Done
Created: 14-June-2018
Post-History:
Keywords: Tcl,threads
Tcl-Version: 8.7
Tcl-Branch: tip-511
Vote-Results: 4/0/5 accepted
Votes-For: DKF, BG, KBK, JN, JD, SL
Votes-Against: none
Votes-Present: DGP, FV, AK
Abstract
This TIP proposes to add a Tcl API for marking Tcl_AsyncHandlers
ready for
processing from POSIX signal contexts.
Context
This TIP is inspired by a request from FlightAware to fix threading issues in combination with TclX signal handling.
Rationale
As of Tcl 8.6, the man page for Tcl_AsyncMark
et.al. states that:
"These procedures provide a safe mechanism for dealing with asynchronous events such as signals..."
For the Tcl_AsyncMark()
function, this claim is only true, when the Tcl
core is built without threading support. Otherwise, the function needs
to lock various mutexes to carry out its operation. But locking mutexes
in a POSIX signal context is plain verboten. And even worse, many signals
in POSIX have process context, and delivery to threads is random without
thread-specific masks.
Specification
A new API Tcl_AsyncMarkFromSignal()
is introduced with the signature
Tcl_AsyncMarkFromSignal(Tcl_AsyncHandler async, int sigNumber)
where the sigNumber argument is the POSIX signal number. This function
shall be called from POSIX signal contexts. For non-POSIX systems it
shall be equivalent to calling Tcl_AsyncMark()
. When called from a
non-signal context, its behaviour is undefined.
In case of the Tcl 8.6 select()
-based notifier thread, this or a
subfunction shall test if it runs in the notifier thread. If this is
not the case, it shall resend the signal number to the notifier thread.
If run in the notifier thread the function shall do whatever is necessary
to perform a Tcl_AsyncMark()
on the respective Tcl_AsyncHandler
. In the
current implementation of the notifier thread this is a write()
of a single byte to the trigger pipe of the notifier thread.
In order to avoid race conditions in the notifier thread it shall be
started with all POSIX signals blocked, unblock all signals only when
going into its select()
based wait state, and block all signals afterwards.
In case of epoll and kqueue notifiers, this or a subfunction shall test if it
runs in the target thread of the Tcl_AsyncHandler
. If this is not the
case, it shall resend the signal number to this target thread.
If run in the target thread the function shall do whatever is necessary
to perform a Tcl_AsyncMark()
on the respective Tcl_AsyncHandler
. In the
current implementations of the epoll and kqueue notifiers this is a
write()
of a single byte to an event_fd
or a pipe, respectively.
Independent of the implementation of the notifier, this approach must
not make further assumptions regarding the runtime environment and its
disposition of signals. However, as for the select()
-based notifier
thread it is allowed for all Tcl related threads to use their own
thread-specific signal mask as required and rely on proper signal
delivery by the OS and Tcl_AsyncMarkFromSignal()
.
And independent of signal dispositions this approach shall ensure
that thread-specific Tcl_AsyncHandlers
are directed to interrupt the
owning target thread of the Tcl_AsyncHandler
.
Related Bugs
Bug #f4f44174 demonstrates a deadlock issue with a script based on TclX
observed with the Tcl 8.6 select()
-based notifier. It is caused by the
pthread_mutex_*()
functions not supporting reentrant locking by default and
not being async-signal-safe.
Implementation
Currently, there's a fork/proof of concept available in
https://www.androwish.org/index.html/info/40790af1e8e4ec9f based
on the Tcl 8.6 select()
notifier.
Copyright
This document has been placed in the public domain.