TIP 511: Implement Tcl_AsyncMarkFromSignal()

Login
Author:         Christian Werner <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        14-June-2018
Post-History:   
Keywords:       Tcl,threads
Tcl-Version:	8.7
Tcl-Branch:     tip-511
Vote-Results:   4/0/5 accepted
Votes-For:      DKF, BG, KBK, JN, JD, SL
Votes-Against:  none
Votes-Present:  DGP, FV, AK

Abstract

This TIP proposes to add a Tcl API for marking Tcl_AsyncHandlers ready for processing from POSIX signal contexts.

Context

This TIP is inspired by a request from FlightAware to fix threading issues in combination with TclX signal handling.

Rationale

As of Tcl 8.6, the man page for Tcl_AsyncMark et.al. states that:

"These procedures provide a safe mechanism for dealing with asynchronous events such as signals..."

For the Tcl_AsyncMark() function, this claim is only true, when the Tcl core is built without threading support. Otherwise, the function needs to lock various mutexes to carry out its operation. But locking mutexes in a POSIX signal context is plain verboten. And even worse, many signals in POSIX have process context, and delivery to threads is random without thread-specific masks.

Specification

A new API Tcl_AsyncMarkFromSignal() is introduced with the signature

Tcl_AsyncMarkFromSignal(Tcl_AsyncHandler async, int sigNumber)

where the sigNumber argument is the POSIX signal number. This function shall be called from POSIX signal contexts. For non-POSIX systems it shall be equivalent to calling Tcl_AsyncMark(). When called from a non-signal context, its behaviour is undefined.

In case of the Tcl 8.6 select()-based notifier thread, this or a subfunction shall test if it runs in the notifier thread. If this is not the case, it shall resend the signal number to the notifier thread. If run in the notifier thread the function shall do whatever is necessary to perform a Tcl_AsyncMark() on the respective Tcl_AsyncHandler. In the current implementation of the notifier thread this is a write() of a single byte to the trigger pipe of the notifier thread. In order to avoid race conditions in the notifier thread it shall be started with all POSIX signals blocked, unblock all signals only when going into its select() based wait state, and block all signals afterwards.

In case of epoll and kqueue notifiers, this or a subfunction shall test if it runs in the target thread of the Tcl_AsyncHandler. If this is not the case, it shall resend the signal number to this target thread. If run in the target thread the function shall do whatever is necessary to perform a Tcl_AsyncMark() on the respective Tcl_AsyncHandler. In the current implementations of the epoll and kqueue notifiers this is a write() of a single byte to an event_fd or a pipe, respectively.

Independent of the implementation of the notifier, this approach must not make further assumptions regarding the runtime environment and its disposition of signals. However, as for the select()-based notifier thread it is allowed for all Tcl related threads to use their own thread-specific signal mask as required and rely on proper signal delivery by the OS and Tcl_AsyncMarkFromSignal().

And independent of signal dispositions this approach shall ensure that thread-specific Tcl_AsyncHandlers are directed to interrupt the owning target thread of the Tcl_AsyncHandler.

Related Bugs

Bug #f4f44174 demonstrates a deadlock issue with a script based on TclX observed with the Tcl 8.6 select()-based notifier. It is caused by the pthread_mutex_*() functions not supporting reentrant locking by default and not being async-signal-safe.

Implementation

Currently, there's a fork/proof of concept available in https://www.androwish.org/index.html/info/40790af1e8e4ec9f based on the Tcl 8.6 select() notifier.

Copyright

This document has been placed in the public domain.