511.md at [49aa10dfb7]

Login

File tip/511.md artifact 7f95be7aba part of check-in 49aa10dfb7


# TIP 511: Implement Tcl_AsyncMarkFromSignal()
	Author:         Christian Werner <[email protected]>
	State:          Final
	Type:           Project
	Vote:           Done
	Created:        14-June-2018
	Post-History:   
	Keywords:       Tcl,threads
	Tcl-Version:	8.7
	Tcl-Branch:     tip-511
	Vote-Results:   4/0/5 accepted
	Votes-For:      DKF, BG, KBK, JN, JD, SL
	Votes-Against:  none
	Votes-Present:  DGP, FV, AK
-----

# Abstract

This TIP proposes to add a Tcl API for marking `Tcl_AsyncHandlers` ready for
processing from POSIX signal contexts.

# Context

This TIP is inspired by a request from FlightAware to fix threading issues
in combination with TclX signal handling.

# Rationale

As of Tcl 8.6, the man page for `Tcl_AsyncMark` et.al. states that:

> "These procedures provide a safe mechanism for dealing with asynchronous
> events such as signals..."

For the `Tcl_AsyncMark()` function, this claim is only true, when the Tcl
core is built without threading support. Otherwise, the function needs
to lock various mutexes to carry out its operation. But locking mutexes
in a POSIX signal context is plain _verboten_. And even worse, many signals
in POSIX have process context, and delivery to threads is random without
thread-specific masks.

# Specification

A new API `Tcl_AsyncMarkFromSignal()` is introduced with the signature

    Tcl_AsyncMarkFromSignal(Tcl_AsyncHandler async, int sigNumber)

where the sigNumber argument is the POSIX signal number. This function
shall be called from POSIX signal contexts. For non-POSIX systems it
shall be equivalent to calling `Tcl_AsyncMark()`. When called from a
non-signal context, its behaviour is undefined.

In case of the Tcl 8.6 `select()`-based notifier thread, this or a
subfunction shall test if it runs in the notifier thread. If this is
not the case, it shall resend the signal number to the notifier thread.
If run in the notifier thread the function shall do whatever is necessary
to perform a `Tcl_AsyncMark()` on the respective `Tcl_AsyncHandler`. In the
current implementation of the notifier thread this is a `write()`
of a single byte to the trigger pipe of the notifier thread.
In order to avoid race conditions in the notifier thread it shall be
started with all POSIX signals blocked, unblock all signals only when
going into its `select()` based wait state, and block all signals afterwards.

In case of epoll and kqueue notifiers, this or a subfunction shall test if it
runs in the target thread of the `Tcl_AsyncHandler`. If this is not the
case, it shall resend the signal number to this target thread.
If run in the target thread the function shall do whatever is necessary
to perform a `Tcl_AsyncMark()` on the respective `Tcl_AsyncHandler`. In the
current implementations of the epoll and kqueue notifiers this is a
`write()` of a single byte to an `event_fd` or a pipe, respectively.

Independent of the implementation of the notifier, this approach must
not make further assumptions regarding the runtime environment and its
disposition of signals. However, as for the `select()`-based notifier
thread it is allowed for all Tcl related threads to use their own
thread-specific signal mask as required and rely on proper signal
delivery by the OS and `Tcl_AsyncMarkFromSignal()`.

And independent of signal dispositions this approach shall ensure
that thread-specific `Tcl_AsyncHandlers` are directed to interrupt the
owning target thread of the `Tcl_AsyncHandler`.

# Related Bugs

Bug #f4f44174 demonstrates a deadlock issue with a script based on TclX
observed with the Tcl 8.6 `select()`-based notifier. It is caused by the
`pthread_mutex_*()` functions not supporting reentrant locking by default and
not being async-signal-safe.

# Implementation

Currently, there's a fork/proof of concept available in
https://www.androwish.org/index.html/info/40790af1e8e4ec9f based
on the Tcl 8.6 `select()` notifier.

# Copyright

This document has been placed in the public domain.