Using Thread Sanitizer to validate and test thread safety#
Thread sanitizer (or TSan) is part of the compiler sanitizer suite, originally developed at Google and now integrated with the LLVM and GCC compilers. The sanitizers instrument compiled code with additional runtime checks for common sources of undefined behavior.
Thread Sanitizer specializes in finding data races, a form of undefined behavior that are possible in low-level multithreaded code. While Rust has compile-time guarantees to prevent most data races, C and C++ do not prevent data races from occurring and in practice many C and C++ extensions exhibit data races while adding support for the free-threaded build.
Compiling CPython and foundational packages with Thread Sanitizer#
In this section we provide the commands to build a free-threading compatible CPython interpreter and packages with TSan and other hints to discover potential data races.
cpython_sanity
docker images#
To ease working with thread sanitizer in projects that use Python, NumPy, and SciPy, we have created a set of docker images that contain a pre-built Python interpreter and common dependencies with thread sanitizer which can be tricky to build.
See the cpython_sanity
repository for more information
about how to use the docker images. Also see NumPy PR #28808,
which adjusted NumPy TSan CI to use the ghcr.io/nascheme/numpy-tsan:3.14t-dev
docker image instead of building Python from source, saving ten minutes of
compute time per CI run.
Compile free-threaded CPython with TSan#
- Clone the latest stable branch (
3.14
):
git clone https://github.com/python/cpython.git -b 3.14
- Configure and build the interpreter. Below instructions are for Linux (Windows and macOS may require some changes). We skip the instructions on how to install the Clang compiler.
cd cpython
CC=/path/to/clang CXX=/path/to/clang++ ./configure --disable-gil --with-thread-sanitizer --prefix $PWD/cpython-tsan
make -j 8
make install
- To use the built Python interpreter:
# Create a virtual environment:
$PWD/cpython-tsan/bin/python3.14t -m venv ~/tsanvenv
# Then activate it:
source ~/tsanvenv/bin/activate
# Exit the `cpython` folder (preparation for the next step below)
cd ..
If you use pyenv, you can also enable a thread sanitizer build with pyenv install
like so:
CC=/path/to/clang CXX=/path/to/clang++ CONFIGURE_OPTS="--with-thread-sanitizer" pyenv install 3.14t-dev
And then activate the build with e.g. pyenv local 3.14t-dev
.
Note
On MacOS, you may see messages like this when you start Python:
python(7027,0x1f6dfc240) malloc: nano zone abandoned due to inability to reserve vm space.
This message is being emitted by the MacOS malloc implementation. As
explained
here,
this happens for any program compiled with TSan on MacOS and can
be safely ignored by setting the MallocNanoZone
environment variable to
0. You should only set this in the session you are running TSan
under, as this setting will slow down other programs that allocate memory.
Compile NumPy with TSan#
- Get the source code (for example, the
main
branch)
git clone --recursive https://github.com/numpy/numpy.git
- Install the build requirements:
cd numpy
python -m pip install -r requirements/build_requirements.txt
- Build the package
CC=/path/to/clang CXX=/path/to/clang++ python -m pip install -v . --no-build-isolation -Csetup-args=-Db_sanitize=thread
# or with debug info
# CC=/path/to/clang CXX=/path/to/clang++ python -m pip install -v . --no-build-isolation -Csetup-args=-Db_sanitize=thread -Csetup-args=-Dbuildtype=debugoptimized
TSan suppressions#
While TSan is incredibly useful, it can also be difficult to securely fix all races detected by TSan. Some races are also more impactful than others. To avoid drowning out new issues with existing issues found under TSan testing, it's common practice to create a suppressions file for known issues and point TSan at the suppressions when you run it.
Here's an example suppression file from the TSan docs:
race:llvm::RuntimeDyldELF::registerEHFrames
race:partial_vectorcall_fallback
race:dnnl_sgemm
This suppressions file tells TSan to not any races it detects in the functions
llvm::RuntimeDyldELF::registerEHFrames
, partial_vectorcall_fallback
, and
dnnl_sgemm
.
You can tell TSan to use your suppressions file by setting suppressions
in
TSAN_OPTIONS
:
TSAN_OPTIONS="suppressions=$PWD/tsan-suppressions" python my_test.py
This would use a suppressions file name tsan-suppressions
located in the
current directory.
Using suppressions files from other projects#
Depending on what you are doing, you may see races coming from code outside of your project, including from CPython itself.
There are known races in CPython that are tracked in a suppressions file used for TSan testing in the CPython CI. You can see the version of this file in the the 3.14 branch of CPython here. This file might be a good place to start for your own testing, particularly if you see races inside of CPython that are listed in CPython's suppressions file.
In addition to CPython, we are aware of the following projects that run tests in CI with TSan and use suppressions:
- NumPy (TSan Suppressions)
- CFFI (TSan Suppressions)
If you are aware of other suppression files used for TSan testing of Python projects, please add them here.
Reporting TSan issues in your dependencies#
It is possible, or even likely in cases where TSan testing has not been used before, that you will see races coming code in your dependencies. If you've found a race in a project that already does TSan testing, then just go ahead and make a bug report including the TSan race report and steps to reproduce the race.
If your dependency does not yet regularly test with TSan, consider adding additional context and link to this guide to help the project understand what they need to do to reproduce your report and to understand how important a bugfix is.
Running Python under TSan#
If you have successfully compiled CPython and your project and any dependencies with native extensions using TSan instrumentation, you should be able to run a test script or your unit tests as normal. You will likely want to customize your TSan testing with some options. We explain how to do that below.
Useful TSan options#
- By default, TSan reports warnings. To stop execution on the first TSan report, use:
TSAN_OPTIONS=halt_on_error=1 python -m pytest test.py
See the TSan documentation for a full listing of options accepted by TSan.
Running pytest tests under TSan#
By default, pytest captures all output from
tests,
this means that you might only see output like ThreadSanitizer: reported 2 warnings
, but with no accompanying report with details about the warning.
To ensure that pytest doesn't capture any output from TSan, you can
pass -s
(short for --show-capture
) to your pytest invocation.
You can also set log_path=/path/to/log_file
in TSAN_OPTIONS, and logs
fill be written to /path/to/log_file.pid
, where pid
is the process ID
instead of being directed to stderr, which is the default.
Some authors of this guide have observed hangs running pytest with
halt_on_error=1
. If you observe hangs, try setting halt_on_error=0
in
TSAN_OPTIONS.
The pytest-xdist plugin can also
sometimes be problematic if a test runner happens to crash during
execution. While pytest-xdist
does have some support for detecting a crashed
worker, it is not foolproof and the authors of this guide have observed hangs on
CI due to pytest-xdist not properly handling a worker failing due to a TSan
error.
The pytest-xdist
plugin also makes it impossible to obtain stdout from
a test runner, so there
is no way to see TSan output if there is an issue. This can lead to hangs on CI
machines with no accompanying error report to explain the nature of the
hang. For that reason we suggest uninstalling pytest-xdist
from your
environment to ensure it isn't used. If you need to use pytest-xdist
to make
the tests complete in a reasonable amount of time, we suggest using
pytest-timeout
to ensure hung
tests eventually exit, particularly on CI.
TSan includes a check to ensure allocators never fail. This can lead to runtime
crashes if a test happens to try allocating a very large block of memory
specifically to ensure such an allocation does fail correctly. Set
allocator_may_return_null=1
in TSAN_OPTIONS
to avoid this.
If a TSan warning is detected, the exit code of the running process will be set
to a nonzero value (66, by default). If for some reason that is problematic in
your test suite then you can set exitcode=0
in TSAN_OPTIONS
to make TSan
quit "successfully" if a warning is detected. For example, you might set this if
a subprocess returning a nonzero exit code unexpectedly breaks a test.
You might also find that running your test suite is very slow under
TSan. Consider skipping tests that do not use threads, for example by only
testing files that import threading
or
concurrent.futures.ThreadPoolExecutor
. See this NumPy CI
workflow
that runs pytest on a subset of NumPy's tests. This will miss tests that spawn
threads in native code (e.g. with OpenMP or other threading primitives) or use
Python packages that spawn threads, but is a good option if your library doesn't
do that.
Altogether, a pytest invocation using TSan might look like:
$ TSAN_OPTIONS='allocator_may_return_null=1 halt_on_error=1' pytest -s