The known problems with threads on Linux

You've come to this page as a result of a question similar to the following:

What are the known problems with threads on Linux?

This is the Frequently Given Answer to that question.

Linux doesn't provide the rich and complete multithreaded programming semantics that people used to Unices (and indeed OS/2 and Windows) may expect. There are known problems with the threads implementations.

Most of these problems arise from the way that threads were introduced into Linux. The transition to kernel mechanisms for correctly supporting processes and threads is still, after these many years, incomplete. Until the transition is complete, these problems will persist.

Historical background

Linux itself has "tasks", and originally tasks mapped one-to-one to processes. There was no multithreading. Then it was decided to add threads to Linux. This was done by dissociating tasks from processes. The idea was to provide a general-purpose kernel mechanism, whereby the kernel knew only of "tasks", which could optionally share various common resources, such as address spaces, file decriptor tables, and so forth, with one another. This was intended to provide a flexible mechanism on top of which various process/thread models, presented by the application-mode system library, could potentially be built, including the POSIX threading model and various others that might take application developers' fancies.

Three threading implementations aiming to provide the POSIX threading model to applications were built. The first was LinuxThreads. Back in 1999, when this list was first compiled, that was the sole threads implementation for Linux. It suffered from many problems, not least because the kernel "task" mechanism wasn't complete. Tasks couldn't share everything that they needed to be able to share, and various parts of the system, from the kernel to standard utility programs, still treated tasks as if they were processes.

A second was "New Generation POSIX Threads", released in 2001-05. This had a multi-level scheduler, and a M:N model (as Solaris does) for layering application threads on top of kernel threads, unlike the 1:1 model used in other Linux implementations of POSIX multithreading. It only involved minimal changes to kernel functionality, and so suffered from several of the same problems as LinuxThreads.

In 2002-09, version 0.1 of "Native POSIX Threads for Linux", the third such implementation, was annnounced. NPTL took advantage of several later improvements to the kernel's "task" mechanism that allowed a cleaner threading implementation without some of the problems that plagued LinuxThreads. It didn't address the problems with tools such as ps, but it did address the system library and the underlying kernel mechanisms. NPTL became the default threading mechanism in the "unstable" development branch of the kernel as of kernel version 2.5.36, and entered the "stable" kernel as of version 2.6.

However, even with NPTL, the situation is far from perfect. The kernel still lacks the ability to share several resources between tasks that, according to the POSIX threading model, must be shared by all threads within a single process. So there are still, even as of 2010 and kernel version 2.6.33, problems with threads on Linux. Quite a lot of necessary changes to the kernel still remain outstanding.

LinuxThreads problems

These were the known problems with threads on Linux, with LinuxThreads:

It doesn't handle signals correctly.

Signals sent via kill() from other processes are delivered to single individual threads rather than to the process as a whole. This makes it difficult to manually SIGSTOP a process, causes job control to operate incorrectly, and makes it hard to implement debuggers that can freeze a whole process.
A multithreaded process cannot perform asynchronous I/O with SIGIO, handling the signal in a separate thread.
ps shows every thread in a process, and moreover shows each thread as if it were a process. The traditional Unix semantics, and the POSIX IEEE 1003.1:2004 definition of ps (which doesn't differ in this regard from IEEE 1003.2:1992), are that ps lists processes, not threads.

Commercial Unices get this right. Their ps commands list processes. Details of individual threads are simply another sort of additional information about a process that can be printed if desired. (See the -L option to ps on Solaris, for example.)
Core dumps of multi-threaded programs don't contain all the threads (or even necessarily the crashing one !).
getpid() doesn't return the same value for all threads in a single process. In fact, there is no value provided by the Linux kernel to applications corresponding to the standard Unix concept of process ID.

A commonly suggested workaround, that of making getpid() save the first ID that it receives from the kernel in a static variable, is flawed. It doesn't work across exec(), whereas a correctly implemented getpid() must.
A child process fork()ed by one thread in the parent process often cannot be wait()ed for by a different thread in the parent process, depending upon the exact parent-child relationships between threads within the parent.
Threads have parent-child relationships (when they should properly all be peers).
User and group ID information isn't common to all threads in a single process, so (for example) a multithreaded setuid/setgid process can have a very interesting time.

For example, the POSIX IEEE 1003.1:2004 definition of setuid(), requires that UID changes occur across an entire process, not just for the one thread within the process making the setuid() call. With LinuxThreads, setuid() only affects the current thread.
rlimit information isn't common to all threads in a single process.

Per the POSIX IEEE 1003.1:2004 definition of setrlimit() and getrlimit(), resource limits apply variously across an entire process or (in the case of RLIMIT_STACK) to the first thread in the process (not the calling thread). Linux only applies them to the calling thread.
A multithreaded session leader process cannot do things like disconnecting from a controlling TTY in any thread other than its first.

This is because the kernel associates session leadership with tasks and not with processes as it should do. Even though it is in the same session leader process, a secondary thread is not considered to be the session leader task.

As per the POSIX IEEE 1003.1:2004 definition of controlling TTYs, a controlling TTY is an attribute of a process, not of a thread. If a session leader process disconnects from a controlling TTY in one of its threads, this should be the case for all other threads in the process, too. Similarly, if a non-session-leader process calls setsid() to disconnect from its controlling TTY in one of its threads, this should be the case for all of the other threads in the process, too.
times() doesn't account for anything other than the thread it is called in.
Regions of files locked by different threads in the same process are not correctly merged.

NPTL problems

As of NPTL version 0.19 and kernel 2.5.36, several of the above problems were not addressed, and indeed unacknowledged even as problems in a couple of cases. Of those, the following were later fixed:

The situation with core dumps of multi-threaded programs, which wasn't acknowledged as a problem as of NPTL 0.19, gradually improved over subsequent years, although kernel and NPTL developers were still ironing out problems over four years later.
Resource usage and limit information continued to be stored directly in the "task" structure, and not shareable amongst tasks; and this continued to be a problem through kernel version 2.6.8 and NPTL version 0.60, in 2005.

As of 2010, resource limit information is shareable amongst tasks, and is no longer directly in the task structure. It is shared amongst all tasks in a process using the same structure used to share CPU accounting information for terminated tasks and child processes.
As of 2010, the controlling TTY and session information for all of the tasks in a "task group" (i.e. process) are those of a distinguished task that is the "group leader".

These are then the remaining known problems with threads on Linux, with NPTL, that are still outstanding, as of kernel 2.6.33:

User and group ID information continued to be stored directly in the "task" structure, and not shareable amongst tasks; and this continued to be a problem through kernel version 2.6.8 and NPTL version 0.60, in 2005.

In more recent versions, the user and group ID information is no longer stored directly in the structure, but this is a distinction that makes no difference to multithreading. User and group ID information has moved into a separate "credentials" structure. However, credentials structures are only shared amongst tasks as a copy-on-write optimization, and the situation is effectively unchanged. The kernel still, as of version 2.6.33, does not support tasks sharing modifications to a single, shared, set of credentials.

NPTL includes a bodge to alleviate this contined kernel deficiency. Any of the various system calls to set UIDs and GIDs for a process are no longer simple wrappers on top of the actual kernel functionality. Instead, they send a signal to all of the threads in the process telling the thread that a "xid_command" is pending. Upon receipt of the signal, the threads invoke the actual kernel calls to set UIDs and GIDs.

This hidden signal mechanism caused problems with VMWare and several other applications in early 2005. For example: Initial implementations of the library didn't check whether the signaller was another thread in the same process, allowing processes to randomly crash other processes at whim simply by sending this signal without setting up the xid_command data structures that were supposed to accompany it.

The VMWare developers complained that the introduction of this bodge into the C library actually created a security hole in VMWare, whose binaries were set-UID to the superuser.
CPU time information continued to be stored directly in the "task" structure, and not shareable amongst tasks; and this continued to be a problem for the times() system call through kernel version 2.6.8 and NPTL version 0.60, in 2005.

As of 2010, CPU time information is still not shared amongst tasks. However, a single shared resource accounting structure for all of the tasks in a "thread group" (i.e. process) records cumulative CPU accounting information for terminated tasks and for child processes, and the kernel mechanisms to collect CPU time information are capable of scanning all of the tasks in the process and summing all of the times to produce the per-process total.

This still doesn't produce the behaviour mandated by the POSIX IEEE 1003.1:2004 definition of times(), however. The CPU time consumed by terminated threads within a process is erroneously accounted as CPU time consumed by child processes of that process. It should, per the standard, be reported as CPU time consumed by the process itself, not by its children.

© Copyright 1999–2003,2010 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.