A thread is a unit of CPU utilization, which comprises of a thread ID, program counter, a set of registers and a stack. Its code section, data section and other process specific resources such as open files and signals are shared with other threads in the same process. When a process has multiple threads of control, it can perform many tasks at the same time.
Reasons for implementing multithreaded applications
Modern desktop software is often multithreaded, since they have to handle many tasks at the same time. A browser, for example, will download content from the internet using one thread, while another thread renders the content. A word processor might have separate threads to render the interface, read from user input and perform spell checking and similar operations.
Web applications will get thousands of clients simultaneously requesting similar services. To create separate processes to serve all of them is a resource consuming and wasteful task (since all the resulting processes will be duplicating their code and most of their data). A much more efficient way to handle this situation is to have a single process that has multiple threads, each serving a different client, and a separate thread to listen to client requests.
RPC mechanisms (and Java RMIs) also typically have multithreaded servers that allocate a separate thread to service every message that it receives.
Most modern operating systems are also multithreaded. Solaris creates a set of kernel threads that handle interrupts, while Linux manages free memory using a kernel thread.
Responsiveness is improved, since even if parts of a program are involved in a blocked or lengthy operation, the other parts can still interact with the user.
Threads are able to Share Resources, since they share address space. This allows them to cut down on unnecessary duplication of code and data.
Economy: Creating and context switching between processes is a slow and costly operation. But since threads share resources and can context switch without kernel intervention (in the case of user-level threads), threads are far more economical to create and run than processes.
Multithreaded systems can also better utilize multiprocessor architectures, where threads can be run in parallel on different processors.
It should be kept in mind that there are two levels of threads: user threads and kernel threads. Kernel threads are supported and managed by the kernel, while User threads are invisible to the kernel, are managed by user-level thread libraries, and need to be mapped onto kernel level threads. There are three ways of doing this:
- Many-to-One Model: Many user-level threads are mapped onto one kernel thread. Since thread management is done by the thread library in user space, it’s efficient, but if one user thread makes a blocking call, the entire kernel thread will block. Also, since only one thread can access the kernel at a time, multiple threads cannot run in parallel on multiprocessor systems. Examples are Solaris Green threads and GNU Portable Threads.
- One-to-One Model: Each user thread is mapped to a kernel thread. While this solves many of the issues with the many-to-one model, creating and managing kernel threads can cause a lot of overhead, and hence systems restrict the number of threads it can support at a time. Linux and most modern Windows systems support this model.
- Many-to-Many Model: This model overcomes shortcomings in both previous models, by mapping all user-level threads to an equal or lesser amount of kernel threads. This allows developers to reap the benefits of multiple kernel threads, without having to worry about creating more kernel threads than the system can support. A variation on this, called the two-level model maps many user-levels threads onto a lesser or equal amount of kernel threads, and also allows programmers to decide which user threads to bind to a kernel thread. This model is supported by versions of Solaris before version 9, IRIX, HP-UX and Tru64 UNIX.
A thread library is the API for creating and managing threads. This could either be a kernel-level library implemented by the OS, which involves system calls to the kernel and associated overhead; or a user-level library implemented in user space with no kernel support, where calls to the library are treated as local function calls in user space (and are hence more lighteweight). Three commonly used thread libraries are:
- Pthreads: The thread creation and synchronization API defined in the POSIX standard for UNIX systems. Since Pthreads is just a specification, different OSes have implemented it in different ways. It may be implemented as either a user-level or kernel-level library. All Pthreads have a thread ID and a list of thread attributes.
- Wind32 Threads: This is the kernel-level thread library for Windows systems. Values passed to the CreateThread() function include security information and the stack size.
- Java Threads: Threads are an integral part of the Java language and hence the API provides a rich set of features for creating and managing threads. All Java programs consist of at least one thread (the main thread). Creating additional threads can be done either using the run() method or by using the Runnable interface. While Pthreads and Win32 use global variables to share common data between threads, Java (as a pure Object Oriented language) puts shared data in an object, the reference to which is then passed to the appropriate threads. Since the JVM usually runs on top of a host OS, Java threads are usually implemented internally using Pthreads or Win32 threads (but these details are hidden from the Java programmer).
- Semantics of fork() and exec() system calls: When fork() is called in a multithreaded program, the issue arises whether the new process created duplicates all threads, or just the thread that called fork(). If an exec() call follows the fork() call, duplicating all the threads in the process is a wasteful task, since they will all get overwritten when exec() runs. In that case, only the thread that called fork() should be duplicated. In other cases, all the threads must be duplicated. For this reason, some UNIX systems have chosen to have two fork() system calls that behave in these two different ways.
- Thread cancellation: For several reasons, a thread (called the ‘target thread’) might need to be terminated before it has finished execution. Asynchronous cancellation involves a target thread immediately being terminated. This can result in several issues such as the target thread being in the middle of updating shared data (which can result in corruption of data) and the issue of resources that have been allocated to the thread (which might not be properly reclaimed if the target thread doesn’t free that data itself). As a solution to this, we use Deferred cancellation which involves every thread periodically checking if it needs to be cancelled. Usually, a thread only checks if it needs to be cancelled when it’s safe to do so (e.g. it’s not in the middle of updating data). Pthreads calls such points cancellation points.
- Signal handling: Signals can be divided into two types, synchronous and asynchronous. Synchronous signals are delivered to the same process that caused the action that generated it (such as illegal memory operations). When a process receives a signal produced by an external event, it’s called an asynchronous signal.
Signals are handled by either a default signal handler or a user-defined one. In a multithreaded process, the thread that recieves the signal is determined by the OS and the type of signal generated. Synchronous signals should be delivered to the thread causing it. But asynchronous signals may need to be delivered to all threads, or a special signal handling thread. UNIX allows threads to specify which type of signals it will accept, and block all others. Therefore asynchronous signals may be delivered to the first thread found not blocking it. Pthreads also allows you to use pthread_kill() to specify to which thread you need to deliver the signal to.
- Thread pools: In a multithreaded web server, seperate threads are used to serve different requests. But the creation of a thread takes some time, and the thread is discarded after use. Also, when the number of requests goes up, the system could create more threads than its resources can handle. One solution to this is using a thread pool. A set of threads are created at process startup and kept in a pool. When a request arrives, a thread is assigned to service it – and is placed back in the pool once it has finished. If no free threads are available, the server waits for one to become free. The thread pool solves the issues previously mentioned, since it’s faster to use an existing thread than to create one anew, and it also caps the maximum amount of threads that can operate at any given time.
- Thread-specific data: While sharing data between threads is one of the main advantages of multithreaded programming, certain applications might need threads to have a seperate copy of certain data. Most thread libraries provide support for this functionality.
- Scheduler Activations: Many systems place an intermediate data stucture between the user and kernel level threads, called a lightweight process (LWP). The LWP appers as a virtual processor on which applications can schedule user level threads to run. Each LWP is attatched to one kernel thread, and the kernel threads get scheduled to run on the physical processor by the OS. If the kernel thread blocks, the LWP attached to it and any user level threads running on the LWP will also block. Applications might require more than one LWPs to run efficiently.
Scheduler activation is when the kernel provides an application with a set of LWPs, and the application can schedule threads on it. The process of the kernel informing an application about certain events is called an upcall. Upcalls are handled by special upcall handlers, which must run on a virtual processer. One instance where an upcall is triggered is when an application thread is about to block. The kernel sends an upcall identifying the specific thread and allocates a new virtual processor to the application. The application then runs the upcall handler on the new virtual processor, and it saves the state of the blocking thread and gives the virtual processor on which it was running back to the OS. The upcall handler then schedules another eligible thread to run on the virtual processor it’s running on. Similarly, when the event the blocked thread was waiting for occurs, the kernel will make another upcall and notify the application that the blocked thread is eligible to run. It may then either allocate a new virtual processor or preemt one of the running threads to allocate the its own processor for the upcall handler to run on.