Выбрать главу

B.5.3. Inter-Process Communications

An isolated process, whether a daemon or an interactive application, is rarely useful on its own, which is why there are several methods allowing separate processes to communicate together, either to exchange data or to control one another. The generic term referring to this is inter-process communication, or IPC for short.

The simplest IPC system is to use files. The process that wishes to send data writes it into a file (with a name known in advance), while the recipient only has to open the file and read its contents.

In the case where one does not wish to store data on disk, one can use a pipe, which is simply an object with two ends; bytes written in one end, are readable at the other. If the ends are controlled by separate processes, this leads to a simple and convenient inter-process communication channel. Pipes can be classified into two categories: named pipes, and anonymous pipes. A named pipe is represented by an entry on the filesystem (although the transmitted data is not stored there), so both processes can open it independently if the location of the named pipe is known beforehand. In cases where the communicating processes are related (for instance, a parent and its child process), the parent process can also create an anonymous pipe before forking, and the child inherits it. Both processes will then be able to exchange data through the pipe without needing the filesystem.

IN PRACTICE A concrete example

Let's describe in some detail what happens when a complex command (a pipeline) is run from a shell. We assume we have a bash process (the standard user shell on Debian), with pid 4374; into this shell, we type the command: ls | sort .

The shell first interprets the command typed in. In our case, it understands there are two programs (ls and sort), with a data stream flowing from one to the other (denoted by the | character, known as pipe). bash first creates an unnamed pipe (which initially exists only within the bash process itself).

Then the shell clones itself; this leads to a new bash process, with pid #4521 (pids are abstract numbers, and generally have no particular meaning). Process #4521 inherits the pipe, which means it is able to write in its “input” side; bash redirects its standard output stream to this pipe's input. Then it executes (and replaces itself with) the ls program, which lists the contents of the current directory. Since ls writes on its standard output, and this output has previously been redirected, the results are effectively sent into the pipe.

A similar operation happens for the second command: bash clones itself again, leading to a new bash process with pid #4522. Since it is also a child process of #4374, it also inherits the pipe; bash then connects its standard input to the pipe output, then executes (and replaces itself with) the sort command, which sorts its input and displays the results.

All the pieces of the puzzle are now set up: ls writes the list of files in the current directory into the pipe; sort reads this list, sorts it alphabetically, and displays the results. Processes numbers #4521 and #4522 then terminate, and #4374 (which was waiting for them during the operation), resumes control and displays the prompt to allow the user to type in a new command.

Not all inter-process communications are used to move data around though. In many situations, the only information that needs to be transmitted are control messages such as “pause execution” or “resume execution”. Unix (and Linux) provides a mechanism known as signals, through which a process can simply send a signal (chosen within a fixed list of a few tens of predefined signals) to another process. The only requirement is to know the pid of the target.

For more complex communications, there are also mechanisms allowing a process to open access, or share, part of its allocated memory to other processes. Memory then shared between them, allows moving data across.

Finally, network connections can also help processes communicate; these processes can even be running on different computers, possibly thousands of kilometers apart.

It is quite standard for a typical Unix-like system to make use of all these mechanisms to various degrees.

B.5.4. Libraries

Function libraries play a crucial role in a Unix-like operating system. They are not proper programs, since they cannot be executed on their own, but collections of code fragments that can be used by standard programs. Among the common libraries, the most noteworthy include:

the standard C library (glibc), which contains basic functions such as ones to open files or network connections, and others facilitating interactions with the kernel;

graphical toolkits, Gtk+ and Qt, allowing many programs to reuse the graphical objects they provide;

the libpng library, that allows loading, interpreting and saving images in the PNG format.

Thanks to those libraries, applications can reuse existing code. Their development is thus correspondingly simplified, in particular when many applications reuse the same functions. Since libraries are often developed by different persons, the global development of the system is closer to Unix's historical philosophy.

CULTURE The Unix Way: one thing at a time

One of the fundamental concepts that underlies the Unix family of operating systems is that each tool should only do one thing, and do it well; applications can then reuse these tools to build more advanced logic on top. This Way can be seen in many incarnations. Shell scripts may be the best example: they assemple complex sequences of very simple tools (such as grep, wc, sort, uniq and so on). Another implementation of this philosophy can be seen in code libraries: the libpng library allows reading and writing PNG images, with different options and in different ways, but it does only that; no question of including functions that display or edit images.

Moreover, these libraries are often referred to as “shared libraries”, since the kernel is able to only load them into memory once, even if several processes use the same library at the same time. This allows saving memory, when compared with the opposite (hypothetical) situation where the code for a library would be loaded as many times as there are processes using it.