Linux process states

PROCESS STATE CODES
   R  running or runnable (on run queue)
   D  uninterruptible sleep (usually IO)
   S  interruptible sleep (waiting for an event to complete)
   Z  defunct/zombie, terminated but not reaped by its parent
   T  stopped, either by a job control signal or because
      it is being traced
A process starts its life in an R "running" state and finishes after its parent reaps it from the Z "zombie" state.


SIGSTOP, SIGCONT

When you press CTRL+z, under the hood kernel terminal driver sends a SIGSTOP signal to foreground processes. Similarly, on bg / fg bash sends a SIGCONT signal. The manual page signal(7) describes the signals

SIGCHLD and waitpid()

Whenever a child process changes its state - either gets stopped, continues or exits - two things happen to the parent process:
  • it gets a SIGCHLD signal
  • a blocking waitpid(2) (or wait) call may return

Zombies

A zombie process is a process that exited successfully, but its state change wasn't yet acknowledged by the parent. That is, the parent didn't call wait() / waitpid() functions.
The Z "zombie" process state is required in order to give a parent time to ask the kernel about the resources used by the deceased child, using getrusage(2). A parent informs a kernel that it's done with the child by calling waitpid.
Most often the parent doesn't really care about child process resources or exit status. In such case a common way to avoid zombies to install a SIGCHLDhandler and call waitpid within it. Unfortunately, as defined by Unix, many individual signals sent to the same process with the same signal number may be coalesced into one. A single call to SIGCHLD signal handler might actually be triggered by more than one child state change. Therefore if you have more than one child process you may need to run waitpid in a loop to reap zombies, like this:
static void sigchld_handler(int sig) {
    int status;
    int pid;
    while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
        // `pid` exited with `status`
    }
}
Alternatively, to totally avoid zombies, one can explicitly set SIGCHLD signal handler to SIG_IGN or use SA_NOCLDWAIT flag for sigaction (see NOTES inwaitpid(2)):
signal(SIGCHLD, SIG_IGN);

Wrapping up process states

Process states form an interesting mechanism that is basically creating a synchronous communication channel between a parent and a child process.
For example - if a child changes its state to "stopped" state, a parent can wait for that using waitpid. Later it can order a child to continue by sending SIGCONT.
This mechanism is not flawless - if a child goes into "stopped" state and quickly receives SIGCONT, the parent will receive SIGCHLD, but waitpid may miss the state change.

Back to ptrace

"Stopping" and "continuing" is the mechanism used by ptrace to controll a debugged process. First, on initialisation ptrace causes the current (debugging) process to temporarily become a parent of a debugged process (let's call it "adoption"). As a parent it will be notified about child process state changes. Next, various ptrace flags inform the kernel to put the child into "stopped" state when particular debugging events occur. When such an event is triggered the parent receives SIGCHLD, can retrieve child status via waitpidand has a chance to inspect the stopped child. When it's done, it puts child back into "running" state.
We'll see how to use ptrace to do this in the next part of this tutorial.
The way ptrace works is a huge abuse of the original Unix process model, but in practice it seems to work quite well. However this mechanism is not very efficient due to the high overhead of constant context switches between the parent and the child.

Ptrace and security

In the beginning I quoted:
ptrace is a nasty, complex part of the kernel which has a long history of problems
Indeed, a number of serious security issues were found in kernel ptracecode. This is so noticeable that Ubuntu decided to disable the ability to run a ptrace against unrelated processes by an untrusted user. You can see if your Linux has that restriction enabled by looking at the output of sysctl command:
$ /sbin/sysctl kernel.yama.ptrace_scope
kernel.yama.ptrace_scope = 1
With this restriction in place untrusted ptrace can be only be run against the parent's genuine children, "adoption" is not possible any more without administrator rights.
 




Comments

Popular posts from this blog

HAproxy logging

tomcat catalina coyote jasper cluster

NFS mount add in fstab _netdev instead of default | firewall-cmd --list-all