Rambles around computer science

Diverting trains of thought, wasting precious time

Mon, 11 Sep 2023

A process by any other name

On Linux, what is a process's name? I've found this to be more complicated than I thought it would be. Clearly it's non-trivial because it seems top and ps can disagree about a process's name. As usual I have Firefox and some of its worker processes cheerfully burning some CPU for no apparent reason, so let's use one of those.

top - 14:53:48 up 5 days,  3:20, 28 users,  load average: 0.47, 0.82, 0.74
Tasks: 629 total,   1 running, 543 sleeping,   0 stopped,   4 zombie
%Cpu(s):  1.9 us,  0.8 sy,  0.0 ni, 97.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  16064256 total, 15800628 used,   263628 free,    55752 buffers
KiB Swap:  9291780 total,  6922152 used,  2369628 free.  5019028 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                      
 3172 stephen   20   0 2493792 145184  91804 S   6.0  0.9   0:07.80 Isolated Web Co                                              
 1685 stephen   20   0 2693232 264100  95080 S   3.6  1.6   0:28.86 Isolated Web Co                                              
19803 stephen   20   0 13.552g 6.042g 2.603g S   2.6 39.4  37:22.35 firefox-esr                                                  
12266 stephen   20   0 2653868 670136 582816 S   1.7  4.2 103:31.80 Xorg                                                         

Here it looks like process 3172 has a name ‘Isolated Web Co’. But this isn't the name that shows up in ps xf.

$ ps xf | grep [3]172
 3172 pts/2    Sl     0:09  |   |               \_ /usr/lib/firefox-esr/firefox-esr -contentproc -childID 218 -isForBrowser -prefsLen 39352 -prefMapSize 227223 -jsInitLen 277276 -parentBuildID 20230403141754 -appDir /usr/lib/firefox-esr/browser 19803 true tab

Nor is it what is visible in /proc/pid/cmdline.

$ cat /proc/3172/cmdline | tr '\0' '\n'

In fact we can get top to flip between these views by presssing the ‘c’ key.

       -c  :Command-line/Program-name toggle
            Starts  top  with  the last remembered 'c' state reversed.  Thus, if top was dis?
            playing command lines, now that field will show program names,  and  visa  versa.
            See the 'c' interactive command for additional information.

So the process name is not the same as its argv[0], and in our case it is the string ‘Isolated Web Co’. I'd like to know where this string is stored. It seems this is called the “program name”, at least by the top manual page. Later in the manual page it talks about “command names” as the same thing as “program names” and observes that even a process without a command line, like kthreadd, has a program name. So where does it live? The /proc filesystem does know about these names.

$ cat /proc/3172/status | head
Name:   Isolated Web Co
Umask:  0022
State:  S (sleeping)

In libbsd there is a setproctitle() call. It says it affects the name used by ps, which for us was just the command line, not the other name. We can request that ps show this name, although this seems not to be the default.

$ ps --pid=3172 -o comm
Isolated Web Co

And indeed a simpler way to get it to show this is ps -a (the minus is significant!).

$ ps -a | grep [3]172
 3172 pts/2    00:00:47 Isolated Web Co

This actually checks out. The use of - (dash) in ps options signifies “Unix options” and not BSD options, whereas it is the BSD setproctitle() call that claims to change the name used by ps. The name used by a “BSD options” invocation of ps, such as ps a, is indeed just the command, not the program name. So it looks like on BSD, historically there was just one “program name” and it was stored at argv[0], but somehow a separate “program name” has also arrived in Linux.

Glancing at the implementation of setproctitle() in libbsd, it seems more-or-less to overwrite the contents of the buffer pointed to by argv[0]. This is a behaviour I consider broken and had previously reported as a bug in Chromium, since it may overwrite existing command-line arguments besides the name (when overwriting the argv[0] string with something longer) and/or erase the framing of them. The latter is, or was, the problem in Chromium's implementation, which concatenates all arguments, joined by a space, into one big new argv[0] string. This renders ambiguous the case of spaces within arguments. (It's also unnecessary, since the goal of the exercise is presumably to update argv[0] while leaving later arguments untouched. Although I haven't tried it, if the new argv[0] string doesn't fit in place, can presumably just be stored elsewhere and argv[0] updated to point to it. In extremis, as we'll see shortly, it is possible for the entire argvenviron structure to be rebuilt and re-declared to the kernel... although this requires permissions that an ordinary process typically does not have.)

The source code of the BSD function's reimplementation in Chromium reveals the intricacies of this apparently simple operation on Linux. Linux has overhauled its handling of process names, introducing some bugs that Chromium needed to work around between kernel versions 4.18 and 5.2. The latest major kernel commit affecting this explains more about how things are supposed to work. It splits the logic into two cases: in get_mm_proctitle, with the comment...

    If the user used setproctitle(), we just get the string from
    user space at arg_start, and limit it to a maximum of one page.

... and in get_mm_cmdline with the following confusing comment:

    We allow setproctitle() to overwrite the argument
    strings, and overflow past the original end. But
    only when it overflows into the environment area
    .. and limit it to a maximum of one page of slop.

But all this seems to suggest that setproctitle() will change both the “command line” and the “process title”. Yet what we're seeing is that the two exist independently: one says Isolated Web Co, the other firefox-esr .... What is going on?

A big of digging led me to discover that there is a 16-byte field comm in struct task_struct roughly here. Unlike argv[0], this exists entirely on the kernel side. The comm field turns out to be the “process title” that top is (optionally) displaying—but actually it's a thread title, since there is one per schedulable task.

Use of strace reveals that top itself reads it from the /proc filesystem. Could this by the only interface on Linux that exposes the “process title” as distinct from the command line? Not quite! There is also prctl(PR_GET_NAME). (Interestingly, the PR_SET_MM family of prctl() calls also let you update the various kernel-side pointers that mark the bounds of the argv and envp vectors and their backing character data. You can also modify the position of brk, and so on. And one can even change the whole lot in one shot by passing a struct prctl_mm_map, although only if we were built with CONFIG_CHECKPOINT_RESTORE. A process needs to have CAP_SYS_RESOURCE if it is to be allowed to perform any of these updates. This makes sense: being able to completely re-plumb your process's command line and environment like this is a feature rarely needed, but would be needed in a “user-space exec()” or image replacement of the kind a checkpoint-restore system might perform. Meanwhile, tweaking the argv bounds could be seen as modifying the stack size, which might need resource-override permissions. (I find the latter a bit tenuous, since it is really just setting two kernel-side pointers, and they only usefully point into memory that still has to be allocated by the calling process.)

Going back to the mysterious Linux kernel distinction between get_mm_proctitle and get_mm_cmdline, confusingly, get_mm_proctitle still has nothing to do with the thread name (the 16-character thing), which top called the “process name”, which a reasonable person might think is the same as the “process title”. It's just an alternative semantics for reading from the argv area: instead of limiting the read to the argument strings according to their original bounds, it allows overflow into the environment area. In both cases it's only reading the command line, never the 16-byte comm string. It exists only to make allowances for how a setproctitle() call might clobber the environment strings, thereby tacitly redefining a part of that region as part of the argument area.

One might also wonder why on detecting this clobbering, Linux doesn't take the opportunity to move the process's env_start and arg_end pointers, as if the necessary prctl() calls had been made as discussed above. That might be considered “too clever” I suppose, and risk creating further changes observable to certain user-space programs that might assume the status quo.

All this has consequences for a user-space runtime that wants to understand what is stored at the top of the initial stack, in the argument strings area. When a BSD-style setproctitle() happens, the effective boundary of the argument strings may change. So, remind me to patch liballocs to override setproctitle() with something that does the necessary extra bookkeeping. That probably means tracking separately the kernel's arg_end and “the place the kernel will claim is the end of argv[0]”. And of course for setproctitle() calls that don't overflow into the environment strings, we might still want to track when they've trashed argument strings.

Just to make things more confusing, the implementation of setproctitle() in libbsd does not seem to overwrite the argument strings in all cases. I can successfully set the process title to a longer string without it spilling over into later arguments. I could dig into this, but my guess it it's using the “separate storage” trick I mentioned as an option for Chromium, above... i.e.  it's no longer doing the overwriting that the kernel's heuristic approach is hacking around.

Of course, after finding all this out I found a nice blog post by Tycho Andersen covering basically the same material.

[/devel] permanent link contact

Powered by blosxom

validate this page