0CTF 2024: Lonely Christmas Time Reversing

Back to home

Writeups by: xlr8or

The 2024 edition of 0CTF was running from Sat, 21 Dec. 2024, 02:00 UTC — Mon, 23 Dec. 2024, 02:00 UTC.
With end of semester exams out of the way and session exams still far away (although much closer than it seems), I decided to take a look.
No team I am part of participated, therefore I gave it a shot solo, achieving rank 21/187 solving 3/5 reversing challenges and the survey (of course).
Event on CTFtime

Table of Contents

  1. EzLogic
  2. EzLUTs
  3. how2link
  4. Closing thoughts

EzLogic - 141 points - 68 solves

In this challenge we got a nice readme that points us to some helpful resources regarding the challenge material.
The core idea is that we are presented with a logic circuit that checks the value of the flag that we input, and if the output is 1, then the flag is correct, otherwise it is not.
The logic circuit is given both as a schematic PDF and also as verilog files. Verilog is a hardware description language it deals with lower level stuff such as clocks, wires, signals and other logic gate elements.

To solve the challenge I have used iverilog and gtkwave, as recommended by the guiding readme file.
If we just follow the instructions in the readme file we can actually simulate the circuit and get the output of the flag checker. It will be of course an invalid flag as currently we only have place holder values.
In the code for the main module we can see the condition based on which we get different output:

if (success) begin
    $display("Great! You've found the correct flag!");
end
else begin
    $display("Haha, try again!");
end
                

success is a wire that can carry a value of 0 or 1 which is then used to decide if the flag was correct or not.
Let's look at how it is computed a bit below in the file:
wire [0:8*N-1] data_std = 'h30789d5692f2fe23bb2c5d9e16406653b6cb217c952998ce17b7143788d949952680b4bce4c30a96c753;
assign success = (data_std == data_out_all);
                
Here we can see it is comparing data_out_all with a fixed hex value and if the value completely matches, then we have guessed the correct flag.
data_out_all is constructed by gathering the input of all bits produced by the entire circuit, which is placed on the data_out wire.
Another important observation here, is that the flag bytes are supplied one by one to the circuit to check, and then the resulting values from the check are stored one by one into data_out - as I wrote above - , on each tick of the clock

always @(posedge clk) begin
    if (start == 1) begin
        if (counter < N) begin
            counter <= counter + 1;
            data_in <= flag_test_arr[counter];
            valid_in <= 1;
        end
        else begin
            data_in <= 0;
            valid_in <= 0;
            start <= 0;
        end
    end
end
                

Because the check value is computed per byte and the value for the given byte is not updated after we moved past the byte it is not hard to bruteforce the flag value.
In essence we know from the main module that we have 42 flag bytes, and we know the prefix = 0ops{ and suffix = }, so there are just 36 bytes that are not known. Furthermore we know that each of those bytes must be in the printable ASCII value range, so 0x20 - 0x7f.
This leaves us with around 3420 values to try in total, because we can guess for each flag byte separately from the others.

I wrote some python to solve the following tasks:


The vcd file is keeping track of changes in the values of the variables that the circuit has.
I studied the file format on Wikipedia, but essentially each variable gets assigned an identifier at the start of the file and then whenever the value of that variable changes the file will contain a line with the new value suffixed by the identifier of the variable.

You can download the solve script to see the specific implementation and some more comments with explanations.

EzLUTs - 392 points - 18 solves

This is the next step in the verilog reversing challenge line!
This time we also get a main module verilog file and another one with the implementation of the actual checking circuit.
To simulate the circuit we can reuse the behavioral models supplied for the previous challenge, but the .N(N) input passing in the main module to the checking circuit seems to be causing a compile issue, therefore it should be removed to have a working simulation, although I solved this challenge without using iverilog simulation.
The goal is still the same, given some checking circuit, we have output 1 or 0 depending on whether the flag is correct or not. We need to supply the correct flag for the checker to signal 1.

This time the main module is very simple and we only have a single success signal that comes from the circuit implementation itself.
Checking the circuit implementation, we can see this is a more complicated circuit compared to the one in the previous challenge, with around 16K lines of verilog defining it!
A good approach here is to follow how the success signal will get its value and how we can ensure it is 1.

OBUF success_OBUF_inst
    (.I(success_OBUF),
    .O(success));
                
This just means that we will get the success signal from a buffer named success_OBUF, so let's keep tracking where the signal is coming from.

LUT6 #(
    .INIT(64'h8000000000000000)) 
    success_OBUF_inst_i_1
       (.I0(success_OBUF_inst_i_2_n_0),
        .I1(success_OBUF_inst_i_3_n_0),
        .I2(success_OBUF_inst_i_4_n_0),
        .I3(success_OBUF_inst_i_5_n_0),
        .I4(success_OBUF_inst_i_6_n_0),
        .I5(success_OBUF_inst_i_7_n_0),
        .O(success_OBUF));
                
Just on the lines below we find a circuit element referring to the buffer we are interested in. We can see this is a LUT6 element and that it has an INIT value, some inputs and one output.
As I didn't know what this is, I read the helpful manual referenced in the first challenge

To get a full picture, please read the manual as well, but here are the most important takeaways:

Now that we understand how LUTs work we know all the building blocks of the circuit and can being to find higher meaning.
You can essentially see the problem as a tree, where the root node is the success variable, and every input that influences it (such as in the above case the inputs of the LUT6) will become a child of the node. Then to figure out which variables the children depend on, we recursively check if any LUTs have them as the output variable.
As such we recursively construct the tree of dependency of the output values, at the leaves of the tree will be our variables based on the flag that we are trying to guess.
Going from the leaves, if we know their value, then we can compute the value of their parent nodes, based on the logic that the LUTs encode. Doing this for all nodes (going from bottom up), we can figure out the value of the root node, to know if the flag is correct or not.
However of course, we don't know the values of the flag, rather we have a known function, a desired output and unknown inputs. We want to find a value for the flag that satisfies the constraint that the output should be 1, given that we know how the flag bytes map to the output value.
Z3 can be used to solve such a constraint satisfaction problem and to get values for the flag bytes which result in output 1.

The simple initial model using Z3 is as follows:

To create logic functions from the LUTs, that Z3 knows, we can think of LUTs as truth tables that describe a custom function on the given inputs.
Any truth table can be expressed in terms of its inputs in DNF, using only and; or; not logic operations.
The basic idea is that if the input values have any of the valid configurations, then the output should be one, in other terms: output = CONFIG1 or CONFIG2 or CONFIG3 .... A configuration is just stating which input variables are one and which are zero and this can be represented as follows: CONFIG1 = BIT1 IS 0 and BIT2 IS 1 and BIT3 IS 1 .... Then BIT2 IS 1 = BIT2, the value of the bit itself, and checking for zero is: BIT1 IS 0 = not BIT1, in other words, the negation of the bit should be 1.

This models takes a long time to run though, since we have a lot of variables, and we generate potentially large logic functions per intermediate variable.
For example imagine that an LUT6 implements a simple function, such as: BIT0 should be 0, and we don't care about the values of the other inputs. This is a really simple logic function of just writing not BIT0, however think about how the approach I described above would generate 32 configurations in which BIT0 is 0 and perform a logical and on all of those configurations, not to mention all configurations contain a logical or amongst all 6 input values.
In short: we can't reduce the number of variables, but we should look if we can simplify the functions that define them.

Let's take a look at LUT2 for example.
The circuit contains 119 LUT2 modules and all of them have the same init value of 6.
A LUT2 has 2 variables and 6 in binary is 0110
This means that configurations where both inputs are 1 and where both inputs are 0 will result in a negative result, and otherwise we have a positive output if they are different. Does this remind you of a simple logic function? Hint: xor.

Therefore based on this finding we can replace all LUT2 functions with a single xor operation on the 2 input values.
Similarly we can look at other LUTs and see that LUT3 and LUT4 also only do xor, but on 3 and 4 inputs respectively.
LUT5 has 3 different init values one of them is just an xor on the 5 inputs, the other 2 are checking only for one specific configuration of the 5 inputs values.
For the cases where only one specific configuration is set (i.e the init mask has a single bit which is 1 and the rest is 0) we can still use the old method.

LUT6 is a bit more tricky, as it has 24 potential init values.
I will not list all of the simplifications here, but check out the solver script for a list of all of them and some explanations.
In general they are as follows:

After implementing all the simplifications for the logic functions and running the same script, I managed to get a flag within a reasonable time.
Z3 took around 18 minutes (cpu: i7-8700) to find the correct flag.

You can download the solve script to see the specific implementation and some more comments with explanations.

In this challenge we get the whole setup that is on the remote and a dockerfile to build the environment.
The main driver is service.py which does the following:

Interestingly the service include the secret.py file, which contains the flag, but never prints it or writes it to any stream, even if all the linking succeeds.

Furthermore we get 2 object files, that we suppose (from the description) should be linked together to win.
Checking the object files we see only simple hello world prints and they exit right after, therefore I was suspicious, that linking the 2 files might not be the way to the solution, given that nothing will print the flag for us that is already on the system.
As an early conclusion I thought (correctly) that we would need to inject something custom into the resulting binary that is then ran to read out the flag for us.
Since our point of contact is the not_linker file, we should start investigating there!

We see that the first argument (our file) gets opened, but if the argument has value "debug" then there are other options to interact with the binary.

if (param_1 == 2) {
    __s1 = (char *)param_2[1];
    iVar1 = strcmp(__s1,"debug");
    if (iVar1 == 0) {
      // truncated....
    }
    else {
      fp = fopen(__s1,"rb");
      if (fp == (FILE *)0x0) {
        uVar2 = main.cold();
        return uVar2;
      }
    }
                
Next we see that some files are being loaded, namely the 2 object files that we are given, but also we are opening the file which will then be used by ld to generate the final binary.
regs[0] = (char *)open_bfd("a.o");
regs[1] = (char *)open_bfd("b.o");
regs[2] = (char *)bfd_openw("out.o",**(undefined8 **)(regs[0] + 8));
                
Note:You need to set the type of regs to be an array of pointers to get this slightly nicer decompilation.
Here we can already see the main topic for this challenge, which is BFD, through the open_bfd and bfd_openw functions.
As I didn't know what BFD was, I thought it was some challenge related functions, but turns out it is a library to interact with binary files of various formats.
To learn more I have turned to the friendly manual, which contains some basic information, examples, documentation of methods and structs.
This was perhaps the most useful resource in understanding the functions offered by the library, but another source of truth is the source itself.

The main function continues (and abruptly ends) with the following lines:
do {
  local_6c[0] = 0;
  if (debug == 0) {
    fread(local_6c,1,1,fp);
  }
  else {
    printf("< ");
    __isoc23_scanf("%d",local_6c);
  }
} while (0x23 < (byte)local_6c[0]);
                
Seemingly this just runs through the input while it is above 0x23 and exits otherwise.
This is a case of the Ghidra decompiler messing up, inspecting the disassembly there is actually a jump table, which is indexed by the byte that is read from the file, then control flow jumps back to the processing loop.
0012e83a e8 21 f9        CALL       <EXTERNAL>::fread
         ff ff
                     LAB_0012e83f                                    XREF[1]:     0012f283(j)  
0012e83f 8b 44 24 2c     MOV        EAX,dword ptr [RSP + local_6c]
0012e843 3c 23           CMP        AL,0x23
0012e845 77 c9           JA         LAB_0012e810
0012e847 0f b6 c0        MOVZX      EAX,AL
0012e84a 48 63 04 83     MOVSXD     RAX,dword ptr [RBX + RAX*0x4]=>INT_ARRAY_00244
0012e84e 48 01 d8        ADD        RAX,RBX
0012e851 ff e0           JMP        RAX
                
As you can see if the we don't loop back at 0012e845, then the result of the read is used as an index into INT_ARRAY_00244 (again you need to retype the memory to be an array of integers of size 0x23+1, which we know from the bound check - to get the nicer decompiler output).
The index (RAX) is then added to the base of the array where we got it from (RBX) and then we jump to the target address we get as a result.

General VM Structure

The jump table looks as follows, more precisely it's more like an offset table, as it doesn't contain the destination of the jump, only the offset:

                                 INT_ARRAY_00244c34                              XREF[2]:     main:0012e7cb(*), 
                                                                              main:0012e84a(*)  
00244c34 8c a5 ee        int[36]
     ff 5c a5 
     ee ff 2c 
00244c34 [0]               FFEEA58Ch,    FFEEA55Ch,    FFEEA52Ch,    FFEEA4FCh,
00244c44 [4]               FFEEA4CCh,    FFEEA49Ch,    FFEEA46Ch,    FFEEA434h,
00244c54 [8]               FFEEA3ECh,    FFEEA39Ch,    FFEEA354h,    FFEEA30Ch,
00244c64 [12]              FFEEA2BCh,    FFEEA26Ch,    FFEEA224h,    FFEEA1CCh,
00244c74 [16]              FFEEA17Ch,    FFEEA13Ch,    FFEEA0BCh,    FFEEA03Ch,
00244c84 [20]              FFEE9FCCh,    FFEE9F6Ch,    FFEE9EF4h,    FFEE9EBCh,
00244c94 [24]              FFEE9E8Ch,    FFEE9E1Ch,    FFEE9DECh,    FFEE9DBCh,
00244ca4 [28]              FFEE9D8Ch,    FFEE9D4Ch,    FFEE9D1Ch,    FFEE9CE4h,
00244cb4 [32]              FFEE9CB4h,    FFEE9C6Ch,    FFEE9C24h,    FFEEA5C4h
                
We can see all of them have MSB FF, which means these are negative (32-bit integers - we know this from dword ptr and the move is with sign extend).
The base pointer is going to be the start of the array 0x00244c34 and we can now visit these locations in the disassembly view to see what is there.
We are going through the disassembly view, because Ghidra didn't pick up the jumptable in the first place and thus doesn't know these are destinations and have not even disassembled these parts of memory.
We can manually go through each offset and disassemble the bytes we find there.

We end up with 35 functions that can be called based on the input from our file that we are uploading.
I won't go into details about each of the functions here, I have uploaded some scripts and source code for these that details the ones we will end up using.
However let's look a bit at the general structure and some types of functions that we encounter (function is technically not correct, as there is not call/ret instruction rather jumps).

We can think of the file we upload as custom instructions that are going to be executed by the not_linker VM.
Each instruction will first start with an opcode between 0 and 35 (inclusive), that will determine the function to execute.
Our data storage is backed by registers, the regs array, to be specific, and looking at the functions we can determine that we can use 256 registers in total. The first 3 of these registers are pointers to struct bfd for the 3 object files that we are dealing with, namely a.out, b.out and out.o in this order.
Based on the opcode there could be a need to provide more information to the operation:

  1. Providing 8 bit constants directly to be used as values for an operation
  2. Providing 8 bit constants directly to be used as indices into the register array for an operation
  3. Providing at most 8 byte long, zero-terminated buffers to be used as values for an operation
A loose categorization of the different opcodes: There are some differences in the offsets that each opcode uses for memory access, or the way in which a specific BFD function is called, but this should cover the general structure.

Challenge approach

Okay, so based on the VM we know that we should construct the out.o object that can be passed to ld for linking and when execute should leak the flag.
To construct the object we can make use of specific BFD functions permitted by the VM, as well as some register transfer operations possibly writing to/reading from memory using pointers.
As base data we have everything that is in the 2 existing object files, that we can access using the VM's functions.

In short: we want to build a minimal object file, using BFD, that executes some shellcode for us.

As I am not familiar with the minimal steps required to achieve such an object file and have no past experience with BFD, I turned to Mr. GPT for help.
An almost working example got produced, I needed to fix some non-existent functions, constants and an include issue
You can look at the minimal C BFD example I had after fixing the GPT output, which generates a simple object file with a main symbol, and a text section with some data.
It can pass through the linker and executing it jumps to the shellcode that we have defined during generation, awesome!
The main steps to produce such a file that we need to replicate:

Implementation in the VM

Now that we know a bit about what we need to do with BFD to generate the desired object file, we can start to work on getting the VM to call the right functions.
Luckily there are some opcodes already which directly call BFD library functions (although it can be a bit tricky to get the proper input to them). Function we can use are:

Let's look at the challenges for the implementation by tackling each of the main BFD steps!

Setting architecture and machine

In the minimal example we have used bfd_default_set_arch_mach, however that is not available in the VM as a function.
We do however have bfd_set_arch_info which will also achieve the desired affect of setting the architecture and machine properly.
This function takes a pointer to an existing architecture info struct, which would be tricky to construct in the VM, however luckily we also have bfd_get_arch_info.
The idea is to steal the architecture info from an existing object and set it for our new object.
Opcodes 26 and 27 can be used to set and get the architecture respectively, and our job here is easy as the get function just gives us the pointer in a register of our choice!

Setting the format

The format should be set to bfd_object, which according to the source code has value 1.
Thankfully opcode 29 can be used to set the format, however it's taking the format value from a register and not as an immediate.
Here we face the first challenge of setting register values, since no trivial way exists to set a register to an arbitrary value.
However since, we specifically need the value of 1, we get away with it fairly easily this time, by using opcode 10.

uVar1 = getinput();
uVar2 = getinput();
uVar3 = getinput();
*(char **)(regs[uVar2 & 0xff] + 0x38) = regs[uVar3 & 0xff];
regs[uVar1 & 0xff] = (char *)0x1;
                
Here we can see the goal, the first referenced register will be set to value 1 always.
An additional annoyance will be that register 2 needs to have a valid pointer and that reg2 + 0x38 will get the value in reg3.
Although there are some ways to get valid pointers into register, we already start with 3 of them, because of the BFD structs.
An important principle I will abuse many times later is that you can freely overwrite BFD of the 2 existing objects, granted that you don't care about the information in them anymore (and we won't).
This means by using opcode 10 and supplying a BFD as reg2 we can set reg1 to 1, which we then in turn use to set the format of the BFD.

Creating the .text section

We start going into more difficult operations here!
Opcode 15 can be used to call the bfd_make_section_with_flags function, however we will need to resort to some tricks to pass values to it.

uVar1 = getinput();
uVar2 = getinput();
getbuf(&stack0x0000004e);
uVar3 = getinput();
pcVar4 = (char *)bfd_make_section_with_flags
                           (regs[uVar2 & 0xff],&stack0x0000004e,
                            *(undefined4 *)(regs + (uVar3 & 0xff)));
regs[uVar1 & 0xff] = pcVar4;
                
The first argument to the function should be the pointer to the BFD where the section should be created, this is of course reg2, our output BFD.
Next, it's nice for us that we can provide the name of the section as an immediate value without the need to come up with a pointer that has this exact text.
getbuf just read user input until the first null byte with max size of 8 bytes, and the result is used as the name of the section.
Note: Somewhat unfortunately a pointer to the stack is directly passed, which means if we change the value on the stack at that location, then the section name will also change. To get around this issue, we just make sure the last getbuf write we do is .text so it gets written back to its correct value.

The resulting section pointer gets stored in reg1, which we should remember later to pass this section pointer to functions.
The problem is with argument 3, where the lower 4 bytes of a register are used to set the flag.
As said above, there are no convenient ways to set registers to values, but the flag value we need here is 0x100 = SEC_HAS_CONTENTS so we can't rely on the previous trick of setting a register to one.

Now we introduce a slightly more general way to set register values, that involves an existing buffer, a bit of a noisy function to set values in that buffer, and some problems with null bytes.
Let's first look at opcode 22, which is important in creating pointers to heap with (mostly) arbitrary 8 bytes.

uVar2 = getinput();
uVar3 = getinput();
uVar4 = getinput();
iVar5 = getinput();
bStack0000000000000017 = (byte)iVar5;
getbuf(&stack0x0000004e);
pcVar1 = regs[uVar2 & 0xff];
*(char **)(pcVar1 + 0x20) = regs[uVar3 & 0xff];
*(uint *)(pcVar1 + 0x18) = uVar4 & 0xff;
*(ulong *)(pcVar1 + 0x10) = (ulong)bStack0000000000000017;
pcVar6 = strdup(&stack0x0000004e);
*(char **)(pcVar1 + 8) = pcVar6;
                
Although this opcode will be very important in its whole later when creating the symbol structure, let's only focus on the parts which are important to the current subtask.
The operation takes some register values and requires an existing buffer where it can write to.
As part of the input values getbuf is used, where we can supply 8 bytes up to the first null byte in our arbitrary input. The resulting string is then duplicated using strdup, which just gives you a heap pointer with the same content as the input pointer.
The pointer is then stored at an offset 8 from the existing buffer that is pointed to by input 1.

Now we have placed our custom 8 byte value on the heap and we have a pointer to it. To access this pointer we need to go offset 8 from another pointer, which is stored in a register.
But now we can make use of opcodes 32 and 1 to read the custom value into a register.
Opcode 32:
uVar1 = getinput();
uVar2 = getinput();
regs[uVar1 & 0xff] = *(char **)(regs[uVar2 & 0xff] + 8);
                
Will read the 8 byte value at pointer + 8 and store it in the destination register.
If we supply the same register as for the existing buffer in input 1 to opcode 22, then this will exactly store the heap pointer to our register.
This heap pointer has our custom 8 byte value, we only need to somehow read from it, which is possible thanks to opcode 1:
uVar1 = getinput();
uVar2 = getinput();
regs[uVar1 & 0xff] = *(char **)regs[uVar2 & 0xff];
                
This will read 8 bytes from the pointer that is contained in input 2 and stored the result in the register that input 1 references.
In effect, this will store our custom 8 byte value from the heap to a register of our choice!

To summarize: we can use opcode 22,32,1 sequence to place a custom 8 byte value in a custom register (we should be careful with null byte).
This technique of course messes up data in the existing buffer, but as we have discussed above, we can make the one of the first 2 BFD pointers.

Using the method above we make a register have the flag value 0x00010101, because the lower 4 bytes are used only.
Although a value of 0x100 exactly can be achieved as well, by writing multiple times to the getbuf buffer to place null bytes in desired location, I thought the larger value was easier during the contest.
Yes, this sets some additional flags, but they cause no problems in the big picture (debug and alloc flag set in addition).

Setting section size

Now that we have the section we should set a size for it.
As written during the intro, the size can be larger than the shellcode, but should be large enough to contain it.
Opcode 17 calls this function for us:

uVar2 = getinput();
uVar3 = getinput();
uVar4 = getinput();
bVar1 = bfd_set_section_size(regs[uVar3 & 0xff],regs[uVar4 & 0xff]);
regs[uVar2 & 0xff] = (char *)(ulong)bVar1;
                
We just need the section pointer (we have this from the previous step, in a register already).
The other value is the size of the section, which we could set using the previous technique to a custom value.
I want to note that during the contest I used a different way to get a proper size value, that involves a read from a pointer at offset 0x30 (another opcode exists for this as well).
By reading from reg0 (BFD for object a) at this offset, I would get 0xb0, which is large enough, but not too large.
I thought this way was easier in the moment of the contest, but of course there is a way to make the previous technique place null bytes in the higher bytes by doing multiple dummy writes.

Creating a symbol

Well, this step is going to be the most tricky out of all of them.
I'll try my best to explain it, but there will be a lot of pointer juggling going on with different opcodes and also using a given opcode for 2 different purposes.
If I manage to confuse you, just remember that you can look at the provided scripts, and run through the operations in a debugger, to see how/when/where certain pointers/registers get their value and how that will play into the bigger picture.

First let's consider the bulk of the values that are needed for the symbol table, which are relatively easy to set, then let's figure out the tricky detail.
For starters, let's take a look at how the symbol struct look like and what values we need to set.

typedef struct symbol_cache_entry
{
  struct _bfd *the_bfd;
  CONST char *name;
  symvalue value;
  flagword flags;
  struct sec *section;

  union
    {
      PTR p;
      bfd_vma i;
    } udata;

} asymbol;
                
Well the good news is that there's a cool opcode that can set all the values (in a nice way I might add), except the_bfd.

Opcode 22 is indeed almost as if was designed to inititalize the symbol structure:

uVar2 = getinput();
uVar3 = getinput();
uVar4 = getinput();
iVar5 = getinput();
bStack0000000000000017 = (byte)iVar5;
getbuf();
pcVar1 = regs[uVar2 & 0xff];
*(char **)(pcVar1 + 0x20) = regs[uVar3 & 0xff];
*(uint *)(pcVar1 + 0x18) = uVar4 & 0xff;
*(ulong *)(pcVar1 + 0x10) = (ulong)bStack0000000000000017;
pcVar6 = strdup(&stack0x0000004e);
*(char **)(pcVar1 + 8) = pcVar6;
                
The problem now is only where to place the struct, but as we have abused this many times now, let's just use an existing BFD.
This works well, except when the time comes later to write the symbol table with this symbol to the object file we will segfault.
The reason is that the first field of the struct, the_bfd is not actually set to our result BFD, and when it tries to use fields of the struct to index memory bad things happen.

In short we need to make sure that we have a pointer to our BFD at offset 0 of the symbol struct.
This is not that simple, as there are no opcodes which directly dereference a register and set the value at that location to the value taken from another register.
There are a couple of opcodes that do this (set value at a pointer), but they all do so at some offset from the given pointer that is stored in a register.
This would be okay, if there rest of the symbol initialization could take place after that given offset, however that would require us to have a register which contains the value pointer + offset, to be able to pass this as an existing buffer to opcode 22.

So there isn't (or at least I didn't find) a way to set this directly, therefore I found another multi step technique that builds the required structure.

The core of this idea relies on opcode 31. The idea will be to create are symbol structure at an offset from the usual BFD pointer, where we can place the proper BFD pointer (offset 0 from struct).

uVar1 = getinput();
uVar2 = getinput();
regs[uVar1 & 0xff] =
(char *)(*(long *)(regs[uVar2 & 0xff] + 0x10) +
*(long *)(*(long *)(regs[uVar2 & 0xff] + 0x20) + 0x30));
                
Input 1 is only used to store the results of the operation, the tricks happen in input 2.
The left side of the sum is just a value (8 bytes) at offset 0x10 from the pointer in input 2.
The right side of the sum comes from a double pointer dereference at different offsets.
The first one is at offset 0x20 from the same pointer as in the left side, and this should give us another pointer.
Then the resulting pointer is dereferenced at offset 0x30 to get an 8 byte value for the right side of the sum.

Given a proper structure setup for input 2, this opcode could be used to load sums of values into a register. Specifically we will use the left side to hold an offset and the right side to hold a pointer to a BFD to get an offset pointer from the BFD start.
The structure we setup will have the following layout:
0: don't care
8: don't care
10: offset
18: don't care
20: 0 // - 0 because it will point back to the start of the struct, which in this example is at address 0.
28: don't care
30: BFD base pointer
                
Convince yourself that the structure above combined with opcode 31 makes the register that input 1 refers to BFD base pointer + offset.
Opcode 22 can be used to set offset 0x10 to a custom value immediate and 0x20 to a custom pointer value stored in a register.
Opcode 9 will allow us to place a value from a register to a pointer + offset 0x30 (details not discussed here).

To put everything together, we will use opcode 10 to write our BFD base pointer (pointer to the correct BFD we are building) at offset 0x38 from one of the dummy BFD pointers.
Then use the above technique to get a pointer to dummy BFD+0x38.
We can then construct our symbol table normally as discussed first, by passing the new dummy BFD+0x38 pointer as the existing buffer.

All these tricks allow us to create the symbol and now let's take a break and set the symbol table.

Setting the symbol table

Thankfully this is really simple, thanks to opcode 21 being really helpful with the input handling.

uVar2 = getinput();
uVar3 = getinput();
uVar4 = getinput();
in_stack_00000038 = 0;
in_stack_00000030 = regs[uVar4 & 0xff];
bVar1 = bfd_set_symtab(regs[uVar3 & 0xff],&stack0x00000030,1);
regs[uVar2 & 0xff] = (char *)(ulong)bVar1;
                
We just need to pass the BFD pointer (BFD that we are building up), and the pointer to the symbol struct, that we constructed through hard word in the previous setup.
The opcode already takes care of making a table out of a single symbol, and passes the one to signal we only have a single symbol.

Loading the shellcode

Now all that we need to do is to load shellcode into the .text section and we are good to go.
The shellcode should load the secret.py file and then print it to stdout:

# "./secret.py" stack string load
mov $0x79702e, %rax
pushq %rax
movq $0x7465726365732f2e, %rax
pushq %rax

# x = open("./secret.py", 0)
movq %rsp, %rdi
xorq %rsi, %rsi
movq $2, %rax
syscall

# sendfile(1, x, 0, 128)
movq %rax, %rsi
movq $1, %rdi
xorq %rdx, %rdx
movq $128, %r10
movq $40, %rax
syscall
                
Simple shellcode to open the flag file and send it to stdout.

To actually populate the section with the shellcode we can use opcode 18:

iVar2 = getinput();
iVar3 = getinput();
bStack0000000000000017 = (byte)iVar3;
iVar3 = getinput();
uVar4 = getinput();
uVar5 = getinput();
uVar6 = getinput();
bVar1 = bfd_set_section_contents
    (regs[bStack0000000000000017],regs[(byte)iVar3],regs[uVar4 & 0xff],uVar5 & 0xff,
        uVar6 & 0xff);
regs[(byte)iVar2] = (char *)(ulong)bVar1;
                
The API here is nice we just need a pointer to the BFD and section which we have, we need a content pointer (which we can make), and offset and length are taken as immediates.
We can use our first trick with opcode 22 to get pointer to arbitrary 8 bytes (the original trick also dereferences the pointer to get a value into a register, but here having the pointer is the goal).
We can just iterate over all shellcode bytes, populate our arbitrary 8 byte buffer, then write to the section.
We can play around with the offset to not have to write null bytes one by one (i.e if we have consecutive null bytes, just skip them with the offset and continue writing from the next non-null byte).

Wrapping it all up

Now that we know what opcodes to call with what input, we can just convert our array of bytes to a hexstring as required by the remote.
Additionally there's the need to implement a PoW solver to bruteforce 4 alphanumeric values to get a matching sha256 hash.
During the contest I managed to submit my flag in probably the last minute, but really happy that I got a solution and I could implement the PoW solver fast enough and that it also solved fast enough.

You can download the solve script to see the specific implementation and some more comments with explanations.
You can download the PoW solver for a very simple PoW written under panic, rushing to have time to submit the flag.

Closing thoughts

To finish up, I had a great time playing in this CTF, and I will definitely try to attend next year, as much as it will be possible.
I would recommend participating in this contest, although of course this is a higher rated CTF, so beginners may find it difficult to approach this contest.
I enjoyed the reversing challenges, although the verilog/hardware focused ones were definitely outside of tasks I usually do, I welcomed the new challenge environment.
I didn't look at the web or crypto challenges, I hope they were enjoyable as well.
For misc I have tried numbers, but couldn't figure out to just google the number sequence, skill issue.
I briefly looked at the other 2 reversing challenges, but decided not to invest too much time.
EzRegs seemed like a lot to understand and wrap my head around in the remaining time, and I didn't find a way to quickly make progress on p-nodes.
As for pwn, I looked at ip management system, I found the leak and OOB write vulnerabilities, but during the contest I couldn't figure out an OOB write that would give me anything useful, so decided to pursue other challenges.