# KEKculator

Table of Contents

Intro

This is my first blog post so I thought I would keep it easily digestable :)

For this years rendition of SieberrCTF 2025, I wrote a bytevm reverse engineering/pwn challenge called “KEKculator” with the intention of allowing newer CTF players an introduction to bytevm reversing.
The challenge was used in the qualifying round, and got a total of 1 solve by @azzazo
Additionally, it was my first time writing any sort of bytevm/re/pwn challenge + I’m an re and pwn noob and thus I figured it would be a good opportunity to practice some coding.

I thought that I would approach this writeup from the perspective of creating the challenge first, then present how I envisioned someone with minimal RE experience solving it.
Hopefully this allows for better clarity 😁

Challenge creation

Architecture

The byteVM acts like a 32-bit cpu, with a bytecode convention as follows

opcode (4 bytes)arg1 (4 bytes)arg2 (4 bytes)arg3 (4 bytes)

All the values are big endian and will always be present.

Eg. if the opcode only requires arg1 and arg2, arg3 must still be provided and will simply be ignored.

The “ram”/memory is stored in a simple python list with the following structure
[
- 0:1000 -> stack
- 1001: 1001+len(code) -> code
]

The “stack” grows downward as per common convention
Additionally, the reasoning for placing the stack on top of the code is to allow for the pwn aspect of this challenge(as you’ll see later)

Furthermore, there exists a bunch of standardish registers:
esp, ebp, eip, eax(used for arithmetic calls) ,edx, ecx, ebx, esi, edi

Opcodes

There are a total of 18 opcodes implemented in the byteVM, here’s a quick list and a brief description of them
opcodes:

  • 0x00 -> halt (arg1 = exit_code, arg2 = message, arg3 = 0)
  • 0x01 -> add (arg1 = dest(register), arg2 = src1, arg3 = src2)
  • 0x02 -> sub (arg1 = dest(register), arg2 = src1, arg3 = src2)
  • 0x03 -> mul (arg1 = dest(register), arg2 = src1, arg3 = src2)
  • 0x04 -> div (arg1 = dest(register), arg2 = src1, arg3 = src2)
  • 0x05 -> test (arg1 = src1, arg2 = src2, arg3 = 0) -> clears the flags of register tes
  • 0x06 -> jeq (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps if the eq is set (tested equality)
  • 0x07 -> jne (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps if the neq flag is not set (tested inequality)
  • 0x08 -> jgt (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps if the greater than flag is set (tested greater than)
  • 0x09 -> jlt (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps if the less than flag is set (tested less than)
  • 0x0a -> jz (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps if the zero flag is set (tested equality)
  • 0x0b -> jnz (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps if the zero flag is not set (tested inequality)
  • 0x0c -> jmp (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps unconditionally
  • 0x0d -> push (arg1 = dest(register), arg2 = value)
  • 0x0e -> pop (arg1 = dest(register), arg2 = 0)
  • 0x0f -> store (arg1 = src(register), arg2 - memory addr) -> stores arg1(register) into arg2(memory (stores unconditionally)
  • 0xff -> nop (arg1 = 0, arg2 = 0, arg3 = 0) -> no operation
  • 0xdd -> syscall (arg1(0=print, 1=read), arg2(resgiter), arg3(register))

Code structure

The full source code can be viewed here
PS: I have done my best to make it readable by adding comments, but I am not too great at writing readable code 😛
However, in the interest of keeping this digestable, I shall only provide a layout of the code.

A bunch of helper functions are defined, and individual functions are defined for each opcode.
The path to a program is passed into the class init function, which then initializes the registers in a dictionary and the memory list

Lastly, a router function is used to get the code at the current EIP(Instruction pointer), fill the arg1,2,3 registers with the correct data, call the function and corresponds to the opcode
(you may notice that I used a bunch of if loops instead of a match case switch. This is because most bytecode decompilers only support python versions that do not have the match case switch, and I wanted to introduce the participants to python bytecode reversing, but making it trivial to reverse so as to maintain the crux of the challenge)

Writing a vulneriable program

The vulnerable program that is ran serverside is a calculator program that loads the value of 1 into a register, takes in an instruction and an operand, and performs the corresponding instruction using the current value of the register and the provided operand, then saves the resultant value into the register.

When the “done” instruction is received, it stores the value of the register into the stack and prints the bytes of the value.

I was going to write the bytecode by hand initially, but due to time constraints I decided on writing an assembler to speed up this process.

assembler.py
with open('exploit.asm') as f:
asm = f.readlines()
asm = [line.strip() for line in asm if line[0] != '#' and line != '\n' ]
bytecode = b''
opcode_to_byte = {
'halt': 0x00,
'add': 0x01,
'sub': 0x02,
'mul': 0x03,
'div': 0x04,
'test': 0x05,
'jeq': 0x06,
'jne': 0x07,
'jgt': 0x08,
'jlt': 0x09,
'jz': 0x0a,
'jnz': 0x0b,
'jmp': 0x0c,
'push': 0x0d,
'pop': 0x0e,
'store': 0x0f,
'syscall': 0xdd,
'nop': 0xff
}
def arg_to_bytes(arg):
if arg[:2] == '0x':
arg = int(arg, 16)
else:
arg = int(arg.encode().hex(),16)
return arg
def pad_bytes(arg):
return arg.to_bytes(4, 'big')
for line in asm:
opcode, arg1, arg2, arg3 = line.split(' ')
opcode = opcode_to_byte[opcode]
arg1, arg2, arg3 = arg_to_bytes(arg1), arg_to_bytes(arg2), arg_to_bytes(arg3)
bytecode += pad_bytes(opcode) + pad_bytes(arg1) + pad_bytes(arg2) + pad_bytes(arg3)
with open('exploit.bin', 'wb') as f:
f.write(bytecode)

Here is the code for the simple assembler that I wrote.

With this, I created the following assembly:

program.asm
# simple program to write a string to the stack (starts from 0:1000) and then print it out
# we need to first print out the string "Welcome to KEKulator PRO!"
add edx Welc 0x0
push edx 0x0 0x0
add edx 0x6f6d6520 0x0
push edx 0x0 0x0
add edx 0x746f204b 0x0
push edx 0x0 0x0
add edx 0x454b756c 0x0
push edx 0x0 0x0
add edx 0x61746f72 0x0
push edx 0x0 0x0
add edx 0x2050524f 0x0
push edx 0x0 0x0
add edx 0x21000000 0x0
push edx 0x0 0x0
add edx 0x0 0x0
syscall 0x0 edx 0x0
# print "Your starting number: 1!"
add edx Your 0x0
push edx 0x0 0x0
add edx 0x20737461 0x0
push edx 0x0 0x0
add edx 0x7274696e 0x0
push edx 0x0 0x0
add edx 0x67206e75 0x0
push edx 0x0 0x0
add edx 0x6d626572 0x0
push edx 0x0 0x0
add edx 0x3a203121 0x0
push edx 0x0 0x0
add edx 0x0 0x0
push edx 0x0 0x0
add edx 0x1c 0x0
syscall 0x0 edx 0x0
# print "This is a blackbox so I won't tell you what to do...teehee"
add edx 0x54686973 0x0
push edx 0x0 0x0
add edx 0x20697320 0x0
push edx 0x0 0x0
add edx 0x6120626c 0x0
push edx 0x0 0x0
add edx 0x61636b62 0x0
push edx 0x0 0x0
add edx 0x6f782073 0x0
push edx 0x0 0x0
add edx 0x6f204920 0x0
push edx 0x0 0x0
add edx 0x776f6e27 0x0
push edx 0x0 0x0
add edx 0x74207465 0x0
push edx 0x0 0x0
add edx 0x6c6c2079 0x0
push edx 0x0 0x0
add edx 0x6f752077 0x0
push edx 0x0 0x0
add edx 0x68617420 0x0
push edx 0x0 0x0
add edx 0x746f2064 0x0
push edx 0x0 0x0
add edx 0x6f2e2e2e 0x0
push edx 0x0 0x0
add edx 0x74656568 0x0
push edx 0x0 0x0
add edx 0x65650000 0x0
push edx 0x0 0x0
add edx 0x38 0x0
syscall 0x0 edx 0x0
# push 1 to ecx
add ecx 0x0 0x1
# we need to accept the arithmetic instruction(add, sub, mul, div) and a number to operate on eax
# call read to read 4 bytes into edx (the instruction to run)
syscall 0x1 edx 0x0
# test for "add "
add ebx 0x61646420 0x0
test ebx edx 0x0
# if equal, jump to arithmetic function
jeq 0x8f8 0x0 0x0
# test for "sub "
add ebx 0x73756220 0x0
test ebx edx 0x0
# if equal, jump to arithmetic function
jeq 0x928 0x0 0x0
# test for "mul "
add ebx 0x6d756c20 0x0
test ebx edx 0x0
# if equal, jump to arithmetic function
jeq 0x958 0x0 0x0
# test for "div "
add ebx 0x64697620 0x0
test ebx edx 0x0
# if equal, jump to arithmetic function
jeq 0x988 0x0 0x0
# test for "done"
add ebx 0x646f6e65 0x0
test ebx edx 0x0
# if equal, jump to done function
jeq 0x9b8 0x0 0x0
# add function
# call read to read 4 bytes into edx (the value to add)
syscall 0x1 edx 0x0
add ecx ecx edx
# jump back to input function
jmp 0x7f8 0x0 0x0
# sub function
# call read to read 4 bytes into edx(the value to sub)
syscall 0x1 edx 0x0
sub ecx ecx edx
# jump back to input function
jmp 0x7f8 0x0 0x0
# mul function
# call read to read 4 bytes into edx(the value to mul)
syscall 0x1 edx 0x0
mul ecx ecx edx
# jump back to input function
jmp 0x7f8 0x0 0x0
# div function
# call read to read 4 bytes into edx(the value to div)
syscall 0x1 edx 0x0
div ecx ecx edx
# jump back to input function
jmp 0x7f8 0x0 0x0
# done function
# store the value of ecx onto the stack
store ecx 0x74 0x0
# print the value that was stored
add ecx 0x0 0x74
syscall 0x0 ecx 0x0
# halt the program
halt 0x0 0x0 0x0

PS: The keen eyed among you might have noticed that I (definitely intentionally) left out functionality to do something like mov eax 0x10.
Therefore, in order to “move” a value to a register, one must do the following add eax 0x0 <value> or similar.

The exploit

As there is a current writeup competition going on, I shall make this section purposefully vague for the time being till the deadline has been reached. Additionally, I will not include the section on how I believe a beginner contestant can solve the challenge till then.

The vulnerability of the program comes with the fact that there is no bounds check on the value in the ecx register.
Additionally, the store instruction stores a value of arbitrary size into the stack.
Therefore, if we can execute a sequence of mathematical operations such that when the store instruction is called, we can write a huge value onto the stack that

  1. Contains some shellcode that reads the flag file and prints it (how convenient that the string “flag” is only 4 bytes!)
  2. Overwrites the code @ the current eip to jump to the start of this shellcode
    We can print out the flag and win!

I was initially left wondering on the best way to do the above, but a friend wisely suggested a loop of bitshifting 4 bytes, then adding the value that corresponds to 4 bytes that you want and so forth.
This works really well. However, we cannot bitshift 4 bytes at a time as we would require to multiply by 0x0100000000, which is a little more than the 4 bytes we are allowed.
Therefore, to keep things simple, I opted to bitshift by 3 bytes at a time. The big endianess makes this really uncomplicated and easy.

To do this, we simply multiply by 0x01000000: 0x12340x01000000=0x12340000000x1234 \cdot 0x01000000 = 0x1234000000
Then, we add the value of the 3 bytes we want. We can do this for our entire payload and then call the “done” instruction and win.

In the interests of saving operations however, I opted to spam a few mul, 0xffffffff first to fill up space with “random” bytes that we won’t care about.

We first create a shellcode snippet that can be used to read and print the flag:

exploit.asm
# add "flag" to ecx
add ecx flag 0x0
# push to stack
push ecx 0x0 0x0
# flag is now at addr 0x74
add ecx 0x0 0x74
# call open syscall to read value into
syscall 0x2 ecx ebx
# store the read value from ebx to eax
store ebx 0x100 0x0
add ebx 0x100 0x0
# call print syscall to print the value out
syscall 0x0 ebx 0x0
halt 0x0 0x0 0x0

Our payload(theoretically) will be constructed like so:

0x8c4 bytesshellcodejmp instruction to top of shellcode
0x74-0x9380x938-0x9b80x9b8-0x9c8

Which we can then use this rather clunky solve script to solve

solve.py
<redacted for now>

Revisiting the challenge from the perspective of a participant

Recon

Analysis and decompilation of the bytevm

Exploit creation

Solve script

My avatar

Hopefully this was an interesting read! Feel free to check out my other posts or contact me via the social links in the footer.


More Posts