Description
The following files were given with the challenge
.
└── pyflagchecker
├── chall.flag
└── chall.py
The file chall.flag
contains some binary data, while the file chall.py
contains the following python script:
def main():
import marshal, inspect
c=b'\x87\xfcA@\xc7\xc4\xea\xf5\xa6\x87\x84\x02\xd6\x9e\x85\x93\xdeM\xa9\xe7\'\xf1\xaf\xe7\xc5\xde:\xff\xff\x95\x9a\xce\x05v\xd21\xce~\xa5\xb6\x19KI\xafd\xf2\xb5D\x1d\xa9:<7\x97<\xc8\xfd\x02\x9fK\x9a\x14\xb3\xc8\xb82\x90\x1a7\x140j\xffw
\xe6\xb1s\xcd|s\xfd\x99\xf4b\xe1Z\x9e\x86\xf1%q\x05\xf0\xfb\xd2\xf0G\xd5x\xc0A\xac\xc4\xe5\xca~}2\x8e.\x9d6\xbe\xe2\x9a\xa0u=\xe0\x98\xaa\xc1xl\x1aI~T\xec\xfe\x95F\xbbS1)\x7f\x01\xdam\xb7>\x91\xf0\xb9-"1\xa1V\xf4\x18{3\'\x8b\x16\xeb\x92%\r
<\xbe\xe3\xda\xcc\xc1\x03\xae\xf7h\xe6\xd7\xa8h\xe4\x1dM\xf53.-o\xd7\xb4\x1b\xdd\x1f\x7fG\xeb[o\xbab\x1fZF\xa1\xbd\x81\xe5\xb3)\xdf\'\x01\xdb\xe0j\xd6\xf6\x8b+\x1b\xee\x97\xcb\x9c\xf7\x03\x00p\xdb\xe89}\xc8\xd6\xd2Tu\x85y\x08\xfa\xe2\x98\x
d2\xfa\n\x00v\xd4\x0cr\x02ew\x99\xe6\xe2\x1b\x8b\xbd]\x08\xb2\x07\xdfg\xc2\xf3\xd32\xe65\xf1\xa4\x02:\xbe\xf6\xa4\xea\xe1\xc9 \xa4E\x9c\xe3\xf95\x81\xbdd\xb3&\xf7%\x8a\x05\x18\x1a\xdc\x00\x08\xd2\x95\xd8\x06\xc7\xe4\xfaI\xe5\x80K\x99\xfa\x
d9\x8a\xc5d\x18\x03\xc6\xd4\x13m4\xf1ts\x866\xe0\xae8$\xb0z\x85\x0bU\xfdC\xe6\xc2\xef:Ra\xd2\x07h\xdf\x9b\xaa\xd8\xa3\xc1\xee}\x0b\xd0\x97\xb7\x11U\x98\x89E\x88\xfd\xa8\x85\x84Ez@\x92U\x8a6\xcb\x18\xb7\xe0$\xfe\xa6\x0f]\xbd\x05p\xc4\xa8"\x
c0w\xd7I\xfd\x84\xe8\xb5\xa7\xc9A\x01\tV\xbb\xaa@HTiK\xab)W\xb3\xdf\x0e\xb0\xd8\x93Oh`\xf7[b\xbaB\x8f\xe2\x9d\tpXA\x11\x04\xa2N\x9e;\x07J\x9c@"\x90^9\xdc\x10\x87`9\xbe\xfc|\x04\xb6"\x95\xd7l\xf4\x07;\xfb\x8a\xf3\xc4\xd6\xa2T\xcc\x18v~y8RE\
xca,Q\xf5\xb5\xed\xd6,6\xf1i%\x92\xc3\x83\xda\xdc-\x0fWe&/I\x04\x1e\xf4Y\x14]\xb3e\x97\x84\xe1\x922\xd0\x96e\xc2\x161\xebtr\xe8\xa3\xea(\x18!\x83\xf9\x0b+\xc1\x01\x1f\xcec\x1f\x91\xf2\xd8f\xbav\xf9\t\xd7\xab\xd4\x84\x10L\x95\xe7\xf5\xcf\x1
5\xdf\x9d\xad\xfa\xac\x9d\xcbJ\x86\x14^P\n\xbc\xbd\x1f\xbb\xaen\xe4\xd0q\xc0\xd8\xb3\x97\xdc\x92P\xaa\xe4j\x813~\xd0_q\x88y\xff[\x00\xaa@\x90\x87\x905\xb0\xc3\r\x91\t\x9f>\xdd\x17\x19\xe1.\x8eR\x06/\x99\x1b\xff\x8a\x95qY\xa3h\x10\xcau\x0c\
x0b\xb8\xb5\x13\xf6\xde\x06\x9c\xb6\xffS\x819\x8e8\'\x1e\x0fker\xbcD\xc99\x9d\xda\x8b\xbd\x0e\x9cbF?\xe8,\x17w\x8d\xafk\xee\'\x7f\x9b\x07\x91{I\xd0\xbbi$\xf8\xad\xed\xcf\x83\xb8\xee\xbe\x8aM\xa3Ea\xe5\xe3s\xaf\xe5\xcc:<!\xee\x03\x96[\xf6\x
de\xe0\xfa\xb0}\xfa\xdb\x95\x9eio\xba\xc8Oo\xc4^_\x12\x88V\xe5\x0c\x01\x07\x87{\xfc\x93,h\x94\xc1\x80\x96\xac6\xdf\x98+M\xd6\xd0\xcf\x15\xd6>H\x11)\x140\xf7\x12t[\x1a\xa7\xcc\x80v\x18<\xb7\xafOa;o\xf4\x86K\xcf\xe6\x87\xaa\xaaLfk\xc7w \x7fN
k\xc0\x9f\xc8b\x1aFo\x80\x07 YRa\x96\xa6\xbcH\x84)SK\xda\xaeX\xbd\x82\x8d6\x11U6d\x91\xb101I\x17\xe0\xe7\xf0\xcc\xd7\x1a@t\xe4\\\x06]\x19\x975\xa2(\xc3\x13\xe3^my\xbe<\xe0\x05\xb8Cue\x9c\x18o3\xe9\xd4-\x10\x14\x9ea\xad\xcb\xd2\xee\xaa\xa0>
re\x8a\xb9\x1d\xba\xd2\xb81[\xe7\xd19\xf5\x12\xfe~\xf4\xc5)\x15\xe0o\x01%?#\x94\xde\xa9\xc3g(WDN9\xd8\xeb\xb4\xef8?\xf6\x18\xbdu\x16\x8c"\xbe?c\xb7P\xd1V\x8d\x14\x94\xb8\xaa\xac,=\x81g\xd7G\xac\xf6\x84\xe8\x90\x7f\xce\x1crz@\x9e\xaf\xed\xa
1\xb2@\x91\x8c\x89\xe6\xa8\xa5\x9c|\x1cq\xb0|1\x06\xe6\x87\xfcn3\x80\xc2\xb0\x8c\x0cI"\x1c48yC\xe2.\x7f\x7f"k\xbb\xae\xf5a/\xfa\x9f\n\x01\x00\xc7\xb9\xb0C\xff\xb6Y)\xf5a)5`\xf9iqx\xe03Z\xf0\x0bW\x0e\xa7\t\xb4\x98J\xf6\xe2\xa4\xaa\x1eG\xeeD
\xe1\xadD\xbe\xfb\r/\x97v\x86j~\xbe\x9d\xf4F\xc0:$\x8a\x06n\x12`sE\xd4\xdc\xe5i\x1a$^\xd4\x06r\x87-bj9Ik[\ru\x8f\xb3q\xb9Y\xc1\xa6\xe8\xb3\xd8)'
def e(m, k):
r = []
r += [m[0]^(sum(k)&0o377)]
for _, i in enumerate(m[1:]):
k = e(k, [m[_]])
r += [i^(sum(k)&0o377)]
return bytes(r)
print("This is a simple python flag checking service.")
flag = input("Please give the flag to check: ")
k=[(a*b)&0xff for a,b in zip(map(sum,inspect.getsource(main).encode().splitlines()),map(len,inspect.getsource(main).encode().splitlines()))]
try: exec(marshal.loads(e(c, k)))
except: print("That is not the flag")
if __name__ == "__main__":
main()
Analysis And Solution
Based on what we’re given we can assume that the file chall.flag
contains the encrypted flag, the script checks if it’s correct based on the file chall.flag
Running the script we and testing it:
➜ pyflagchecker $ python chall.py
This is a simple python flag checking service.
Please give the flag to check: uoftctf{test_flag}
That is not the flag
Stage 1: Unraveling the First Thread
The main script employs a fascinating encryption function e()
that uses the very structure of its own code as part of the decryption key. Talk about self-referential security! Here’s the clever bit:
k=[(a*b)&0xff for a,b in zip(map(sum,inspect.getsource(main).encode().splitlines()),map(len,inspect.getsource(main).encode().splitlines()))]
This is particularly devious because any attempt to modify the function changes the key itself - a classic example of anti-tampering protection.
Stage 2: Outsmarting the Key Generation
Instead of fighting against the function’s self-protective nature, we decided to work around it. By moving the key calculation outside the main function, we could capture its output without disturbing its delicate internal structure:
if __name__ == "__main__":
import inspect
k=[(a*b)&0xff for a,b in zip(map(sum,inspect.getsource(main).encode().splitlines()),map(len,inspect.getsource(main).encode().splitlines()))]
print(k)
main()
This revealed our first prize - the encryption key:
k = [117, 167, 0, 48, 34, 58, 159, 220, 195, 70, 136, 149, 48, 125, 238]
Stage 3: The Marshal’s Gambit
We can see that the function e
is using the value of k
(which we found earlier) to get some marshal data, that is executed later on, so let’s inspect this decrypted code and see if it’ll led us to the flag.
To do this, let’s print the decrypted bytes:
def main():
import marshal, inspect
c= b'XXXX' # for simplicity
def e(m, k):
r = []
r += [m[0]^(sum(k)&0o377)]
for _, i in enumerate(m[1:]):
k = e(k, [m[_]])
r += [i^(sum(k)&0o377)]
return bytes(r)
print("This is a simple python flag checking service.")
flag = input("Please give the flag to check: ")
k = [117, 167, 0, 48, 34, 58, 159, 220, 195, 70, 136, 149, 48, 125, 238]
marshal.loads(e(c, k))
print(e(c, k))
Just when we thought we were making progress, the marshal data threw us a curveball:
ValueError: bad marshal data (unknown type code)
Fun Fact
If you run the script with a Python version other than 3.10.12, even when inputting the correct flag it will say that it’s wrong, because the marshal de-serialization fails when the Python versions are different.
Turns out the challenge creators added another twist by relying on a version-specific marshal format. Basically, the way Python serializes and deserializes code objects can change between versions, and these folks used Python 3.10.12, which means if you try to load that data in a different version of Python, it’ll just throw errors in your face. Why? Because each Python version can have tweaks to how code objects are structured—stuff like new opcodes, changes in metadata, or even optimizations in how data is stored. It’s like trying to play a new video game with an old console—just won’t work! So, the only way to get past this was to grab Python 3.10.12’s source code, build it myself, and dive in with the exact environment they used. Let’s roll!
We can download the source from here:
https://www.python.org/downloads/release/python-31012/
Following the instructions we can build it, and then we’ll create a new virtual environment where we can work:
configure
make
./python -m venv .venv
source .venv/bin/activate
Got the custom Python 3.10.12 environment set up and ready. Ran the script again, and this time it gave me the decrypted bytes. Perfect!
The decrypted output was a mix of gibberish and some recognizable patterns—always a good sign. Here’s a snippet of the data dump:
b'c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\t\x00\x00\x00@\x00\x00\x00sJ\x01\x00\x00e\x00\x83\x00Z\x01e\x02e\x01\x83\x01Z\x03e\x04j\x05Z\x06e\x07d\x00d\x01\x84\x00e\x03\xa0\x08\xa1\x00D\x00\x83\x01\x83\x01\xa0\t\xa1
\x00Z\nd\x02d\x03\x84\x00Z\x0bd\x04d\x05l\x0cZ\x0ce\x0be\x0be\n\x83\x01\x83\x01Z\ne\ne\x07e\x03d\x06\x19\x00\x83\x01\xa0\t\xa1\x00e\x03d\x07\x19\x00\x17\x007\x00Z\ne\x0be\n\x83\x01e\x06e\x0bj\rj\x0e\x83\x01\x17\x00e\x0c\xa0\x0f\xa1\x00d\x0
8\x19\x00\xa0\t\xa1\x00\x04\x00Z\x10\x17\x00Z\ne\x0be\n\x83\x01e\x06e\x11j\x12j\rj\x0e\x83\x01\x17\x00e\x06e\x03d\x03\x19\x00j\rj\x0e\x83\x01\x17\x00Z\ne\x0be\x0be\x0be\x0be\x0be\n\x83\x01e\x0be\x06e\x13e\x14\x83\x01\xa0\x15\xa1\x00\x83\x0
1\x83\x01\x17\x00\x83\x01\x83\x01\x83\x01\x83\x01Z\ne\x13d\td\n\x83\x02Z\x0bd\x0bZ\x01\t\x00e\x0b\xa0\x15d\r\xa1\x01Z\x06e\x16\xa0\x17e\x06d\x0e\xa1\x02\x04\x00Z\x06d\x04k\x02r\x88n\x14e\x01e\x03d\x03\x19\x00e\x0b\xa0\x15e\x06\xa1\x01e\n\x
83\x027\x00Z\x01e\x03d\x03\x19\x00e\ne\n\x83\x02Z\nqxe\x18e\x04\xa0\x19e\x01\xa1\x01\x83\x01\x01\x00d\x05S\x00)\x0fc\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x04\x00\x00\x00C\x00\x00\x00s\x14\x00\x00\x00g\x00|\x00]\x
06}\x01t\x00|\x01\x83\x01\x91\x02q\x02S\x00\xa9\x00)\x01\xda\x03dir)\x02\xda\x02.0\xda\x01_r\x00\x00\x00\x00r\x00\x00\x00\x00\xfa;Nice Progress! I wonder if there is a better way to do this\xda\n<listcomp>\x04\x00\x00\x00s\x02\x00\x00\x00\
x14\x00r\x05\x00\x00\x00c\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x07\x00\x00\x00C\x00\x00\x00s(\x00\x00\x00t\x00d\x01d\x02\x84\x00t\x01|\x00|\x00t\x02|\x00\x83\x01d\x03\x1a\x00d\x00\x85\x02\x19\x00\x83\x02D\x00\x83
\x01\x83\x01S\x00)\x04Nc\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00S\x00\x00\x00s\x18\x00\x00\x00g\x00|\x00]\x08\\\x02}\x01}\x02|\x01|\x02A\x00\x91\x02q\x02S\x00r\x00\x00\x00\x00r\x00\x00\x00\x00)\x03r\
x02\x00\x00\x00\xda\x01a\xda\x01br\x00\x00\x00\x00r\x00\x00\x00\x00r\x04\x00\x00\x00r\x05\x00\x00\x00\x07\x00\x00\x00s\x02\x00\x00\x00\x18\x00z\x15e.<locals>.<listcomp>\xe9\x02\x00\x00\x00)\x03\xda\x05bytes\xda\x03zip\xda\x03len)\x01\xda\x
01mr\x00\x00\x00\x00r\x00\x00\x00\x00r\x04\x00\x00\x00\xda\x01e\x06\x00\x00\x00s\x02\x00\x00\x00(\x01r\r\x00\x00\x00\xe9\x00\x00\x00\x00N\xda\x07marshal\xda\x01c\xe9\xff\xff\xff\xffz\nchall.flag\xda\x02rb\xf3\x00\x00\x00\x00T\xe9\x03\x00\x
00\x00\xda\x03big)\x1a\xda\x06locals\xda\x01I\xda\x04dict\xda\x01lr\x0f\x00\x00\x00\xda\x05dumps\xda\x01i\xda\x03str\xda\x06values\xda\x06encode\xda\x02l2r\r\x00\x00\x00\xda\ttraceback\xda\x08__code__\xda\x07co_code\xda\x0cformat_stack\xda
\x02tb\xda\x07inspect\xda\tgetsource\xda\x04open\xda\x08__file__\xda\x04read\xda\x03int\xda\nfrom_bytes\xda\x04exec\xda\x05loadsr\x00\x00\x00\x00r\x00\x00\x00\x00r\x00\x00\x00\x00r\x04\x00\x00\x00\xda\x08<module>\x01\x00\x00\x00s*\x00\x00\
x00\x06\x00\x08\x01\x06\x01\x1a\x01\x08\x02\x08\x02\x0c\x01\x1c\x01(\x01&\x01,\x01\n\x02\x04\x01\x02\x01\n\x01\x14\x01\x02\x01\x18\x01\x0e\x01\x02\xfb\x12\x06\x1d\x00'
To make sense of this mess, I turned the decrypted bytes into a .pyc
file. Python’s marshal
module totally came through for me here. I whipped up a quick script to load the bytes, recompile them, and save them as a shiny new .pyc
file:
>>> import marshal
>>> code = marshal.loads(b'<THE BYTES FROM EARLIER>')
>>> import importlib
>>> pyc_data = importlib._bootstrap_external._code_to_timestamp_pyc(code)
>>> with open("code.pyc", 'wb') as pyc_file:
... pyc_file.write(pyc_data)
Next, I threw the .pyc
file into a couple of decompilers. First, I tried pycdc
, but honestly, something felt off. The output was a hot mess and barely made sense. So, I switched to PyLingual.io, and bam! It spit out a way cleaner version of the code. Definitely good enough to keep going.
# Decompiled with PyLingual (https://pylingual.io)
# Internal filename: Nice Progress! I wonder if there is a better way to do this
# Bytecode version: 3.10.0rc2 (3439)
# Source timestamp: 1970-01-01 00:00:00 UTC (0)
I = locals()
l = dict(I)
i = marshal.dumps
l2 = str([dir(_) for _ in l.values()]).encode()
def e(m):
return bytes([a ^ b for a, b in zip(m, m[len(m) // 2:])])
import traceback
l2 = e(e(l2))
l2 += str(l['marshal']).encode() + l['c']
l2 = e(l2) + i(e.__code__.co_code) + (tb := traceback.format_stack()[-1].encode())
l2 = e(l2) + i(inspect.getsource.__code__.co_code) + i(l['e'].__code__.co_code)
l2 = e(e(e(e(e(l2) + e(i(open(__file__).read()))))))
e = open('chall.flag', 'rb')
I = b''
while True:
i = e.read(3)
if (i := int.from_bytes(i, 'big')) == 0:
break
I += l['e'](e.read(i), l2)
l2 = l['e'](l2, l2)
exec(marshal.loads(I))
At this point, it’s pretty clear what’s happening. The script is set up to decrypt yet another layer of data pulled from chall.flag
, just like the first script did. But here’s the kicker: the new code is just as sneaky as the last—it’s completely self-referential. The decryption process relies on the script’s exact structure, meaning if you so much as breathe on the source code, the decryption will break. So, how do we peek at what’s going on without messing it all up? The only real option left is to go straight to the heart of it: modify Python itself. Yes, we’re talking about tinkering with the interpreter at its core.
Stage 4: Going Nuclear - Modifying Python Itself
After realizing we needed to peek at Python’s variables during execution, I dug around the Python source code. First stop was Python’s documentation (https://docs.python.org/3/c-api/object.html) where I found a neat function called PyObject_Print
- it’s basically Python’s C equivalent of the print()
function we use in Python scripts.
Next up was finding where exec
actually lives. If you poke around the Python source code (specifically in the Python-3.10.12 directory), there’s a file called Python/bltinmodule.c
. This is where all the built-in functions like exec
, eval
, etc. are implemented - pretty much Python’s core functionality.
Inside bltinmodule.c
, we find this function:
builtin_exec_impl(PyObject *module, PyObject *source, PyObject *globals,
PyObject *locals)
This is Python’s internal implementation of the exec()
function we use in Python. It takes 4 parameters:
module
: Where the code is runningsource
: The actual code being executedglobals
andlocals
: These are dictionaries that store all the variables - exactly what we want to see!
So I added our debug prints right after the parameter validation code:
builtin_exec_impl(PyObject *module, PyObject *source, PyObject *globals,
PyObject *locals)
{
PyObject *v;
// ... parameter validation ...
if (!PyMapping_Check(locals)) {
PyErr_Format(PyExc_TypeError,
"locals must be a mapping or None, not %.100s",
Py_TYPE(locals)->tp_name);
return NULL;
}
// ...
printf("---------- Begin Globals and Locals --------------\n\n\n");
PyObject_Print(globals, stdout, 0);
PyObject_Print(locals, stdout, 0);
printf("---------- End Globals and Locals --------------\n\n\n");
// more code ...
After making these changes, we need to rebuild Python to make them take effect:
make
rm -rf .venv
./python -m venv .venv
source .venv/bin/activate
When we run our script now, every time it calls exec()
, Python will dump out all the variables in scope at that moment - including our decrypted code before it gets executed! It’s like adding a temporary print statement, but at the C level where our Python code can’t detect it.
This worked great because our decrypted code has to exist as a Python object somewhere in memory before it can be executed, and globals
/locals
are where Python stores these objects. By printing them during exec()
, we catch our decrypted code in that brief moment between decryption and execution.
Stage 5: The Final Boss - Pickled Flag
When I ran the first challenge script, my terminal basically exploded with messages, dumping out all the local and global variables. It was a chaotic mess, but buried in there was the golden nugget: a variable named I
holding the code object we were after. I grabbed that code object, turned it into a .pyc
file, and ran it through PyLingual.io for decompilation. What came out was a massive Python script—most of it not too interesting—but then I found the gem: the logic that actually checks if the flag is correct. This was the real meat of the challenge!
if flag.encode() == bytes(loads(b'(I117\nI111\nI102\nI116\nI99\nI116\nI102\nI123\nI109\nI48\nI100\nI49\nI102\nI121\nI49\nI110\nI54\nI95\nI55\nI104\nI51\nI95\nI53\nI48\nI117\nI114\nI99\nI51\nI95\nI48\nI102\nI95\nI112\nI121\nI55\nI104\nI48\nI110\nI95\nI49\nI53\nI95\nI102\nI117\nI110\nI125\nt.')):
print('Congratz!! Now submit that flag')
One final piece of Python magic to unpickle our prize:
>>> import pickle
>>> bytes(pickle.loads(b'(I117\nI111\nI102\nI116\nI99\nI116\nI102\nI123\nI109\nI48\nI100\nI49\nI102\nI121\nI49\nI110\nI54\nI95\nI55\nI104\nI51\nI95\nI53\nI48\nI117\nI114\nI99\nI51\nI95\nI48\nI102\nI95\nI112\nI121\nI55\nI104\nI48\nI110\nI95
\nI49\nI53\nI95\nI102\nI117\nI110\nI125\nt.'))
b'uoftctf{m0d1fy1n6_7h3_50urc3_0f_py7h0n_15_fun}'
And we got the flag
uoftctf{m0d1fy1n6_7h3_50urc3_0f_py7h0n_15_fun}
Conclusion
This challenge was a masterclass in Python internals, showing how every level of the Python ecosystem - from C source to pickle serialization - can be used in creative ways for code protection. The flag’s message about modifying Python’s source being “fun” was a fitting reward for our deep dive into the interpreter’s internals!