My previous post uses concolic testing to explore and discover the path that leads to the correct answer. Getting the desired execution path (and basic block) executed means the problem is basically solved. We need to only solve for the inputs that lead to that path.
Unlike #2 that has a focus on finding the right path, this challenge does not have separate paths for success
or failure
per se. Instead of a desired execution path, we do know the desired encoded output that we want the binary to finally produce. The task is then to solve for the input that used to derive the output. We need to somehow represent the desired encoded output as a series of contraints for the SAT solver to solve for the input.
Challenge #5 consists of two files, a binary and a pcap file. The binary, sender
(binary link), is a Windows binary that reads from a key.txt
file, presumably performs some computation on the contents, and sends data (likely derived from the key.txt
file) via a HTTP POST request.
The pcap file, challenge.pcap
, helps illuminate what is being sent. Investigating the pcap in Wireshark, we can tell that the following “hash” or encoded data is sent over separate requests transmitting four bytes at a time.
UDYs1D7bNmdE1o3g5ms1V6RrYCVvODJF1DpxKTxAJ9xuZW==
The disassembly of the main function sub_401100
is very straightforward. The important functions in order of execution are:
ReadFile
: Reads key_original
from key filesub_401250
: In-place encodes key_original
==> key_encoded
sub_401427
: Allocates a heap buffer to store hash
sub_4012A0
: Encodes key_encoded
==> hash
...
00401181 call ds:ReadFile ; Read from "key.txt" file
...
00401198 mov edx, esi ; edx = number of bytes read
0040119A lea ecx, [ebp+Buffer] ; ecx = contents of key file
004011A0 call sub_401250 ; Performs an in-place encoding of the
; original key buffer
...
004011D3 push eax
004011D4 push 1
004011D6 call sub_401427 ; Allocates a buffer to store the
; encoded "hash"
...
004011DD lea ecx, [ebp+Buffer] ; ecx = in-place encoded buffer
004011E3 push edi ; edi = len(hash) =
; math.ceil(len(key)/3) * 4
004011E4 push ebx ; ebx = alloc-ed buffer to store hash
004011E5 mov edx, esi ; edx = original len(key)
004011E7 call sub_4012A0 ; Converts encoded key buffer into
; hash to be sent over network
The chunk of instructions before HeapAlloc sub_401427
is called calculates the length of the encoded “hash” string from the length of the original key. Reversing instructions from 0x4011A5
to 0x4011CD
indicates that len(hash) = math.ceil(len(key)/3) * 4
. This information is particularly useful for us to reduce the search space of the concolic execution. Since we know that the length of the desired “hash” is 48 bytes, we compute the length of the original key as 34 bytes. We will use this to guide our concolic execution.
Now that we get a rough idea of the high-level pipeline of the “hash” generation, we shall proceed to use the concolic execution to derive the symbolic constraints on the inputs, and finally use angr in-built SAT solver to solve for the input satisfying the discovered constraints.
First, we initialize angr, load the challenge binary and begin with an execution state starting after the function call to ReadFile()
.
In [1]: import angr
In [2]: p = angr.Project('sender')
In [3]: state = p.factory.blank_state(addr=0x401198, remove_options={simuvex.o.LAZY_SOLVES})
We configure the stack to the state after the key file is read. More importantly, we need to mark the key file buffer as a symbolic one. Note that here we hard-coded the length of the key contents. It will be interesting to see how much longer the concolic execution will take if we set the length as a symbolic value.
In [4]: LEN_PW = 0x22
In [5]: ADDR_PW_ORI = state.regs.ebp - 0x80004
In [6]: state.regs.esi = LEN_PW
In [7]: for i in xrange(LEN_PW):
...: state.mem[ADDR_PW_ORI+i:].byte = state.BV('pw', 8)
When the encoding function sub_401250
, we face the first complication – the encoding of the original key buffer is performed in-place within the same buffer. As this buffer will subsequently be used as an input to the hashing function sub_4012A0
, solving symbolically for this buffer (used as input and output at the same time) at the end will derive the sequence of bytes representing the buffer after it has been encoded by sub_401250
.
As a workaround, we will retrofit a portion of the stack [ebp - 0x70004]
to be the output of this in-place encoding function. This separation of the input and output buffers used by sub_401250
conveniently allows the constraints to be propagated by the concolic engine.
In [8]: ADDR_PW_ENC = ADDR_PW_ORI + 0x10000
Additionally, we need to use angr hooking functionality to hook the instruction at 0x401259
within the function sub_401250
to duplicate the original key buffer and update the register ebx
to the retrofitted output buffer. The instruction at 0x4011E7
also needs to be hooked to use this output buffer. Note that these are 0-length hooks, i.e. no existing instructions are replaced by the hooks.
In [9]: def hook_duplicate_pw_buf(state):
...: for i in xrange(LEN_PW):
...: char_ori = state.memory.load(ADDR_PW_ORI + i, 1)
...: state.memory.store(ADDR_PW_ENC + i, char_ori)
...: state.regs.ebx = ADDR_PW_ENC
In [10]: p.hook(0x401259, hook_duplicate_pw_buf, length=0)
In [11]: def hook_use_dup_pw_buf(state):
....: state.regs.ecx = ADDR_PW_ENC
In [12]: p.hook(0x4011E7, hook_use_dup_pw_buf, length=0)
The second complication we face is related to the function sub_401427
. This function calls the HeapAlloc()
external library call to allocate a new buffer to hold the “hash” to be computed later. It is challenging to support the use of external libraries in angr, especially if it is just the use of one API import. The best approach to handle this is to (again) retrofit part of the stack as the “new” buffer and use this buffer instead for the hash computation. Note that this time, a hook of length 5 is used because we do not want to replace the original call function at 0x4011D6
.
In [13]: ADDR_HASH = state.regs.ebp - 0x40000
In [14]: def hook_heapalloc(state):
....: state.regs.eax = ADDR_HASH
In [15]: p.hook(0x4011D6, hook_heapalloc, length=5)
Now we are all set to discover the constraints via concolic execution. We will stop the path exploration just after the return of the hashing function sub_4012A0
.
In [16]: paths = p.factory.path_group(state, immutable=False)
In [17]: paths.explore(find=0x4011EC)
Out[17]: <PathGroup with 1 found>
Next, we add the constraints to make final hash buffer equal to the desired goal hash string that we have derived earlier from the pcap file. We also restrict the hash buffer to only printable bytes.
In [18]: found_s = paths.found[0].state
In [19]: GOAL_HASH = 'UDYs1D7bNmdE1o3g5ms1V6RrYCVvODJF1DpxKTxAJ9xuZW=='
In [20]: for i in xrange(len(GOAL_HASH)):
....: char = found_s.memory.load(ADDR_HASH + i, 1)
....: found_s.add_constraints(char >= 0x21, char <= 0x7e,
....: char == ord(GOAL_HASH[i]))
Finally, we solve the constraints with the angr solver, and voila!
In [21]: found_s.se.any_str(found_s.memory.load(ADDR_PW_ORI+0, 1)) + \
....: found_s.se.any_str(found_s.memory.load(ADDR_PW_ORI+1, LEN_PW-1))
Out[21]: 'Sp1cy_7_layer_OSI_dip@flare-on.com'
Some may question why the constraint solving for the input is done over two separate solver se.any_str()
statements. This is an issue that I have been unable to figure out at the moment. When we try to solve the constraints in one statement over the symbolic input buffer as a whole, somehow only the first character is shown. The rest of the buffer consists of only null bytes.
In [22]: found_s.se.any_str(found_s.memory.load(ADDR_PW_ORI, LEN_PW))
Out[22]: 'S\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
Please kindly drop me a note, if you know a way around this.
Here is the full listing of python script to solve Flare-On #5 with angr.
import angr
import simuvex
# Globals
LEN_PW = 0x22
ADDR_PW_ORI = ADDR_PW_ENC = ADDR_HASH = 0
GOAL_HASH = 'UDYs1D7bNmdE1o3g5ms1V6RrYCVvODJF1DpxKTxAJ9xuZW=='
def hook_duplicate_pw_buf(state):
for i in xrange(LEN_PW):
char_ori = state.memory.load(ADDR_PW_ORI + i, 1)
state.memory.store(ADDR_PW_ENC + i, char_ori)
state.regs.ebx = ADDR_PW_ENC
def hook_use_dup_pw_buf(state):
state.regs.ecx = ADDR_PW_ENC
def hook_heapalloc(state):
state.regs.eax = ADDR_HASH
def main():
global ADDR_PW_ORI, ADDR_PW_ENC, ADDR_HASH
# Load binary
p = angr.Project('sender')
# Start with a blank state at the EIP after "key.txt" is read
state = p.factory.blank_state(addr=0x401198,
remove_options={simuvex.o.LAZY_SOLVES})
# Initialize global variables
ADDR_PW_ORI = state.regs.ebp - 0x80004
ADDR_PW_ENC = ADDR_PW_ORI + 0x10000
ADDR_HASH = state.regs.ebp - 0x40000
# Setup stack to simulate the state after which the "key.txt" is read
state.regs.esi = LEN_PW
for i in xrange(LEN_PW):
state.mem[ADDR_PW_ORI+i:].byte = state.BV('pw', 8)
# Hook instructions to use a separate buffer for the XOR-ing function
p.hook(0x401259, hook_duplicate_pw_buf, length=0)
p.hook(0x4011E7, hook_use_dup_pw_buf, length=0)
# To avoid calling imports (HeapAlloc), retrofit part of the stack as
# temporary buffer to hold symbolic copy of the password
p.hook(0x4011D6, hook_heapalloc, length=5)
# Explore the paths until after the hash is computed
paths = p.factory.path_group(state, immutable=False)
paths.explore(find=0x4011EC)
# Add constraints to make final hash equal to the one we want
# Also restrict the hash to only printable bytes
found_s = paths.found[0].state
for i in xrange(len(GOAL_HASH)):
char = found_s.memory.load(ADDR_HASH + i, 1)
found_s.add_constraints(char >= 0x21,
char <= 0x7e,
char == ord(GOAL_HASH[i]))
# Solve for password that will result in the required hash
print found_s.se.any_str(found_s.memory.load(ADDR_PW_ORI+0, 1)) + \
found_s.se.any_str(found_s.memory.load(ADDR_PW_ORI+1, LEN_PW-1))
if __name__ == '__main__':
main()
Thanks to jmgrosen on Reddit, we now know that the sidenote is indeed a weird bug.
More importantly, he suggested a more elegant solution around the problem of getting the original password symbolic buffer getting clobbered (see complication 1 subsection above). The trick is to save the symbolic bytes for the original password buffer and then solve for them at the end.
pw = state.BV('PW', LEN_PW * 8)
state.mem[ADDR_PW_ORI:] = pw
...
print found_s.se.any_str(pw)
However, due to the bug, we need to mark the password buffer byte by byte, instead of as a long range of BitVectors.
Here is the improved script listing, without the need to maintain a duplicate buffer at ADDR_PW_ENC
:
import angr
import simuvex
# Globals
LEN_PW = 0x22
ADDR_PW_ORI = ADDR_HASH = 0
GOAL_HASH = 'UDYs1D7bNmdE1o3g5ms1V6RrYCVvODJF1DpxKTxAJ9xuZW=='
def hook_heapalloc(state):
state.regs.eax = ADDR_HASH
def main():
global ADDR_PW_ORI, ADDR_HASH
# Load binary
p = angr.Project('sender')
# Start with a blank state at the EIP after "key.txt" is read
state = p.factory.blank_state(addr=0x401198,
remove_options={simuvex.o.LAZY_SOLVES})
# Initialize global variables
ADDR_PW_ORI = state.regs.ebp - 0x80004
ADDR_HASH = state.regs.ebp - 0x40000
# Setup stack to simulate the state after which the "key.txt" is read
state.regs.esi = LEN_PW
# Restrict the password to printable bytes, ending with a null byte
pw_char = state.BV('PW_CHAR', 8)
pw_str = pw_char
for i in xrange(LEN_PW):
state.add_constraints(pw_char >= 0x21, pw_char <= 0x7e)
state.mem[ADDR_PW_ORI+i:].byte = pw_char
pw_char = state.BV('PW_CHAR', 8)
pw_str = pw_str.concat(pw_char)
state.add_constraints(pw_char == 0)
# To avoid calling imports (HeapAlloc), retrofit part of the stack as
# temporary buffer to hold symbolic copy of the password
p.hook(0x4011D6, hook_heapalloc, length=5)
# Explore the paths until after the hash is computed
paths = p.factory.path_group(state, immutable=False)
paths.explore(find=0x4011EC)
# Add constraint to make final hash equal to the one we want
# Also restrict the hash to only printable bytes
found_s = paths.found[0].state
for i in xrange(len(GOAL_HASH)):
char = found_s.memory.load(ADDR_HASH + i, 1)
found_s.add_constraints(char >= 0x21,
char <= 0x7e,
char == ord(GOAL_HASH[i]))
# Solve for password that will result in the required hash
print found_s.se.any_str(pw_str)
if __name__ == '__main__':
main()