04 November 2007

update notes

i updated the blog, obviously, i did it with older updates of mine from another blog in an effort to bring some of my thoughts into the public realm, so the chronology of the next posts are out of order. Also, I have a bunch more on double free()'s to post, I just haven't gotten it to a web presentation format atm.

library randomization

so it seems like to me, and perhaps im just not thinking right because i havent worked it out on paper, but it seems like because of the fact that apple say's that:


Mach-O position-independent code design is based on the observation that the __DATA segment is always located at a constant offset from the __TEXT segment. That is, the dynamic loader, when loading any Mach-O file, never moves a file’s __TEXT segment relative to its __DATA segment. Therefore, a function can use its own current address plus a fixed offset to determine the location of the data it wishes to access. All segments of a Mach-O file, not only the __TEXT and __DATA segments, are at fixed offsets relative to the other segments.

Note: If you are familiar with the Executable and Linking Format (ELF), you may note that Mach-O position-independent code is similar to the GOT (global offset table) scheme. The primary difference is that Mach-O code references data using a direct offset, while ELF indirects all data access through the global offset table.


Now, I haven't actual spent any time digging through leopard, aside from a simple test of compiling a test program with the gcc on their (v4.0.1) to see if it had SSP (it had never heard of the flags -fstack-protector or -fstack-protector-all ?)- but it seems like if Apple PIC binaries contain this trait you end up with a couple complications.

The first and most obvious would be that you can't randomize per section, although in theory i believe you should be able to randomize the stack and heap, although that may cause problems in one of those funko languages like obj-c. The second problem is that because the text is not randomized, and because all i need to know is the base address of the image, which is pretty likely considering all of the segments that are not randomized and then look for variable references that take observation of the fact that section offsets are constant, and it would seem like I could reverse the address space layout that way.

I mean examining the .text or anything dealing with libraries, such as dyld, should reveal a lot of those references, it seems pretty much like overkill at this point in the game because all you really need is a jmp/call addr/reg, but honestly this seems like a deep flaw in the ASLR logic; maybe i just need more sleep?

MAX_PATH and the secret life of backslash-backslash-questionmark-blackslash

So in Windows, specifically in the Shell API there is the concept of MAX_PATH, which obviously is the maximum path isn't it? Well no, actually, it isn't ;]

If you prepend the string "\\?\" to a path and call the unicode version of the function, you can access files and directories with names up to something like 32,000 wide characters.

This can in turn lead to incorrect file access (which can result in a lot of problems), or for a host of API calls that take an output buffer and output buffer size parameter, a 'buffer is too small' return value which is larger than the original output buffer size.

That is to say you should be calling the APIs in the following way:

DWORD siz = len;
DWORD retval = SomethingWithALongNameW(..., siz)

if (0 == retval)
// error
if (retval > siz)
// resize buffer or error


But I'm seeing that in a lot of cases, thats not how people are calling it, they're instead calling it closer to this:

if (0 != SomethingWithALongNameW(..., siz) {
// !error



When you do that, you end up with a condition where the buffer being passed in as an output isn't initialized to any value, and the return value is not properly checked and no way to truly know whether the value was initialized or not.

The second half of the problem comes from the fact that a lot of those same API calls will truncate @ MAX_PATH, potentially leading to conditions where files are accessed incorrectly, think of this in the context of signing of verifying the signature of a file where the path length that gets truncated is not the one that gets employed, or the return value is there also improperly checked.

Seriously, try this out- open visual studio and create a directory structure thats like C:\0\1\2\3\4\5\6 , et cetera, create it longer than MAX_PATH, and then try to access it via say cmd.exe or even explorer.exe, try to delete it using either of those, well not cmd.exe, but you'll see why there. (ed note to the lazy reader, everything breaks)

double free()

So, while writing an exploit for a publicly known bug at work I stumbled across another one, a double free(). Then I started looking into how exactly one exploit's a double free, and it looked grim. The program has to survive a double free, meaning that it cannot crash. The default action for glibc is to print an error message such as:

glibc detected *** double free or corruption (fasttop): 0x12345678 ***



and then call abort(), which terminates the program. So it looked like it was not exploitable, and just a DoS, so I looked at BSD libc and the libc from Solaris, neither of them appeared to be affected by double free's either. So I went back and looked in glibc's source code to determine what exactly it checks to detect a double free (There are a couple different checks depending on type, I'm only covering one here because it is what pertains to my situation and the others are essentially the same with a few other easily bypassed checks).

So, glibc has the concept of fast bin's, or arrays where it stores free chunks of memory under certain conditions, most importantly chunks that are less than 512 bytes in length. So I looked at the glibc code in malloc/malloc.c and find the following relevant section:

[...]
fb = &(av->fastbins[fastbin_index(size)]);
/* Another simple check: make sure the top of the bin is not the
record we are going to add (i.e., double free). */
if (__builtin_expect (*fb == p, 0))
{
errstr = "double free or corruption (fasttop)";
goto errout;
}




Here we have something interesting, let me explain the code first though. In the first line we take the variable 'fb' and make it point to the address of the element in the fast bin array for the size of the chunk in question. In other words, the fast bin array is sorted by size of chunks, and we are assigning the fb variable to the address of the index for the given size of our current chunk. Then we check to see if what that address points to (*fb) is the address of the current chunk being free'd (p), if so we've found the chunk being free'd in the list of free chunks and we have a double free situation (or linked list corruption).

But what if? What if another chunk of the same size and thus in the same fast bin has been deallocated since the current chunk was free'd, for instance, what if we have:

ptr0 = malloc(siz);
ptr1 = malloc(siz);

// presume both calls succeed

free(ptr0);
free(ptr1);



Then the top entry in the list won't be ptr0, it will be ptr1, and (assuming no other chunks in that list have been deallocated) ptr0 will be the second end on the linked list and the check will succeed and we will be able to free the pointer again, not have abort() called and potentially exploit the situation for our advantage.


#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
void *ptr0;

ptr0 = malloc((size_t)64);

if (NULL == ptr0) {
perror("malloc()");
return EXIT_FAILURE;
}

if (1 < argc) {

void *ptr1;

ptr1 = malloc((size_t)64);

if (NULL == ptr1) {
perror("malloc()");
return EXIT_FAILURE;
}

free(ptr0);
free(ptr1);
free(ptr0);

} else {

free(ptr0);
free(ptr0);

}

return EXIT_SUCCESS;
}




I've got a bit more on the glibc heap implementation, some of which I thought myself clever for finding only to realize it's been documented in a recent paper, others which just are redundant to mention at this point.

__dso_handle && __cxa_finalize()

So, I'm working on another program, not the one dealing with authentication and such but the one related to email with the buggy signal handler, there have been some complications in getting a reliable exploit for it so I backed up and thought maybe I could find something easier to exploit in the signal handler, I expected to perhaps be able to screw with OpenSSL in cleanup routines but there are none.

However, the code base is quite ugly and shows all the style of a grad student who thinks they know what they're doing and I decided to fix it to improve reliability (FYI spaces = bad, tabs = good, putting as many things as you can on a single line = bad, new lines = good [they come free with the computer]).

In doing so I found another small bug that I am trying to determine if I can leverage, basically there is a static array of signed chars that I have limited control over the index, because its signed I can provide a negative index, but I only have a single char for an index so I am limited to a max of -127 bytes before the array.

In that, I can do nothing if where the index ends up doesn't have the value of 0x20, so I started digging through .data (where the compiler puts the array) and seeing what has or could have a value of 0x20 -127 bytes back and I ran across a symbol named __dso_handle, not sure of what it is I dug into GCC a little bit and here's what I found.

Basically, it's a symbol that deals with C++ destructors for static objects in shared libraries, the relevant code that uses it is in a function called __cxa_finalize() and is something like as follows:

void
__cxa_finalize (void *d)
{
[...]

if (!d)
return;

for (funcs = __exit_funcs; funcs; funcs = funcs->next)
{
[...]

if (f->flavor == ef_cxa && d == f->func.cxa.dso_handle)
{
(*f->func.cxa.fn) (f->func.cxa.arg);
[...]
}
}
}




the argument 'd' is the __dso_handle for the shared object, interestingly enough if I could modify that then I would have the possibility of having another objects destructors called, causing any number of circumstances, most likely a double free().

It's not incredibly useful in this instance because I am dealing with a program that won't have any C++ static object destructors, but it's interesting none the less and something I will keep in mind in the future.

That's that, and that was today in my world. Good night.

sololis

Man, I wish I wasn't so tired as I'd like to explain this. This is positively the funniest thing I've come across all week. I'm starting to study the Solaris heap, not only because I want to learn it but because it's starting to look like I have a fairly large vulnerability in Solaris that of course deals with the heap, this is what I came across.

/* i may have missed a line here or there but atm this is what i think happens
* (aka i havent actually gotten a sun box running well enough yet to test this)
* WAY TO GO SUN!
* -jf
*/

src/lib/libc/inc/mtlib.h:

[...]
#if defined(THREAD_DEBUG)
extern void assert_no_libc_locks_held(void);

#else
#define assert_no_libc_locks_held()
#endif

src/lib/libc/port/threads/sync.c:

[...]
#pragma weak _private_mutex_lock = __mutex_lock
[...]
int

__mutex_lock(mutex_t *mp)
{
ASSERT(!curthread->ul_critical || curthread->ul_bindflags);

return (mutex_lock_impl(mp, NULL));
}
[...]

static int
mutex_lock_impl(mutex_t *mp, timespec_t *tsp)
{

ulwp_t *self = curthread;
uberdata_t *udp = self->ul_uberdata;

uberflags_t *gflags;
int mtype;

/*
* Optimize the case of USYNC_THREAD, including
* the LOCK_RECURSIVE and LOCK_ERRORCHECK cases,
* no error detection, no lock statistics,
* and the process has only a single thread.
* (Most likely a traditional single-threaded application.)
*/
if ((((mtype = mp->mutex_type) & ~(LOCK_RECURSIVE|LOCK_ERRORCHECK)) |

udp->uberflags.uf_all) == 0) {
/*
* Only one thread exists so we don't need an atomic operation.
*/

if (mp->mutex_lockw == 0) {
mp->mutex_lockw = LOCKSET;

mp->mutex_owner = (uintptr_t)self;
DTRACE_PROBE3(plockstat, mutex__acquire, mp, 0, 0);

return (0);
}
if (mtype && MUTEX_OWNER(mp) == self)

return (mutex_recursion(mp, mtype, MUTEX_LOCK));

/*
* We have reached a deadlock, probably because the
* process is executing non-async-signal-safe code in
* a signal handler and is attempting to acquire a lock
* that it already owns. This is not surprising, given
* bad programming practices over the years that has
* resulted in applications calling printf() and such
* in their signal handlers. Unless the user has told
* us that the signal handlers are safe by setting:
* export _THREAD_ASYNC_SAFE=1
* we return EDEADLK rather than actually deadlocking.
*/
if (tsp == NULL &&
MUTEX_OWNER(mp) == self && !self->ul_async_safe) {

DTRACE_PROBE2(plockstat, mutex__error, mp, EDEADLK);
return (EDEADLK);

[...]

src/lib/libc/port/gen/malloc.c:

[...]
void *
malloc(size_t size)
{
void *ret;

if (!primary_link_map) {
errno = ENOTSUP;

return (NULL);
}
assert_no_libc_locks_held();

(void) _private_mutex_lock(&libc_malloc_lock);
ret = _malloc_unlocked(size);

(void) _private_mutex_unlock(&libc_malloc_lock);
return (ret);

}
While auditing this application I noticed that there was a linked list (std::list) that was accessed in multiple threads, however insertion and deletion of nodes in the list was serialized, however iterating over the list was not, and my little wheels got turning. Here is the situation essentially in code below, i removed a bunch of layers of abstraction, but this is basically what it did:

// thread 0
lock();
list_x.push_back(new_node);
unlock();


// thread 1
for (itr = list_x.begin(); itr != list_x.end(); itr++)
if ((*itr)->method())
// ...

In thread zero, what specifically happens behind the scenes is that the
push_back() method first allocates a new node, then hooks the new node
into the list, first by modifying the lists pointers, and then by
modifying the nodes pointers at which point the new node is linked in
and in a stable state. In the second thread, the variable itr is
assigned the first node in the list, or more specifically list_x->next.
In the middle condition of the for() statement, the iterator is checked
to ensure that it does not equal the end of the list, which behind the
scenes is actually defined as being list_x (the list is circular).
Assuming this condition is true, then the iterator is dereferenced and a
member method is called.

However, if in the process of hooking in the new node during
push_back(), this new node is traversed by the for() loop in the second
thread, it is possible that itr->next does not point to a valid node in
the list, and not to the node returned by end(). Thus when the iterator
is assigned to itr->next, it can point to an invalid section of memory,
and then when the member method is called, execution can occur in an
unintended spot.

interesting thought at least

Its actually pretty neat, you can potentially cause a SIGTERM to be sent to a remote process on Linux if you can cause it to consume large amounts of memory, which in turn can potentially cause a signal handler to be invoked, which can potentially happen at awkward and inopportune moments, or specifically what I'm trying to accomplish in something that I'm working on is that it has a function they call to deinitialize the entire process and deallocates globally-scoped memory. Specifically it does things like tear down database connections and destroys their opaque datatypes, and other similar things along with actual calls to free().

So, the Linux OOM killer calls a function named badness() (no joke) that determines every processes likelyhood of resource abuse to reclaim memory for the system, its a pretty extreme goodbye, but the OS does this when absolutely necessary to reclaim memory. The badness is calculated by various factors, including its capability set (specifically CAP_SYS_ADMIN and CAP_SYS_RAWIO), how long the process has been running, how many children it has and of course how much memory its consuming. Finally it takes a user-tunable number, and left shifts the badness number that many times. Supposedly a value of -17 in this /proc file can causes the OOM killer to not consider that process if its a process leader. Furthermore, processes that are in the process of free()'ing memory are not candidates.

So now the stage is set, I have a signal handler that calls a function that is seriously not-reentrant, but I can't reach it via traditionally applicable signals (signals that can be sent remotely), i.e. SIGPIPE, SIGURG, et cetera. It can be reached via things like SIGHUP, SIGTERM, SIGINT.

I can however cause a SIGTERM to be sent indirectly.

All I need is a memory leak, the more of them the better, I need as much precision that I can get on triggering this, if I can get it to trigger at the same time this process is already calling that exit function and I can use one of the pieces of code that more or less acts as a destructor to use a dangling pointer and write 4 bytes to say the actual destructors, when the process calls libc's exit() I may be able to cause it to call an atexit function, which I hopefully control and if all of these conditions are met, I land at a root shell.

03 April 2007

Apple Computers

Today I interviewed with Apple for a security researcher position, it's a position that has been open since something like August of 2005 and now I know why. Before I detail what occurred in the interview, let me elaborate upon my motivations behind being interested in a position with Apple.

First let me say, I am not a huge Apple fan, honestly the commercials drive me up a wall because they're largely filled with disinformation. I am not part of the 'cult of mac', but I am generally not someone who hates Apple either. However there is one thing that I am fairly certain of, they have the most insecure mainstream operating system currently on the market, and they're about five years behind the curve in regards to security. I'm sure plenty of people will turn their nose up at that statement, but let me explain. If we look at it from a coders perspective, it's NeXT dressed up as Unix for Halloween, except it lacks the evolution that has occurred throughout the Unix world, and the evolution that has made it's way into Redmond as well. On my PPC powerbook, the stack, heap, .data and .bss are all executable, my understanding is that on Intel mac's this is not the case, however they're lacking any form of ASLR, which makes the non-executable stack/et cetera more or less useless. Even more, the GCC on my Apple CD is missing stack smashing protection (SSP formerly ProPolice), which is something that comes in GCC by default, which means that they had to rip the extra security out of the compiler. A lot of people are mistakenly under the impression that SSP is only stack cookies, which is something it does implement, but it also does other rather unique things that make it a really great feature, namely it will reorder variables to minimize the damage of potential overflow.

So we have a relatively soft target platform, we also have an incredible monoculture never dreamed of in the windows world, not only is everyone running the same OS, but they're largely running the same hardware, or rather one of a few different types of hardware, this means that you have only 3 or 4 targets, whether it be userland or kernel exploitation, which means that by and large any exploit found will work fairly well, think about that for a moment in the context of a worm.

But, my dear mac friend, you may say, 'oh but the firewall!', and as previously pointed out by Jay Bealle, the firewall is useless as implemented. To bypass the TCP filter, you just need to fragment your packets because it will accept any fragment, to bypass the UDP filter you just need a source port of 53 or 67, because it allows anything with these source ports through.

So we have a soft platform, with a faulty firewall, and then one of the biggest dangers to OSX- its user base, how many times have you seen a random OSX user make some comment about not needing anti-virus or really to have any true concerns in regards to security, after all everyone is targetting windows, right? That is a very dangerous perspective that one of these days will eat the OSX user base alive, or so one can hope anyways.

So we have an uneducated user base, a soft platform, a faulty firewall and what else? One of the things that came out of the Month of Apple bugs that I found interesting was the format string bugs in the AppKit framework. Why? Because these are APIs written in Cupertino, and used by the same people in Cupertino, and they misused them horribly leading me to believe that they simply don't understand the dangers, for those wondering I'm specifically referring to the following functions:

* NSBeginAlertSheet
* NSBeginCriticalAlertSheet
* NSBeginInformationalAlertSheet
* NSGetAlertPanel
* NSGetCriticalAlertPanel
* NSGetInformationalPanel
* NSReleaseAlertPanel
* NSRunAlertPanel
* NSRunCriticalAlertPanel
* NSRunInformationalAlertPanel
* NSLog

Let me make sure my point is crystal clear here, Apple defined these APIs, then they misused their own APIs in a very elementary way that makes me question the basic comprehension of security by their developers, QA team and security engineers/researchers. Furthermore, while I am on the subject of the Month of Apple Bugs (which to be bluntly honest left me mostly unimpressed), these were bugs that were largely found, as I understand it, by fuzzing, feeding random data in a semi-intelligent manner to the applications.

So we have a faulty firewall, a soft platform, ignorant users and developers that don't truly understand what they're doing and extreme monoculutre; what we really have is a recipe for disaster.

So when I looked at Apple, I saw a company in distress, I see a company whose future is intersects with a major security incident, and naively, I wanted to help. The Apple team is running towards the cliff in a date with destiny, and that's not a threat, thats a prophecy. It's going to happen and it wasn't until maybe 20-30 minutes after my interview that I realized why nothing is being done to prevent it; they believe their own marketing hype. I'll get back to this soon enough though.

So my interview started around 1330 and I was greeted by a gentlemen who told me basically what they do and then proceeded to tell me that their phone was 'half-duplex' and that they could either talk or listen, but not both at the same time. I find this incredibly hard to believe, but I opted to not comment on treating me like an idiot and dumbing down the point that if I give a long answer I should pause every so often, thanks, I almost forgot how to talk to people, but with your guidance I'm sure I'll do great!

The interview started with pretty simple questions, they're going to ask me system calls and I'm going to tell them what they do and any security implications it might have. The first one was fork() .. Okay well it um, fork's a process, or create a child process, security implications would be largely the inheritance of file descriptors and that a fault in either one of the processes wouldn't affect the other, as opposed to threads. They wanted more, but I couldn't think of anything and I still can't, the next was execve() which we talked about some mostly about failure to sanitize possible user-input and relative paths/et cetera, then I was asked how a process would drop its privileges, and I mentioned setuid()/setgid() and we discussed that some and then I was asked how I would drop privileges if I wanted to regain them later, and I commented that it probably wasn't the answer they were directly looking for but that what I would do would be to keep a parent management process that listened on a Unix socket to perform privilege seperation and that if a child wanted to restore it's privileges that it would make a request to the parent to create a new privileged child and pass on the file descriptors.

Here's what shocked me, I was told I was wrong (!!), they were obviously looking for some about saved/effective uid's and such but that's pointless because you might as well never had dropped your privileges, or to say it another way, if you can regain your privileges, you never dropped them.

I was then asked about how if I had this *large* code base of like 30,000 lines of code and like two days to audit it, what would I do. Firstly, let me comment that 30,o00 lines is not large, its fairly common, but whatever. I said that I'd look for usage of functions known to have problems traditionally and look at the core internal API and see if it can be misused. They asked me how I would do that, kind've making it sound like I was oversimplifying the situation and I commented that it typically wasn't that hard, grep or look at the filenames as there is typically things like alloc.c or similar, then work backwards from there. This again was met with resistance and I said that it isn't like this is hypothetical, this is what I do for a living and then they said something like 'While I can appreciate that you supposedly do this for a living, I was asking for a real answer' at which point I had enough and told them that I didn't like their fucking attitude and that this type of arrogance is why Apple is in the midst of such a security nightmare, that I wasn't interested any longer in a position with them and then proceeded to hang up.

Then about five minutes later I got a call back from the recruiter, asking what had happened and she started to laugh when she repeated back my line about the security nightmare and I realized what the problem really is, they're mistaking beginners luck for skill. When I say beginners luck, I don't mean beginner to the computer industry, but rather beginner to the 'real' industry; up until OSX they were a tinker toy OS that was largely disregarded, then they stepped up and put themselves on the same level with Microsoft and Unix, and they seem to think that just because no one has ripped them apart means that no one can. Even worse they don't seem to recognize that their arrogance makes enemies, and generally speaking they're not the types of enemies that a software vendor wants.

So fuck it, here's to you Apple and to your future insecurity, I wanted to help you and instead you just ended up with another person interested in auditing your software.