jf: November 2007

04 November 2007

update notes

i updated the blog, obviously, i did it with older updates of mine from another blog in an effort to bring some of my thoughts into the public realm, so the chronology of the next posts are out of order. Also, I have a bunch more on double free()'s to post, I just haven't gotten it to a web presentation format atm.

library randomization

so it seems like to me, and perhaps im just not thinking right because i havent worked it out on paper, but it seems like because of the fact that apple say's that:

Mach-O position-independent code design is based on the observation that the __DATA segment is always located at a constant offset from the __TEXT segment. That is, the dynamic loader, when loading any Mach-O file, never moves a file’s __TEXT segment relative to its __DATA segment. Therefore, a function can use its own current address plus a fixed offset to determine the location of the data it wishes to access. All segments of a Mach-O file, not only the __TEXT and __DATA segments, are at fixed offsets relative to the other segments.

Note: If you are familiar with the Executable and Linking Format (ELF), you may note that Mach-O position-independent code is similar to the GOT (global offset table) scheme. The primary difference is that Mach-O code references data using a direct offset, while ELF indirects all data access through the global offset table.

Now, I haven't actual spent any time digging through leopard, aside from a simple test of compiling a test program with the gcc on their (v4.0.1) to see if it had SSP (it had never heard of the flags -fstack-protector or -fstack-protector-all ?)- but it seems like if Apple PIC binaries contain this trait you end up with a couple complications.

The first and most obvious would be that you can't randomize per section, although in theory i believe you should be able to randomize the stack and heap, although that may cause problems in one of those funko languages like obj-c. The second problem is that because the text is not randomized, and because all i need to know is the base address of the image, which is pretty likely considering all of the segments that are not randomized and then look for variable references that take observation of the fact that section offsets are constant, and it would seem like I could reverse the address space layout that way.

I mean examining the .text or anything dealing with libraries, such as dyld, should reveal a lot of those references, it seems pretty much like overkill at this point in the game because all you really need is a jmp/call addr/reg, but honestly this seems like a deep flaw in the ASLR logic; maybe i just need more sleep?

MAX_PATH and the secret life of backslash-backslash-questionmark-blackslash

So in Windows, specifically in the Shell API there is the concept of MAX_PATH, which obviously is the maximum path isn't it? Well no, actually, it isn't ;]

If you prepend the string "\\?\" to a path and call the unicode version of the function, you can access files and directories with names up to something like 32,000 wide characters.

This can in turn lead to incorrect file access (which can result in a lot of problems), or for a host of API calls that take an output buffer and output buffer size parameter, a 'buffer is too small' return value which is larger than the original output buffer size.

That is to say you should be calling the APIs in the following way:

DWORD siz = len;
DWORD retval = SomethingWithALongNameW(..., siz)

if (0 == retval) 
        // error
if (retval > siz) 
       // resize buffer or error

But I'm seeing that in a lot of cases, thats not how people are calling it, they're instead calling it closer to this:

if (0 != SomethingWithALongNameW(..., siz) {
        // !error

When you do that, you end up with a condition where the buffer being passed in as an output isn't initialized to any value, and the return value is not properly checked and no way to truly know whether the value was initialized or not.

The second half of the problem comes from the fact that a lot of those same API calls will truncate @ MAX_PATH, potentially leading to conditions where files are accessed incorrectly, think of this in the context of signing of verifying the signature of a file where the path length that gets truncated is not the one that gets employed, or the return value is there also improperly checked.

Seriously, try this out- open visual studio and create a directory structure thats like C:\0\1\2\3\4\5\6 , et cetera, create it longer than MAX_PATH, and then try to access it via say cmd.exe or even explorer.exe, try to delete it using either of those, well not cmd.exe, but you'll see why there. (ed note to the lazy reader, everything breaks)

double free()

So, while writing an exploit for a publicly known bug at work I stumbled across another one, a double free(). Then I started looking into how exactly one exploit's a double free, and it looked grim. The program has to survive a double free, meaning that it cannot crash. The default action for glibc is to print an error message such as:

glibc detected *** double free or corruption (fasttop): 0x12345678 ***

and then call abort(), which terminates the program. So it looked like it was not exploitable, and just a DoS, so I looked at BSD libc and the libc from Solaris, neither of them appeared to be affected by double free's either. So I went back and looked in glibc's source code to determine what exactly it checks to detect a double free (There are a couple different checks depending on type, I'm only covering one here because it is what pertains to my situation and the others are essentially the same with a few other easily bypassed checks).

So, glibc has the concept of fast bin's, or arrays where it stores free chunks of memory under certain conditions, most importantly chunks that are less than 512 bytes in length. So I looked at the glibc code in malloc/malloc.c and find the following relevant section:

[...]
    fb = &(av->fastbins[fastbin_index(size)]);
    /* Another simple check: make sure the top of the bin is not the
       record we are going to add (i.e., double free).  */
    if (__builtin_expect (*fb == p, 0))
      {
        errstr = "double free or corruption (fasttop)";
        goto errout;
      }

Here we have something interesting, let me explain the code first though. In the first line we take the variable 'fb' and make it point to the address of the element in the fast bin array for the size of the chunk in question. In other words, the fast bin array is sorted by size of chunks, and we are assigning the fb variable to the address of the index for the given size of our current chunk. Then we check to see if what that address points to (*fb) is the address of the current chunk being free'd (p), if so we've found the chunk being free'd in the list of free chunks and we have a double free situation (or linked list corruption).

But what if? What if another chunk of the same size and thus in the same fast bin has been deallocated since the current chunk was free'd, for instance, what if we have:

ptr0 = malloc(siz);
ptr1 = malloc(siz);

// presume both calls succeed

free(ptr0);
free(ptr1);

Then the top entry in the list won't be ptr0, it will be ptr1, and (assuming no other chunks in that list have been deallocated) ptr0 will be the second end on the linked list and the check will succeed and we will be able to free the pointer again, not have abort() called and potentially exploit the situation for our advantage.

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
    void *ptr0;

    ptr0 = malloc((size_t)64);

    if (NULL == ptr0) {
        perror("malloc()");
        return EXIT_FAILURE;
    }

    if (1 < argc) {

        void *ptr1;

        ptr1 = malloc((size_t)64);
        
        if (NULL == ptr1) {
            perror("malloc()");
            return EXIT_FAILURE;
        }

        free(ptr0);    
        free(ptr1);
        free(ptr0);

    } else {

        free(ptr0);
        free(ptr0);

    }

    return EXIT_SUCCESS;
}

I've got a bit more on the glibc heap implementation, some of which I thought myself clever for finding only to realize it's been documented in a recent paper, others which just are redundant to mention at this point.

__dso_handle && __cxa_finalize()

So, I'm working on another program, not the one dealing with authentication and such but the one related to email with the buggy signal handler, there have been some complications in getting a reliable exploit for it so I backed up and thought maybe I could find something easier to exploit in the signal handler, I expected to perhaps be able to screw with OpenSSL in cleanup routines but there are none.

However, the code base is quite ugly and shows all the style of a grad student who thinks they know what they're doing and I decided to fix it to improve reliability (FYI spaces = bad, tabs = good, putting as many things as you can on a single line = bad, new lines = good [they come free with the computer]).

In doing so I found another small bug that I am trying to determine if I can leverage, basically there is a static array of signed chars that I have limited control over the index, because its signed I can provide a negative index, but I only have a single char for an index so I am limited to a max of -127 bytes before the array.

In that, I can do nothing if where the index ends up doesn't have the value of 0x20, so I started digging through .data (where the compiler puts the array) and seeing what has or could have a value of 0x20 -127 bytes back and I ran across a symbol named __dso_handle, not sure of what it is I dug into GCC a little bit and here's what I found.

Basically, it's a symbol that deals with C++ destructors for static objects in shared libraries, the relevant code that uses it is in a function called __cxa_finalize() and is something like as follows:

void
__cxa_finalize (void *d)
{
   [...]
 
   if (!d)
     return;
 
   for (funcs = __exit_funcs; funcs; funcs = funcs->next)
     {
          [...]

       if (f->flavor == ef_cxa && d == f->func.cxa.dso_handle)
         {
           (*f->func.cxa.fn) (f->func.cxa.arg);
              [...]
         }
     }
 }

the argument 'd' is the __dso_handle for the shared object, interestingly enough if I could modify that then I would have the possibility of having another objects destructors called, causing any number of circumstances, most likely a double free().

It's not incredibly useful in this instance because I am dealing with a program that won't have any C++ static object destructors, but it's interesting none the less and something I will keep in mind in the future.

That's that, and that was today in my world. Good night.

sololis

Man, I wish I wasn't so tired as I'd like to explain this. This is positively the funniest thing I've come across all week. I'm starting to study the Solaris heap, not only because I want to learn it but because it's starting to look like I have a fairly large vulnerability in Solaris that of course deals with the heap, this is what I came across.

/* i may have missed a line here or there but atm this is what i think happens
 * (aka i havent actually gotten a sun box running well enough yet to test this)
 * WAY TO GO SUN!
 * -jf
 */

src/lib/libc/inc/mtlib.h:

[...]
#if defined(THREAD_DEBUG)
extern  void    assert_no_libc_locks_held(void);

#else
#define assert_no_libc_locks_held()
#endif

src/lib/libc/port/threads/sync.c:

[...]
#pragma weak _private_mutex_lock = __mutex_lock
[...]
int

__mutex_lock(mutex_t *mp)
{
        ASSERT(!curthread->ul_critical || curthread->ul_bindflags);

        return (mutex_lock_impl(mp, NULL));
}
[...]

static int
mutex_lock_impl(mutex_t *mp, timespec_t *tsp)
{

        ulwp_t *self = curthread;
        uberdata_t *udp = self->ul_uberdata;

        uberflags_t *gflags;
        int mtype;

        /*
         * Optimize the case of USYNC_THREAD, including
         * the LOCK_RECURSIVE and LOCK_ERRORCHECK cases,
         * no error detection, no lock statistics,
         * and the process has only a single thread.
         * (Most likely a traditional single-threaded application.)
         */
        if ((((mtype = mp->mutex_type) & ~(LOCK_RECURSIVE|LOCK_ERRORCHECK)) |

            udp->uberflags.uf_all) == 0) {
                /*
                 * Only one thread exists so we don't need an atomic operation.
                 */

                if (mp->mutex_lockw == 0) {
                        mp->mutex_lockw = LOCKSET;

                        mp->mutex_owner = (uintptr_t)self;
                        DTRACE_PROBE3(plockstat, mutex__acquire, mp, 0, 0);

                        return (0);
                }
                if (mtype && MUTEX_OWNER(mp) == self)

                        return (mutex_recursion(mp, mtype, MUTEX_LOCK));

                /*
                 * We have reached a deadlock, probably because the
                 * process is executing non-async-signal-safe code in
                 * a signal handler and is attempting to acquire a lock
                 * that it already owns.  This is not surprising, given
                 * bad programming practices over the years that has
                 * resulted in applications calling printf() and such
                 * in their signal handlers.  Unless the user has told
                 * us that the signal handlers are safe by setting:
                 *      export _THREAD_ASYNC_SAFE=1
                 * we return EDEADLK rather than actually deadlocking.
                 */
                if (tsp == NULL &&
                    MUTEX_OWNER(mp) == self && !self->ul_async_safe) {

                        DTRACE_PROBE2(plockstat, mutex__error, mp, EDEADLK);
                        return (EDEADLK);

[...]

src/lib/libc/port/gen/malloc.c:

[...]
void *
malloc(size_t size)
{
        void *ret;

        if (!primary_link_map) {
                errno = ENOTSUP;

                return (NULL);
        }
        assert_no_libc_locks_held();

        (void) _private_mutex_lock(&libc_malloc_lock);
        ret = _malloc_unlocked(size);

        (void) _private_mutex_unlock(&libc_malloc_lock);
        return (ret);

}

While auditing this application I noticed that there was a linked list (std::list) that was accessed in multiple threads, however insertion and deletion of nodes in the list was serialized, however iterating over the list was not, and my little wheels got turning. Here is the situation essentially in code below, i removed a bunch of layers of abstraction, but this is basically what it did:

// thread 0
lock();
list_x.push_back(new_node);
unlock();


// thread 1
for (itr = list_x.begin(); itr != list_x.end(); itr++)
  if ((*itr)->method())
    // ...

In thread zero, what specifically happens behind the scenes is that the
push_back() method first allocates a new node, then hooks the new node
into the list, first by modifying the lists pointers, and then by
modifying the nodes pointers at which point the new node is linked in
and in a stable state. In the second thread, the variable itr is
assigned the first node in the list, or more specifically list_x->next.
In the middle condition of the for() statement, the iterator is checked
to ensure that it does not equal the end of the list, which behind the
scenes is actually defined as being list_x (the list is circular).
Assuming this condition is true, then the iterator is dereferenced and a
member method is called.

However, if in the process of hooking in the new node during
push_back(), this new node is traversed by the for() loop in the second
thread, it is possible that itr->next does not point to a valid node in
the list, and not to the node returned by end(). Thus when the iterator
is assigned to itr->next, it can point to an invalid section of memory,
and then when the member method is called, execution can occur in an
unintended spot.

interesting thought at least

Its actually pretty neat, you can potentially cause a SIGTERM to be sent to a remote process on Linux if you can cause it to consume large amounts of memory, which in turn can potentially cause a signal handler to be invoked, which can potentially happen at awkward and inopportune moments, or specifically what I'm trying to accomplish in something that I'm working on is that it has a function they call to deinitialize the entire process and deallocates globally-scoped memory. Specifically it does things like tear down database connections and destroys their opaque datatypes, and other similar things along with actual calls to free().

So, the Linux OOM killer calls a function named badness() (no joke) that determines every processes likelyhood of resource abuse to reclaim memory for the system, its a pretty extreme goodbye, but the OS does this when absolutely necessary to reclaim memory. The badness is calculated by various factors, including its capability set (specifically CAP_SYS_ADMIN and CAP_SYS_RAWIO), how long the process has been running, how many children it has and of course how much memory its consuming. Finally it takes a user-tunable number, and left shifts the badness number that many times. Supposedly a value of -17 in this /proc file can causes the OOM killer to not consider that process if its a process leader. Furthermore, processes that are in the process of free()'ing memory are not candidates.

So now the stage is set, I have a signal handler that calls a function that is seriously not-reentrant, but I can't reach it via traditionally applicable signals (signals that can be sent remotely), i.e. SIGPIPE, SIGURG, et cetera. It can be reached via things like SIGHUP, SIGTERM, SIGINT.

I can however cause a SIGTERM to be sent indirectly.

All I need is a memory leak, the more of them the better, I need as much precision that I can get on triggering this, if I can get it to trigger at the same time this process is already calling that exit function and I can use one of the pieces of code that more or less acts as a destructor to use a dangling pointer and write 4 bytes to say the actual destructors, when the process calls libc's exit() I may be able to cause it to call an atexit function, which I hopefully control and if all of these conditions are met, I land at a root shell.

jf