Here's an alternate approach to the 'round up' part. Not the most brilliantly coded solution but it gets the job done, and this type of syntax is a bit easier to remember (plus would work for alignment values that aren't a power of 2). The uintptr_t
cast was necessary to appease the compiler; pointer arithmetic isn't very fond of division or multiplication.
void *mem = malloc(1024 + 15);
void *ptr = (void*) ((uintptr_t) mem + 15) / 16 * 16;
memset_16aligned(ptr, 0, 1024);
free(mem);