Quantcast
Channel: How to allocate aligned memory only using the standard library? - Stack Overflow
Viewing all articles
Browse latest Browse all 18

Answer by Ian Ollmann for How to allocate aligned memory only using the standard library?

$
0
0

We do this sort of thing all the time for Accelerate.framework, a heavily vectorized OS X / iOS library, where we have to pay attention to alignment all the time. There are quite a few options, one or two of which I didn't see mentioned above.

The fastest method for a small array like this is just stick it on the stack. With GCC / clang:

 void my_func( void )
 {
     uint8_t array[1024] __attribute__ ((aligned(16)));
     ...
 }

No free() required. This is typically two instructions: subtract 1024 from the stack pointer, then AND the stack pointer with -alignment. Presumably the requester needed the data on the heap because its lifespan of the array exceeded the stack or recursion is at work or stack space is at a serious premium.

On OS X / iOS all calls to malloc/calloc/etc. are always 16 byte aligned. If you needed 32 byte aligned for AVX, for example, then you can use posix_memalign:

void *buf = NULL;
int err = posix_memalign( &buf, 32 /*alignment*/, 1024 /*size*/);
if( err )
   RunInCirclesWaivingArmsWildly();
...
free(buf);

Some folks have mentioned the C++ interface that works similarly.

It should not be forgotten that pages are aligned to large powers of two, so page-aligned buffers are also 16 byte aligned. Thus, mmap() and valloc() and other similar interfaces are also options. mmap() has the advantage that the buffer can be allocated preinitialized with something non-zero in it, if you want. Since these have page aligned size, you will not get the minimum allocation from these, and it will likely be subject to a VM fault the first time you touch it.

Cheesy: Turn on guard malloc or similar. Buffers that are n*16 bytes in size such as this one will be n*16 bytes aligned, because VM is used to catch overruns and its boundaries are at page boundaries.

Some Accelerate.framework functions take in a user supplied temp buffer to use as scratch space. Here we have to assume that the buffer passed to us is wildly misaligned and the user is actively trying to make our life hard out of spite. (Our test cases stick a guard page right before and after the temp buffer to underline the spite.) Here, we return the minimum size we need to guarantee a 16-byte aligned segment somewhere in it, and then manually align the buffer afterward. This size is desired_size + alignment - 1. So, In this case that is 1024 + 16 - 1 = 1039 bytes. Then align as so:

#include <stdint.h>
void My_func( uint8_t *tempBuf, ... )
{
    uint8_t *alignedBuf = (uint8_t*) 
                          (((uintptr_t) tempBuf + ((uintptr_t)alignment-1)) 
                                        & -((uintptr_t) alignment));
    ...
}

Adding alignment-1 will move the pointer past the first aligned address and then ANDing with -alignment (e.g. 0xfff...ff0 for alignment=16) brings it back to the aligned address.

As described by other posts, on other operating systems without 16-byte alignment guarantees, you can call malloc with the larger size, set aside the pointer for free() later, then align as described immediately above and use the aligned pointer, much as described for our temp buffer case.

As for aligned_memset, this is rather silly. You only have to loop in up to 15 bytes to reach an aligned address, and then proceed with aligned stores after that with some possible cleanup code at the end. You can even do the cleanup bits in vector code, either as unaligned stores that overlap the aligned region (providing the length is at least the length of a vector) or using something like movmaskdqu. Someone is just being lazy. However, it is probably a reasonable interview question if the interviewer wants to know whether you are comfortable with stdint.h, bitwise operators and memory fundamentals, so the contrived example can be forgiven.


Viewing all articles
Browse latest Browse all 18

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>