For the solution i used a concept of padding which aligns the memory and do not waste the memory of a single byte .
If there are constraints that, you cannot waste a single byte. All pointers allocated with malloc are 16 bytes aligned.
C11 is supported, so you can just call aligned_alloc (16, size)
.
void *mem = malloc(1024+16);
void *ptr = ((char *)mem+16) & ~ 0x0F;
memset_16aligned(ptr, 0, 1024);
free(mem);