Is there any advantage defining an array's size to be a multiple of 8, if using 64 bit UNIX OS? I am intended to use this array for loading data from shared memory. So dependencies may exist on the operating system and the page size.
Doesn't matter. Your compiler knows whether or not it wants padding there, so let it decide. Don't mud up your code because of guess-work.
Get your program working first, then care about performance with a profiler.
Assuming you're dynamically allocating the array on the heap, it's fair to assume that malloc's internal allocation algorithm will be doing some abstraction away from actual memory requests to the kernel. That is to say, there may or may not be a direct relationship between your malloc() call and libc's brk() (or mmap()) system call.
The malloc man page has some more on this.
So in terms of memory usage I would tend to suggest that it won't really matter whether or not you allocate in multiples of 8 bytes since malloc will likely be doing something different (and sensible) beneath you.
In terms of program performance, the allocation of your data structures in memory can have a huge impact on cache performance. Ultimately, though, you will need to profile your application to see whether you could improve its cache performance. I don't believe there is a hard and fast rule which will let you optimise for this as you write your code.
If you're interested in learning more about memory and Linux, Ulrich Drepper wrote a fantastic series for LWN on the subject a few years ago:
If you is about memory access alignment or so - it is internal environment/libc matters how to align dynamic allocations. It is not guaranteed to have some array aligned in specific way if its size is aligned. Many allocators return memory blocks aligned to some value (about of 2x or 4x size of machine word) so it is not the place to bother about alignment.
I remember only several things that may have significance:
You may want use vector operations and/or unrolled loops to process an array, so it may be necessary to have some padding to make program not to fall beyond allocated area. (But if your vector engine require alignment beyond standard C implementation provide, you have to allocate the memory in another way than just simple malloc() anyway).
Most of memory allocators store service information (e.g. allocated block size) beside allocated area, and total size of memory cut from free are slightly larger. Si it may be best to allocate area of size slightly less than some round value to have areas densely packed in some standard allocation block (say memory page or so). As an example if CPU have 4k page, then page may contain only 3 1024 byte blocks, but 4 1008 byte (=1024-8) blocks.
Also, many memory allocators have a block size threshold, below such memory is allocated from heap, but above it memory is got directly from OS VM dispatcher by whole hardware pages and thus aligned on page boundary. In this case it may be necessary to round allocation size up to page size to get whole page.
There may be soume other issues but I don't remember 'em.