Greetings,
I am developing a program to study the dynamics of a system of particles in a cubic box with periodic boundary conditions. I would like to take advantage of auto-vectorization to expedite computations. To that end, I have created a data structure for storing the position, orientation, force, and torque vectors (among quantities of interest).
If I understood correctly the documentation on auto-vectorization, I can request the compiler to align the data structure (or derived-type) to a 32-bit boundary and use the sequence attribute to pack the components of the structure (no-padding). I am interested in aligning data to 32-bit boundaries since I am targeting the program to run in AVX processors. The problem that I am facing is that none of the components becomes aligned despite the fact that the data structure is requested to be aligned to a 32-bit boundary.
To ease the troubleshooting, I am posting a minimalistic version of the code here (with comments for clarity of the reader):
module dynamics ! Description: ! Defines the principal data structure that stores information about the particles ! in the system. To take advantage of auto-vectorization the data is organized ! as a Structure of Arrays (SoA). use, intrinsic :: iso_fortran_env implicit none type data sequence ! position vector real(kind = real64), allocatable :: r_x(:); real(kind = real64), allocatable :: r_y(:); real(kind = real64), allocatable :: r_z(:); ! orientation vector (director) real(kind = real64), allocatable :: d_x(:); real(kind = real64), allocatable :: d_y(:); real(kind = real64), allocatable :: d_z(:); ! force vector real(kind = real64), allocatable :: F_x(:); real(kind = real64), allocatable :: F_y(:); real(kind = real64), allocatable :: F_z(:); ! torque vector real(kind = real64), allocatable :: T_x(:); real(kind = real64), allocatable :: T_y(:); real(kind = real64), allocatable :: T_z(:); ! displacement vector (the prefix `d' stands for delta or difference) real(kind = real64), allocatable :: dr_x(:); real(kind = real64), allocatable :: dr_y(:); real(kind = real64), allocatable :: dr_z(:); ! particle ID array integer(kind = int32), allocatable :: ID(:); ! padding array, such that we have an equivalent of 16 arrays of 64 bits each integer(kind = int32), allocatable :: padding(:); end type data private public data end module dynamics program alignment_test ! Minimalistic program to test the alignment of derived-types. use, intrinsic :: iso_fortran_env use dynamics implicit none ! particle data structure, aligned to 32-bits type(data), target :: pdata; !dir$ attributes align: 32 :: pdata ! number of particles in the system integer(kind = int32), parameter :: n_pdata = 64; ! captures the status returned by allocate/deallocate functions integer(kind = int32) :: alloc_stat; ! pointer to access the components of the data structure real(kind = real64), pointer, contiguous :: pr_x(:); real(kind = real64), pointer, contiguous :: pr_y(:); real(kind = real64), pointer, contiguous :: pr_z(:); allocate( pdata %r_x(n_pdata), pdata %r_y(n_pdata), pdata %r_z(n_pdata),& pdata %d_x(n_pdata), pdata %d_y(n_pdata), pdata %d_z(n_pdata),& pdata %F_x(n_pdata), pdata %F_y(n_pdata), pdata %F_z(n_pdata),& pdata %T_x(n_pdata), pdata %T_y(n_pdata), pdata %T_z(n_pdata),& pdata %dr_x(n_pdata), pdata %dr_y(n_pdata), pdata %dr_z(n_pdata),& pdata %ID(n_pdata), pdata %padding(n_pdata), stat=alloc_stat ); if ( alloc_stat /= 0 ) then stop "insufficient memory to allocate the data structure, program stopped" end if pr_x => pdata %r_x; pr_y => pdata %r_y; pr_z => pdata %r_z; ! assign pretend values to the components of the position vector of the particles !dir$ assume_aligned pr_x: 32 pr_x = 0.0d+0; !dir$ assume_aligned pr_y: 32 pr_y = 0.0d+0; !dir$ assume_aligned pr_z: 32 pr_z = 0.0d+0; ! free structure from memory deallocate( pdata %r_x, pdata %r_y, pdata %r_z,& pdata %d_x, pdata %d_y, pdata %d_z,& pdata %F_x, pdata %F_y, pdata %F_z,& pdata %T_x, pdata %T_y, pdata %T_z,& pdata %dr_x, pdata %dr_y, pdata %dr_z,& pdata %ID, pdata %padding, stat=alloc_stat ); if ( alloc_stat /= 0 ) then stop "unexpected error, failed deallocate the data structure..." end if end program
The program was compiled in the following manner:
ifort -g -traceback -check all -align nosequence -O0 alignment_test.f90
Here is the output generated at runtime:
forrtl: severe (408): fort: (28): Check for ASSUME_ALIGNED fails for 'PR_X' in routine 'ALIGNMENT_TEST' at line 77.
Image PC Routine Line Source
a.out 0000000000407786 Unknown Unknown Unknown
a.out 000000000040454F MAIN__ 77 alignment_test.f90
a.out 0000000000402F1E Unknown Unknown Unknown
libc.so.6 0000003B1F81ED5D Unknown Unknown Unknown
a.out 0000000000402E29 Unknown Unknown Unknown
and the version of the Fortran compiler is the following:
ifort --version
ifort (IFORT) 16.0.3 20160415
Copyright (C) 1985-2016 Intel Corporation. All rights reserved.
Thanks in advance for your help,
Misael