Greetings,
I am developing a program to study the dynamics of a system of particles in a cubic box with periodic boundary conditions. I would like to take advantage of auto-vectorization to expedite computations. To that end, I have created a data structure for storing the position, orientation, force, and torque vectors (among quantities of interest).
If I understood correctly the documentation on auto-vectorization, I can request the compiler to align the data structure (or derived-type) to a 32-bit boundary and use the sequence attribute to pack the components of the structure (no-padding). I am interested in aligning data to 32-bit boundaries since I am targeting the program to run in AVX processors. The problem that I am facing is that none of the components becomes aligned despite the fact that the data structure is requested to be aligned to a 32-bit boundary.
To ease the troubleshooting, I am posting a minimalistic version of the code here (with comments for clarity of the reader):
module dynamics
! Description:
! Defines the principal data structure that stores information about the particles
! in the system. To take advantage of auto-vectorization the data is organized
! as a Structure of Arrays (SoA).
use, intrinsic :: iso_fortran_env
implicit none
type data
sequence
! position vector
real(kind = real64), allocatable :: r_x(:);
real(kind = real64), allocatable :: r_y(:);
real(kind = real64), allocatable :: r_z(:);
! orientation vector (director)
real(kind = real64), allocatable :: d_x(:);
real(kind = real64), allocatable :: d_y(:);
real(kind = real64), allocatable :: d_z(:);
! force vector
real(kind = real64), allocatable :: F_x(:);
real(kind = real64), allocatable :: F_y(:);
real(kind = real64), allocatable :: F_z(:);
! torque vector
real(kind = real64), allocatable :: T_x(:);
real(kind = real64), allocatable :: T_y(:);
real(kind = real64), allocatable :: T_z(:);
! displacement vector (the prefix `d' stands for delta or difference)
real(kind = real64), allocatable :: dr_x(:);
real(kind = real64), allocatable :: dr_y(:);
real(kind = real64), allocatable :: dr_z(:);
! particle ID array
integer(kind = int32), allocatable :: ID(:);
! padding array, such that we have an equivalent of 16 arrays of 64 bits each
integer(kind = int32), allocatable :: padding(:);
end type data
private
public data
end module dynamics
program alignment_test
! Minimalistic program to test the alignment of derived-types.
use, intrinsic :: iso_fortran_env
use dynamics
implicit none
! particle data structure, aligned to 32-bits
type(data), target :: pdata;
!dir$ attributes align: 32 :: pdata
! number of particles in the system
integer(kind = int32), parameter :: n_pdata = 64;
! captures the status returned by allocate/deallocate functions
integer(kind = int32) :: alloc_stat;
! pointer to access the components of the data structure
real(kind = real64), pointer, contiguous :: pr_x(:);
real(kind = real64), pointer, contiguous :: pr_y(:);
real(kind = real64), pointer, contiguous :: pr_z(:);
allocate( pdata %r_x(n_pdata), pdata %r_y(n_pdata), pdata %r_z(n_pdata),&
pdata %d_x(n_pdata), pdata %d_y(n_pdata), pdata %d_z(n_pdata),&
pdata %F_x(n_pdata), pdata %F_y(n_pdata), pdata %F_z(n_pdata),&
pdata %T_x(n_pdata), pdata %T_y(n_pdata), pdata %T_z(n_pdata),&
pdata %dr_x(n_pdata), pdata %dr_y(n_pdata), pdata %dr_z(n_pdata),&
pdata %ID(n_pdata), pdata %padding(n_pdata), stat=alloc_stat );
if ( alloc_stat /= 0 ) then
stop "insufficient memory to allocate the data structure, program stopped"
end if
pr_x => pdata %r_x;
pr_y => pdata %r_y;
pr_z => pdata %r_z;
! assign pretend values to the components of the position vector of the particles
!dir$ assume_aligned pr_x: 32
pr_x = 0.0d+0;
!dir$ assume_aligned pr_y: 32
pr_y = 0.0d+0;
!dir$ assume_aligned pr_z: 32
pr_z = 0.0d+0;
! free structure from memory
deallocate( pdata %r_x, pdata %r_y, pdata %r_z,&
pdata %d_x, pdata %d_y, pdata %d_z,&
pdata %F_x, pdata %F_y, pdata %F_z,&
pdata %T_x, pdata %T_y, pdata %T_z,&
pdata %dr_x, pdata %dr_y, pdata %dr_z,&
pdata %ID, pdata %padding, stat=alloc_stat );
if ( alloc_stat /= 0 ) then
stop "unexpected error, failed deallocate the data structure..."
end if
end program
The program was compiled in the following manner:
ifort -g -traceback -check all -align nosequence -O0 alignment_test.f90
Here is the output generated at runtime:
forrtl: severe (408): fort: (28): Check for ASSUME_ALIGNED fails for 'PR_X' in routine 'ALIGNMENT_TEST' at line 77.
Image PC Routine Line Source
a.out 0000000000407786 Unknown Unknown Unknown
a.out 000000000040454F MAIN__ 77 alignment_test.f90
a.out 0000000000402F1E Unknown Unknown Unknown
libc.so.6 0000003B1F81ED5D Unknown Unknown Unknown
a.out 0000000000402E29 Unknown Unknown Unknown
and the version of the Fortran compiler is the following:
ifort --version
ifort (IFORT) 16.0.3 20160415
Copyright (C) 1985-2016 Intel Corporation. All rights reserved.
Thanks in advance for your help,
Misael