Hi, everyone.
I have a problem using automatic vectorization in ifort (Version 17.0.4.196 Build 20170411, comes with Parallel Studio XE release 2017.4). I have this simple module with two functions:
- test1 gets an inout 2D matrix, assumes it is 64-byte aligned, and doubles the entries
- test2 gets an in 2D matrix, assumes it is 64-byte aligned, allocates a 64-byte aligned second matrix, and copies the entries from the argument to the newly allocated array
The source (attached) uses assume_aligned and align directives. AFAIK, this should in both cases result in aligned memory access. This is the report I get for test1:
remark #15388: vectorization support: reference fsm(i,j) has aligned access [ loops.f90(20,11) ] remark #15388: vectorization support: reference fsm(i,j) has aligned access [ loops.f90(20,22) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 1.000 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15448: unmasked aligned unit stride loads: 1 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 6 remark #15477: vector cost: 0.620 remark #15478: estimated potential speedup: 6.870 remark #15488: --- end vector cost summary ---
And this is the report for test2:
remark #15388: vectorization support: reference F(i,j) has aligned access [ loops.f90(39,11) ] remark #15389: vectorization support: reference fsm(i,j) has unaligned access [ loops.f90(39,11) ] remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.444 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 4 remark #15477: vector cost: 1.120 remark #15478: estimated potential speedup: 3.110 remark #15488: --- end vector cost summary ---
In the second loop the argument matrix reports unaligned access, which I think is wrong. Moreover, the report is the same whether or not I include the assume_aligned and align directives - they don't seem to have any impact. What am I missing here?
FYI, I compile with
ifort -O3 -mavx -fopenmp -qopt-report-phase=vec,loop -qopt-report=5 -qopt-streaming-stores never -mcmodel=medium -c loops.f90
And the code follows:
module loops public :: test1, test2 contains subroutine test1(fsm, im, jm) implicit none real, dimension(:,:), intent(inout) :: fsm integer, intent(in) :: im, jm integer i, j !dir$ assume_aligned fsm:64 do j = 1,jm do i = 1,im fsm(i,j) = fsm(i,j)*2 end do end do end subroutine test1 subroutine test2(fsm, im, jm) implicit none real, dimension(:,:), intent(in) :: fsm integer, intent(in) :: im, jm integer i, j real, dimension(:,:), allocatable :: f !dir$ assume_aligned fsm:64 !dir$ attributes align: 64:: f allocate(f(im, jm)) do j = 1,jm do i = 1,im f(i,j) = fsm(i,j) end do end do end subroutine test2 end module loops
Could anyone shed some light on this?
Thanks a lot!
Marcin