Quantcast
Channel: Intel® Software - Intel® Fortran Compiler for Linux* and macOS*
Viewing all articles
Browse latest Browse all 2746

Floating invalid only when vectorizing

$
0
0

Dear all

First of all I must apologize as I for legal reasons can't show any actual code and I have not been able to draft dummy code that can reproduce the problem I am experiencing. The code in question is doing scientific HPC calculations and is of the order of 200 000 lines long. Core parts of it is F77 and my job is to update the code and turn everything into more modern Fortran. Serial performance is a very high priority as we are pushing the limits when it comes to parallelization at a few thousands cores. The core subroutines (serial, all parallelization is done at a higher level) are hit billions of times and the code in those subroutines are optimized for vectorization. I can, again for legal reasons, not copy any output or code from the system where the code is running, but I hope you all can see beyond any obvious typos.

The code used to run without any problems but after  an update of parts not related(?) to the core routines I am getting error(65), Floating invalids right of the bat in one of the core subroutines. It is obviously something I have done but I can for my life not figure it out and the problem turned out to be very difficult to debug. I only see the error when the compiler vectorize the code. Whatever I do that turn vectorization off (-novec, -O0 or -O1 or add any form of -check for example or fork the code with a print statement in the vectorized part) also "fix" the problem and the code will run without any errors.

The update I did was to change the definition of the local floats in a few supporting subroutines and functions from double precision to selected_real_kind(15,307). All floats in modules are still defined as double precision and the same is true for all local floats in the core routines. The code is compiled with the options " -mcmodel=large -align array64byte -xCORE-AVX512 -O3 -qopt-zmm-usage=low -fp-model fast=2 -g -traceback -qopt-report=5" and that has not been changed. The code will for some reason always crash with -qopt-zmm-usage=high and Intel's experts have not been able to diagnose that problem which of course is of some concern.

The run-time error and Intel Inspector point to a specific line in the code and Inspector also indicates that there is something going partially wrong when a array used in the aforementioned line is allocated. The array in question is defined in a module

double precision, dimension(:), allocatable :: A, B, C

and allocated as

allocate(A(1:nj))

Inspector reports a Invalid partial memory access when the array is allocated. When I add code to inquire the status of the allocation I get STAT=0. The array A is used in the code before the core routine where the code crashes without any problem. The segment of the core subroutine where the code crashes looks like this:

!DIR$ ASSUME_ALIGNED A(1):64
!DIR$ ASSUME_ALIGNED B(1):64
!DIR$ ASSUME_ALIGNED C(1):64

!DIR$ IVDEP

do j = 1, nj

  locarray1(j) = locvar1 + locvar2*B(j)/A(j) + locvar3*C(j)  ! CRASH HERE

end do

All local arrays and variables are defined as double precision and I can inspect A, B and C and all other variables outside of the loop and the bounds are well defined and all values are non-NaN and not zero (A ~ 0.5). Inspector reports Invalid memory access at the offending line.

I have tried to align the arrays A, B, and C "by hand" by using !DIR$ attributes align: 64 :: A, B,C in the module where the arrays are defined without any difference. In the optrpt-file the compiler reports

vectorization support: reference A(j) has aligned acess

and the same for locarray1, B, and C. The loop in question is vectorized without a peel loop but with a small reminder loop.

Sorry for the wall of text and lack of actual code. I hope I could make my case anyway. It is frustrating to debug a code where any attempt to look closer at the problem makes it go away. Could it be that the KIND parameter used in the updated subroutines messes the alignment up in some global context? Is the "new" KIND parameter not compatible with -align array64byte?

Best regards

Per

 


Viewing all articles
Browse latest Browse all 2746

Trending Articles