Intel Fortran MPI hyperthreading performance issue

May 13, 2019, 8:45 am

Latest and popular articles on Intel Technologies

Hello everyone,

I have a Dell Precision Tower 7910 with the Dual Intel® Xeon® Processor E5-2697 v4 (18C, 2.3GHz, 3.6GHz Turbo, 2400MHz, 45MB, 145W). I use it to run various programs written in MPI Fortran. The processor has 36 physical cores, but because of the hyperthreading technology, the system shows 72 CPUs.

I've always achieved peak performance with my programs at 36 processes sharp. Any higher number of processes resulted in a dramatic slowdown. I took it as a sign that the hyperthreading isn't working well, at least for my application.

But recently a friend ran the same code on a similar workstation with the Dual Intel Xeon Gold 6148 2.4GHz, 3.7GHz, 20C, 10.4GT/s 3UPI, 27M Cache, HT (150W) DDR4-2666. The same program achieved a continuous speedup up to 80 processes!

This makes me wonder if I need to change some option in my system, or install a different compiler version? I'm puzzled. I would appreciate any advise on this issue! Below I'm attaching some details of the two systems (output of cpuinfo and compiler versions).

My system:

===== Processor composition =====
Processor name : Intel(R) Xeon(R) E5-2697 v4
Packages(sockets) : 2
Cores : 36
Processors(CPUs) : 72
Cores per package : 18
Threads per core : 2

ifort (IFORT) 19.0.0.117 20180804
Copyright (C) 1985-2018 Intel Corporation. All rights reserved.

Friend's system:

Processor name : Intel(R) Xeon(R) Gold 6148
Packages(sockets) : 2
Cores : 40
Processors(CPUs) : 80
Cores per package : 20
Threads per core : 2

ifort (IFORT) 19.0.3.199 20190206
Copyright (C) 1985-2018 Intel Corporation. All rights reserved.

↧

module files transformation

May 13, 2019, 7:07 pm

Latest and popular articles on Intel Technologies

≫ Next: relocation truncated to fit

≪ Previous: Intel Fortran MPI hyperthreading performance issue

Is there any approach to convert a module file generated by compiler other than ifort (eg. gfortran) so that an ifort-compiled program can link to ? On Windows OS, there is a tool Module Wizard which can generate interfaces for DLLs, any Linux tool can do that job? Thanks for help.

↧

relocation truncated to fit

May 15, 2019, 8:20 am

Latest and popular articles on Intel Technologies

≫ Next: Intel fortran installed but can not be used

≪ Previous: module files transformation

Dear all,

When I compile my Fortran code, I got the following error,

ipo_out1.f:(.text.hot00002+0x9a): relocation truncated to fit: R_X86_64_PC32 against symbol `cdata_mp_ilnrho_' defined in .bss.cdata_mp_ilnrho_[cdata_mp_ilnrho_] section in /tmp/ipo_ifort4MDURv1.o
ld: ipo_out1.f:(.text.hot00002+0x105): relocation truncated to fit: R_X86_64_PC32 against symbol `cdata_mp_t_' defined in .bss.cdata_mp_t_[cdata_mp_t_] section in /tmp/ipo_ifort4MDURv1.o
ld: ipo_out1.f:(.text.hot00002+0x136): relocation truncated to fit: R_X86_64_PC32 against symbol `cdata_mp_ilnrho_' defined in .bss.cdata_mp_ilnrho_[cdata_mp_ilnrho_] section in /tmp/ipo_ifort4MDURv1.o
ld: ipo_out1.f:(.text.hot00002+0x383): relocation truncated to fit: R_X86_64_PC32 against symbol `cdata_mp_lfirstpoint_' defined in .bss.cdata_mp_lfirstpoint_[cdata_mp_lfirstpoint_] section in /tmp/ipo_ifort4MDURv1.o
ld: ipo_out1.f:(.text.hot00002+0x390): relocation truncated to fit: R_X86_64_PC32 against symbol `cdata_mp_imn_' defined in .bss.cdata_mp_imn_[cdata_mp_imn_] section in /tmp/ipo_ifort4MDURv1.o
ld: ipo_out1.f:(.text.hot00002+0x39b): relocation truncated to fit: R_X86_64_PC32 against symbol `cdata_mp_necessary_' defined in .bss.cdata_mp_necessary_[cdata_mp_necessary_] section in /tmp/ipo_ifort4MDURv1.o
ld: ipo_out1.f:(.text.hot00002+0x3aa): relocation truncated to fit: R_X86_64_PC32 against symbol `cdata_mp_mm_' defined in .bss.cdata_mp_mm_[cdata_mp_mm_] section in /tmp/ipo_ifort4MDURv1.o
ld: ipo_out1.f:(.text.hot00002+0x3b9): relocation truncated to fit: R_X86_64_PC32 against symbol `cdata_mp_nn_' defined in .bss.cdata_mp_nn_[cdata_mp_nn_] section in /tmp/ipo_ifort4MDURv1.o
ld: ipo_out1.f:(.text.hot00002+0x3d5): relocation truncated to fit: R_X86_64_PC32 against symbol `cdata_mp_n_' defined in .bss.cdata_mp_n_[cdata_mp_n_] section in /tmp/ipo_ifort4MDURv1.o
ld: ipo_out1.f:(.text.hot00002+0x3dc): relocation truncated to fit: R_X86_64_PC32 against symbol `cdata_mp_m_' defined in .bss.cdata_mp_m_[cdata_mp_m_] section in /tmp/ipo_ifort4MDURv1.o
ld: ipo_out1.f:(.text.hot00002+0x462): additional relocation overflows omitted from the output
ld: failed to convert GOTPCREL relocation; relink with --no-relax

The largest array size in my simulation is 17 times smaller than "2^(31)-1"

System: CentOS release 6.10

Intel: intel/2019.1.144-GCC-8.2.0-2.31.1/impi/2018.4.274/bin64/mpiifort

Any idea about how this happens?

Best regards,

Sean Lee

↧

Intel fortran installed but can not be used

May 15, 2019, 8:55 am

Latest and popular articles on Intel Technologies

≫ Next: BIND(C,NAME="longliteral") causes ICE

≪ Previous: relocation truncated to fit

Hello Intel,

I dowlnoaded parallel_studio_xe_2019_update3_cluster_edition and then custom installed Intel fortan and Vtune amplifier as shown below;

1. Accept and continue [ default ]
2. [None] Intel Trace Analyzer and Collector 2019 Update 3
3. [None] Intel Cluster Checker 2019 Update 2
4. [All] Intel VTune Amplifier 2019 Update 3
5. [None] Intel Inspector 2019 Update 3
6. [None] Intel Advisor 2019 Update 3
7. [None] Intel C++ Compiler 19.0 Update 3
8. [All] Intel Fortran Compiler 19.0 Update 3
9. [None] Intel Math Kernel Library 2019 Update 3 for C/C++
10.[None] Intel Math Kernel Library 2019 Update 3 for Fortran
11.[None] Intel Integrated Performance Primitives 2019 Update 3
12.[None] Intel Threading Building Blocks 2019 Update 4
13.[None] Intel Data Analytics Acceleration Library 2019 Update 3
14.[None] Intel MPI Library 2019 Update 3
15.[None] GNU* GDB 8.0
16.[None] Intel(R) Distribution for Python*
Then it showed some missing pre-requistes as shown below;

Prerequisites > Missing Prerequisite(s)
--------------------------------------------------------------------------------
There are one or more unresolved issues based on your system configuration and
component selection.

You can resolve all the issues without exiting the installer and re-check, or
you can exit, resolve the issues, and then run the installation again.

--------------------------------------------------------------------------------
Missing optional prerequisites
-- 32-bit libraries not found
--------------------------------------------------------------------------------
1. Skip prerequisites [ default ]
2. Show the detailed info about issue(s)
3. Re-check the prerequisites

I pursued with installation and installed successfully;

Installation
--------------------------------------------------------------------------------
Each component will be installed individually. If you cancel the installation,
some components might remain on your system. This installation may take several
minutes, depending on your system and the options you selected.
--------------------------------------------------------------------------------
Installing Platform Profiler component... done
--------------------------------------------------------------------------------

Installing Command line interface component... done
--------------------------------------------------------------------------------
Installing Sampling Driver kit component... done
--------------------------------------------------------------------------------
Installing Graphical user interface component... done
--------------------------------------------------------------------------------
Installing Intel Fortran Compiler for Intel(R) 64 component... done
--------------------------------------------------------------------------------
Finalizing product configuration...
--------------------------------------------------------------------------------
Installing Application Performance Snapshot... done
--------------------------------------------------------------------------------
Preparing driver configuration scripts... done
--------------------------------------------------------------------------------

But now when I type in the terminal ifort then it says command ifort not found. I am using ubuntu 18.04. Your suggestions are highly acknowledged.

↧

BIND(C,NAME="longliteral") causes ICE

May 16, 2019, 3:24 am

Latest and popular articles on Intel Technologies

≫ Next: How to generate profiling files for each process of a MPI program

≪ Previous: Intel fortran installed but can not be used

Hello,

This code causes Internal Compiler Error with ifort 19.0.4.227.

I would like to be able to use longer character constants.

!                                           1         2         3
Integer Function f() Bind (C,Name='_23456789012345678901234567890&
&12345678901234567890123456789012345678901234567890123456789012345')
!         4         5         6         7         8         9
  f = 0
  Return
End Function f

↧

How to generate profiling files for each process of a MPI program

May 22, 2019, 11:14 am

Latest and popular articles on Intel Technologies

≫ Next: Problem with big arrays?

≪ Previous: BIND(C,NAME="longliteral") causes ICE

Hi All,

I'm looking for help on profiling MPI program with Intel Fortran compiler.

Currently I'm profiling MPI Fortran program by compiling the code using "-profile-functions -profile-loops=all" options in Intel compiler, then run the generated code on multiple compute nodes. After the execution I got one "loop_prof_funcs_xxx.dump" and one "loop_prof_loops_xxx.dump" profiling files showing the aggregated execution costs at job level. This is helpful but it still miss the process level details. Is there any way to profile MPI program and generate profiling data (e.g. .dump files) for each individual process/rank?

Any guidance and advise is appreciated!

Thanks,

↧

Problem with big arrays?

May 22, 2019, 9:05 am

Latest and popular articles on Intel Technologies

≫ Next: Abaqus and Intel Fortran Compiler compatibility

≪ Previous: How to generate profiling files for each process of a MPI program

Hello. I have a code that was running just fine. Then, I increased the size of my arrays largely and it sometimes doesn't compiles and I tried with some flags that compiled but crashed with errors I don't understand.

I had some arrays of size about 3000, that I changed to 22554 ( and also squared matrices of this size). I have only my main code and a module, in which I create most of my global variables.

I was originally compiling just with:

ifort -c global_param.f90; ifort -c dyn.f90; ifort global_param.o dyn.o -check bounds

And for that I got the error during compiling:

global_param.o: In function `global_param_mp_hamiltonian_':
global_param.f90:(.text+0x145): relocation truncated to fit: R_X86_64_32S against symbol `global_param_mp_pot1_' defined in COMMON section in global_param.o
global_param.f90:(.text+0x14e): relocation truncated to fit: R_X86_64_32S against symbol `global_param_mp_pot1_' defined in COMMON section in global_param.o
global_param.f90:(.text+0x173): relocation truncated to fit: R_X86_64_32S against symbol `global_param_mp_pot1_' defined in COMMON section in global_param.o
global_param.f90:(.text+0x1d9): relocation truncated to fit: R_X86_64_32S against symbol `global_param_mp_pot1_' defined in COMMON section in global_param.o
global_param.f90:(.text+0x1e3): relocation truncated to fit: R_X86_64_32S against symbol `global_param_mp_pot1_' defined in COMMON section in global_param.o
global_param.f90:(.text+0x212): relocation truncated to fit: R_X86_64_32S against symbol `global_param_mp_pot1_' defined in COMMON section in global_param.o
global_param.f90:(.text+0x22c): relocation truncated to fit: R_X86_64_32S against symbol `global_param_mp_pot2_' defined in COMMON section in global_param.o
global_param.f90:(.text+0x236): relocation truncated to fit: R_X86_64_32S against symbol `global_param_mp_pot2_' defined in COMMON section in global_param.o
global_param.f90:(.text+0x25a): relocation truncated to fit: R_X86_64_32S against symbol `global_param_mp_pot2_' defined in COMMON section in global_param.o
global_param.f90:(.text+0x2c0): relocation truncated to fit: R_X86_64_32S against symbol `global_param_mp_pot2_' defined in COMMON section in global_param.o
global_param.f90:(.text+0x2ca): additional relocation overflows omitted from the output

The I searched and found out that this could be due to the size of my arrays, and that I should compile with the -fpic and -mcmodel flags. So, I tried:

ifort -c global_param.f90 -fpic -mcmodel=large ;ifort -c dyn.f90 -fpic -mcmodel=large ; ifort -fpic -mcmodel=large global_param.o dyn.o -check bounds

And then I simply got the error

Segmentation fault

I don't know what to do. Can someone please help me?

Many thanks in advance,

Cayo

↧

Abaqus and Intel Fortran Compiler compatibility

May 22, 2019, 9:38 am

Latest and popular articles on Intel Technologies

≫ Next: About free version Parallel Studio XE

≪ Previous: Problem with big arrays?

Hi,

My company uses Abaqus 2019. As per the vendor's requirements, we need access to the following Fortran compiler:

Intel® Visual Fortran 16.0 Update 1

My question is simple: How do we get it and how much is it?

After a bit of poking about in the Intel website it looks as though we need Intel Parallel Studio XE. However, there are 3 versions of this, i.e.

Cluster, Professional and Composer.

Which of these do I need?

I don't know how many seats we need - I think one might be enough.

Also, I expect that we will have to access an older version of the software (i.e. ver 16.0.1) than you are currently marketing, but I don't think that will be a problem.

It might be useful to have some sort of editor / development tool (IDE?). However, there are 2 potential problems. Cost and complexity.

So, can you tell me how much it costs for the IDE (or is it included?). Secondly, Abaqus only "wants" to use the compiler, will having the IDE "confuse" it?

Apologies for the somewhat vague terminology, but I am not that techy.

Thanks in advance.

↧

About free version Parallel Studio XE

May 22, 2019, 6:40 pm

Latest and popular articles on Intel Technologies

≫ Next: Reduction with large OpenMP arrays

≪ Previous: Abaqus and Intel Fortran Compiler compatibility

What is the difference between Parallel Studio XE Composer Edition for Fortran macOS 2019 and the free version (Intel® Parallel Studio XE)? Can researchers use this free version for personal academic research?

↧

Reduction with large OpenMP arrays

May 25, 2019, 4:25 am

Latest and popular articles on Intel Technologies

≫ Next: Intrinsic function IS_CONTIGUOUS

≪ Previous: About free version Parallel Studio XE

Hi all,

I know that this has been asked sometimes: https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-ar..., or even in StackOverflow: https://stackoverflow.com/questions/24882425/openmp-array-reductions-wit..., https://stackoverflow.com/questions/20413995/reducing-on-array-in-openmp, but I would like to know your opinion because the scalability that I get is not the one that I expect.

So I need to fill a really large array of complex numbers, which I would like to parallelize with OpenMP. Our first approach is this one:

COMPLEX(KIND=DBL), ALLOCATABLE :: huge_array(:)
COMPLEX(KIND=DBL), ALLOCATABLE :: thread_huge_array(:)
INTEGER :: huge_number, index1, index2, index3, index4, index5, bignumber1, bignumber2, smallnumber, depending_index

ALLOCATE(huge_array(huge_number))

!$OMP PARALLEL FIRSTPRIVATE(thread_huge_array)
ALLOCATE(thread_huge_array(SIZE(huge_array)))
thread_huge_array = ZERO
!$OMP DO
DO index1=1,bignumber1
! Some calculations
DO index2=1,bignumber2
! Some calculations
DO index3=1,6
DO index4=1,6
DO index5=1,smallnumber
depending_index = function(index1, index2, index3, index4, index5)
thread_huge_array(depending_index) = thread_huge_array(depending_index)
ENDDO
ENDDO
ENDDO
ENDDO
ENDDO
!$OMP END DO
!$OMP BARRIER
!$OMP MASTER
huge_array = ZERO
!$OMP END MASTER
!$OMP CRITICAL
huge_array = huge_array + thread_huge_array
!$OMP END CRITICAL
DEALLOCATE(thread_huge_array)
!$OMP END PARALLEL

So, with that approach, we get good scalability until 8 cores, reasonable scalability until 32 cores and from 40 cores, it is slower than with 16 cores (we have a machine with 80 physical cores). Of course, we cannot use REDUCTION clause because the size of the array is so big that it doesn't fit in the stack (even increasing ulimit to the maximum allowed in the machine).

We have tried a different approach with this one:

COMPLEX(KIND=DBL), ALLOCATABLE :: huge_array(:)
COMPLEX(KIND=DBL), POINTER:: thread_huge_array(:,:)
INTEGER :: huge_number

ALLOCATE(huge_array(huge_number))

ALLOCATE(thread_huge_array(SIZE(huge_array),omp_get_max_threads()))
thread_huge_array = ZERO

!$OMP PARALLEL PRIVATE (num_thread)

num_thread = omp_get_thread_num()+1
!$OMP DO
DO index1=1,bignumber1
! Some calculations
DO index2=1,bignumber2
! Some calculations
DO index3=1,6
DO index4=1,num_weights_sp
DO index5=1,smallnumber
depending_index = function(index1, index2, index3, index4, index5)
thread_huge_array(depending_index, omp_get_thread_num()) = thread_huge_array(depending_index, omp_get_thread_num())
ENDDO
ENDDO
ENDDO
ENDDO
ENDDO
!$OMP END DO
!$OMP END PARALLEL

huge_array = ZERO

DO index_ii = 1,omp_get_max_threads()
huge_array = huge_array + thread_huge_array(:,index_ii)
ENDDO

DEALLOCATE(thread_huge_array)

DEALLOCATE(huge_array)

And in this last case, we obtain longer times for the method (due to the allocation of the memory, which is much bigger) and worse relative acceleration.

Can you provide some hints to achieve a better acceleration? Or is it impossible with these huge arrays with OpenMP?

Thanks for reading. This forum always helps me a lot with FORTRAN and you have always provided with very useful insights into the language and standards.

Best,

Adrian.

↧

Intrinsic function IS_CONTIGUOUS

May 27, 2019, 12:36 am

Latest and popular articles on Intel Technologies

≫ Next: Installer Broken Link

≪ Previous: Reduction with large OpenMP arrays

Dear all,

I was testing the new intrinsic function IS_CONTIGUOUS of Fortran 2008 and I found that gfortran 9.1.1 and ifort 19.0.2.187 give different results for the real part of a complex array.

Here is the example:

program testing
 implicit none
 complex(8) :: ac(1000)

 ac(:) = (1.0_8,1.0_8)
 write(*,*) 'real part is contiguous:',IS_CONTIGUOUS(ac(:)%re)

end program

$ gfortran contiguous.f90 
$ ./a.out 
 real part is contiguous: F
$ ifort contiguous.f90
$ ./a.out 
 real part is contiguous: T

Who's right?

Best,

Fabien

↧

Installer Broken Link

May 25, 2019, 6:16 pm

Latest and popular articles on Intel Technologies

≫ Next: sigsegv unexpected for parallelisation of private allocated array

≪ Previous: Intrinsic function IS_CONTIGUOUS

The installer links for the student edition, parallel studio XE are throwing 404 codes. I tried several previous versions, Full and Custom, same result.

↧

sigsegv unexpected for parallelisation of private allocated array

May 29, 2019, 6:53 am

Latest and popular articles on Intel Technologies

≫ Next: FPE in vectorized division using ifort 19.0.3

≪ Previous: Installer Broken Link

A bit of F90 code attached that I am using to see about speeding up the reading in of a number of files. I have 3 OpenMP parallelisation techniques:

(a) allocate array INPUTS, then enter PAR with PRIVATE(INPUTS) - since each thread reads its own data in to INPUTS from a different file

(b) within the PAR DO combo with PRIVATE(INPUTS), then allocate-read-deallocate per iteration

(c) have a PAR fork with PRIVATE(INPUTS), then allocate INPUTS on each thread, then a distributed DO to share reading of files over threads

but despite trying on different machines and different versions of Intel ifort compiler, for (a) even with 2 threads I get a sigsegv, sometimes at entry to the PAR DO but sometimes on the 3rd iteration of the loop, whereas the other 2 approaches both work fine. As far as I can see the memory usage is no bigger for (a) so why the sigsegv eg:

file read: chkSum: 5243196.
file closed successfully
file read: chkSum: 5243196.
file closed successfully
PAR-(c) reads took: 41.4187059402466

Program received signal SIGSEGV, Segmentation fault.
0x0000000000406c67 in L_MAIN___132__par_loop3_2_2 () at thrasher.f90:132
132 !$OMP PARALLEL DO DEFAULT(NONE) PRIVATE(filename, myStatus, stream, inputs)
Missing separate debuginfos, use: debuginfo-install libgcc-4.4.7-23.el6.x86_64
(gdb) l
127 ! read files in PARALLEL (a) single alloc pre-PAR
128 start=omp_get_wtime()
129 allocate(inputs(numReals), stat=myStatus)
130 if (myStatus /= 0) stop 'error allocating par-(a) inputs'
131
132 !$OMP PARALLEL DO DEFAULT(NONE) PRIVATE(filename, myStatus, stream, inputs)
133 do i=1, numFiles
134 stream=50+i
135 filename(1:5)="fort."
136 filename(6:8)=val2str(i)
(gdb) quit

Attachment	Size
Download thrasher.f90	5.35 KB

↧

FPE in vectorized division using ifort 19.0.3

May 30, 2019, 4:27 am

Latest and popular articles on Intel Technologies

≫ Next: performance implications of -prec-div

≪ Previous: sigsegv unexpected for parallelisation of private allocated array

The following code fails in division, works fine if vectorization is disabled.

        program div_test


           integer, parameter :: n = 180
           character*16 :: out
           real upp(4,n)
           integer i
           integer :: ieee_flags

           upp = 0
           upp(1,:) = 1

           i = ieee_flags('set', 'exception', 'all', out)

           !!DIR$ NOVECTOR
           do i = 1 ,n
              upp(1,i) = 1. / upp(1,i)
           enddo

           print *, maxval(upp)

        end

Looking at the assembly it is obvious why:

movss     16+div_test_$UPP.0.1(%rax), %xmm1             #20.15
        movaps    %xmm0, %xmm3                                  #20.15
        movss     div_test_$UPP.0.1(%rax), %xmm2                #20.15
        movaps    %xmm0, %xmm6                                  #20.15
        movss     48+div_test_$UPP.0.1(%rax), %xmm4             #20.15
        movaps    %xmm0, %xmm9                                  #20.15
        movss     32+div_test_$UPP.0.1(%rax), %xmm5             #20.15
        movaps    %xmm0, %xmm12                                 #20.15
        movss     80+div_test_$UPP.0.1(%rax), %xmm7             #20.15
        addl      $8, %edx                                      #19.12
        movss     64+div_test_$UPP.0.1(%rax), %xmm8             #20.15
        movss     112+div_test_$UPP.0.1(%rax), %xmm10           #20.15
        movss     96+div_test_$UPP.0.1(%rax), %xmm11            #20.15
        unpcklps  %xmm1, %xmm2                                  #20.15
        unpcklps  %xmm4, %xmm5                                  #20.15
        unpcklps  %xmm7, %xmm8                                  #20.15
        unpcklps  %xmm10, %xmm11                                #20.15
        divps     %xmm2, %xmm3                                  #20.15
        divps     %xmm5, %xmm6                                  #20.15
        divps     %xmm8, %xmm9                                  #20.15
        divps     %xmm11, %xmm12                                #20.15

the xmm* registers will carry some zeroes into the division. It is worth noting that previous compiler version (14.0.1) seems to be have in a more sensible way, adding two more unpcklps that will move the zeroes out of the registers.

movss     48+div_test_$UPP.0.1(%rax), %xmm0             #20.31
        addl      $8, %edx                                      #19.12
        movss     16+div_test_$UPP.0.1(%rax), %xmm2             #20.31
        movss     32+div_test_$UPP.0.1(%rax), %xmm1             #20.31
        movss     div_test_$UPP.0.1(%rax), %xmm3                #20.31
        movss     64+div_test_$UPP.0.1(%rax), %xmm8             #20.31
        movss     112+div_test_$UPP.0.1(%rax), %xmm5            #20.31
        movss     80+div_test_$UPP.0.1(%rax), %xmm7             #20.31
        movss     96+div_test_$UPP.0.1(%rax), %xmm6             #20.31
        unpcklps  %xmm0, %xmm2                                  #20.31
        unpcklps  %xmm1, %xmm3                                  #20.31
        unpcklps  %xmm5, %xmm7                                  #20.31
        unpcklps  %xmm6, %xmm8                                  #20.31
        unpcklps  %xmm2, %xmm3                                  #20.31
        movaps    .L_2il0floatpacket.0(%rip), %xmm4             #20.15
        movaps    .L_2il0floatpacket.0(%rip), %xmm9             #20.15
        unpcklps  %xmm7, %xmm8                                  #20.31
        divps     %xmm3, %xmm4                                  #20.15
        divps     %xmm8, %xmm9                                  #20.15

↧

performance implications of -prec-div

May 30, 2019, 7:40 am

Latest and popular articles on Intel Technologies

≫ Next: Xcode 10 - known compatibility issue

≪ Previous: FPE in vectorized division using ifort 19.0.3

Hi ,

Could someone please give an estimate of the expected slowdown of -prec-div on modern hardware.

I.e. what is the performance of a instructions like divps versus the newton-raphson sequence:

        rcpps
        mulps
        mulps
        addps
        subps

Without -prec-div the compiler may choose to do newton-raphson within the main vector loop while using divps in the peel loop

and that can affect the run-to-run reproducibility that I would like to maintain.

Ragards

↧

Xcode 10 - known compatibility issue

May 7, 2019, 10:13 am

Latest and popular articles on Intel Technologies

≫ Next: ifort beta bug: ICE

≪ Previous: performance implications of -prec-div

Hello all, and hopefully at least one expert, I have been banging my head against the wall trying to get the fortran compiler Xcode integration working. Update 3 supposedly supported Mojave and Xcode 10, yet after an install (both admin and root, typical installation) Xcode will not compile my Fortran projects, throwing the typical warning: no rule to process file '/Users/steve/Xcode Projects/CEA_stand_alone/cea2.for' of type 'sourcecode.fortran' for architecture 'x86_64' (in target 'CEA_stand_alone') Has anyone tested this product?? I have tried two different development Macs with the exact same results. What do I need to do to get this working? I am at wit's end. I can see the Fortran Source File default build rules, and have even tried adding my own custom build rule to compile them. The directories for ifort all exist in command line, so I know it is installed, but Xcode integration seems totally broken.

↧

ifort beta bug: ICE

May 30, 2019, 4:52 pm

Latest and popular articles on Intel Technologies

≫ Next: compiling with openmp in different folder

≪ Previous: Xcode 10 - known compatibility issue

This program gives an internal compiler error in a Linux system with ifort Version 19.1.0.056 Pre-Release Beta Build 20190321. Evidence:

cayley[~/Jfh] % cat cr_lf2.f90
program carriagereturn ! F2003: iso_c_binding, length in array constructor
use, intrinsic :: iso_c_binding, only: C_CARRIAGE_RETURN, C_NEW_LINE
character :: cr(3) = [character:: C_CARRIAGE_RETURN,achar(13),'\r']
character :: lf(3) = [character:: C_NEW_LINE,achar(10),'\n']
integer i
do i = 1,3
   print *,'a',cr(i),'b'
   print *,'c',lf(i),'d'
   print *,'e',cr(i),lf(i),'f'
   print *,'g',lf(i),cr(i),'h'
   print *
end do
end program carriagereturn

cayley[~/Jfh] % ifort -V cr_lf2.f90
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.1.0.056 Pre-Release Beta Build 20190321
Copyright (C) 1985-2019 Intel Corporation. All rights reserved.

ifort: NOTE: The Beta evaluation period for this product ends on 9-oct-2019 UTC.
Intel(R) Fortran 19.1-1453
cr_lf2.f90: catastrophic error: **Internal compiler error: segmentation violation signal raised** Please report this error along with the circumstances in which it occurred in a Software Problem Report. Note: File and line given may not be explicit cause of this error.
compilation aborted for cr_lf2.f90 (code 1)

↧

compiling with openmp in different folder

May 31, 2019, 9:05 am

Latest and popular articles on Intel Technologies

≫ Next: Regarding non-commercial license

≪ Previous: ifort beta bug: ICE

Hello.

I am a new user and I am running a code in a slurm controlled cluster. There I load Intel 2018 products, but they don't have openmp for me to load. So I downloaded it and installed in my home folder and now I would like to compile my code but redirecting the -qopenmp flag to my home folder.

Right now I compile with -qopenmp flag but when the program reaches the parallel region to do a matrix multiplication, it seems to ignore it and continue the program (I included a print command and it is not printing anything).

How do I solve this?

Many Thanks in advance

Cayo Gonçalves

↧

Regarding non-commercial license

June 4, 2019, 12:40 am

Latest and popular articles on Intel Technologies

≫ Next: -check-all issue

≪ Previous: compiling with openmp in different folder

I am currently working as an Assistant Professor at MNIT Jaipur. I would like to use intel parallel studio for my research and teaching.
My area of research is Cosmology. The compiler will be used to compile and run publicly available codes COSMOMC, CAMB.
https://github.com/cmbant/CosmoMC
https://cosmologist.info/cosmomc/
My research area is not related to any industrial work/patents and I do not receive any compensation. I will be grateful if you can kindly let me know whether I can use non-commercial license of intel parallel studio.

↧

-check-all issue

June 5, 2019, 5:29 am

Latest and popular articles on Intel Technologies

≫ Next: up/downcasting in Fortran2003

≪ Previous: Regarding non-commercial license

Dear all,

when compiled with "-check all", the following simple code fails at compile time or run time depending on which line is commented:

      program test
      implicit none
      real    :: kpt(3,64)
      integer :: i,n
      n = 9
      kpt(:,n:) = kpt(:,[(i,i=64,n,-1)]) ! fails at run time
c      kpt(:,9:) = kpt(:,[(i,i=64,9,-1)]) ! fails at compile time
      end

Compile with:
ifort -check all test.f

Error message during compile time:
test.f(7): error #5581: Shape mismatch: The extent of dimension 1 of array KPT is 3 and the corresponding extent of array <RHS expression> is 56
kpt(:,9:) = kpt(:,[(i,i=64,9,-1)]) ! fails at compile time
------^

Error message during run time:
forrtl: severe (408): fort: (33): Shape mismatch: The extent of dimension 1 of array KPT is 3 and the corresponding extent of array <RHS expression> is 56

In my opinion, the error messages are incorrect. The dimension 1 of the RHS is also 3. The dimension 2 is 56 (on both sides).

ifort --version

Thank you.

Best
Christoph

↧