Quantcast
Channel: Intel® Software - Intel® Fortran Compiler for Linux* and macOS*
Viewing all articles
Browse latest Browse all 2746

Nested thread parallelism in Python and Fortran

$
0
0

The purpose of this example is to find out how a program behaves when both the upper layer (Python) and the lower layer (Fortran) use their threading capabilities.

The Python3 caller script launches 2 concurrent instances of the same Fortran routine using the Thread class.

The Fortran callee is OpenMP parallel with OMP_NUM_THREADS set to 4.

Looking at the printout it looks like the 2 concurrent OpenMP instances don't interfere with each other, both have their threads numbered from 0 to 3. This is nice as it allows to put a level of parallelism on top of OpenMP. But the question is: Is such nested parallelism a supported feature or is this just an accidental observation ?

-------------------- Output printed --------------------

>python3 pydot.py
 Thread           2
 Thread           3
 Thread           0
 Thread           1
 Thread           0
 Thread           1
1000.0
 Thread           2
 Thread           3
500.0

-------------------- Fortran code ----------------------

! Compile to shared library doing
!    ifort -g -fPIC -shared -openmp -o ddot.so ddot.f90

real(8) function ddot(n,a,b) bind(c)

  use, intrinsic :: iso_c_binding
  use omp_lib

  integer(4),       value, intent(in) :: n
  real(8), dimension(1:n), intent(in) :: a
  real(8), dimension(1:n), intent(in) :: b

  integer :: i

  ddot = 0.0

!$OMP PARALLEL
  print *,'Thread', omp_get_thread_num()

!$OMP DO REDUCTION(+:ddot)
  do i = 1, n
    ddot = ddot + a(i) * b(i)
  end do

!$OMP END PARALLEL

end function

------------------ Python code ----------------------

from ctypes    import *
from threading import Thread

# Load shared library

flib = cdll.LoadLibrary('./ddot.so')

# Prototype for external function

veclen = 1000

flib.ddot.argtypes = [c_int, ARRAY(c_double, veclen), ARRAY(c_double, veclen)]
flib.ddot.restype  = c_double

# Array construction

rvect  = c_double * veclen

a = rvect()
b = rvect()
c = rvect()

for i in range(0,veclen):
  a[i] = 0.5
  b[i] = 2.0
  c[i] = 1.0

# Python wrapper

def pydot(a,b):
  pydot.argtypes = [ARRAY(c_double, veclen), ARRAY(c_double, veclen)]
  res = flib.ddot(c_int(veclen), a, b)
  print(res)

# Do it using 2 threads

# Create
t1 = Thread(target=pydot, args=(a,b,))
t2 = Thread(target=pydot, args=(a,c,))

# Launch
t1.start()
t2.start()

# Wait for completion
t1.join()
t2.join()

# End

 


Viewing all articles
Browse latest Browse all 2746

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>