I have found one of my programs (sometimes) reports different memory usage numbers when compiled with openMP.
After a careful inspection of the code (with debugging) I am pretty sure my code does not have "memory leaks" (forget to deallocate something, which can happen easily when using pointer arrays).
Finally I have setup a really minimalistic demo program which shows the same behavior:
program T1 call mamems(__file__, __line__) !$OMP PARALLEL DO SCHEDULE(GUIDED) DEFAULT(NONE) PRIVATE(I) do i = 1, 100 call sub(i) enddo !$OMP END PARALLEL DO call mamems(__file__, __line__) end program subroutine sub(i) s = 0 do k = 1, 100 s = s + i * k enddo end subroutine
(Routine mamems reporting memory usage is not shown here but is included in the attachment. It takes twice as many lines as the example code itself.)
On a quadcore cpu I get (the output shows the current memory usage and the peek value, both in bytes and megabytes)
~> T1
T1.f90 3 *** Memsize= 38715392 38715392 36.9 36.9 ***
T1.f90 11 *** Memsize= 53608448 53608448 51.1 51.1 ***
~> T1
T1.f90 3 *** Memsize= 38715392 38715392 36.9 36.9 ***
T1.f90 11 *** Memsize= 53608448 53608448 51.1 51.1 ***
~> T1
T1.f90 3 *** Memsize= 38715392 38715392 36.9 36.9 ***
T1.f90 11 *** Memsize= 53608448 53608448 51.1 51.1 ***
~> T1
T1.f90 3 *** Memsize= 38715392 38715392 36.9 36.9 ***
T1.f90 11 *** Memsize= 53608448 53608448 51.1 51.1 ***
~> T1
T1.f90 3 *** Memsize= 38715392 38715392 36.9 36.9 ***
T1.f90 11 *** Memsize= 120717312 183627776 115.1 175.1 ***
~> T1
T1.f90 3 *** Memsize= 38715392 38715392 36.9 36.9 ***
T1.f90 11 *** Memsize= 120717312 187826176 115.1 179.1 ***
~> T1
T1.f90 3 *** Memsize= 38715392 38715392 36.9 36.9 ***
T1.f90 11 *** Memsize= 53608448 53608448 51.1 51.1 ***
~> T1
T1.f90 3 *** Memsize= 38715392 38715392 36.9 36.9 ***
T1.f90 11 *** Memsize= 120717312 187826176 115.1 179.1 ***
On a dual quadcore cpu with HT (16 threads):
~ >var/T1
T1.f90 3 *** Memsize= 38699008 38699008 36.9 36.9 ***
T1.f90 11 *** Memsize= 103972864 103972864 99.2 99.2 ***
~ >var/T1
T1.f90 3 *** Memsize= 38699008 38699008 36.9 36.9 ***
T1.f90 11 *** Memsize= 103972864 103972864 99.2 99.2 ***
~ >var/T1
T1.f90 3 *** Memsize= 38699008 38699008 36.9 36.9 ***
T1.f90 11 *** Memsize= 171081728 187809792 163.2 179.1 ***
~ >var/T1
T1.f90 3 *** Memsize= 38699008 38699008 36.9 36.9 ***
T1.f90 11 *** Memsize= 103972864 103972864 99.2 99.2 ***
~ >var/T1
T1.f90 3 *** Memsize= 38699008 38699008 36.9 36.9 ***
T1.f90 11 *** Memsize= 171081728 204603392 163.2 195.1 ***
~ >var/T1
T1.f90 3 *** Memsize= 38699008 38699008 36.9 36.9 ***
T1.f90 11 *** Memsize= 171081728 217198592 163.2 207.1 ***
~ >var/T1
T1.f90 3 *** Memsize= 38699008 38699008 36.9 36.9 ***
T1.f90 11 *** Memsize= 171081728 221396992 163.2 211.1 ***
~ >var/T1
T1.f90 3 *** Memsize= 38699008 38699008 36.9 36.9 ***
T1.f90 11 *** Memsize= 171081728 204603392 163.2 195.1 ***
~ >var/T1
T1.f90 3 *** Memsize= 38699008 38699008 36.9 36.9 ***
T1.f90 11 *** Memsize= 171081728 208801792 163.2 199.1 ***
The amount of memory openMP consumes seems to be related to the number of threads, which is not much a surprise.
But why is it varying over repeated program runs?
(With my real program I have found the variation is not regarded to the size of the data set. Though not being constant it usually is something about 60 MB. With a small data set of the same magnitude this means a variation ratio of 1:2. With a huge data set of 2GB the variation is insignificant.)
Markus