Hi,
I've been working on a bug for weeks that is very difficult to hunt. Finally, I've decided to go at the assembly level to track it down. I am not allowed to share or post the code, but I am quite puzzled by the assembly code. To simplify, the subroutine looks like this:
subroutine anonymized(this, k)
implicit none
class(my_type), intent(inout) :: this
integer, intent(in) :: k
real(8) :: aux
integer :: i1, i2
aux = this%something ...
do i1 = 1, this%n
do i2 = 1, this%m
if (this%value(i1) < 1.0e-10) then
...
and the code crashes at the first comparison of this%value(i1). The crash is only observable with some flags such as -O2 -heap-arrays 0. If I try to print the value of this%value(i1), just before it is used, the code runs fine to completion and the bug dissapears. Sometimes, when I change the code that is *after* this one, the bug disappears. It just drives me crazy.
So I had a look at the assembly code. The beginning of this code is given here.
Dump of assembler code for function __anonymized:
=> 0x0000000000522970 <+0>: push %rbp
0x0000000000522971 <+1>: mov %rsp,%rbp
0x0000000000522974 <+4>: push %r12
0x0000000000522976 <+6>: push %r13
0x0000000000522978 <+8>: push %r14
0x000000000052297a <+10>: push %r15
0x000000000052297c <+12>: push %rbx
0x000000000052297d <+13>: sub $0x148,%rsp
0x0000000000522984 <+20>: mov (%rdi),%rbx
0x0000000000522987 <+23>: mov %rsi,-0x80(%rbp)
0x000000000052298b <+27>: mov %rdi,-0x78(%rbp)
0x000000000052298f <+31>: mov 0x79c58(%rbx),%rdx
0x0000000000522996 <+38>: neg %rdx
0x0000000000522999 <+41>: movslq 0x7a6a8(%rbx),%rcx
0x00000000005229a0 <+48>: add %rcx,%rdx
0x00000000005229a3 <+51>: mov 0x79c18(%rbx),%rax
0x00000000005229aa <+58>: movsd 0x7a688(%rbx),%xmm0
0x00000000005229b2 <+66>: mov 0x79ba0(%rbx),%r8d
0x00000000005229b9 <+73>: mov %rcx,-0x88(%rbp)
0x00000000005229c0 <+80>: mulsd (%rax,%rdx,8),%xmm0
0x00000000005229c5 <+85>: mov %r8d,-0x48(%rbp)
0x00000000005229c9 <+89>: mov 0x79bc0(%rbx),%ecx
0x00000000005229cf <+95>: test %r8d,%r8d
0x00000000005229d2 <+98>: jle 0x527bcf <__anonymized+21087>
0x00000000005229d8 <+104>: mov %ecx,%r13d
0x00000000005229db <+107>: xor %r12d,%r12d
0x00000000005229de <+110>: and $0xfffffff8,%r13d
0x00000000005229e2 <+114>: pxor %xmm2,%xmm2
0x00000000005229e6 <+118>: movslq -0x48(%rbp),%rax
0x00000000005229ea <+122>: pxor %xmm3,%xmm3
0x00000000005229ee <+126>: movslq %r13d,%r10
0x00000000005229f1 <+129>: movslq %ecx,%rdx
0x00000000005229f4 <+132>: movsd 0x13bb9c(%rip),%xmm1 # 0x65e598
0x00000000005229fc <+140>: mov %rax,-0x40(%rbp)
0x0000000000522a00 <+144>: mov %r10,-0x160(%rbp)
0x0000000000522a07 <+151>: mov %r13d,-0x168(%rbp)
0x0000000000522a0e <+158>: mov %ecx,-0x30(%rbp)
0x0000000000522a11 <+161>: cmpl $0x0,-0x30(%rbp)
0x0000000000522a15 <+165>: jle 0x522d10 <__anonymized+928>
0x0000000000522a1b <+171>: neg %r11
0x0000000000522a1e <+174>: add %r12,%r11
0x0000000000522a21 <+177>: mov 0x79fd8(%rbx),%rdi
0x0000000000522a28 <+184>: mov 0x7a018(%rbx),%r8
0x0000000000522a2f <+191>: mov 0x7a260(%rbx),%rsi
0x0000000000522a36 <+198>: comisd 0x8(%rdi,%r11,8),%xmm1
The code crashed on comisd. It seems that the jle are not taken (I am a beginner to assembly code). On the comisd line, 0x8(%rdi,%r11,8) is obviously trying to access the array at index r11. I have checkd %rdi which contains the right address. But what is surprising, is that r11 is set to 140737488332700 at the beginning of the function and is only neg at line 0x0000000000522a1b. So it feels to me that the register %r11 is never initialized.
What do you think of that?
Best regards