Problem with signed / unsigned sounds about right.
Not having much luck with addr2line though.

I just manually migrated the VM to cause the problem again, not sure if this partial NUMA config warning could be contributing:

2020-04-09T23:54:05.537028Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future
2020-04-11 07:25:04.146+0000: initiating migration
tcmalloc: large alloc 562949953421312 bytes == (nil) @  0x7f4a0bb464ef 0x7f4a0bb66367 0x7f4a2364b736 0x561527438ac8 0x5615274398e5 0x5615273e9bae 0x5615273f07b6 0x5615275b8de5 0x5615275b4bdf 0x7f4a0aaeee65 0x7f4a0a81788d

(process:32202): GLib-ERROR **: 08:25:04.151: gmem.c:135: failed to allocate 562949953421312 bytes
2020-04-11 07:25:08.408+0000: shutting down, reason=crashed


Attempt to use addr2line

# addr2line -e /usr/libexec/qemu-kvm
0x7f4a0bb464ef 0x7f4a0bb66367 0x7f4a2364b736 0x561527438ac8 0x5615274398e5 0x5615273e9bae 0x5615273f07b6 0x5615275b8de5 0x5615275b4bdf 0x7f4a0aaeee65 0x7f4a0a81788d
??:0
??:0


Single addresses give the same:

0x7f4a0bb464ef
??:0

0x7f4a0a81788d
??:0


Maybe need debug packages ?

On Fri, 10 Apr 2020 at 22:23, <eevans@digitaldatatechs.com> wrote:

I found this thread on Stack overflow:

https://stackoverflow.com/questions/9077457/how-to-trace-tcmalloc-large-alloc

 

See http://code.google.com/p/gperftools/source/browse/trunk/src/tcmalloc.cc?r=80&redir=1 line 843

Depending on your application - the large allocation may or may not be a bug.

In any case - the part after the @ mark is a stack trace and can be used to locate the source of the message

The repeating number (4294488064 which seems to be equal to 4G-479232 or 0x100000000-0x75000) makes me suspect the original allocation call got a negative signed value and used it as an unsigned value.

It also had this to trace the memory leak:

to trace the mem address to a line in your code, use addr2line commandline tool.. use it as addr2line -e <executable name> then press enter and then paste an address and press enter

 

I’m not sure if this is helpful but it does sound like a memory leak.

 

In a related Microsoft doc it stated:

 

1073741824         Allocations larger than this value cause a stack trace to be dumped to stderr. The threshold for dumping stack traces is increased by a factor of 1.125 every time we print a message so that the threshold automatically goes up by a factor of ~1000 every 60 messages. This bounds the amount of extra logging generated by this flag. Default value of this flag is very large and therefore you should see no extra logging unless the flag is overridden.

 

The default in Windows is 1 GB. I’m not sure about Linux.

 

I hope this is helpful.

 

Eric Evans

Digital Data Services LLC.

304.660.9080

 

From: Maton, Brett <matonb@ltresources.co.uk>
Sent: Friday, April 10, 2020 4:53 PM
To: eevans@digitaldatatechs.com
Cc: Ovirt Users <users@ovirt.org>
Subject: [ovirt-users] Re: Windows 10 Pro 64 (1909) crashes when migrating

 

The hosts are identical, and yes I'm sure about the 563 terrabytes, which is obviously wrong, and why I mentioned it. Possibly an overflow?

 

On Fri, 10 Apr 2020, 21:31 , <eevans@digitaldatatechs.com> wrote:

I have a Windows 10 guest and a Server 2016 guest that migrate without an issue.
Are your CPU architectures comparable between the hosts?
BTW,  56294995342131 bytes is 562 terabytes. Are you sure that's correct?
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/7JDAC6SVJIPJRMLDHHZIREUGC3EDR6FP/