Hi Kevin,
Another host went down, so I have to prepare info for this one.
I could not SSH to it anymore.
Console would show login screen, but no keystrokes were registered.
I could “suspend” the VM and “run” it, but still can’t SSH to it.
Before suspension, all QEMU threads were around 0%, after resuming, 3 of them hover at
100%.
Attached you could find the gdb, core dump, and other logs.
Logs:
Is there anything else we could provide?
Since this is a test machine, I will leave it “hanging” for now.
Best,
Dr Christophe Trefois, Dipl.-Ing.
Technical Specialist / Post-Doc
UNIVERSITÉ DU LUXEMBOURG
LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
Campus Belval | House of Biomedicine
6, avenue du Swing
L-4367 Belvaux
T: +352 46 66 44 6124
F: +352 46 66 44 6949
----
This message is confidential and may contain privileged information.
It is intended for the named recipient only.
If you receive it in error please notify me and permanently delete the original message
and any copies.
----
On 29 Mar 2016, at 15:40, Kevin Wolf <kwolf(a)redhat.com>
wrote:
Am 27.03.2016 um 22:38 hat Christophe TREFOIS geschrieben:
> Hi,
>
> MS does not like my previous email, so here it is again with a link to Dropbox
> instead of as attached.
>
> ——
> Hi Nir,
>
> Inside the core dump tarball is also the output of the two gdb commands you
> mentioned.
>
> Understandbly, you might not want to download the big files for that, so I
> attached them here seperately.
The gdb dump looks pretty much like an idle qemu that just sits there
and waits for events. The vcpu threads seem to be running guest code,
the I/O thread and SPICE thread are in poll() waiting for events to
respond to, and finally the RCU thread is idle as well.
Does the qemu process still respond to monitor commands, so for example
can you still pause and resume the guest?
Kevin
> For the other logs, here you go.
>
> For gluster I didn’t know which, so I sent all.
>
> I got the icinga notifcation at 17:06 CEST on March 27th (today). So for vdsm,
> I provided logs from 16h-18h.
> The check said that the VM was down for 11 minutes at that time.
>
>
https://dl.dropboxusercontent.com/u/63261/bioservice-1.tar.gz
>
> Please do let me know if there is anything else I can provide.
>
> Best regards,
>
>
>> On 27 Mar 2016, at 21:24, Nir Soffer <nsoffer(a)redhat.com> wrote:
>>
>> On Sun, Mar 27, 2016 at 8:39 PM, Christophe TREFOIS
>> <christophe.trefois(a)uni.lu> wrote:
>>> Hi Nir,
>>>
>>> Here is another one, this time with strace of children and gdb dump.
>>>
>>> Interestingly, this time, the qemu seems stuck 0%, vs 100% for other cases.
>>>
>>> The files for strace are attached.
>>
>> Hopefully Kevin can take a look.
>>
>>
>>> The gdb + core dump is found here (too
>>> big):
>>>
>>>
https://dl.dropboxusercontent.com/u/63261/gdb-core.tar.gz
>>
>> I think it will be more useful to extract a traceback of all threads
>> and send the tiny traceback.
>>
>> gdb --pid <qemu pid> --batch --eval-command='thread apply all bt'
>>
>>> If it helps, most machines get stuck on the host hosting the self-hosted
>>> engine, which runs a local 1-node glusterfs.
>>
>> And getting also /var/log/messages, sanlock, vdsm, glusterfs and
>> libvirt logs for this timeframe
>> would be helpful.
>>
>> Nir
>>
>>>
>>> Thank you for your help,
>>>
>>> —
>>> Christophe
>>>
>>> Dr Christophe Trefois, Dipl.-Ing.
>>> Technical Specialist / Post-Doc
>>>
>>> UNIVERSITÉ DU LUXEMBOURG
>>>
>>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>>> Campus Belval | House of Biomedicine
>>> 6, avenue du Swing
>>> L-4367 Belvaux
>>> T: +352 46 66 44 6124
>>> F: +352 46 66 44 6949
>>>
http://www.uni.lu/lcsb
>>>
>>>
>>>
>>> ----
>>> This message is confidential and may contain privileged information.
>>> It is intended for the named recipient only.
>>> If you receive it in error please notify me and permanently delete the
>>> original message and any copies.
>>> ----
>>>
>>>
>>>
>>>> On 25 Mar 2016, at 11:53, Nir Soffer <nsoffer(a)redhat.com> wrote:
>>>>
>>>> gdb --pid <qemu pid> --batch --eval-command='thread apply all
bt'