[Users] Gluster VM stuck in "waiting for launch" state

Thu Oct 31 07:02:01 UTC 2013

On 10/31/2013 11:51 AM, Sahina Bose wrote:
>
> On 10/31/2013 09:22 AM, Zhou Zheng Sheng wrote:
>>
>> on 2013/10/31 06:24, Dan Kenigsberg wrote:
>>> On Wed, Oct 30, 2013 at 08:41:43PM +0100, Alessandro Bianchi wrote:
>>>> Il 30/10/2013 18:04, Dan Kenigsberg ha scritto:
>>>>> On Wed, Oct 30, 2013 at 02:40:02PM +0100, Alessandro Bianchi wrote:
>>>>>>     Il 30/10/2013 13:58, Dan Kenigsberg ha scritto:
>>>>>>
>>>>>>   On Wed, Oct 30, 2013 at 11:34:21AM +0100, Alessandro Bianchi 
>>>>>> wrote:
>>>>>>
>>>>>>      Hi everyone
>>>>>>
>>>>>>      I've set up a gluster storage with two replicated bricks
>>>>>>
>>>>>>      DC is up and I created a VM to test gluster storage
>>>>>>
>>>>>>      If I start the VM WITHOUT any disk attached (only one 
>>>>>> virtual DVD) it
>>>>>>      starts fine.
>>>>>>
>>>>>>      If I attach a gluster domain disk thin provisioning 30 Gb 
>>>>>> the Vm stucks in
>>>>>>      "waiting for launch" state
>>>>>>
>>>>>>      I see no special activity on the gluster servers (they serve 
>>>>>> several other
>>>>>>      shares with no troubles at all and even the ISO domain  is a 
>>>>>> NFS on
>>>>>>      locally mounted gluster and works fine)
>>>>>>
>>>>>>      I've double checked all the pre requisites and they look 
>>>>>> fine (F 19 -
>>>>>>      gluster setup insecure  in both glusterd.vol and volume 
>>>>>> options -
>>>>>>      uid/gid/insecure )
>>>>>>
>>>>>>      Am I doing something wrong?
>>>>>>
>>>>>>      I'm even unable to stop the VM from the engine GUI
>>>>>>
>>>>>>      Any advise?
>>>>>>
>>>>>>   Which version of ovirt are you using? Hopefully ovirt-3.3.0.1.
>>>>>>   For how long is the VM stuck in its "wait for launch" state?
>>>>>>   What does `virsh -r list` has to say while startup stalls?
>>>>>>   Would you provide more content of your vdsm.log and possibly
>>>>>>   libvirtd.log so we can understand what blocks the VM start-up? 
>>>>>> Please
>>>>>>   use attachement of pastebin, as your mail agents wreaks havoc 
>>>>>> to the log
>>>>>>   lines.
>>>>>>
>>>>>>
>>>>>>     Thank you for your answer.
>>>>>>
>>>>>>     Here are the "facts"
>>>>>>
>>>>>>     In the GUI I see
>>>>>>
>>>>>>     "waiting for launch 3 h"
>>>>>>
>>>>>>      virsh -r list
>>>>>>      Id    Nome                           Stato
>>>>>>     ----------------------------------------------------
>>>>>>      3     CentOS_30                      terminato
>>>>>>
>>>>>>     vdsClient -s 0 list table
>>>>>>     200dfb05-461e-49d9-95a2-c0a7c7ced669      0 CentOS_30
>>>>>>     WaitForLaunch
>>>>>>
>>>>>>     Packages:
>>>>>>
>>>>>>     ovirt-engine-userportal-3.3.0.1-1.fc19.noarch
>>>>>>     ovirt-log-collector-3.3.1-1.fc19.noarch
>>>>>>     ovirt-engine-restapi-3.3.0.1-1.fc19.noarch
>>>>>>     ovirt-engine-setup-3.3.0.1-1.fc19.noarch
>>>>>>     ovirt-engine-backend-3.3.0.1-1.fc19.noarch
>>>>>>     ovirt-host-deploy-java-1.1.1-1.fc19.noarch
>>>>>>     ovirt-release-fedora-8-1.noarch
>>>>>> ovirt-engine-setup-plugin-allinone-3.3.0.1-1.fc19.noarch
>>>>>>     ovirt-engine-webadmin-portal-3.3.0.1-1.fc19.noarch
>>>>>>     ovirt-engine-sdk-python-3.3.0.7-1.fc19.noarch
>>>>>>     ovirt-iso-uploader-3.3.1-1.fc19.noarch
>>>>>>     ovirt-engine-websocket-proxy-3.3.0.1-1.fc19.noarch
>>>>>>     ovirt-engine-dbscripts-3.3.0.1-1.fc19.noarch
>>>>>>     ovirt-host-deploy-offline-1.1.1-1.fc19.noarch
>>>>>>     ovirt-engine-cli-3.3.0.5-1.fc19.noarch
>>>>>>     ovirt-engine-tools-3.3.0.1-1.fc19.noarch
>>>>>>     ovirt-engine-lib-3.3.0.1-1.fc19.noarch
>>>>>>     ovirt-image-uploader-3.3.1-1.fc19.noarch
>>>>>>     ovirt-engine-3.3.0.1-1.fc19.noarch
>>>>>>     ovirt-host-deploy-1.1.1-1.fc19.noarch
>>>>>>
>>>>>>     I attach the full vdsm log
>>>>>>
>>>>>>     Look around 30-10 10:30 to see all what happens
>>>>>>
>>>>>>     Despite the "terminated" label in output from virsh I still 
>>>>>> see the VM
>>>>>>     "waiting for launch" in the GUI, so I suspect the answer to 
>>>>>> "how long" may
>>>>>>     be "forever"
>>>>>>
>>>>>>     Since this is a test VM I can do whatever test you may need 
>>>>>> to track the
>>>>>>     problem included destroy and rebuild
>>>>>>
>>>>>>     It would be great to have gluster support stable in ovirt!
>>>>>>
>>>>>>     Thank you for your efforts
>>>>> The log has an ominous failed attempt to start the VM, followed by an
>>>>> immediate vdsm crash. Is it reproducible?
>>>>>
>>>>> We have plenty of issues lurking here:
>>>>> 1. Why has libvirt failed to create the VM? For this, please find 
>>>>> clues
>>>>>     in the complete non-line-broken CentOS_30.log and libvirtd.log.
>>>> attached to this messages
>>>>> 2. Why was vdsm killed? Does /var/log/message has a clue from 
>>>>> systemd?
>>>> result of cat /var/log/messages | grep vdsm attached
>>> I do not see an explicit attempt to take vdsmd down. Do you see any 
>>> other
>>> incriminating message correlated with
>>>
>>> Oct 30 08:51:15 hypervisor respawn: slave '/usr/share/vdsm/vdsm' 
>>> died, respawning slave
>>>
>>>
>>>>> 3. We may have a nasty race: if Vdsm crashes just before it has
>>>>>     registered that the VM is down.
>>> Actually, this is not the issue: vdsm tries (and fails, due to 
>>> qemu/libvirt
>>> bug) to destroy the VM.
>>>
>>>>> 4. We used to force Vdsm to run with LC_ALL=C. It seems that the 
>>>>> grand
>>>>>     service rewrite by Zhou (http://gerrit.ovirt.org/15578) has 
>>>>> changed
>>>>>     that. This may have adverse effects, since AFAIR we sometimes 
>>>>> parse
>>>>>     application output, and assume that it's in C. Having a 
>>>>> non-English
>>>>>     log file is problematic on it's own for support personal, used to
>>>>>     grep for keywords. ybronhei, was it intensional? Can it be 
>>>>> reverted
>>>>>     or at least scrutinized?
>>>> currentely it still says "waiting for launch 9h"
>>>>
>>>> I don't abort it so if you need any other info I can have them
>>> libvirtd fails to connect to qemu's monitor. This smells like a qemu 
>>> bug that
>>> is beyond my over-the-mailing-list debugging abilities :-(
>>> You may want to strace or gdb the running qemu process, or to try to 
>>> re-attach
>>> to it by restarting libvirtd.
>>>
>>> 2013-10-30 07:51:10.045+0000: 8304: debug : qemuProcessStart:3804 : 
>>> Waiting for monitor to show up
>>> 2013-10-30 07:51:10.045+0000: 8304: debug : 
>>> qemuProcessWaitForMonitor:1707 : Connect monitor to 0x7fc1640eab80 
>>> 'CentOS_30'
>>> 2013-10-30 07:51:10.246+0000: 8304: debug : 
>>> qemuMonitorOpenInternal:751 : QEMU_MONITOR_NEW: mon=0x7fc1640ef6b0 
>>> refs=2 fd=27
>>> 2013-10-30 07:51:10.246+0000: 8304: debug : 
>>> qemuMonitorSetCapabilities:1145 : mon=0x7fc1640ef6b0
>>> 2013-10-30 07:51:10.246+0000: 8304: debug : qemuMonitorSend:887 : 
>>> QEMU_MONITOR_SEND_MSG: mon=0x7fc1640ef6b0 
>>> msg={"execute":"qmp_capabilities","id":"libvirt-1"}
>>>
>>>   fd=-1
>>> 2013-10-30 07:51:13.097+0000: 8296: error : qemuMonitorIORead:505 : 
>>> Unable to read from monitor: Connessione interrotta dal corrispondente
>>> 2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:638 : 
>>> Error on monitor Unable to read from monitor: Connessione interrotta 
>>> dal corrispondente
>>> 2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:672 : 
>>> Triggering error callback
>>> 2013-10-30 07:51:13.097+0000: 8296: debug : 
>>> qemuProcessHandleMonitorError:351 : Received error on 0x7fc1640eab80 
>>> 'CentOS_30'
>>> 2013-10-30 07:51:13.097+0000: 8304: debug : qemuMonitorSend:899 : 
>>> Send command resulted in error Unable to read from monitor: 
>>> Connessione interrotta dal corrispondente
>>> 2013-10-30 07:51:13.097+0000: 8304: debug : qemuProcessStop:3992 : 
>>> Shutting down VM 'CentOS_30' pid=7655 flags=0
>>> 2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:638 : 
>>> Error on monitor Unable to read from monitor: Connessione interrotta 
>>> dal corrispondente
>>> 2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:661 : 
>>> Triggering EOF callback
>>>
>> I noticed in the previous attached CentOS_30.log, qemu said "Gluster
>> connection failed for server=172.16.0.100" for each of the first few
>> attempts to start qemu. That's why libvirt said it could not connect to
>> qemu monitor, the real reason was that qemu process exited after it
>> checked the gluster storage. libvirt error message was not very friendly
>> in this case.
>>
>> Then I see on "2013-10-30 08:28:51.047+0000" in CentOS_30.log, qemu
>> started successfully without reporting gluster error, this was fine. In
>> libvirtd.log on the corresponding time spot, it libvirt connected to the
>> monitor and reported it was about to issue a handshake.
>>
>> Investigate the CentOS_30.log further, I find the attempts to start qemu
>> after "2013-10-30 08:28:51" were failed with the same error "Gluster
>> connection failed for server=172.16.0.100". Maybe there is a problem in
>> the gluster server or firewall? I'm not sure.
>>
>
> http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Important_Pre-requisites 
> - I assume these have been followed w.r.t setting permissions.
>

Sorry, I missed that you had set all the insecure options on the volume.

"Gluster connection failed for server" has a "Transport endpoint is not 
connected" error message.

Are you able to manually mount the gluster volume?

>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users