On 10/31/2013 09:22 AM, Zhou Zheng Sheng wrote:
>
> on 2013/10/31 06:24, Dan Kenigsberg wrote:
>> On Wed, Oct 30, 2013 at 08:41:43PM +0100, Alessandro Bianchi wrote:
>>> Il 30/10/2013 18:04, Dan Kenigsberg ha scritto:
>>>> On Wed, Oct 30, 2013 at 02:40:02PM +0100, Alessandro Bianchi wrote:
>>>>> Il 30/10/2013 13:58, Dan Kenigsberg ha scritto:
>>>>>
>>>>> On Wed, Oct 30, 2013 at 11:34:21AM +0100, Alessandro Bianchi
>>>>> wrote:
>>>>>
>>>>> Hi everyone
>>>>>
>>>>> I've set up a gluster storage with two replicated bricks
>>>>>
>>>>> DC is up and I created a VM to test gluster storage
>>>>>
>>>>> If I start the VM WITHOUT any disk attached (only one
>>>>> virtual DVD) it
>>>>> starts fine.
>>>>>
>>>>> If I attach a gluster domain disk thin provisioning 30 Gb
>>>>> the Vm stucks in
>>>>> "waiting for launch" state
>>>>>
>>>>> I see no special activity on the gluster servers (they serve
>>>>> several other
>>>>> shares with no troubles at all and even the ISO domain is a
>>>>> NFS on
>>>>> locally mounted gluster and works fine)
>>>>>
>>>>> I've double checked all the pre requisites and they look
>>>>> fine (F 19 -
>>>>> gluster setup insecure in both glusterd.vol and volume
>>>>> options -
>>>>> uid/gid/insecure )
>>>>>
>>>>> Am I doing something wrong?
>>>>>
>>>>> I'm even unable to stop the VM from the engine GUI
>>>>>
>>>>> Any advise?
>>>>>
>>>>> Which version of ovirt are you using? Hopefully ovirt-3.3.0.1.
>>>>> For how long is the VM stuck in its "wait for launch"
state?
>>>>> What does `virsh -r list` has to say while startup stalls?
>>>>> Would you provide more content of your vdsm.log and possibly
>>>>> libvirtd.log so we can understand what blocks the VM start-up?
>>>>> Please
>>>>> use attachement of pastebin, as your mail agents wreaks havoc
>>>>> to the log
>>>>> lines.
>>>>>
>>>>>
>>>>> Thank you for your answer.
>>>>>
>>>>> Here are the "facts"
>>>>>
>>>>> In the GUI I see
>>>>>
>>>>> "waiting for launch 3 h"
>>>>>
>>>>> virsh -r list
>>>>> Id Nome Stato
>>>>> ----------------------------------------------------
>>>>> 3 CentOS_30 terminato
>>>>>
>>>>> vdsClient -s 0 list table
>>>>> 200dfb05-461e-49d9-95a2-c0a7c7ced669 0 CentOS_30
>>>>> WaitForLaunch
>>>>>
>>>>> Packages:
>>>>>
>>>>> ovirt-engine-userportal-3.3.0.1-1.fc19.noarch
>>>>> ovirt-log-collector-3.3.1-1.fc19.noarch
>>>>> ovirt-engine-restapi-3.3.0.1-1.fc19.noarch
>>>>> ovirt-engine-setup-3.3.0.1-1.fc19.noarch
>>>>> ovirt-engine-backend-3.3.0.1-1.fc19.noarch
>>>>> ovirt-host-deploy-java-1.1.1-1.fc19.noarch
>>>>> ovirt-release-fedora-8-1.noarch
>>>>> ovirt-engine-setup-plugin-allinone-3.3.0.1-1.fc19.noarch
>>>>> ovirt-engine-webadmin-portal-3.3.0.1-1.fc19.noarch
>>>>> ovirt-engine-sdk-python-3.3.0.7-1.fc19.noarch
>>>>> ovirt-iso-uploader-3.3.1-1.fc19.noarch
>>>>> ovirt-engine-websocket-proxy-3.3.0.1-1.fc19.noarch
>>>>> ovirt-engine-dbscripts-3.3.0.1-1.fc19.noarch
>>>>> ovirt-host-deploy-offline-1.1.1-1.fc19.noarch
>>>>> ovirt-engine-cli-3.3.0.5-1.fc19.noarch
>>>>> ovirt-engine-tools-3.3.0.1-1.fc19.noarch
>>>>> ovirt-engine-lib-3.3.0.1-1.fc19.noarch
>>>>> ovirt-image-uploader-3.3.1-1.fc19.noarch
>>>>> ovirt-engine-3.3.0.1-1.fc19.noarch
>>>>> ovirt-host-deploy-1.1.1-1.fc19.noarch
>>>>>
>>>>> I attach the full vdsm log
>>>>>
>>>>> Look around 30-10 10:30 to see all what happens
>>>>>
>>>>> Despite the "terminated" label in output from virsh I
still
>>>>> see the VM
>>>>> "waiting for launch" in the GUI, so I suspect the
answer to
>>>>> "how long" may
>>>>> be "forever"
>>>>>
>>>>> Since this is a test VM I can do whatever test you may need
>>>>> to track the
>>>>> problem included destroy and rebuild
>>>>>
>>>>> It would be great to have gluster support stable in ovirt!
>>>>>
>>>>> Thank you for your efforts
>>>> The log has an ominous failed attempt to start the VM, followed by an
>>>> immediate vdsm crash. Is it reproducible?
>>>>
>>>> We have plenty of issues lurking here:
>>>> 1. Why has libvirt failed to create the VM? For this, please find
>>>> clues
>>>> in the complete non-line-broken CentOS_30.log and libvirtd.log.
>>> attached to this messages
>>>> 2. Why was vdsm killed? Does /var/log/message has a clue from
>>>> systemd?
>>> result of cat /var/log/messages | grep vdsm attached
>> I do not see an explicit attempt to take vdsmd down. Do you see any
>> other
>> incriminating message correlated with
>>
>> Oct 30 08:51:15 hypervisor respawn: slave '/usr/share/vdsm/vdsm'
>> died, respawning slave
>>
>>
>>>> 3. We may have a nasty race: if Vdsm crashes just before it has
>>>> registered that the VM is down.
>> Actually, this is not the issue: vdsm tries (and fails, due to
>> qemu/libvirt
>> bug) to destroy the VM.
>>
>>>> 4. We used to force Vdsm to run with LC_ALL=C. It seems that the
>>>> grand
>>>> service rewrite by Zhou (
http://gerrit.ovirt.org/15578) has
>>>> changed
>>>> that. This may have adverse effects, since AFAIR we sometimes
>>>> parse
>>>> application output, and assume that it's in C. Having a
>>>> non-English
>>>> log file is problematic on it's own for support personal, used
to
>>>> grep for keywords. ybronhei, was it intensional? Can it be
>>>> reverted
>>>> or at least scrutinized?
>>> currentely it still says "waiting for launch 9h"
>>>
>>> I don't abort it so if you need any other info I can have them
>> libvirtd fails to connect to qemu's monitor. This smells like a qemu
>> bug that
>> is beyond my over-the-mailing-list debugging abilities :-(
>> You may want to strace or gdb the running qemu process, or to try to
>> re-attach
>> to it by restarting libvirtd.
>>
>> 2013-10-30 07:51:10.045+0000: 8304: debug : qemuProcessStart:3804 :
>> Waiting for monitor to show up
>> 2013-10-30 07:51:10.045+0000: 8304: debug :
>> qemuProcessWaitForMonitor:1707 : Connect monitor to 0x7fc1640eab80
>> 'CentOS_30'
>> 2013-10-30 07:51:10.246+0000: 8304: debug :
>> qemuMonitorOpenInternal:751 : QEMU_MONITOR_NEW: mon=0x7fc1640ef6b0
>> refs=2 fd=27
>> 2013-10-30 07:51:10.246+0000: 8304: debug :
>> qemuMonitorSetCapabilities:1145 : mon=0x7fc1640ef6b0
>> 2013-10-30 07:51:10.246+0000: 8304: debug : qemuMonitorSend:887 :
>> QEMU_MONITOR_SEND_MSG: mon=0x7fc1640ef6b0
>>
msg={"execute":"qmp_capabilities","id":"libvirt-1"}
>>
>> fd=-1
>> 2013-10-30 07:51:13.097+0000: 8296: error : qemuMonitorIORead:505 :
>> Unable to read from monitor: Connessione interrotta dal corrispondente
>> 2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:638 :
>> Error on monitor Unable to read from monitor: Connessione interrotta
>> dal corrispondente
>> 2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:672 :
>> Triggering error callback
>> 2013-10-30 07:51:13.097+0000: 8296: debug :
>> qemuProcessHandleMonitorError:351 : Received error on 0x7fc1640eab80
>> 'CentOS_30'
>> 2013-10-30 07:51:13.097+0000: 8304: debug : qemuMonitorSend:899 :
>> Send command resulted in error Unable to read from monitor:
>> Connessione interrotta dal corrispondente
>> 2013-10-30 07:51:13.097+0000: 8304: debug : qemuProcessStop:3992 :
>> Shutting down VM 'CentOS_30' pid=7655 flags=0
>> 2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:638 :
>> Error on monitor Unable to read from monitor: Connessione interrotta
>> dal corrispondente
>> 2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:661 :
>> Triggering EOF callback
>>
> I noticed in the previous attached CentOS_30.log, qemu said "Gluster
> connection failed for server=172.16.0.100" for each of the first few
> attempts to start qemu. That's why libvirt said it could not connect to
> qemu monitor, the real reason was that qemu process exited after it
> checked the gluster storage. libvirt error message was not very friendly
> in this case.
>
> Then I see on "2013-10-30 08:28:51.047+0000" in CentOS_30.log, qemu
> started successfully without reporting gluster error, this was fine. In
> libvirtd.log on the corresponding time spot, it libvirt connected to the
> monitor and reported it was about to issue a handshake.
>
> Investigate the CentOS_30.log further, I find the attempts to start qemu
> after "2013-10-30 08:28:51" were failed with the same error "Gluster
> connection failed for server=172.16.0.100". Maybe there is a problem in
> the gluster server or firewall? I'm not sure.
>
http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Important_Pre-requ...
- I assume these have been followed w.r.t setting permissions.
Sorry, I missed that you had set all the insecure options on the volume.
"Gluster connection failed for server" has a "Transport endpoint is not
connected" error message.
Are you able to manually mount the gluster volume?
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users