[ovirt-users] Cannot run specific VM in one node

Omer Frenkel ofrenkel at redhat.com
Tue Jul 21 16:28:41 UTC 2015



----- Original Message -----
> From: "Diego Remolina" <dijuremo at gmail.com>
> To: "Omer Frenkel" <ofrenkel at redhat.com>
> Cc: Users at ovirt.org
> Sent: Tuesday, July 21, 2015 12:48:31 AM
> Subject: Re: [ovirt-users] Cannot run specific VM in one node
> 
> Well... never mind..., after leaving the VM running in the other host
> for a few days, today I shut it down today, then attempted to start it
> back in ysmha02 and it booted up just fine. I was gonna collect more
> logs, but seems like the issue cleared itself so I guess no point in
> looking at this issue anymore.
> 
> Diego

sorry for the slow response, i was out for couple of days..

im happy it is fixed for you, i did look at the logs anyway, i can see that:
first vm was migrating for a long time and during this time it jumped between not-responding/paused/migrating-from  statuses
after 7 mins, there was an attempt to cancel the migration that failed with "Timed out during operation: cannot acquire state change lock, code = -32603"
then multiple attempts to stop the vm that also fails ( in vdsm log i can see one is: 
"libvirtError: Failed to terminate process 12479 with SIGTERM: Device or resource busy"
also an attempt to resume the vm (when it was paused for some time) also fails with 
"Timed out during operation: cannot acquire state change lock, code = -32603"

then vdsm restarted and the vm was finally down, but somehow libvirt still thinks the vm is running on this host,
because on start we get:
libvirtError: Requested operation is not valid: domain 'ysmad02' is already active

i assume that restarting vdsm+libvirt cleared this funky state so now it works
i wonder about the error that started all this ("cannot acquire state change lock") not sure if related to any storage issue maybe?
i cannot see this in the vdsm log because it starts after the issue.


> 
> On Mon, Jul 20, 2015 at 7:58 AM, Diego Remolina <dijuremo at gmail.com> wrote:
> > Omer et all,
> >
> > I had uploaded the logs to:
> >
> > https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0
> > https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0
> >
> > Do you have any recommendations for me or need me to provide more info?
> >
> > I will be able to re-run and experiment with this in the evening, so I
> > can collect specific logs with times, etc if you have something in
> > particular you want me to try.
> >
> > Diego
> >
> > On Thu, Jul 16, 2015 at 8:20 AM, Diego Remolina <dijuremo at gmail.com> wrote:
> >> These are the links to the files, if there is other better/preffered
> >> way to post them, let me know:
> >>
> >> https://www.dropbox.com/s/yziky6f9nk3e8aw/engine.log.xz?dl=0
> >> https://www.dropbox.com/s/qsweiizwxk37qzg/vdsm.log.4.xz?dl=0
> >>
> >> A bit more of an explanation on the infrastructure:
> >>
> >> I have two virtualization/storage servers, ysmha01 and ysmha02 running
> >> Ovirt hosted engine on top of glusterfs storage. I have two Windows
> >> server vms called ysmad01 and ysmad02. The current problem is that
> >> ysmad02 will *not* start on ysmha02 any more.
> >>
> >>
> >> Timeline
> >>
> >> My problems started at around 8:30PM 7/15/2015 when migrating
> >> everything to ysmha01 after having patched and rebooted the server.
> >>
> >> I got things back up at around 10:30PM after rebooting servers, etc.
> >> The hosted engine running on ysmha02. I got ysmad01 running on
> >> ysmha01, but ysmad02 just would not start at all on ysmha02. I did a
> >> run once and set ysmad02 to start on ysmha01 and that works.
> >>
> >> When attempting to start or migrate ysmad02 on ysmha02, if I do a
> >> virsh -r list on ysmha02, I just see the state as: "Shut off" and the
> >> VM just does not run on that hypervisor.
> >>
> >> Diego
> >>
> >>
> >>
> >> On Thu, Jul 16, 2015 at 3:01 AM, Omer Frenkel <ofrenkel at redhat.com> wrote:
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> From: "Diego Remolina" <dijuremo at gmail.com>
> >>>> To: Users at ovirt.org
> >>>> Sent: Thursday, July 16, 2015 7:45:43 AM
> >>>> Subject: [ovirt-users] Cannot run specific VM in one node
> >>>>
> >>>> Hi,
> >>>>
> >>>> Was wondering if I can get some help with this particular situation. I
> >>>> have two ovirt cluster nodes. I had a VM running in node2 and tried to
> >>>> move it to node1. The move failed and the machine was created and
> >>>> paused in both nodes. I tried stopping migration, shutting down the
> >>>> machine, etc but none of that worked.
> >>>>
> >>>> So I decided to simply look for the process number and I killed it for
> >>>> that VM. After that, I was not able to get the VM to run in any of the
> >>>> nodes, so I rebooted them both.
> >>>>
> >>>> At this point, the vm will *not* start in node2 at all. When I try to
> >>>> start it, it just sits there and if I do:
> >>>>
> >>>> virsh -r list
> >>>>
> >>>> from the command line, the output says the vm state is "shut off".
> >>>>
> >>>> I am able to user Run Once to fire up the VM in node 1, but I cannot
> >>>> migrate it to node2.
> >>>>
> >>>> How can I clear this problematic state for node 2?
> >>>
> >>> please attach engine + vdsm logs for the time of the failure
> >>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Diego
> >>>> _______________________________________________
> >>>> Users mailing list
> >>>> Users at ovirt.org
> >>>> http://lists.ovirt.org/mailman/listinfo/users
> >>>>
> 



More information about the Users mailing list