[Users] VMs and volumes disappearing

Yair Zaslavsky yzaslavs at redhat.com
Tue Oct 1 11:02:32 UTC 2013



----- Original Message -----
> From: "Martijn Grendelman" <martijn.grendelman at isaac.nl>
> To: users at ovirt.org
> Sent: Tuesday, October 1, 2013 12:29:58 PM
> Subject: Re: [Users] VMs and volumes disappearing
> 
> Hi,
> 
> >> I have recently set up an oVirt environment, I think in a pretty
> >> standard fashion, with engine 3.3 on one host, one oVirt host on a
> >> physical machine, both running CentOS 6.4, using NFS for all storage
> >> domains.
> > 
> > Please provide rpm -qa on the ovirt rpms (ovirt engine).
> 
> martijn at ovirt:~> rpm -qa | grep ovirt
> ovirt-log-collector-3.3.0-1.el6.noarch
> ovirt-engine-3.3.0-1.el6.noarch
> ovirt-host-deploy-1.1.1-1.el6.noarch
> ovirt-engine-cli-3.3.0.4-1.el6.noarch
> ovirt-engine-userportal-3.3.0-1.el6.noarch
> ovirt-engine-tools-3.3.0-1.el6.noarch
> ovirt-engine-setup-3.3.0-4.el6.noarch
> ovirt-engine-sdk-python-3.3.0.6-1.el6.noarch
> ovirt-image-uploader-3.3.0-1.el6.noarch
> ovirt-engine-restapi-3.3.0-1.el6.noarch
> ovirt-engine-webadmin-portal-3.3.0-1.el6.noarch
> ovirt-host-deploy-java-1.1.1-1.el6.noarch
> ovirt-engine-backend-3.3.0-1.el6.noarch
> ovirt-release-el6-8-1.noarch
> ovirt-iso-uploader-3.3.0-1.el6.noarch
> ovirt-engine-dbscripts-3.3.0-1.el6.noarch
> ovirt-engine-lib-3.3.0-4.el6.noarch
> 
> >> Today I was playing around with snapshots, when I noticed that the
> >> Snapshots panel didn't show any of the snapshots I created, not even the
> >> 'Current - Active VM' snapshot that all VMs have.
> > 
> > Not sure why this has happened. How do you know that snapshot
> > creation was completed? Did you look at the events tab? (Asking to be
> > sure) engine.log will be quite helpful here.
> 
> I find engine.log somewhat hard to read, to be honest, and documentation
> is hard to find, but I think I found some clues.

Hi,
I understand what you're saying about engine.log, when I asked for it, it was because I'm one of the maintainers of ovirt engine, so I thought I could give you a hand here, especially after reading your email and
getting a sense that I saw a similar issue in the past.
> 
> I tried to create 4 snapshots of a certain VM, 2 of which completed
> normally and 2 of which failed:
> 
> "Failed with VDSM error SNAPSHOT_FAILED and code 48"
> 
> However, what I find most upsetting, is that the VMs that disappeared
> were not the subject of my experiments. I was creating snapshots of a
> single VM, and the VMs that disappeared were unrelated. As a matter of
> fact, the VM I was experimenting with IS THE ONLY ONE that survived.
> 
> By the way, the Snapshots panel has been displaying snapshots correctly
> for a while, but when I logged in this morning, it appeared empty again,
> for all VMs.
> 
> Is there anything I can check to see what causes this?
> 
> >> Not sure what to do, I decided to restart the ovirt-engine process.
> >>
> >> When I logged back on to the administrator panel, I was shocked to
> >> see 2endWith of my 4 VMs completely missing from the inventory. I
> >> haven't been able to find back a single trace of either machine,
> >> neither in the portal nor on disk. It seems like they never
> >> existed. The storage of both VMs seems to be erased from the data
> >> domain.
> 
> > Not sure why storage domain was erased. About Vms disappeared - there
> > were previous discussions on that at users at ovirt.org. In a nutshell,
> > due to a bug (that was already fixed) prior to the restart you might
> > have had records at the 	 table that contained value of
> > "empty guid" (a string in UUID format with only 0 and - ) at the
> > vdsm_task_id_column. This means that the task is not associated with
> > a real SPM task, and when the engine restarts, if for a given flow
> > (let's say - snapshot creation) there are tasks with such
> > vdsm_task_id,  the flow will end with failure. For some flows ,
> > ending with failure means erasing the vm (for example - real failure
> > of importing a vm). By the way, similar issue can probably occur with
> > disks as well, as there are flows that run async tasks that deal with
> > disks.
> 
> I think I have an idea about what happended now.
> 
> The 2 disappeared VMs have been imported into oVirt using virt-v2v. The
> 3rd one that's now missing a disk volume was not, but I have been
> playing with storage migration in the past.

Then this is the reason, other users have complained about it at users at ovirt.org
> 
> Yesterday's engine.log seems to suggest, that all of these tasks
> (importing the 2 VMs and trying to move a volume) have been restarted
> immediately after restarting Engine. After failure, the VMs and volume
> were removed. It seems to fit the above description of the bug.
> 
> So...
> 
> What can I do to prevent this from happening again?
> 
> Should I periodically check the 'async_tasks' table for anomalies? Is
> there a bugfix I can apply, or should I wait for a new release of oVirt?
> If the latter, when is that expected to happen?


Upgrade 
I just talked with Ofer (CC'ed), our release engineer, and he said that all packages should be 3.3.0-4 (notice ovirt-engine is not)
I hope this helps you out,

Yair


> 
> Thanks,
> Martijn.
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 



More information about the Users mailing list