New subject: [Users] VMs and volumes disappearing

1 Oct 2013

      Hi,
...
...
I have recently set up an oVirt environment, I think in a pretty
standard fashion, with engine 3.3 on one host, one oVirt host on a
physical machine, both running CentOS 6.4, using NFS for all storage
domains.
Please provide rpm -qa on the ovirt rpms (ovirt engine).
martijn@ovirt:~> rpm -qa | grep ovirt
ovirt-log-collector-3.3.0-1.el6.noarch
ovirt-engine-3.3.0-1.el6.noarch
ovirt-host-deploy-1.1.1-1.el6.noarch
ovirt-engine-cli-3.3.0.4-1.el6.noarch
ovirt-engine-userportal-3.3.0-1.el6.noarch
ovirt-engine-tools-3.3.0-1.el6.noarch
ovirt-engine-setup-3.3.0-4.el6.noarch
ovirt-engine-sdk-python-3.3.0.6-1.el6.noarch
ovirt-image-uploader-3.3.0-1.el6.noarch
ovirt-engine-restapi-3.3.0-1.el6.noarch
ovirt-engine-webadmin-portal-3.3.0-1.el6.noarch
ovirt-host-deploy-java-1.1.1-1.el6.noarch
ovirt-engine-backend-3.3.0-1.el6.noarch
ovirt-release-el6-8-1.noarch
ovirt-iso-uploader-3.3.0-1.el6.noarch
ovirt-engine-dbscripts-3.3.0-1.el6.noarch
ovirt-engine-lib-3.3.0-4.el6.noarch
...
...
Today I was playing around with snapshots, when I noticed that the
Snapshots panel didn't show any of the snapshots I created, not even the
'Current - Active VM' snapshot that all VMs have.
Not sure why this has happened. How do you know that snapshot
creation was completed? Did you look at the events tab? (Asking to be
sure) engine.log will be quite helpful here.
I find engine.log somewhat hard to read, to be honest, and documentation
is hard to find, but I think I found some clues.

I tried to create 4 snapshots of a certain VM, 2 of which completed
normally and 2 of which failed:

"Failed with VDSM error SNAPSHOT_FAILED and code 48"

However, what I find most upsetting, is that the VMs that disappeared
were not the subject of my experiments. I was creating snapshots of a
single VM, and the VMs that disappeared were unrelated. As a matter of
fact, the VM I was experimenting with IS THE ONLY ONE that survived.

By the way, the Snapshots panel has been displaying snapshots correctly
for a while, but when I logged in this morning, it appeared empty again,
for all VMs.

Is there anything I can check to see what causes this?
...
...
Not sure what to do, I decided to restart the ovirt-engine process.
When I logged back on to the administrator panel, I was shocked to
see 2endWith of my 4 VMs completely missing from the inventory. I
haven't been able to find back a single trace of either machine,
neither in the portal nor on disk. It seems like they never
existed. The storage of both VMs seems to be erased from the data
domain.
...
Not sure why storage domain was erased. About Vms disappeared - there
were previous discussions on that at users@ovirt.org. In a nutshell,
due to a bug (that was already fixed) prior to the restart you might
have had records at the 	 table that contained value of
"empty guid" (a string in UUID format with only 0 and - ) at the
vdsm_task_id_column. This means that the task is not associated with
a real SPM task, and when the engine restarts, if for a given flow
(let's say - snapshot creation) there are tasks with such
vdsm_task_id,  the flow will end with failure. For some flows ,
ending with failure means erasing the vm (for example - real failure
of importing a vm). By the way, similar issue can probably occur with
disks as well, as there are flows that run async tasks that deal with
disks.
I think I have an idea about what happended now.

The 2 disappeared VMs have been imported into oVirt using virt-v2v. The
3rd one that's now missing a disk volume was not, but I have been
playing with storage migration in the past.

Yesterday's engine.log seems to suggest, that all of these tasks
(importing the 2 VMs and trying to move a volume) have been restarted
immediately after restarting Engine. After failure, the VMs and volume
were removed. It seems to fit the above description of the bug.

So...

What can I do to prevent this from happening again?

Should I periodically check the 'async_tasks' table for anomalies? Is
there a bugfix I can apply, or should I wait for a new release of oVirt?
If the latter, when is that expected to happen?

Thanks,
Martijn.

Re: [Users] VMs and volumes disappearing

Martijn Grendelman

Yair Zaslavsky

tags

participants (2)