[Users] VMs and volumes disappearing

Martijn Grendelman martijn.grendelman at isaac.nl
Tue Oct 1 09:29:58 UTC 2013


Hi,

>> I have recently set up an oVirt environment, I think in a pretty
>> standard fashion, with engine 3.3 on one host, one oVirt host on a
>> physical machine, both running CentOS 6.4, using NFS for all storage
>> domains.
> 
> Please provide rpm -qa on the ovirt rpms (ovirt engine).

martijn at ovirt:~> rpm -qa | grep ovirt
ovirt-log-collector-3.3.0-1.el6.noarch
ovirt-engine-3.3.0-1.el6.noarch
ovirt-host-deploy-1.1.1-1.el6.noarch
ovirt-engine-cli-3.3.0.4-1.el6.noarch
ovirt-engine-userportal-3.3.0-1.el6.noarch
ovirt-engine-tools-3.3.0-1.el6.noarch
ovirt-engine-setup-3.3.0-4.el6.noarch
ovirt-engine-sdk-python-3.3.0.6-1.el6.noarch
ovirt-image-uploader-3.3.0-1.el6.noarch
ovirt-engine-restapi-3.3.0-1.el6.noarch
ovirt-engine-webadmin-portal-3.3.0-1.el6.noarch
ovirt-host-deploy-java-1.1.1-1.el6.noarch
ovirt-engine-backend-3.3.0-1.el6.noarch
ovirt-release-el6-8-1.noarch
ovirt-iso-uploader-3.3.0-1.el6.noarch
ovirt-engine-dbscripts-3.3.0-1.el6.noarch
ovirt-engine-lib-3.3.0-4.el6.noarch

>> Today I was playing around with snapshots, when I noticed that the
>> Snapshots panel didn't show any of the snapshots I created, not even the
>> 'Current - Active VM' snapshot that all VMs have.
> 
> Not sure why this has happened. How do you know that snapshot
> creation was completed? Did you look at the events tab? (Asking to be
> sure) engine.log will be quite helpful here.

I find engine.log somewhat hard to read, to be honest, and documentation
is hard to find, but I think I found some clues.

I tried to create 4 snapshots of a certain VM, 2 of which completed
normally and 2 of which failed:

"Failed with VDSM error SNAPSHOT_FAILED and code 48"

However, what I find most upsetting, is that the VMs that disappeared
were not the subject of my experiments. I was creating snapshots of a
single VM, and the VMs that disappeared were unrelated. As a matter of
fact, the VM I was experimenting with IS THE ONLY ONE that survived.

By the way, the Snapshots panel has been displaying snapshots correctly
for a while, but when I logged in this morning, it appeared empty again,
for all VMs.

Is there anything I can check to see what causes this?

>> Not sure what to do, I decided to restart the ovirt-engine process.
>>
>> When I logged back on to the administrator panel, I was shocked to
>> see 2endWith of my 4 VMs completely missing from the inventory. I
>> haven't been able to find back a single trace of either machine,
>> neither in the portal nor on disk. It seems like they never
>> existed. The storage of both VMs seems to be erased from the data
>> domain.

> Not sure why storage domain was erased. About Vms disappeared - there
> were previous discussions on that at users at ovirt.org. In a nutshell,
> due to a bug (that was already fixed) prior to the restart you might
> have had records at the 	 table that contained value of
> "empty guid" (a string in UUID format with only 0 and - ) at the
> vdsm_task_id_column. This means that the task is not associated with
> a real SPM task, and when the engine restarts, if for a given flow
> (let's say - snapshot creation) there are tasks with such
> vdsm_task_id,  the flow will end with failure. For some flows ,
> ending with failure means erasing the vm (for example - real failure
> of importing a vm). By the way, similar issue can probably occur with
> disks as well, as there are flows that run async tasks that deal with
> disks.

I think I have an idea about what happended now.

The 2 disappeared VMs have been imported into oVirt using virt-v2v. The
3rd one that's now missing a disk volume was not, but I have been
playing with storage migration in the past.

Yesterday's engine.log seems to suggest, that all of these tasks
(importing the 2 VMs and trying to move a volume) have been restarted
immediately after restarting Engine. After failure, the VMs and volume
were removed. It seems to fit the above description of the bug.

So...

What can I do to prevent this from happening again?

Should I periodically check the 'async_tasks' table for anomalies? Is
there a bugfix I can apply, or should I wait for a new release of oVirt?
If the latter, when is that expected to happen?

Thanks,
Martijn.



More information about the Users mailing list