Re: [Users] VMs and volumes disappearing

Hi,
I have recently set up an oVirt environment, I think in a pretty standard fashion, with engine 3.3 on one host, one oVirt host on a physical machine, both running CentOS 6.4, using NFS for all storage domains.
Please provide rpm -qa on the ovirt rpms (ovirt engine).
martijn@ovirt:~> rpm -qa | grep ovirt ovirt-log-collector-3.3.0-1.el6.noarch ovirt-engine-3.3.0-1.el6.noarch ovirt-host-deploy-1.1.1-1.el6.noarch ovirt-engine-cli-3.3.0.4-1.el6.noarch ovirt-engine-userportal-3.3.0-1.el6.noarch ovirt-engine-tools-3.3.0-1.el6.noarch ovirt-engine-setup-3.3.0-4.el6.noarch ovirt-engine-sdk-python-3.3.0.6-1.el6.noarch ovirt-image-uploader-3.3.0-1.el6.noarch ovirt-engine-restapi-3.3.0-1.el6.noarch ovirt-engine-webadmin-portal-3.3.0-1.el6.noarch ovirt-host-deploy-java-1.1.1-1.el6.noarch ovirt-engine-backend-3.3.0-1.el6.noarch ovirt-release-el6-8-1.noarch ovirt-iso-uploader-3.3.0-1.el6.noarch ovirt-engine-dbscripts-3.3.0-1.el6.noarch ovirt-engine-lib-3.3.0-4.el6.noarch
Today I was playing around with snapshots, when I noticed that the Snapshots panel didn't show any of the snapshots I created, not even the 'Current - Active VM' snapshot that all VMs have.
Not sure why this has happened. How do you know that snapshot creation was completed? Did you look at the events tab? (Asking to be sure) engine.log will be quite helpful here.
I find engine.log somewhat hard to read, to be honest, and documentation is hard to find, but I think I found some clues. I tried to create 4 snapshots of a certain VM, 2 of which completed normally and 2 of which failed: "Failed with VDSM error SNAPSHOT_FAILED and code 48" However, what I find most upsetting, is that the VMs that disappeared were not the subject of my experiments. I was creating snapshots of a single VM, and the VMs that disappeared were unrelated. As a matter of fact, the VM I was experimenting with IS THE ONLY ONE that survived. By the way, the Snapshots panel has been displaying snapshots correctly for a while, but when I logged in this morning, it appeared empty again, for all VMs. Is there anything I can check to see what causes this?
Not sure what to do, I decided to restart the ovirt-engine process.
When I logged back on to the administrator panel, I was shocked to see 2endWith of my 4 VMs completely missing from the inventory. I haven't been able to find back a single trace of either machine, neither in the portal nor on disk. It seems like they never existed. The storage of both VMs seems to be erased from the data domain.
Not sure why storage domain was erased. About Vms disappeared - there were previous discussions on that at users@ovirt.org. In a nutshell, due to a bug (that was already fixed) prior to the restart you might have had records at the table that contained value of "empty guid" (a string in UUID format with only 0 and - ) at the vdsm_task_id_column. This means that the task is not associated with a real SPM task, and when the engine restarts, if for a given flow (let's say - snapshot creation) there are tasks with such vdsm_task_id, the flow will end with failure. For some flows , ending with failure means erasing the vm (for example - real failure of importing a vm). By the way, similar issue can probably occur with disks as well, as there are flows that run async tasks that deal with disks.
I think I have an idea about what happended now. The 2 disappeared VMs have been imported into oVirt using virt-v2v. The 3rd one that's now missing a disk volume was not, but I have been playing with storage migration in the past. Yesterday's engine.log seems to suggest, that all of these tasks (importing the 2 VMs and trying to move a volume) have been restarted immediately after restarting Engine. After failure, the VMs and volume were removed. It seems to fit the above description of the bug. So... What can I do to prevent this from happening again? Should I periodically check the 'async_tasks' table for anomalies? Is there a bugfix I can apply, or should I wait for a new release of oVirt? If the latter, when is that expected to happen? Thanks, Martijn.

----- Original Message -----
From: "Martijn Grendelman" <martijn.grendelman@isaac.nl> To: users@ovirt.org Sent: Tuesday, October 1, 2013 12:29:58 PM Subject: Re: [Users] VMs and volumes disappearing
Hi,
I have recently set up an oVirt environment, I think in a pretty standard fashion, with engine 3.3 on one host, one oVirt host on a physical machine, both running CentOS 6.4, using NFS for all storage domains.
Please provide rpm -qa on the ovirt rpms (ovirt engine).
martijn@ovirt:~> rpm -qa | grep ovirt ovirt-log-collector-3.3.0-1.el6.noarch ovirt-engine-3.3.0-1.el6.noarch ovirt-host-deploy-1.1.1-1.el6.noarch ovirt-engine-cli-3.3.0.4-1.el6.noarch ovirt-engine-userportal-3.3.0-1.el6.noarch ovirt-engine-tools-3.3.0-1.el6.noarch ovirt-engine-setup-3.3.0-4.el6.noarch ovirt-engine-sdk-python-3.3.0.6-1.el6.noarch ovirt-image-uploader-3.3.0-1.el6.noarch ovirt-engine-restapi-3.3.0-1.el6.noarch ovirt-engine-webadmin-portal-3.3.0-1.el6.noarch ovirt-host-deploy-java-1.1.1-1.el6.noarch ovirt-engine-backend-3.3.0-1.el6.noarch ovirt-release-el6-8-1.noarch ovirt-iso-uploader-3.3.0-1.el6.noarch ovirt-engine-dbscripts-3.3.0-1.el6.noarch ovirt-engine-lib-3.3.0-4.el6.noarch
Today I was playing around with snapshots, when I noticed that the Snapshots panel didn't show any of the snapshots I created, not even the 'Current - Active VM' snapshot that all VMs have.
Not sure why this has happened. How do you know that snapshot creation was completed? Did you look at the events tab? (Asking to be sure) engine.log will be quite helpful here.
I find engine.log somewhat hard to read, to be honest, and documentation is hard to find, but I think I found some clues.
Hi, I understand what you're saying about engine.log, when I asked for it, it was because I'm one of the maintainers of ovirt engine, so I thought I could give you a hand here, especially after reading your email and getting a sense that I saw a similar issue in the past.
I tried to create 4 snapshots of a certain VM, 2 of which completed normally and 2 of which failed:
"Failed with VDSM error SNAPSHOT_FAILED and code 48"
However, what I find most upsetting, is that the VMs that disappeared were not the subject of my experiments. I was creating snapshots of a single VM, and the VMs that disappeared were unrelated. As a matter of fact, the VM I was experimenting with IS THE ONLY ONE that survived.
By the way, the Snapshots panel has been displaying snapshots correctly for a while, but when I logged in this morning, it appeared empty again, for all VMs.
Is there anything I can check to see what causes this?
Not sure what to do, I decided to restart the ovirt-engine process.
When I logged back on to the administrator panel, I was shocked to see 2endWith of my 4 VMs completely missing from the inventory. I haven't been able to find back a single trace of either machine, neither in the portal nor on disk. It seems like they never existed. The storage of both VMs seems to be erased from the data domain.
Not sure why storage domain was erased. About Vms disappeared - there were previous discussions on that at users@ovirt.org. In a nutshell, due to a bug (that was already fixed) prior to the restart you might have had records at the table that contained value of "empty guid" (a string in UUID format with only 0 and - ) at the vdsm_task_id_column. This means that the task is not associated with a real SPM task, and when the engine restarts, if for a given flow (let's say - snapshot creation) there are tasks with such vdsm_task_id, the flow will end with failure. For some flows , ending with failure means erasing the vm (for example - real failure of importing a vm). By the way, similar issue can probably occur with disks as well, as there are flows that run async tasks that deal with disks.
I think I have an idea about what happended now.
The 2 disappeared VMs have been imported into oVirt using virt-v2v. The 3rd one that's now missing a disk volume was not, but I have been playing with storage migration in the past.
Then this is the reason, other users have complained about it at users@ovirt.org
Yesterday's engine.log seems to suggest, that all of these tasks (importing the 2 VMs and trying to move a volume) have been restarted immediately after restarting Engine. After failure, the VMs and volume were removed. It seems to fit the above description of the bug.
So...
What can I do to prevent this from happening again?
Should I periodically check the 'async_tasks' table for anomalies? Is there a bugfix I can apply, or should I wait for a new release of oVirt? If the latter, when is that expected to happen?
Upgrade I just talked with Ofer (CC'ed), our release engineer, and he said that all packages should be 3.3.0-4 (notice ovirt-engine is not) I hope this helps you out, Yair
Thanks, Martijn. _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (2)
-
Martijn Grendelman
-
Yair Zaslavsky