----- Original Message -----
From: "Martijn Grendelman"
<Martijn.Grendelman(a)isaac.nl>
To: users(a)ovirt.org
Sent: Monday, September 30, 2013 10:43:33 PM
Subject: [Users] VMs and volumes disappearing
Hi,
I have recently set up an oVirt environment, I think in a pretty
standard fashion, with engine 3.3 on one host, one oVirt host on a
physical machine, both running CentOS 6.4, using NFS for all storage
domains.
Please provide rpm -qa on the ovirt rpms (ovirt engine).
Today I was playing around with snapshots, when I noticed that the
Snapshots panel didn't show any of the snapshots I created, not even the
'Current - Active VM' snapshot that all VMs have.
Not sure why this has happened. How do you know that snapshot creation was completed? Did
you look at the events tab? (Asking to be sure)
engine.log will be quite helpful here.
Not sure what to do, I decided to restart the ovirt-engine process.
When I logged back on to the administrator panel, I was shocked to see 2endWith
of my 4 VMs completely missing from the inventory. I haven't been able
to find back a single trace of either machine, neither in the portal nor
on disk. It seems like they never existed. The storage of both VMs seems
to be erased from the data domain.
Not sure why storage domain was erased. About
Vms disappeared - there were previous discussions on that at users(a)ovirt.org.
In a nutshell, due to a bug (that was already fixed) prior to the restart you might have
had records at the async_tasks table that contained value of "empty guid" (a
string in UUID format with
only 0 and - ) at the vdsm_task_id_column. This means that the task is not associated with
a real SPM task, and when the engine restarts, if for a given flow (let's say -
snapshot creation) there are tasks with such vdsm_task_id, the flow will end with
failure. For some flows , ending with failure means erasing the vm (for example - real
failure of importing a vm).
By the way, similar issue can probably occur with disks as well, as there are flows that
run async tasks that deal with disks.
A 3rd VM is down and refuses to start: "Exit message: Volume
337a410f-1598-4a7f-9afd-c0160c329563 is corrupted or missing."
and in vdsm.log on the host:
OSError: [Errno 2] No such file or directory:
'/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/d523a48d-7a34-4bb0-9d48-2092934af816/images/e803ad34-94e5-4180-b26f-7271bfca5923/337a410f-1598-4a7f-9afd-c0160c329563'
So it seems something is seriously f*cked up. Now what? Any ideas what
may have caused this? And more importantly, how do I prevent something
like this from happening again?
Perhaps a needless addition, but I am very scared to host anything
remotely important on oVirt now.
Regards,
Martijn.
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users