[Users] VMs and volumes disappearing

Yair Zaslavsky yzaslavs at redhat.com
Mon Sep 30 20:04:41 UTC 2013



----- Original Message -----
> From: "Martijn Grendelman" <Martijn.Grendelman at isaac.nl>
> To: users at ovirt.org
> Sent: Monday, September 30, 2013 10:43:33 PM
> Subject: [Users] VMs and volumes disappearing
> 
> Hi,
> 
> I have recently set up an oVirt environment, I think in a pretty
> standard fashion, with engine 3.3 on one host, one oVirt host on a
> physical machine, both running CentOS 6.4, using NFS for all storage
> domains.

Please provide rpm -qa on the ovirt rpms (ovirt engine).

> 
> Today I was playing around with snapshots, when I noticed that the
> Snapshots panel didn't show any of the snapshots I created, not even the
> 'Current - Active VM' snapshot that all VMs have.

Not sure why this has happened. How do you know that snapshot creation was completed? Did you look at the events tab? (Asking to be sure)
engine.log will be quite helpful here.

> 
> Not sure what to do, I decided to restart the ovirt-engine process.
> 
> When I logged back on to the administrator panel, I was shocked to see 2endWith
> of my 4 VMs completely missing from the inventory. I haven't been able
> to find back a single trace of either machine, neither in the portal nor
> on disk. It seems like they never existed. The storage of both VMs seems
> to be erased from the data domain.
Not sure why storage domain was erased. About Vms disappeared - there were previous discussions on that at users at ovirt.org.
In a nutshell, due to a bug (that was already fixed) prior to the restart you might have had records at the async_tasks table that contained value of "empty guid" (a string in UUID format with
only 0 and - ) at the vdsm_task_id_column. This means that the task is not associated with a real SPM task, and when the engine restarts, if for a given flow (let's say - snapshot creation) there are tasks with such vdsm_task_id,  the flow will end with failure. For some flows , ending with failure means erasing the vm (for example - real failure of importing a vm).
By the way, similar issue can probably occur with disks as well, as there are flows that run async tasks that deal with disks.


> 
> A 3rd VM is down and refuses to start: "Exit message: Volume
> 337a410f-1598-4a7f-9afd-c0160c329563 is corrupted or missing."
> 
> and in vdsm.log on the host:
> 
> OSError: [Errno 2] No such file or directory:
> '/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/d523a48d-7a34-4bb0-9d48-2092934af816/images/e803ad34-94e5-4180-b26f-7271bfca5923/337a410f-1598-4a7f-9afd-c0160c329563'
> 
> So it seems something is seriously f*cked up. Now what? Any ideas what
> may have caused this? And more importantly, how do I prevent something
> like this from happening again?
> 
> Perhaps a needless addition, but I am very scared to host anything
> remotely important on oVirt now.
> 
> Regards,
> Martijn.
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 



More information about the Users mailing list