Hi Nicolas,
are you still able to reproduce this issue? Are you using fedora or
centos?
If providing the logs is problematic for you could you try to ping me
on irc (fsimonce on #ovirt OFTC) so that we can work on the issue
together?
Thanks,
--
Federico
----- Original Message -----
From: "Nicolas Ecarnot" <nicolas(a)ecarnot.net>
To: "users" <users(a)ovirt.org>
Sent: Monday, January 20, 2014 11:06:21 AM
Subject: [Users] "Volume Group does not exist". Blame device-mapper ?
Hi,
oVirt 3.3, no big issue since the recent snapshot joke, but all in all
running fine.
All my VM are stored in a iSCSI SAN. The VM usually are using only one
or two disks (1: system, 2: data) and it is OK.
Friday, I created a new LUN. Inside a VM, I linked to it via iscsiadm
and successfully login to the Lun (session, automatic attach on boot,
read, write) : nice.
Then after detaching it and shuting down the MV, and for the first time,
I tried to make use of the feature "direct attach" to attach the disk
directly from oVirt, login the session via oVirt.
I connected nice and I saw the disk appear in my VM as /dev/sda or
whatever. I was able to mount it, read and write.
Then disaster stoke all this : many nodes suddenly began to become
unresponsive, quickly migrating their VM to the remaining nodes.
Hopefully, the migrations ran fine and I lost no VM nor downtime, but I
had to reboot every concerned node (other actions failed).
In the failing nodes, /var/log/messages showed the log you can read in
the end of this message.
I first get device-mapper warnings, then the host unable to collaborate
with the logical volumes.
The 3 volumes are the three main storage domains, perfectly up and
running where I store my oVirt VMs.
My reflexions :
- I'm not sure device-mapper is to blame. I frequently see device mapper
complaining and nothing is getting worse (not oVirt specifically)
- I have not change my network settings for months (bonding, linking...)
The only new factor is the usage of direct attach LUN.
- This morning I was able to reproduce the bug, just by trying again
this attachement, and booting the VM. No mounting of the LUN, just VM
booting, waiting, and this is enough to crash oVirt.
- when the disaster happens, usually, amongst the nodes, only three
nodes gets stroke, the only one that run VMs. Obviously, after
migration, different nodes are hosting the VMs, and those new nodes are
the one that then get stroke.
This is quite reproductible.
And frightening.