no spm node in cluster and unable to start any vm or stopped storage domain

Hi, after an upgrade I get the following errors in the web gui: VDSM ovirt-node01 command SpmStatusVDS failed: (13, 'Sanlock resource read failure', 'Permission denied') VDSM ovirt-node03 command HSMGetAllTasksStatusesVDS failed: Not SPM These messages happen from all nodes. I can stop vms and migrate them but I cannot start any vm again How do I get bet to a sane state where one node is SPM. Best, Moritz

Adding Nir, can you please have a look? On Mon, May 29, 2017 at 2:15 PM, Moritz Baumann <moritz.baumann@inf.ethz.ch> wrote:
Here is the vdsm.log from one node
Cheers, Moritz
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>

On Mon, May 29, 2017 at 3:08 PM, Moritz Baumann <moritz.baumann@inf.ethz.ch> wrote:
Hi, after an upgrade I get the following errors in the web gui:
VDSM ovirt-node01 command SpmStatusVDS failed: (13, 'Sanlock resource read failure', 'Permission denied')
What do you see on sanlock.log? What kind of storage do you have, that you experience a permission denied error? Is it a file-system based one, with actual permission issue? Y.
VDSM ovirt-node03 command HSMGetAllTasksStatusesVDS failed: Not SPM
These messages happen from all nodes.
I can stop vms and migrate them but I cannot start any vm again
How do I get bet to a sane state where one node is SPM.
Best, Moritz _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi,
VDSM ovirt-node01 command SpmStatusVDS failed: (13, 'Sanlock resource read failure', 'Permission denied')
What do you see on sanlock.log?
I have attached the gzipped sanlock
What kind of storage do you have, that you experience a permission denied error?
it is nfs3 based and I did an upgrade from RHEL 6.8 -> 6.9 and rebooted the storage. I paused all running vms but forgot to put the storage domains into maintenance. At the moment all vms are stopped (clean) so I "could" remove the nodes and reinstall if this helps.
Is it a file-system based one, with actual permission issue?
filesystem permissions are good, selinux is permissive at the moment. Cheers, Mo

it is nfs3 based and I did an upgrade from RHEL 6.8 -> 6.9 and rebooted the storage. I paused all running vms but forgot to put the storage domains into maintenance.
scratch-inf[0]:/export/scratch/ovirt# ls -lr * iso: total 0 -rwxr-xr-x. 1 36 kvm 0 29. Mai 18:02 __DIRECT_IO_TEST__ drwxr-xr-x. 4 36 kvm 32 21. Jul 2015 2851dcfe-3f64-408a-ad4a-c416790696eb export: total 0 -rwxr-xr-x. 1 36 kvm 0 29. Mai 16:28 __DIRECT_IO_TEST__ drwxr-xr-x. 5 36 kvm 45 21. Jul 2015 4cda1489-e241-4186-9500-0fd61640d895 data: total 0 -rwxr-xr-x. 1 36 kvm 0 29. Mai 18:02 __DIRECT_IO_TEST__ drwxr-xr-x. 5 36 kvm 45 23. Jul 2015 c17d9d7f-e578-4626-a5d9-94ea555d7115 scratch-inf[0]:/export/scratch/ovirt# exportfs -v | grep ovirt /export/scratch/ovirt @ovirt-scratch(rw,async,wdelay,no_root_squash,no_subtree_check,fsid=200,sec=sys,rw,no_root_squash,no_all_squash) I can write on nfs from all nodes as root (did not try with uid 36 but don't see why it shouldn't work) Cheers, Mo

I found some info on how to import an abandomed export domain https://www.ovirt.org/documentation/how-to/storage/clear-the-storage-domain-... would the same (empty POOL_UUID and SHA_CKSUM in metadata) allow me to import the data domain as a new one (and keep the existing vms) ?

Just an idea, but could this be related to stale mounts from when you rebooted the storage? Please try the following: 1. Place all nodes into maintenance mode 2. Disable the ovirt NFS exports 1. Comment out lines in /etc/exports 2. exportfs -r 3. Reboot your nodes 4. Re-enable the ovirt NFS exports 5. Activate your nodes On Wed, May 31, 2017 at 10:04 AM, Moritz Baumann <moritz.baumann@inf.ethz.ch
wrote:
I found some info on how to import an abandomed export domain
https://www.ovirt.org/documentation/how-to/storage/clear- the-storage-domain-pool-config-of-an-exported-nfs-domain/
would the same (empty POOL_UUID and SHA_CKSUM in metadata) allow me to import the data domain as a new one (and keep the existing vms) ?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
-- Adam Litke

Hi Adam,
Just an idea, but could this be related to stale mounts from when you rebooted the storage? Please try the following:
1. Place all nodes into maintenance mode 2. Disable the ovirt NFS exports 1. Comment out lines in /etc/exports 2. exportfs -r 3. Reboot your nodes 4. Re-enable the ovirt NFS exports 5. Activate your nodes
all storage domains (data/iso) are down, so is the data center (non-responsive) and no nfs mount is on any of the nodes. I can however manually mount the data export and touch a file (as root). So I think stale mounts is not the issue. However I did the steps and the result is the same. Best, Mo

I'm no NFS expert but for development domains I use the following options: rw,sync,no_subtree_check,all_squash,anonuid=36,anongid=36 I wonder if something subtle changed on upgrade that interacts poorly with your configuration? On Wed, May 31, 2017 at 11:34 AM, Moritz Baumann <moritz.baumann@inf.ethz.ch
wrote:
Hi Adam,
Just an idea, but could this be related to stale mounts from when you
rebooted the storage? Please try the following:
1. Place all nodes into maintenance mode 2. Disable the ovirt NFS exports 1. Comment out lines in /etc/exports 2. exportfs -r 3. Reboot your nodes 4. Re-enable the ovirt NFS exports 5. Activate your nodes
all storage domains (data/iso) are down, so is the data center (non-responsive) and no nfs mount is on any of the nodes.
I can however manually mount the data export and touch a file (as root).
So I think stale mounts is not the issue.
However I did the steps and the result is the same.
Best, Mo
-- Adam Litke
participants (4)
-
Adam Litke
-
Moritz Baumann
-
Sandro Bonazzola
-
Yaniv Kaul