[ovirt-users] oVirt 4 master storage down - unable to activate after power loss

Robin Vanderveken rvanderveken at dlttechnologies.com
Mon Jul 25 08:31:24 UTC 2016


Dear oVirt users

I've been having several problems with my oVirt nodes utilising GlusterFS
after simulating a power loss.

Our testing setup for oVirt consists of 3 nodes, 2 with GlusterFS storage
and 1 just for computing. Everything seems to be setup correct and was
working correctly. Then after simulating a power loss the master storage
goes down, and therefore the VMs and data center go down as well.

I checked the GlusterFS configuration and it seems to be correct (see
attachment). I checked the oVirt configuration and it seems to be
correct (see attachment). I tried putting several Nodes in Maintenace
several times, even putting in maintenance and reinstalling them. Only when
putting the nodes in maintenance (and choosing to stop the GlusterFS
service) triggers Contending on the other Nodes. Only then there is a
chance that a Node goes from Contending to SPM, which does not always
happen. After trying this several times I got a main Node to become SPM,
but the master storage remains down. When I select the master storage, go
in the Data center, select the data center and click Activate, then both
the master storage and the data center go in the state Locked for a few
seconds and then Inactive again.

Then I upgraded to oVirt 4 (oVirt Engine Version: 4.0.0.6-1.el7.centos) and
tried everything again, resulting in the same result.

I searched online and found this mailing list which is very similar:
http://lists.ovirt.org/pipermail/users/2014-August/026905.html .
Unfortunately the solution was not posted/mailed. I also found this:
https://www.mail-archive.com/users@ovirt.org/msg08105.html suggesting the
kernel version is not correct, but I am unsure how to check this.

In the attachment I added relevant logs:
- GlusterFS service on both main nodes
- /var/log/vdsm/vdsm.log , /var/log/vdsm/supervdsm.log , /var/log/messages
, /var/log/sanlock.log , /var/log/ovirt-engine/engine.log-20160725.gz ,
/var/log/ovirt-engine/engine.log of the oVirt engine
as attachments (I clicked on Activate on the data center at 9:45:30 am
GMT). It can be possible that I need to send different engine logs, please
tell me if necessary. Any help would be highly appreciated.

Kind regards
Robin Vanderveken

-- 
Robin Vanderveken
DLT Technologies
Tel. +352 691 412 922

-- 
*Disclaimer*

*Les informations contenues dans ce message sont destinées exclusivement à 
l’usage de la personne à laquelle elles sont adressées. Elles peuvent 
contenir des données confidentielles et/ou privilégiées et/ou protégées par 
des droits de propriété intellectuelle ou d’autres lois. Si vous avez reçu 
ce message par erreur, veuillez le détruire, ainsi que toutes ses annexes, 
et notifier son expéditeur ; il vous est interdit de copier ou d’utiliser 
ce messages ou ses annexes ou de divulguer son contenu à quiconque. La 
transmission de données par e-mail ne peut être garantie comme un moyen sûr 
et infaillible, ni comme étant exempt de tout virus. L’expéditeur décline 
toute responsabilité en cas de perte ou de dommage résultant de ce message 
ou de son utilisation, quelle qu’elle soit. *

*The information contained in this message is intended for the addressee 
only and may contain confidential and/or privileged information and/or 
information protected by intellectual property rights or other legal rules. 
If you are not the intended recipient, please delete this message and any 
attachment to it and notify the sender; you may not copy or use this 
message or its attachments in any way nor disclose its contents to anyone. 
Emails cannot be guaranteed to be secure or to be error or virus free. No 
liability is accepted by the sender for any loss damage arising in any way 
from this message or its use. *
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160725/95e9b721/attachment-0001.html>
-------------- next part --------------
[root at oVirt-Node1 ~]# gluster volume info

Volume Name: iso
Type: Distribute
Volume ID: 16d455a3-a80e-4087-b20c-b25a9e803930
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.0.0.194:/gluster-bricks/iso/iso
Brick2: 10.0.0.199:/gluster-bricks/iso/iso
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: off
user.cifs: enable
auth.allow: *
storage.owner-gid: 36
storage.owner-uid: 36

Volume Name: vol1
Type: Replicate
Volume ID: 58a38f2c-07a3-44ed-a144-cd9534c7ad5e
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.0.0.194:/gluster-bricks/vol1/vol1
Brick2: 10.0.0.199:/gluster-bricks/vol1/vol1
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: off
user.cifs: enable
auth.allow: *
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
storage.owner-uid: 36
storage.owner-gid: 36
server.allow-insecure: on
network.ping-timeout: 10
[root at oVirt-Node1 ~]# gluster pool list
UUID                                    Hostname        State
4a6f1618-23b3-4ed1-9683-0b7e04a441ef    oVirt-Node2     Connected
69db979d-ed31-4394-9eaa-df506da11b44    localhost       Connected
[root at oVirt-Node1 ~]# systemctl status glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2016-07-21 16:40:04 CEST; 3 days ago
  Process: 23442 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 23443 (glusterd)
   CGroup: /system.slice/glusterd.service
           ├─23443 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level IN...
           ├─23473 /usr/sbin/glusterfsd -s 10.0.0.194 --volfile-id iso.10.0.0...
           ├─23478 /usr/sbin/glusterfsd -s 10.0.0.194 --volfile-id vol1.10.0....
           ├─23767 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs ...
           ├─23776 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glus...
           └─23784 /sbin/rpc.statd

Jul 21 16:40:02 oVirt-Node1 systemd[1]: Starting GlusterFS, a clustered file....
Jul 21 16:40:04 oVirt-Node1 systemd[1]: Started GlusterFS, a clustered file-....
Jul 21 16:40:13 oVirt-Node1 rpc.statd[23784]: Version 1.3.0 starting
Jul 21 16:40:13 oVirt-Node1 sm-notify[23785]: Version 1.3.0 starting
Hint: Some lines were ellipsized, use -l to show in full.
[root at oVirt-Node1 ~]# ping 10.0.0.199
PING 10.0.0.199 (10.0.0.199) 56(84) bytes of data.
64 bytes from 10.0.0.199: icmp_seq=1 ttl=64 time=0.236 ms
64 bytes from 10.0.0.199: icmp_seq=2 ttl=64 time=0.629 ms
64 bytes from 10.0.0.199: icmp_seq=3 ttl=64 time=0.658 ms
64 bytes from 10.0.0.199: icmp_seq=4 ttl=64 time=0.250 ms
64 bytes from 10.0.0.199: icmp_seq=5 ttl=64 time=0.175 ms
64 bytes from 10.0.0.199: icmp_seq=6 ttl=64 time=0.617 ms
^C
--- 10.0.0.199 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5002ms
rtt min/avg/max/mdev = 0.175/0.427/0.658/0.209 ms
-------------- next part --------------
[root at ovirt-node2 ~]# gluster volume info

Volume Name: iso
Type: Distribute
Volume ID: 16d455a3-a80e-4087-b20c-b25a9e803930
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.0.0.194:/gluster-bricks/iso/iso
Brick2: 10.0.0.199:/gluster-bricks/iso/iso
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: off
user.cifs: enable
auth.allow: *
storage.owner-gid: 36
storage.owner-uid: 36

Volume Name: vol1
Type: Replicate
Volume ID: 58a38f2c-07a3-44ed-a144-cd9534c7ad5e
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.0.0.194:/gluster-bricks/vol1/vol1
Brick2: 10.0.0.199:/gluster-bricks/vol1/vol1
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: off
user.cifs: enable
auth.allow: *
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
storage.owner-uid: 36
storage.owner-gid: 36
server.allow-insecure: on
network.ping-timeout: 10
[root at ovirt-node2 ~]# gluster pool list
UUID                                    Hostname        State
69db979d-ed31-4394-9eaa-df506da11b44    oVirt-Node1     Connected
4a6f1618-23b3-4ed1-9683-0b7e04a441ef    localhost       Connected
[root at ovirt-node2 ~]# systemctl status glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-07-25 08:50:20 CEST; 40min ago
 Main PID: 14536 (glusterd)
   CGroup: /system.slice/glusterd.service
           ├─10039 /usr/sbin/glusterfsd -s 10.0.0.199 --volfile-id iso.10.0.0...
           ├─10044 /usr/sbin/glusterfsd -s 10.0.0.199 --volfile-id vol1.10.0....
           ├─14536 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level IN...
           ├─14988 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs ...
           ├─15099 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glus...
           └─15126 /sbin/rpc.statd

Jul 25 08:50:19 ovirt-node2 systemd[1]: Starting GlusterFS, a clustered file....
Jul 25 08:50:20 ovirt-node2 systemd[1]: Started GlusterFS, a clustered file-....
Jul 25 08:50:23 ovirt-node2 rpc.statd[15126]: Version 1.3.0 starting
Jul 25 08:50:23 ovirt-node2 sm-notify[15127]: Version 1.3.0 starting
Hint: Some lines were ellipsized, use -l to show in full.
[root at ovirt-node2 ~]# ping 10.0.0.194
PING 10.0.0.194 (10.0.0.194) 56(84) bytes of data.
64 bytes from 10.0.0.194: icmp_seq=1 ttl=64 time=0.601 ms
64 bytes from 10.0.0.194: icmp_seq=2 ttl=64 time=0.534 ms
64 bytes from 10.0.0.194: icmp_seq=3 ttl=64 time=0.711 ms
64 bytes from 10.0.0.194: icmp_seq=4 ttl=64 time=0.587 ms
64 bytes from 10.0.0.194: icmp_seq=5 ttl=64 time=0.737 ms
64 bytes from 10.0.0.194: icmp_seq=6 ttl=64 time=0.667 ms
^C
--- 10.0.0.194 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5003ms
rtt min/avg/max/mdev = 0.534/0.639/0.737/0.075 ms
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vdsm.log
Type: application/octet-stream
Size: 236297 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160725/95e9b721/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: supervdsm.log
Type: application/octet-stream
Size: 232939 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160725/95e9b721/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: messages
Type: application/octet-stream
Size: 13366 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160725/95e9b721/attachment-0007.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sanlock.log
Type: application/octet-stream
Size: 226896 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160725/95e9b721/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: engine.log-20160725.gz
Type: application/x-gzip
Size: 1517139 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160725/95e9b721/attachment-0001.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: engine.log
Type: application/octet-stream
Size: 1325119 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160725/95e9b721/attachment-0009.obj>


More information about the Users mailing list