<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br></div><div class="gmail_quote"><div dir="ltr"><div style="font-family:verdana,sans-serif;font-size:small">Hi everyone,</div><div style="font-family:verdana,sans-serif;font-size:small"><br></div><div style="font-family:verdana,sans-serif;font-size:small">Until today my environment was fully updated (3.6.5+centos7.2) with 3 nodes (kvm1,kvm2 and kvm3 hosts) . I also have 3 external gluster nodes (gluster-root1,gluster1 and gluster2 hosts ) , replica 3, which the engine storage domain is sitting on top (3.7.11 fully updated+centos7.2)</div><div style="font-family:verdana,sans-serif;font-size:small"><br></div><div style="font-family:verdana,sans-serif;font-size:small"><font color="#ff0000">For some weird reason i've been receiving emails from oVirt with EngineUnexpectedDown (attached picture) on a daily basis more or less, but the engine seems to be working fine and my vm's are up and running normally. I've never had any issue to access the User Interface to manage the vm's </font></div><div style="font-family:verdana,sans-serif;font-size:small"><br></div><div style="font-family:verdana,sans-serif;font-size:small">Today I run "yum update" on the nodes and realised that vdsm was outdated, so I updated the kvm hosts and they are now , again, fully updated. </div><div style="font-family:verdana,sans-serif;font-size:small"><br></div><div style="font-family:verdana,sans-serif;font-size:small"><br></div><div style="font-family:verdana,sans-serif"><font size="4">Reviewing the logs It seems to be an intermittent connectivity issue when trying to access the gluster engine storage domain as you can see below. I don't have any network issue in place and I'm 100% sure about it. I have another oVirt Cluster using the same network and using a engine storage domain on top of an iSCSI Storage Array with no issues.</font></div><div style="font-family:verdana,sans-serif;font-size:small"><br></div><div style="font-family:verdana,sans-serif"><b><font size="4">Here seems to be the issue:</font></b></div><div style="font-family:verdana,sans-serif;font-size:small"><br></div>
<p><span>Thread-1111::INFO::2016-04-27 23:01:27,864::fileSD::357::Storage.StorageDomain::(validate) sdUUID=03926733-1872-4f85-bb21-18dc320560db</span></p>
<p><span>Thread-1111::DEBUG::2016-04-27 23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=[]</span></p>
<p><span>Thread-1111::DEBUG::2016-04-27 23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh) Empty metadata</span></p>
<p><span>Thread-1111::</span><span>ERROR</span><span>::2016-04-27 23:01:27,865::task::866::Storage.TaskManager.Task::(_setError) Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected error</span></p>
<p><span>Traceback (most recent call last):</span></p>
<p><span> File "/usr/share/vdsm/storage/task.py", line 873, in _run</span></p>
<p><span> return fn(*args, **kargs)</span></p>
<p><span> File "/usr/share/vdsm/logUtils.py", line 49, in wrapper</span></p>
<p><span> res = f(*args, **kwargs)</span></p>
<p><span> File "/usr/share/vdsm/storage/hsm.py", line 2835, in getStorageDomainInfo</span></p>
<p><span> dom = self.validateSdUUID(sdUUID)</span></p>
<p><span> File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID</span></p>
<p><span> sdDom.validate()</span></p>
<p><span> File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate</span></p>
<p><span> raise se.StorageDomainAccessError(self.sdUUID)</span></p>
<p><span>StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: (u'03926733-1872-4f85-bb21-18dc320560db',)</span></p>
<p><span>Thread-1111::DEBUG::2016-04-27 23:01:27,865::task::885::Storage.TaskManager.Task::(_run) Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Task._run: d2acf575-1a60-4fa0-a5bb-cd4363636b94 ('03926733-1872-4f85-bb21-18dc320560db',) {} failed - stopping task</span></p>
<p><span>Thread-1111::DEBUG::2016-04-27 23:01:27,865::task::1246::Storage.TaskManager.Task::(stop) Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::stopping in state preparing (force False)</span></p>
<p><span>Thread-1111::DEBUG::2016-04-27 23:01:27,865::task::993::Storage.TaskManager.Task::(_decref) Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::ref 1 aborting True</span></p>
<p><span>Thread-1111::INFO::2016-04-27 23:01:27,865::task::1171::Storage.TaskManager.Task::(prepare) Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::aborting: Task is aborted: 'Domain is either partially accessible or entirely inaccessible' - code 379</span></p>
<div style="font-family:verdana,sans-serif;font-size:small"><span style="font-family:arial,sans-serif">Thread-1111::DEBUG::2016-04-27 23:01:27,866::task::1176::Storage.TaskManager.Task::(prepare) Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Prepare: aborted: Domain is either partially accessible or entirely inaccessible</span> </div><div style="font-family:verdana,sans-serif;font-size:small"><br></div><div style="font-family:verdana,sans-serif;font-size:small"><br></div><div style="font-family:verdana,sans-serif"><font size="4"><b>Question: <font color="#ff0000">Anyone know what might be happening?</font> I have several gluster config's, as you can see below. All the storage domain are using the same config's</b></font></div><div style="font-family:verdana,sans-serif;font-size:small"><br></div><div style="font-family:verdana,sans-serif;font-size:small"><br></div><div style="font-family:verdana,sans-serif"><b><font size="4">More information:</font></b></div><div style="font-family:verdana,sans-serif;font-size:small"><br></div><div style="font-family:verdana,sans-serif;font-size:small">I have the "engine" storage domain, "vmos1" storage domain and "master" storage domain, so everything looks good.</div><div style="font-family:verdana,sans-serif">
<p style="font-size:small"><span>[root@kvm1 vdsm]# vdsClient -s 0 getStorageDomainsList</span></p>
<p style="font-size:small"><span>03926733-1872-4f85-bb21-18dc320560db</span></p>
<p style="font-size:small"><span>35021ff4-fb95-43d7-92a3-f538273a3c2e</span></p>
<p style="font-size:small"><span>e306e54e-ca98-468d-bb04-3e8900f8840c</span></p><p style="font-size:small"><span><br></span></p><p><span><b><font size="4">Gluster config:</font></b></span></p><p style="font-size:small"><span>[root@gluster-root1 ~]# gluster volume info</span></p><p style="font-size:small"><span> </span></p><p style="font-size:small"><span>Volume Name: engine</span></p><p style="font-size:small"><span>Type: Replicate</span></p><p style="font-size:small"><span>Volume ID: 64b413d2-c42e-40fd-b356-3e6975e941b0</span></p><p style="font-size:small"><span>Status: Started</span></p><p style="font-size:small"><span>Number of Bricks: 1 x 3 = 3</span></p><p style="font-size:small"><span>Transport-type: tcp</span></p><p style="font-size:small"><span>Bricks:</span></p><p style="font-size:small"><span>Brick1: gluster1.xyz.com:/gluster/engine/brick1</span></p><p style="font-size:small"><span>Brick2: gluster2.xyz.com:/gluster/engine/brick1</span></p><p style="font-size:small"><span>Brick3: gluster-root1.xyz.com:/gluster/engine/brick1</span></p><p style="font-size:small"><span>Options Reconfigured:</span></p><p style="font-size:small"><span>performance.cache-size: 1GB</span></p><p style="font-size:small"><span>performance.write-behind-window-size: 4MB</span></p><p style="font-size:small"><span>performance.write-behind: off</span></p><p style="font-size:small"><span>performance.quick-read: off</span></p><p style="font-size:small"><span>performance.read-ahead: off</span></p><p style="font-size:small"><span>performance.io-cache: off</span></p><p style="font-size:small"><span>performance.stat-prefetch: off</span></p><p style="font-size:small"><span>cluster.eager-lock: enable</span></p><p style="font-size:small"><span>cluster.quorum-type: auto</span></p><p style="font-size:small"><span>network.remote-dio: enable</span></p><p style="font-size:small"><span>cluster.server-quorum-type: server</span></p><p style="font-size:small"><span>cluster.data-self-heal-algorithm: full</span></p><p style="font-size:small"><span>performance.low-prio-threads: 32</span></p><p style="font-size:small"><span>features.shard-block-size: 512MB</span></p><p style="font-size:small"><span>features.shard: on</span></p><p style="font-size:small"><span>storage.owner-gid: 36</span></p><p style="font-size:small"><span>storage.owner-uid: 36</span></p><p style="font-size:small"><span>
</span></p><p style="font-size:small"><span>performance.readdir-ahead: on</span></p><p style="font-size:small"><span><br></span></p><p style="font-size:small"><span>Volume Name: master</span></p><p style="font-size:small"><span>Type: Replicate</span></p><p style="font-size:small"><span>Volume ID: 20164808-7bbe-4eeb-8770-d222c0e0b830</span></p><p style="font-size:small"><span>Status: Started</span></p><p style="font-size:small"><span>Number of Bricks: 1 x 3 = 3</span></p><p style="font-size:small"><span>Transport-type: tcp</span></p><p style="font-size:small"><span>Bricks:</span></p><p style="font-size:small"><span>Brick1: gluster1.xyz.com:/home/storage/master/brick1</span></p><p style="font-size:small"><span>Brick2: gluster2.xyz.com:/home/storage/master/brick1</span></p><p style="font-size:small"><span>Brick3: gluster-root1.xyz.com:/home/storage/master/brick1</span></p><p style="font-size:small"><span>Options Reconfigured:</span></p><p style="font-size:small"><span>performance.readdir-ahead: on</span></p><p style="font-size:small"><span>performance.quick-read: off</span></p><p style="font-size:small"><span>performance.read-ahead: off</span></p><p style="font-size:small"><span>performance.io-cache: off</span></p><p style="font-size:small"><span>performance.stat-prefetch: off</span></p><p style="font-size:small"><span>cluster.eager-lock: enable</span></p><p style="font-size:small"><span>network.remote-dio: enable</span></p><p style="font-size:small"><span>cluster.quorum-type: auto</span></p><p style="font-size:small"><span>cluster.server-quorum-type: server</span></p><p style="font-size:small"><span>storage.owner-uid: 36</span></p><p style="font-size:small"><span>storage.owner-gid: 36</span></p><p style="font-size:small"><span>features.shard: on</span></p><p style="font-size:small"><span>features.shard-block-size: 512MB</span></p><p style="font-size:small"><span>performance.low-prio-threads: 32</span></p><p style="font-size:small"><span>cluster.data-self-heal-algorithm: full</span></p><p style="font-size:small"><span>performance.write-behind: off</span></p><p style="font-size:small"><span>performance.write-behind-window-size: 4MB</span></p><p style="font-size:small"><span>
</span></p><p style="font-size:small"><span>performance.cache-size: 1GB</span></p><p style="font-size:small"><span><br></span></p><p style="font-size:small"><span>Volume Name: vmos1</span></p><p style="font-size:small"><span>Type: Replicate</span></p><p style="font-size:small"><span>Volume ID: ea8fb50e-7bc8-4de3-b775-f3976b6b4f13</span></p><p style="font-size:small"><span>Status: Started</span></p><p style="font-size:small"><span>Number of Bricks: 1 x 3 = 3</span></p><p style="font-size:small"><span>Transport-type: tcp</span></p><p style="font-size:small"><span>Bricks:</span></p><p style="font-size:small"><span>Brick1: gluster1.xyz.com:/gluster/vmos1/brick1</span></p><p style="font-size:small"><span>Brick2: gluster2.xyz.com:/gluster/vmos1/brick1</span></p><p style="font-size:small"><span>Brick3: gluster-root1.xyz.com:/gluster/vmos1/brick1</span></p><p style="font-size:small"><span>Options Reconfigured:</span></p><p style="font-size:small"><span>network.ping-timeout: 60</span></p><p style="font-size:small"><span>performance.readdir-ahead: on</span></p><p style="font-size:small"><span>performance.quick-read: off</span></p><p style="font-size:small"><span>performance.read-ahead: off</span></p><p style="font-size:small"><span>performance.io-cache: off</span></p><p style="font-size:small"><span>performance.stat-prefetch: off</span></p><p style="font-size:small"><span>cluster.eager-lock: enable</span></p><p style="font-size:small"><span>network.remote-dio: enable</span></p><p style="font-size:small"><span>cluster.quorum-type: auto</span></p><p style="font-size:small"><span>cluster.server-quorum-type: server</span></p><p style="font-size:small"><span>storage.owner-uid: 36</span></p><p style="font-size:small"><span>storage.owner-gid: 36</span></p><p style="font-size:small"><span>features.shard: on</span></p><p style="font-size:small"><span>features.shard-block-size: 512MB</span></p><p style="font-size:small"><span>performance.low-prio-threads: 32</span></p><p style="font-size:small"><span>cluster.data-self-heal-algorithm: full</span></p><p style="font-size:small"><span>performance.write-behind: off</span></p><p style="font-size:small"><span>performance.write-behind-window-size: 4MB</span></p><p style="font-size:small"><span>
</span></p><p style="font-size:small"><span>performance.cache-size: 1GB</span></p><p style="font-size:small"><span><br></span></p><p style="font-size:small"><span><br></span></p><p style="font-size:small"><span>Attached goes all the logs...</span></p><p style="font-size:small"><span><br></span></p><p style="font-size:small"><span><br></span></p><p style="font-size:small"><span>Thanks</span></p><span class="HOEnZb"><font color="#888888"><p style="font-size:small"><span>-Luiz</span></p></font></span></div></div>
</div><br></div>