This is a multi-part message in MIME format.
--------------090307040007000209030508
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
This seems like issue reported in
https://bugzilla.redhat.com/show_bug.cgi?id=1327121
Nir, Simone?
On 04/28/2016 05:35 AM, Luiz Claudio Prazeres Goncalves wrote:
Hi everyone,
Until today my environment was fully updated (3.6.5+centos7.2) with 3
nodes (kvm1,kvm2 and kvm3 hosts) . I also have 3 external gluster
nodes (gluster-root1,gluster1 and gluster2 hosts ) , replica 3, which
the engine storage domain is sitting on top (3.7.11 fully
updated+centos7.2)
For some weird reason i've been receiving emails from oVirt with
EngineUnexpectedDown (attached picture) on a daily basis more or less,
but the engine seems to be working fine and my vm's are up and running
normally. I've never had any issue to access the User Interface to
manage the vm's
Today I run "yum update" on the nodes and realised that vdsm was
outdated, so I updated the kvm hosts and they are now , again, fully
updated.
Reviewing the logs It seems to be an intermittent connectivity issue
when trying to access the gluster engine storage domain as you can see
below. I don't have any network issue in place and I'm 100% sure about
it. I have another oVirt Cluster using the same network and using a
engine storage domain on top of an iSCSI Storage Array with no issues.
*Here seems to be the issue:*
Thread-1111::INFO::2016-04-27
23:01:27,864::fileSD::357::Storage.StorageDomain::(validate)
sdUUID=03926733-1872-4f85-bb21-18dc320560db
Thread-1111::DEBUG::2016-04-27
23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh)
read lines (FileMetadataRW)=[]
Thread-1111::DEBUG::2016-04-27
23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh)
Empty metadata
Thread-1111::ERROR::2016-04-27
23:01:27,865::task::866::Storage.TaskManager.Task::(_setError)
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 873, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 2835, in
getStorageDomainInfo
dom = self.validateSdUUID(sdUUID)
File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID
sdDom.validate()
File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate
raise se.StorageDomainAccessError(self.sdUUID)
StorageDomainAccessError: Domain is either partially accessible or
entirely inaccessible: (u'03926733-1872-4f85-bb21-18dc320560db',)
Thread-1111::DEBUG::2016-04-27
23:01:27,865::task::885::Storage.TaskManager.Task::(_run)
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Task._run:
d2acf575-1a60-4fa0-a5bb-cd4363636b94
('03926733-1872-4f85-bb21-18dc320560db',) {} failed - stopping task
Thread-1111::DEBUG::2016-04-27
23:01:27,865::task::1246::Storage.TaskManager.Task::(stop)
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::stopping in state
preparing (force False)
Thread-1111::DEBUG::2016-04-27
23:01:27,865::task::993::Storage.TaskManager.Task::(_decref)
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::ref 1 aborting True
Thread-1111::INFO::2016-04-27
23:01:27,865::task::1171::Storage.TaskManager.Task::(prepare)
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::aborting: Task is
aborted: 'Domain is either partially accessible or entirely
inaccessible' - code 379
Thread-1111::DEBUG::2016-04-27
23:01:27,866::task::1176::Storage.TaskManager.Task::(prepare)
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Prepare: aborted: Domain
is either partially accessible or entirely inaccessible
*Question: Anyone know what might be happening? I have several gluster
config's, as you can see below. All the storage domain are using the
same config's*
*More information:*
I have the "engine" storage domain, "vmos1" storage domain and
"master" storage domain, so everything looks good.
[root@kvm1 vdsm]# vdsClient -s 0 getStorageDomainsList
03926733-1872-4f85-bb21-18dc320560db
35021ff4-fb95-43d7-92a3-f538273a3c2e
e306e54e-ca98-468d-bb04-3e8900f8840c
*Gluster config:*
[root@gluster-root1 ~]# gluster volume info
Volume Name: engine
Type: Replicate
Volume ID: 64b413d2-c42e-40fd-b356-3e6975e941b0
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.xyz.com:/gluster/engine/brick1
Brick2: gluster2.xyz.com:/gluster/engine/brick1
Brick3: gluster-root1.xyz.com:/gluster/engine/brick1
Options Reconfigured:
performance.cache-size: 1GB
performance.write-behind-window-size: 4MB
performance.write-behind: off
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
cluster.quorum-type: auto
network.remote-dio: enable
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
performance.low-prio-threads: 32
features.shard-block-size: 512MB
features.shard: on
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
Volume Name: master
Type: Replicate
Volume ID: 20164808-7bbe-4eeb-8770-d222c0e0b830
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.xyz.com:/home/storage/master/brick1
Brick2: gluster2.xyz.com:/home/storage/master/brick1
Brick3: gluster-root1.xyz.com:/home/storage/master/brick1
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.write-behind: off
performance.write-behind-window-size: 4MB
performance.cache-size: 1GB
Volume Name: vmos1
Type: Replicate
Volume ID: ea8fb50e-7bc8-4de3-b775-f3976b6b4f13
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.xyz.com:/gluster/vmos1/brick1
Brick2: gluster2.xyz.com:/gluster/vmos1/brick1
Brick3: gluster-root1.xyz.com:/gluster/vmos1/brick1
Options Reconfigured:
network.ping-timeout: 60
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.write-behind: off
performance.write-behind-window-size: 4MB
performance.cache-size: 1GB
Attached goes all the logs...
Thanks
-Luiz
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--------------090307040007000209030508
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: 8bit
<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
This seems like issue reported in
<a class="moz-txt-link-freetext"
href="https://bugzilla.redhat.com/show_bug.cgi?id=1327121">h...
<br>
Nir, Simone?<br>
<br>
<div class="moz-cite-prefix">On 04/28/2016 05:35 AM, Luiz Claudio
Prazeres Goncalves wrote:<br>
</div>
<blockquote
cite="mid:CABUQ0hCLphpHRk+RaMuk21rmSj1-hAMaA_1Upx3smQTK4qR7nw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_default"
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<div class="gmail_quote">
<div dir="ltr">
<div
style="font-family:verdana,sans-serif;font-size:small">Hi
everyone,</div>
<div
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<div
style="font-family:verdana,sans-serif;font-size:small">Until
today my environment was fully updated (3.6.5+centos7.2)
with 3 nodes (kvm1,kvm2 and kvm3 hosts) . I also have 3
external gluster nodes (gluster-root1,gluster1 and
gluster2 hosts ) , replica 3, which the engine storage
domain is sitting on top (3.7.11 fully updated+centos7.2)</div>
<div
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<div
style="font-family:verdana,sans-serif;font-size:small"><font
color="#ff0000">For some weird reason i've been
receiving emails from oVirt with EngineUnexpectedDown
(attached picture) on a daily basis more or less, but
the engine seems to be working fine and my vm's are up
and running normally. I've never had any issue to access
the User Interface to manage the vm's </font></div>
<div
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<div
style="font-family:verdana,sans-serif;font-size:small">Today
I run "yum update" on the nodes and realised that vdsm was
outdated, so I updated the kvm hosts and they are now ,
again, fully updated. </div>
<div
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<div
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<div style="font-family:verdana,sans-serif"><font
size="4">Reviewing
the logs It seems to be an intermittent connectivity
issue when trying to access the gluster engine storage
domain as you can see below. I don't have any network
issue in place and I'm 100% sure about it. I have
another oVirt Cluster using the same network and using a
engine storage domain on top of an iSCSI Storage Array
with no issues.</font></div>
<div
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<div style="font-family:verdana,sans-serif"><b><font
size="4">Here seems to be the
issue:</font></b></div>
<div
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<p><span>Thread-1111::<a
class="moz-txt-link-freetext"
href="INFO::2016-04-27">INFO::2016-04-27</a>
23:01:27,864::fileSD::357::Storage.StorageDomain::(validate)
sdUUID=03926733-1872-4f85-bb21-18dc320560db</span></p>
<p><span>Thread-1111::DEBUG::2016-04-27
23:01:27,865::persistentDict::234::Storage.PersistentDict::(refresh)
read lines (FileMetadataRW)=[]</span></p>
<p><span>Thread-1111::DEBUG::2016-04-27
23:01:27,865::persistentDict::252::Storage.PersistentDict::(refresh)
Empty metadata</span></p>
<p><span>Thread-1111::</span><span>ERROR</span><span>::2016-04-27
23:01:27,865::task::866::Storage.TaskManager.Task::(_setError)
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Unexpected
error</span></p>
<p><span>Traceback (most recent call
last):</span></p>
<p><span> File "/usr/share/vdsm/storage/task.py", line
873,
in _run</span></p>
<p><span> return fn(*args, **kargs)</span></p>
<p><span> File "/usr/share/vdsm/logUtils.py", line 49,
in
wrapper</span></p>
<p><span> res = f(*args, **kwargs)</span></p>
<p><span> File "/usr/share/vdsm/storage/hsm.py", line
2835,
in getStorageDomainInfo</span></p>
<p><span> dom =
self.validateSdUUID(sdUUID)</span></p>
<p><span> File "/usr/share/vdsm/storage/hsm.py", line
278,
in validateSdUUID</span></p>
<p><span> sdDom.validate()</span></p>
<p><span> File "/usr/share/vdsm/storage/fileSD.py",
line
360, in validate</span></p>
<p><span> raise
se.StorageDomainAccessError(self.sdUUID)</span></p>
<p><span>StorageDomainAccessError: Domain is either
partially accessible or entirely inaccessible:
(u'03926733-1872-4f85-bb21-18dc320560db',)</span></p>
<p><span>Thread-1111::DEBUG::2016-04-27
23:01:27,865::task::885::Storage.TaskManager.Task::(_run)
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Task._run:
d2acf575-1a60-4fa0-a5bb-cd4363636b94
('03926733-1872-4f85-bb21-18dc320560db',) {} failed -
stopping task</span></p>
<p><span>Thread-1111::DEBUG::2016-04-27
23:01:27,865::task::1246::Storage.TaskManager.Task::(stop)
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::stopping in
state preparing (force False)</span></p>
<p><span>Thread-1111::DEBUG::2016-04-27
23:01:27,865::task::993::Storage.TaskManager.Task::(_decref)
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::ref 1
aborting True</span></p>
<p><span>Thread-1111::<a
class="moz-txt-link-freetext"
href="INFO::2016-04-27">INFO::2016-04-27</a>
23:01:27,865::task::1171::Storage.TaskManager.Task::(prepare)
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::aborting:
Task is aborted: 'Domain is either partially accessible
or entirely inaccessible' - code 379</span></p>
<div
style="font-family:verdana,sans-serif;font-size:small"><span
style="font-family:arial,sans-serif">Thread-1111::DEBUG::2016-04-27
23:01:27,866::task::1176::Storage.TaskManager.Task::(prepare)
Task=`d2acf575-1a60-4fa0-a5bb-cd4363636b94`::Prepare:
aborted: Domain is either partially accessible or
entirely inaccessible</span> </div>
<div
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<div
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<div style="font-family:verdana,sans-serif"><font
size="4"><b>Question:
<font color="#ff0000">Anyone know what might be
happening?</font> I have several gluster config's,
as you can see below. All the storage domain are using
the same config's</b></font></div>
<div
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<div
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<div style="font-family:verdana,sans-serif"><b><font
size="4">More
information:</font></b></div>
<div
style="font-family:verdana,sans-serif;font-size:small"><br>
</div>
<div style="font-family:verdana,sans-serif;font-size:small">I
have the "engine" storage domain, "vmos1" storage
domain
and "master" storage domain, so everything looks
good.</div>
<div style="font-family:verdana,sans-serif">
<p style="font-size:small"><span>[root@kvm1 vdsm]#
vdsClient -s 0 getStorageDomainsList</span></p>
<p
style="font-size:small"><span>03926733-1872-4f85-bb21-18dc320560db</span></p>
<p
style="font-size:small"><span>35021ff4-fb95-43d7-92a3-f538273a3c2e</span></p>
<p
style="font-size:small"><span>e306e54e-ca98-468d-bb04-3e8900f8840c</span></p>
<p style="font-size:small"><span><br>
</span></p>
<p><span><b><font size="4">Gluster
config:</font></b></span></p>
<p style="font-size:small"><span>[root@gluster-root1
~]#
gluster volume info</span></p>
<p
style="font-size:small"><span> </span></p>
<p style="font-size:small"><span>Volume Name:
engine</span></p>
<p style="font-size:small"><span>Type:
Replicate</span></p>
<p style="font-size:small"><span>Volume ID:
64b413d2-c42e-40fd-b356-3e6975e941b0</span></p>
<p style="font-size:small"><span>Status:
Started</span></p>
<p style="font-size:small"><span>Number of Bricks: 1 x
3 =
3</span></p>
<p style="font-size:small"><span>Transport-type:
tcp</span></p>
<p
style="font-size:small"><span>Bricks:</span></p>
<p style="font-size:small"><span>Brick1:
gluster1.xyz.com:/gluster/engine/brick1</span></p>
<p style="font-size:small"><span>Brick2:
gluster2.xyz.com:/gluster/engine/brick1</span></p>
<p style="font-size:small"><span>Brick3:
gluster-root1.xyz.com:/gluster/engine/brick1</span></p>
<p style="font-size:small"><span>Options
Reconfigured:</span></p>
<p
style="font-size:small"><span>performance.cache-size:
1GB</span></p>
<p
style="font-size:small"><span>performance.write-behind-window-size:
4MB</span></p>
<p
style="font-size:small"><span>performance.write-behind:
off</span></p>
<p
style="font-size:small"><span>performance.quick-read:
off</span></p>
<p
style="font-size:small"><span>performance.read-ahead:
off</span></p>
<p style="font-size:small"><span>performance.io-cache:
off</span></p>
<p
style="font-size:small"><span>performance.stat-prefetch:
off</span></p>
<p style="font-size:small"><span>cluster.eager-lock:
enable</span></p>
<p style="font-size:small"><span>cluster.quorum-type:
auto</span></p>
<p style="font-size:small"><span>network.remote-dio:
enable</span></p>
<p
style="font-size:small"><span>cluster.server-quorum-type:
server</span></p>
<p
style="font-size:small"><span>cluster.data-self-heal-algorithm:
full</span></p>
<p
style="font-size:small"><span>performance.low-prio-threads:
32</span></p>
<p
style="font-size:small"><span>features.shard-block-size:
512MB</span></p>
<p style="font-size:small"><span>features.shard:
on</span></p>
<p style="font-size:small"><span>storage.owner-gid:
36</span></p>
<p style="font-size:small"><span>storage.owner-uid:
36</span></p>
<p style="font-size:small"><span>
</span></p>
<p
style="font-size:small"><span>performance.readdir-ahead:
on</span></p>
<p style="font-size:small"><span><br>
</span></p>
<p style="font-size:small"><span>Volume Name:
master</span></p>
<p style="font-size:small"><span>Type:
Replicate</span></p>
<p style="font-size:small"><span>Volume ID:
20164808-7bbe-4eeb-8770-d222c0e0b830</span></p>
<p style="font-size:small"><span>Status:
Started</span></p>
<p style="font-size:small"><span>Number of Bricks: 1 x
3 =
3</span></p>
<p style="font-size:small"><span>Transport-type:
tcp</span></p>
<p
style="font-size:small"><span>Bricks:</span></p>
<p style="font-size:small"><span>Brick1:
gluster1.xyz.com:/home/storage/master/brick1</span></p>
<p style="font-size:small"><span>Brick2:
gluster2.xyz.com:/home/storage/master/brick1</span></p>
<p style="font-size:small"><span>Brick3:
gluster-root1.xyz.com:/home/storage/master/brick1</span></p>
<p style="font-size:small"><span>Options
Reconfigured:</span></p>
<p
style="font-size:small"><span>performance.readdir-ahead:
on</span></p>
<p
style="font-size:small"><span>performance.quick-read:
off</span></p>
<p
style="font-size:small"><span>performance.read-ahead:
off</span></p>
<p style="font-size:small"><span>performance.io-cache:
off</span></p>
<p
style="font-size:small"><span>performance.stat-prefetch:
off</span></p>
<p style="font-size:small"><span>cluster.eager-lock:
enable</span></p>
<p style="font-size:small"><span>network.remote-dio:
enable</span></p>
<p style="font-size:small"><span>cluster.quorum-type:
auto</span></p>
<p
style="font-size:small"><span>cluster.server-quorum-type:
server</span></p>
<p style="font-size:small"><span>storage.owner-uid:
36</span></p>
<p style="font-size:small"><span>storage.owner-gid:
36</span></p>
<p style="font-size:small"><span>features.shard:
on</span></p>
<p
style="font-size:small"><span>features.shard-block-size:
512MB</span></p>
<p
style="font-size:small"><span>performance.low-prio-threads:
32</span></p>
<p
style="font-size:small"><span>cluster.data-self-heal-algorithm:
full</span></p>
<p
style="font-size:small"><span>performance.write-behind:
off</span></p>
<p
style="font-size:small"><span>performance.write-behind-window-size:
4MB</span></p>
<p style="font-size:small"><span>
</span></p>
<p
style="font-size:small"><span>performance.cache-size:
1GB</span></p>
<p style="font-size:small"><span><br>
</span></p>
<p style="font-size:small"><span>Volume Name:
vmos1</span></p>
<p style="font-size:small"><span>Type:
Replicate</span></p>
<p style="font-size:small"><span>Volume ID:
ea8fb50e-7bc8-4de3-b775-f3976b6b4f13</span></p>
<p style="font-size:small"><span>Status:
Started</span></p>
<p style="font-size:small"><span>Number of Bricks: 1 x
3 =
3</span></p>
<p style="font-size:small"><span>Transport-type:
tcp</span></p>
<p
style="font-size:small"><span>Bricks:</span></p>
<p style="font-size:small"><span>Brick1:
gluster1.xyz.com:/gluster/vmos1/brick1</span></p>
<p style="font-size:small"><span>Brick2:
gluster2.xyz.com:/gluster/vmos1/brick1</span></p>
<p style="font-size:small"><span>Brick3:
gluster-root1.xyz.com:/gluster/vmos1/brick1</span></p>
<p style="font-size:small"><span>Options
Reconfigured:</span></p>
<p style="font-size:small"><span>network.ping-timeout:
60</span></p>
<p
style="font-size:small"><span>performance.readdir-ahead:
on</span></p>
<p
style="font-size:small"><span>performance.quick-read:
off</span></p>
<p
style="font-size:small"><span>performance.read-ahead:
off</span></p>
<p style="font-size:small"><span>performance.io-cache:
off</span></p>
<p
style="font-size:small"><span>performance.stat-prefetch:
off</span></p>
<p style="font-size:small"><span>cluster.eager-lock:
enable</span></p>
<p style="font-size:small"><span>network.remote-dio:
enable</span></p>
<p style="font-size:small"><span>cluster.quorum-type:
auto</span></p>
<p
style="font-size:small"><span>cluster.server-quorum-type:
server</span></p>
<p style="font-size:small"><span>storage.owner-uid:
36</span></p>
<p style="font-size:small"><span>storage.owner-gid:
36</span></p>
<p style="font-size:small"><span>features.shard:
on</span></p>
<p
style="font-size:small"><span>features.shard-block-size:
512MB</span></p>
<p
style="font-size:small"><span>performance.low-prio-threads:
32</span></p>
<p
style="font-size:small"><span>cluster.data-self-heal-algorithm:
full</span></p>
<p
style="font-size:small"><span>performance.write-behind:
off</span></p>
<p
style="font-size:small"><span>performance.write-behind-window-size:
4MB</span></p>
<p style="font-size:small"><span>
</span></p>
<p
style="font-size:small"><span>performance.cache-size:
1GB</span></p>
<p style="font-size:small"><span><br>
</span></p>
<p style="font-size:small"><span><br>
</span></p>
<p style="font-size:small"><span>Attached goes all
the
logs...</span></p>
<p style="font-size:small"><span><br>
</span></p>
<p style="font-size:small"><span><br>
</span></p>
<p
style="font-size:small"><span>Thanks</span></p>
<span class="HOEnZb"><font color="#888888">
<p
style="font-size:small"><span>-Luiz</span></p>
</font></span></div>
</div>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated"
href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a class="moz-txt-link-freetext"
href="http://lists.ovirt.org/mailman/listinfo/users">http://...
</pre>
</blockquote>
<br>
</body>
</html>
--------------090307040007000209030508--