On Wed, Apr 27, 2022 at 3:54 PM José Ferradeira via Users
<users(a)ovirt.org> wrote:
After upgrade to 4.5 host cannot be activated because cannot connect to data domain.
I have a data domain in NFS (master) and a GlusterFS. It complains about the Gluster
domain:
The error message for connection node1-teste.acloud.pt:/data1 returned by VDSM was: XML
error
# rpm -qa|grep glusterfs*
glusterfs-10.1-1.el8s.x86_64
glusterfs-selinux-2.0.1-1.el8s.noarch
glusterfs-client-xlators-10.1-1.el8s.x86_64
glusterfs-events-10.1-1.el8s.x86_64
libglusterfs0-10.1-1.el8s.x86_64
glusterfs-fuse-10.1-1.el8s.x86_64
glusterfs-server-10.1-1.el8s.x86_64
glusterfs-cli-10.1-1.el8s.x86_64
glusterfs-geo-replication-10.1-1.el8s.x86_64
engine log:
2022-04-27 13:35:16,118+01 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [e
be79c6] EVENT_ID: VDS_STORAGES_CONNECTION_FAILED(188), Failed to connect Host NODE1 to
the Storage Domains DATA1.
2022-04-27 13:35:16,169+01 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [e
be79c6] EVENT_ID: STORAGE_DOMAIN_ERROR(996), The error message for connection
node1-teste.acloud.pt:/data1 returned by VDSM was: XML error
2022-04-27 13:35:16,170+01 ERROR
[org.ovirt.engine.core.bll.storage.connection.FileStorageHelper]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [ebe79c6
] The connection with details 'node1-teste.acloud.pt:/data1' failed because of
error code '4106' and error message is: xml error
vdsm log:
2022-04-27 13:40:07,125+0100 ERROR (jsonrpc/4) [storage.storageServer] Could not connect
to storage server (storageServer:92)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line
90, in connect_all
con.connect()
File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line
233, in connect
self.validate()
File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line
365, in validate
if not self.volinfo:
File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line
352, in volinfo
self._volinfo = self._get_gluster_volinfo()
File "/usr/lib/python3.6/site-packages/vdsm/storage/storageServer.py", line
405, in _get_gluster_volinfo
self._volfileserver)
File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 56, in
__call__
return callMethod()
File "/usr/lib/python3.6/site-packages/vdsm/common/supervdsm.py", line 54, in
<lambda>
**kwargs)
File "<string>", line 2, in glusterVolumeInfo
File "/usr/lib64/python3.6/multiprocessing/managers.py", line 772, in
_callmethod
raise convert_to_error(kind, result)
vdsm.gluster.exception.GlusterXmlErrorException: XML error: rc=0 out=()
err=[b'<cliOutput>\n <opRet>0</opRet>\n
<opErrno>0</opErrno>\n <opErrstr />\n <volInfo>\n
<volumes>\
n <volume>\n <name>data1</name>\n
<id>d7eb2c38-2707-4774-9873-a7303d024669</id>\n
<status>1</status>\n <statusStr>Started</statusStr>\n
<sn
apshotCount>0</snapshotCount>\n <brickCount>2</brickCount>\n
<distCount>2</distCount>\n
<replicaCount>1</replicaCount>\n
<arbiterCount>0</arbiterCount>
\n <disperseCount>0</disperseCount>\n
<redundancyCount>0</redundancyCount>\n <type>0</type>\n
<typeStr>Distribute</typeStr>\n <transport>0</tran
sport>\n <bricks>\n <brick
uuid="08c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b">node1-teste.acloud.pt:/home/brick1<name>node1-teste.acloud.pt:/home/brick1</name><hostUuid>0
8c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b</hostUuid><isArbiter>0</isArbiter></brick>\n
<brick
uuid="08c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b">node1-teste.acloud.pt:/brick2<name>nod
e1-teste.acloud.pt:/brick2</name><hostUuid>08c7ba5f-9aca-49c5-abfd-8a3e42dd8c0b</hostUuid><isArbiter>0</isArbiter></brick>\n
</bricks>\n <optCount>23</optCount>\n
<options>\n <option>\n
<name>nfs.disable</name>\n <value>on</value>\n
</option>\n <option>\n <name>transport.addre
ss-family</name>\n <value>inet</value>\n
</option>\n <option>\n
<name>storage.fips-mode-rchecksum</name>\n
<value>on</value>\n
</option>\n <option>\n
<name>storage.owner-uid</name>\n <value>36</value>\n
</option>\n <option>\n <name>storag
e.owner-gid</name>\n <value>36</value>\n
</option>\n <option>\n
<name>cluster.min-free-disk</name>\n
<value>5%</value>\n
</option>\n <option>\n
<name>performance.quick-read</name>\n
<value>off</value>\n </option>\n <option>\n
<name>perfor
mance.read-ahead</name>\n <value>off</value>\n
</option>\n <option>\n
<name>performance.io-cache</name>\n
<value>off</value>\n
</option>\n <option>\n
<name>performance.low-prio-threads</name>\n
<value>32</value>\n </option>\n <option>\n
<
name>network.remote-dio</name>\n <value>enable</value>\n
</option>\n <option>\n
<name>cluster.eager-lock</name>\n <value>enable<
/value>\n </option>\n <option>\n
<name>cluster.quorum-type</name>\n
<value>auto</value>\n </option>\n <option>\n
<name>cluster.server-quorum-type</name>\n
<value>server</value>\n </option>\n <option>\n
<name>cluster.data-self-heal-algorithm</name>\n
<value>full</value>\n </option>\n
<option>\n <name>cluster.locking-scheme</name>\n
<value>granular</value>\n </option>
\n <option>\n
<name>cluster.shd-wait-qlength</name>\n
<value>10000</value>\n </option>\n <option>\n
<name>features.shar
d</name>\n <value>off</value>\n </option>\n
<option>\n <name>user.cifs</name>\n
<value>off</value>\n </option>\n
<option>\n <name>cluster.choose-local</name>\n
<value>off</value>\n </option>\n <option>\n
<name>client.event-threads</name>\
n <value>4</value>\n </option>\n
<option>\n <name>server.event-threads</name>\n
<value>4</value>\n </option>\n
<option>\n <name>performance.client-io-threads</name>\n
<value>on</value>\n </option>\n </options>\n
</volume>\n <count>1</count>\
n </volumes>\n </volInfo>\n</cliOutput>']
2022-04-27 13:40:07,125+0100 INFO (jsonrpc/4) [storage.storagedomaincache] Invalidating
storage domain cache (sdc:74)
2022-04-27 13:40:07,125+0100 INFO (jsonrpc/4) [vdsm.api] FINISH connectStorageServer
return={'statuslist': [{'id':
'dede3145-651a-4b01-b8d2-82bff8670696', 'status': 4106}]} from=
::ffff:192.168.5.165,42132, flow_id=4c170005,
task_id=cec6f36f-46a4-462c-9d0a-feb8d814b465 (api:54)
2022-04-27 13:40:07,410+0100 INFO (jsonrpc/5) [api.host] START getAllVmStats()
from=::ffff:192.168.5.165,42132 (api:48)
2022-04-27 13:40:07,411+0100 INFO (jsonrpc/5) [api.host] FINISH getAllVmStats
return={'status': {'code': 0, 'message': 'Done'},
'statsList': (suppressed)} from=::ffff:192.168.5.1
65,42132 (api:54)
2022-04-27 13:40:07,785+0100 INFO (jsonrpc/7) [api.host] START getStats()
from=::ffff:192.168.5.165,42132 (api:48)
2022-04-27 13:40:07,797+0100 INFO (jsonrpc/7) [vdsm.api] START repoStats(domains=())
from=::ffff:192.168.5.165,42132, task_id=4fa4e8c4-7c65-499a-827e-8ae153aa875e (api:48)
2022-04-27 13:40:07,797+0100 INFO (jsonrpc/7) [vdsm.api] FINISH repoStats return={}
from=::ffff:192.168.5.165,42132, task_id=4fa4e8c4-7c65-499a-827e-8ae153aa875e (api:54)
2022-04-27 13:40:07,797+0100 INFO (jsonrpc/7) [vdsm.api] START multipath_health()
from=::ffff:192.168.5.165,42132, task_id=c6390f2a-845b-420b-a833-475605a24078 (api:48)
2022-04-27 13:40:07,797+0100 INFO (jsonrpc/7) [vdsm.api] FINISH multipath_health
return={} from=::ffff:192.168.5.165,42132, task_id=c6390f2a-845b-420b-a833-475605a24078
(api:54)
2022-04-27 13:40:07,802+0100 INFO (jsonrpc/7) [api.host] FINISH getStats
return={'status': {'code': 0, 'message': 'Done'},
'info': (suppressed)} from=::ffff:192.168.5.165,42132 (
api:54)
2022-04-27 13:40:11,980+0100 INFO (jsonrpc/6) [api.host] START getAllVmStats()
from=::1,37040 (api:48)
2022-04-27 13:40:11,980+0100 INFO (jsonrpc/6) [api.host] FINISH getAllVmStats
return={'status': {'code': 0, 'message': 'Done'},
'statsList': (suppressed)} from=::1,37040 (api:54)
2022-04-27 13:40:12,365+0100 INFO (periodic/2) [vdsm.api] START repoStats(domains=())
from=internal, task_id=f5084096-e5c5-4ca8-9c47-a92fa5790484 (api:48)
2022-04-27 13:40:12,365+0100 INFO (periodic/2) [vdsm.api] FINISH repoStats return={}
from=internal, task_id=f5084096-e5c5-4ca8-9c47-a92fa5790484 (api:54)
2022-04-27 13:40:22,417+0100 INFO (jsonrpc/0) [api.host] START getAllVmStats()
from=::ffff:192.168.5.165,42132 (api:48)
2022-04-27 13:40:22,417+0100 INFO (jsonrpc/0) [api.host] FINISH getAllVmStats
return={'status': {'code': 0, 'message': 'Done'},
'statsList': (suppressed)} from=::ffff:192.168.5.1
65,42132 (api:54)
2022-04-27 13:40:22,805+0100 INFO (jsonrpc/1) [api.host] START getStats()
from=::ffff:192.168.5.165,42132 (api:48)
2022-04-27 13:40:22,816+0100 INFO (jsonrpc/1) [vdsm.api] START repoStats(domains=())
from=::ffff:192.168.5.165,42132, task_id=a9fb939c-ea1a-4116-a22f-d14a99e6eada (api:48)
2022-04-27 13:40:22,816+0100 INFO (jsonrpc/1) [vdsm.api] FINISH repoStats return={}
from=::ffff:192.168.5.165,42132, task_id=a9fb939c-ea1a-4116-a22f-d14a99e6eada (api:54)
2022-04-27 13:40:22,816+0100 INFO (jsonrpc/1) [vdsm.api] START multipath_health()
from=::ffff:192.168.5.165,42132, task_id=5eee2f63-2631-446a-98dd-4947f9499f8f (api:48)
2022-04-27 13:40:22,816+0100 INFO (jsonrpc/1) [vdsm.api] FINISH multipath_health
return={} from=::ffff:192.168.5.165,42132, task_id=5eee2f63-2631-446a-98dd-4947f9499f8f
(api:54)
2022-04-27 13:40:22,822+0100 INFO (jsonrpc/1) [api.host] FINISH getStats
return={'status': {'code': 0, 'message': 'Done'},
'info': (suppressed)} from=::ffff:192.168.5.165,42132 (
api:54)
Please file upstream issue:
https://github.com/oVirt/vdsm/issues
And include info about your gluster server rpm packages.
I hope that Ritesh can help with this.
Nir