Hi guys,
one strange thing happens, cannot understand it.
Today i installed last version 4.4.7 on centos stream, replica 3, via cockpit, internal
lan for sync. Seems all ok, if reboot 3 nodes together aslo ok. But if i reboot 1 node (
and declare node rebooted through web ui) the bricks (engine and data) remain down on that
node. All is clear on logs without explicit indication of the situation except
"Server quorum regained for volume data. Starting local bricks" on glusterd.log.
After "systemctl restart glusterd" bricks gone down on another node. After
"systemctl restart glusterd" on that node all is ok.
Where should i look?
some errors of log that i found:
--------------------------------------------- bdocker.log:
statusStorageThread::ERROR::2021-07-12
22:17:02,899::storage_broker::223::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(put_stats)
Failed to write metadata for ho$
Traceback (most recent call last):
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 215, in put_stats
f = os.open(path, direct_flag | os.O_WRONLY | os.O_SYNC)
FileNotFoundError: [Errno 2] No such file or directory:
'/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$
StatusStorageThread::ERROR::2021-07-12
22:17:02,899::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run)
Failed to update state.
Traceback (most recent call last):
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 215, in put_stats
f = os.open(path, direct_flag | os.O_WRONLY | os.O_SYNC)
FileNotFoundError: [Errno 2] No such file or directory:
'/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
line 86, in run
entry.data
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 225, in put_stats
.format(str(e)))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: failed to write metadata: [Errno 2] No
such file or directory:
'/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7e$
StatusStorageThread::ERROR::2021-07-12
22:17:02,899::storage_broker::223::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(put_stats)
Failed to write metadata for ho$
Traceback (most recent call last):
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 215, in put_stats
f = os.open(path, direct_flag | os.O_WRONLY | os.O_SYNC)
FileNotFoundError: [Errno 2] No such file or directory:
'/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$
StatusStorageThread::ERROR::2021-07-12
22:17:02,899::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run)
Failed to update state.
Traceback (most recent call last):
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 215, in put_stats
f = os.open(path, direct_flag | os.O_WRONLY | os.O_SYNC)
FileNotFoundError: [Errno 2] No such file or directory:
'/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
line 86, in run
entry.data
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 225, in put_stats
.format(str(e)))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: failed to write metadata: [Errno 2] No
such file or directory:
'/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7e$
StatusStorageThread::ERROR::2021-07-12
22:17:02,902::status_broker::70::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(trigger_restart)
Trying to restart the $
StatusStorageThread::ERROR::2021-07-12
22:17:02,902::storage_broker::173::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats)
Failed to read metadata fro$
Traceback (most recent call last):
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 155, in get_raw_stats
f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
FileNotFoundError: [Errno 2] No such file or directory:
'/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$
StatusStorageThread::ERROR::2021-07-12
22:17:02,902::status_broker::98::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run)
Failed to read state.
Traceback (most recent call last):
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 155, in get_raw_stats
f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
FileNotFoundError: [Errno 2] No such file or directory:
'/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
line 94, in run
self._storage_broker.get_raw_stats()
File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 175, in get_raw_stats
.format(str(e)))
-------------------------------------------------------supervdsm.log:
ainProcess|jsonrpc/0::DEBUG::2021-07-12 22:22:13,264::commands::211::root::(execCmd)
/usr/bin/taskset --cpu-list 0-19 /usr/bin/systemd-run --scope --slice=vdsm-glusterfs
/usr/b$
MainProcess|jsonrpc/0::DEBUG::2021-07-12 22:22:15,083::commands::224::root::(execCmd)
FAILED: <err> = b'Running scope as unit:
run-r91d6411af8114090aa28933d562fa473.scope\nMount$
MainProcess|jsonrpc/0::ERROR::2021-07-12
22:22:15,083::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) Error in mount
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line 97, in
wrapper
res = func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line 135, in
mount
cgroup=cgroup)
File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line 280, in
_mount
_runcmd(cmd)
File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line 308, in
_runcmd
raise MountError(cmd, rc, out, err)
vdsm.storage.mount.MountError: Command ['/usr/bin/systemd-run', '--scope',
'--slice=vdsm-glusterfs', '/usr/bin/mount', '-t',
'glusterfs', '-o', 'backup-volfile-servers=cluster2.$
netlink/events::DEBUG::2021-07-12 22:22:15,131::concurrent::261::root::(run) FINISH thread
<Thread(netlink/events, stopped daemon 139867781396224)>
MainProcess|jsonrpc/4::DEBUG::2021-07-12
22:22:15,134::supervdsm_server::102::SuperVdsm.ServerCallback::(wrapper) return
network_caps with {'networks': {'ovirtmgmt': {'ports': [$
---------------------------------------------------------vdsm.log:
2021-07-12 22:17:08,718+0200 INFO (jsonrpc/7) [api.host] FINISH getStats
return={'status': {'code': 0, 'message': 'Done'},
'info': (suppressed)} from=::1,34946 (api:54)
2021-07-12 22:17:09,491+0200 ERROR (monitor/53b068c) [storage.Monitor] Error checking
domain 53b068c1-beb8-4048-a766-3a4e71ded624 (monitor:451)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 432, in
_checkDomainStatus
self.domain.selftest()
File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 712, in
selftest
self.oop.os.statvfs(self.domaindir)
File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py", line
244, in statvfs
return self._iop.statvfs(path)
File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 510, in
statvfs
resdict = self._sendCommand("statvfs", {"path": path},
self.timeout)
File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 479, in
_sendCommand
raise OSError(errcode, errstr)
FileNotFoundError: [Errno 2] No such file or directory
2021-07-12 22:17:09,619+0200 INFO (jsonrpc/0) [api.virt] START getStats() from=::1,34946,
vmId=9167f682-3c82-4237-93bd-53f0ad32ffba (api:48)
2021-07-12 22:17:09,620+0200 INFO (jsonrpc/0) [api] FINISH getStats error=Virtual machine
does not exist: {'vmId': '9167f682-3c82-4237-93bd-53f0ad32ffba'}
(api:129)
2021-07-12 22:17:09,620+0200 INFO (jsonrpc/0) [api.virt] FINISH getStats
return={'status': {'code': 1, 'message': "Virtual machine
does not exist: {'vmId': '9167f682-3c82-4237-$
2021-07-12 22:17:09,620+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call
VM.getStats failed (error 1) in 0.00 seconds (__init__:312)
2021-07-12 22:17:10,034+0200 INFO (jsonrpc/3) [vdsm.api] START
repoStats(domains=['53b068c1-beb8-4048-a766-3a4e71ded624']) from=::1,34946,
task_id=4e823c98-f95b-45f7-ad64-90f82$
2021-07-12 22:17:10,034+0200 INFO (jsonrpc/3) [vdsm.api] FINISH repoStats
return={'53b068c1-beb8-4048-a766-3a4e71ded624': {'code': 2001,
'lastCheck': '0.5', 'delay': '0', 'vali$
2021-07-12 22:17:10,403+0200 INFO (health) [health] LVM cache hit ratio: 12.50% (hits: 1
misses: 7) (health:131)
2021-07-12 22:17:10,472+0200 INFO (MainThread) [vds] Received signal 15, shutting down
(vdsmd:74)
2021-07-12 22:17:10,472+0200 INFO (MainThread) [root] Stopping DHCP monitor.
(dhcp_monitor:106)
2021-07-12 22:17:10,473+0200 INFO (ioprocess/11056) [IOProcessClient]
(53b068c1-beb8-4048-a766-3a4e71ded624) Poll error 16 on fd 74 (__init__:176)
2021-07-12 22:17:10,473+0200 INFO (ioprocess/11056) [IOProcessClient]
(53b068c1-beb8-4048-a766-3a4e71ded624) ioprocess was terminated by signal 15
(__init__:200)
2021-07-12 22:17:10,476+0200 INFO (ioprocess/19109) [IOProcessClient]
(e10cbd59-d32e-4b69-a4c1-d213e7bd8973) Poll error 16 on fd 75 (__init__:176)
2021-07-12 22:17:10,476+0200 INFO (ioprocess/19109) [IOProcessClient]
(e10cbd59-d32e-4b69-a4c1-d213e7bd8973) ioprocess was terminated by signal 15
(__init__:200)
2021-07-12 22:17:10,513+0200 INFO (ioprocess/44046) [IOProcess]
(e10cbd59-d32e-4b69-a4c1-d213e7bd8973) Starting ioprocess (__init__:465)
2021-07-12 22:17:10,513+0200 INFO (ioprocess/44045) [IOProcess]
(53b068c1-beb8-4048-a766-3a4e71ded624) Starting ioprocess (__init__:465)
2021-07-12 22:17:10,519+0200 WARN (periodic/0) [root] Failed to retrieve Hosted Engine HA
info: timed out (api:198)
2021-07-12 22:17:10,611+0200 ERROR (check/loop) [storage.Monitor] Error checking path
/rhev/data-center/mnt/glusterSD/cluster1.int:_data/e10cbd59-d32e-4b69-a4c1-d213e7bd8973/dom$
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 507, in
_pathChecked
delay = result.delay()
File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391, in
delay
raise exception.MiscFileReadException(self.path, self.rc, self.err)
vdsm.storage.exception.MiscFileReadException: Internal file read failure:
('/rhev/data-center/mnt/glusterSD/cluster1.int:_data/e10cbd59-d32e-4b69-a4c1-d213e7bd8973/dom_md/metada$
2021-07-12 22:17:10,860+0200 INFO (MainThread) [root] Stopping Bond monitor.
(bond_monitor:53)
Thanks in advance
Best regards.