4.4.7 gluster quorum problem

Hi guys, one strange thing happens, cannot understand it. Today i installed last version 4.4.7 on centos stream, replica 3, via cockpit, internal lan for sync. Seems all ok, if reboot 3 nodes together aslo ok. But if i reboot 1 node ( and declare node rebooted through web ui) the bricks (engine and data) remain down on that node. All is clear on logs without explicit indication of the situation except "Server quorum regained for volume data. Starting local bricks" on glusterd.log. After "systemctl restart glusterd" bricks gone down on another node. After "systemctl restart glusterd" on that node all is ok. Where should i look? some errors of log that i found: --------------------------------------------- bdocker.log: statusStorageThread::ERROR::2021-07-12 22:17:02,899::storage_broker::223::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(put_stats) Failed to write metadata for ho$ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 215, in put_stats f = os.open(path, direct_flag | os.O_WRONLY | os.O_SYNC) FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$ StatusStorageThread::ERROR::2021-07-12 22:17:02,899::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to update state. Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 215, in put_stats f = os.open(path, direct_flag | os.O_WRONLY | os.O_SYNC) FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$ During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 86, in run entry.data File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 225, in put_stats .format(str(e))) ovirt_hosted_engine_ha.lib.exceptions.RequestError: failed to write metadata: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7e$ StatusStorageThread::ERROR::2021-07-12 22:17:02,899::storage_broker::223::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(put_stats) Failed to write metadata for ho$ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 215, in put_stats f = os.open(path, direct_flag | os.O_WRONLY | os.O_SYNC) FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$ StatusStorageThread::ERROR::2021-07-12 22:17:02,899::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to update state. Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 215, in put_stats f = os.open(path, direct_flag | os.O_WRONLY | os.O_SYNC) FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$ During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 86, in run entry.data File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 225, in put_stats .format(str(e))) ovirt_hosted_engine_ha.lib.exceptions.RequestError: failed to write metadata: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7e$ StatusStorageThread::ERROR::2021-07-12 22:17:02,902::status_broker::70::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(trigger_restart) Trying to restart the $ StatusStorageThread::ERROR::2021-07-12 22:17:02,902::storage_broker::173::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats) Failed to read metadata fro$ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 155, in get_raw_stats f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$ StatusStorageThread::ERROR::2021-07-12 22:17:02,902::status_broker::98::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to read state. Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 155, in get_raw_stats f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$ During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 94, in run self._storage_broker.get_raw_stats() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 175, in get_raw_stats .format(str(e))) -------------------------------------------------------supervdsm.log: ainProcess|jsonrpc/0::DEBUG::2021-07-12 22:22:13,264::commands::211::root::(execCmd) /usr/bin/taskset --cpu-list 0-19 /usr/bin/systemd-run --scope --slice=vdsm-glusterfs /usr/b$ MainProcess|jsonrpc/0::DEBUG::2021-07-12 22:22:15,083::commands::224::root::(execCmd) FAILED: <err> = b'Running scope as unit: run-r91d6411af8114090aa28933d562fa473.scope\nMount$ MainProcess|jsonrpc/0::ERROR::2021-07-12 22:22:15,083::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) Error in mount Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line 97, in wrapper res = func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line 135, in mount cgroup=cgroup) File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line 280, in _mount _runcmd(cmd) File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line 308, in _runcmd raise MountError(cmd, rc, out, err) vdsm.storage.mount.MountError: Command ['/usr/bin/systemd-run', '--scope', '--slice=vdsm-glusterfs', '/usr/bin/mount', '-t', 'glusterfs', '-o', 'backup-volfile-servers=cluster2.$ netlink/events::DEBUG::2021-07-12 22:22:15,131::concurrent::261::root::(run) FINISH thread <Thread(netlink/events, stopped daemon 139867781396224)> MainProcess|jsonrpc/4::DEBUG::2021-07-12 22:22:15,134::supervdsm_server::102::SuperVdsm.ServerCallback::(wrapper) return network_caps with {'networks': {'ovirtmgmt': {'ports': [$ ---------------------------------------------------------vdsm.log: 2021-07-12 22:17:08,718+0200 INFO (jsonrpc/7) [api.host] FINISH getStats return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} from=::1,34946 (api:54) 2021-07-12 22:17:09,491+0200 ERROR (monitor/53b068c) [storage.Monitor] Error checking domain 53b068c1-beb8-4048-a766-3a4e71ded624 (monitor:451) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 432, in _checkDomainStatus self.domain.selftest() File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 712, in selftest self.oop.os.statvfs(self.domaindir) File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py", line 244, in statvfs return self._iop.statvfs(path) File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 510, in statvfs resdict = self._sendCommand("statvfs", {"path": path}, self.timeout) File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 479, in _sendCommand raise OSError(errcode, errstr) FileNotFoundError: [Errno 2] No such file or directory 2021-07-12 22:17:09,619+0200 INFO (jsonrpc/0) [api.virt] START getStats() from=::1,34946, vmId=9167f682-3c82-4237-93bd-53f0ad32ffba (api:48) 2021-07-12 22:17:09,620+0200 INFO (jsonrpc/0) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': '9167f682-3c82-4237-93bd-53f0ad32ffba'} (api:129) 2021-07-12 22:17:09,620+0200 INFO (jsonrpc/0) [api.virt] FINISH getStats return={'status': {'code': 1, 'message': "Virtual machine does not exist: {'vmId': '9167f682-3c82-4237-$ 2021-07-12 22:17:09,620+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.00 seconds (__init__:312) 2021-07-12 22:17:10,034+0200 INFO (jsonrpc/3) [vdsm.api] START repoStats(domains=['53b068c1-beb8-4048-a766-3a4e71ded624']) from=::1,34946, task_id=4e823c98-f95b-45f7-ad64-90f82$ 2021-07-12 22:17:10,034+0200 INFO (jsonrpc/3) [vdsm.api] FINISH repoStats return={'53b068c1-beb8-4048-a766-3a4e71ded624': {'code': 2001, 'lastCheck': '0.5', 'delay': '0', 'vali$ 2021-07-12 22:17:10,403+0200 INFO (health) [health] LVM cache hit ratio: 12.50% (hits: 1 misses: 7) (health:131) 2021-07-12 22:17:10,472+0200 INFO (MainThread) [vds] Received signal 15, shutting down (vdsmd:74) 2021-07-12 22:17:10,472+0200 INFO (MainThread) [root] Stopping DHCP monitor. (dhcp_monitor:106) 2021-07-12 22:17:10,473+0200 INFO (ioprocess/11056) [IOProcessClient] (53b068c1-beb8-4048-a766-3a4e71ded624) Poll error 16 on fd 74 (__init__:176) 2021-07-12 22:17:10,473+0200 INFO (ioprocess/11056) [IOProcessClient] (53b068c1-beb8-4048-a766-3a4e71ded624) ioprocess was terminated by signal 15 (__init__:200) 2021-07-12 22:17:10,476+0200 INFO (ioprocess/19109) [IOProcessClient] (e10cbd59-d32e-4b69-a4c1-d213e7bd8973) Poll error 16 on fd 75 (__init__:176) 2021-07-12 22:17:10,476+0200 INFO (ioprocess/19109) [IOProcessClient] (e10cbd59-d32e-4b69-a4c1-d213e7bd8973) ioprocess was terminated by signal 15 (__init__:200) 2021-07-12 22:17:10,513+0200 INFO (ioprocess/44046) [IOProcess] (e10cbd59-d32e-4b69-a4c1-d213e7bd8973) Starting ioprocess (__init__:465) 2021-07-12 22:17:10,513+0200 INFO (ioprocess/44045) [IOProcess] (53b068c1-beb8-4048-a766-3a4e71ded624) Starting ioprocess (__init__:465) 2021-07-12 22:17:10,519+0200 WARN (periodic/0) [root] Failed to retrieve Hosted Engine HA info: timed out (api:198) 2021-07-12 22:17:10,611+0200 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/glusterSD/cluster1.int:_data/e10cbd59-d32e-4b69-a4c1-d213e7bd8973/dom$ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 507, in _pathChecked delay = result.delay() File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391, in delay raise exception.MiscFileReadException(self.path, self.rc, self.err) vdsm.storage.exception.MiscFileReadException: Internal file read failure: ('/rhev/data-center/mnt/glusterSD/cluster1.int:_data/e10cbd59-d32e-4b69-a4c1-d213e7bd8973/dom_md/metada$ 2021-07-12 22:17:10,860+0200 INFO (MainThread) [root] Stopping Bond monitor. (bond_monitor:53) Thanks in advance Best regards.

After rebootof node1 -> bricks must always come up. Most probably VDO had to recover for a longer period blocking the bricks coming up on time.Investigate this issue before rebooting another host. Best Regards,Strahil Nikolov Hi guys, one strange thing happens, cannot understand it. Today i installed last version 4.4.7 on centos stream, replica 3, via cockpit, internal lan for sync. Seems all ok, if reboot 3 nodes together aslo ok. But if i reboot 1 node ( and declare node rebooted through web ui) the bricks (engine and data) remain down on that node. All is clear on logs without explicit indication of the situation except "Server quorum regained for volume data. Starting local bricks" on glusterd.log. After "systemctl restart glusterd" bricks gone down on another node. After "systemctl restart glusterd" on that node all is ok. Where should i look? some errors of log that i found: --------------------------------------------- bdocker.log: statusStorageThread::ERROR::2021-07-12 22:17:02,899::storage_broker::223::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(put_stats) Failed to write metadata for ho$ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 215, in put_stats f = os.open(path, direct_flag | os.O_WRONLY | os.O_SYNC) FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$ StatusStorageThread::ERROR::2021-07-12 22:17:02,899::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to update state. Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 215, in put_stats f = os.open(path, direct_flag | os.O_WRONLY | os.O_SYNC) FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$ During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 86, in run entry.data File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 225, in put_stats .format(str(e))) ovirt_hosted_engine_ha.lib.exceptions.RequestError: failed to write metadata: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7e$ StatusStorageThread::ERROR::2021-07-12 22:17:02,899::storage_broker::223::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(put_stats) Failed to write metadata for ho$ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 215, in put_stats f = os.open(path, direct_flag | os.O_WRONLY | os.O_SYNC) FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$ StatusStorageThread::ERROR::2021-07-12 22:17:02,899::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to update state. Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 215, in put_stats f = os.open(path, direct_flag | os.O_WRONLY | os.O_SYNC) FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$ During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 86, in run entry.data File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 225, in put_stats .format(str(e))) ovirt_hosted_engine_ha.lib.exceptions.RequestError: failed to write metadata: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7e$ StatusStorageThread::ERROR::2021-07-12 22:17:02,902::status_broker::70::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(trigger_restart) Trying to restart the $ StatusStorageThread::ERROR::2021-07-12 22:17:02,902::storage_broker::173::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats) Failed to read metadata fro$ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 155, in get_raw_stats f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$ StatusStorageThread::ERROR::2021-07-12 22:17:02,902::status_broker::98::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to read state. Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 155, in get_raw_stats f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/53b068c1-beb8-4048-a766-3a4e71ded624/d3df7eb6-d453-439a-8436-d3694d4b5179/de18b2cc-a4e1-4afc-9b5a-6063$ During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 94, in run self._storage_broker.get_raw_stats() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 175, in get_raw_stats .format(str(e))) -------------------------------------------------------supervdsm.log: ainProcess|jsonrpc/0::DEBUG::2021-07-12 22:22:13,264::commands::211::root::(execCmd) /usr/bin/taskset --cpu-list 0-19 /usr/bin/systemd-run --scope --slice=vdsm-glusterfs /usr/b$ MainProcess|jsonrpc/0::DEBUG::2021-07-12 22:22:15,083::commands::224::root::(execCmd) FAILED: <err> = b'Running scope as unit: run-r91d6411af8114090aa28933d562fa473.scope\nMount$ MainProcess|jsonrpc/0::ERROR::2021-07-12 22:22:15,083::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) Error in mount Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line 97, in wrapper res = func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/supervdsm_server.py", line 135, in mount cgroup=cgroup) File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line 280, in _mount _runcmd(cmd) File "/usr/lib/python3.6/site-packages/vdsm/storage/mount.py", line 308, in _runcmd raise MountError(cmd, rc, out, err) vdsm.storage.mount.MountError: Command ['/usr/bin/systemd-run', '--scope', '--slice=vdsm-glusterfs', '/usr/bin/mount', '-t', 'glusterfs', '-o', 'backup-volfile-servers=cluster2.$ netlink/events::DEBUG::2021-07-12 22:22:15,131::concurrent::261::root::(run) FINISH thread <Thread(netlink/events, stopped daemon 139867781396224)> MainProcess|jsonrpc/4::DEBUG::2021-07-12 22:22:15,134::supervdsm_server::102::SuperVdsm.ServerCallback::(wrapper) return network_caps with {'networks': {'ovirtmgmt': {'ports': [$ ---------------------------------------------------------vdsm.log: 2021-07-12 22:17:08,718+0200 INFO (jsonrpc/7) [api.host] FINISH getStats return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} from=::1,34946 (api:54) 2021-07-12 22:17:09,491+0200 ERROR (monitor/53b068c) [storage.Monitor] Error checking domain 53b068c1-beb8-4048-a766-3a4e71ded624 (monitor:451) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 432, in _checkDomainStatus self.domain.selftest() File "/usr/lib/python3.6/site-packages/vdsm/storage/fileSD.py", line 712, in selftest self.oop.os.statvfs(self.domaindir) File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py", line 244, in statvfs return self._iop.statvfs(path) File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 510, in statvfs resdict = self._sendCommand("statvfs", {"path": path}, self.timeout) File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 479, in _sendCommand raise OSError(errcode, errstr) FileNotFoundError: [Errno 2] No such file or directory 2021-07-12 22:17:09,619+0200 INFO (jsonrpc/0) [api.virt] START getStats() from=::1,34946, vmId=9167f682-3c82-4237-93bd-53f0ad32ffba (api:48) 2021-07-12 22:17:09,620+0200 INFO (jsonrpc/0) [api] FINISH getStats error=Virtual machine does not exist: {'vmId': '9167f682-3c82-4237-93bd-53f0ad32ffba'} (api:129) 2021-07-12 22:17:09,620+0200 INFO (jsonrpc/0) [api.virt] FINISH getStats return={'status': {'code': 1, 'message': "Virtual machine does not exist: {'vmId': '9167f682-3c82-4237-$ 2021-07-12 22:17:09,620+0200 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.00 seconds (__init__:312) 2021-07-12 22:17:10,034+0200 INFO (jsonrpc/3) [vdsm.api] START repoStats(domains=['53b068c1-beb8-4048-a766-3a4e71ded624']) from=::1,34946, task_id=4e823c98-f95b-45f7-ad64-90f82$ 2021-07-12 22:17:10,034+0200 INFO (jsonrpc/3) [vdsm.api] FINISH repoStats return={'53b068c1-beb8-4048-a766-3a4e71ded624': {'code': 2001, 'lastCheck': '0.5', 'delay': '0', 'vali$ 2021-07-12 22:17:10,403+0200 INFO (health) [health] LVM cache hit ratio: 12.50% (hits: 1 misses: 7) (health:131) 2021-07-12 22:17:10,472+0200 INFO (MainThread) [vds] Received signal 15, shutting down (vdsmd:74) 2021-07-12 22:17:10,472+0200 INFO (MainThread) [root] Stopping DHCP monitor. (dhcp_monitor:106) 2021-07-12 22:17:10,473+0200 INFO (ioprocess/11056) [IOProcessClient] (53b068c1-beb8-4048-a766-3a4e71ded624) Poll error 16 on fd 74 (__init__:176) 2021-07-12 22:17:10,473+0200 INFO (ioprocess/11056) [IOProcessClient] (53b068c1-beb8-4048-a766-3a4e71ded624) ioprocess was terminated by signal 15 (__init__:200) 2021-07-12 22:17:10,476+0200 INFO (ioprocess/19109) [IOProcessClient] (e10cbd59-d32e-4b69-a4c1-d213e7bd8973) Poll error 16 on fd 75 (__init__:176) 2021-07-12 22:17:10,476+0200 INFO (ioprocess/19109) [IOProcessClient] (e10cbd59-d32e-4b69-a4c1-d213e7bd8973) ioprocess was terminated by signal 15 (__init__:200) 2021-07-12 22:17:10,513+0200 INFO (ioprocess/44046) [IOProcess] (e10cbd59-d32e-4b69-a4c1-d213e7bd8973) Starting ioprocess (__init__:465) 2021-07-12 22:17:10,513+0200 INFO (ioprocess/44045) [IOProcess] (53b068c1-beb8-4048-a766-3a4e71ded624) Starting ioprocess (__init__:465) 2021-07-12 22:17:10,519+0200 WARN (periodic/0) [root] Failed to retrieve Hosted Engine HA info: timed out (api:198) 2021-07-12 22:17:10,611+0200 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/glusterSD/cluster1.int:_data/e10cbd59-d32e-4b69-a4c1-d213e7bd8973/dom$ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 507, in _pathChecked delay = result.delay() File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 391, in delay raise exception.MiscFileReadException(self.path, self.rc, self.err) vdsm.storage.exception.MiscFileReadException: Internal file read failure: ('/rhev/data-center/mnt/glusterSD/cluster1.int:_data/e10cbd59-d32e-4b69-a4c1-d213e7bd8973/dom_md/metada$ 2021-07-12 22:17:10,860+0200 INFO (MainThread) [root] Stopping Bond monitor. (bond_monitor:53) Thanks in advance Best regards. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/PLVNFRCZJDLF2T...

Hi Strahil, installed other replica 3, same problem: Broadcast message from systemd-journald@cluster1.dus (Tue 2021-07-20 16:56:24 CEST): gluster_bricks-engine-engine[6051]: [2021-07-20 14:56:24.985428] M [MSGID: 113075] [posix-helpers.c:2214:posix_health_check_thread_proc] 0-engine-posix: health-check failed, going down Broadcast message from systemd-journald@cluster1.dus (Tue 2021-07-20 16:56:24 CEST): gluster_bricks-engine-engine[6051]: [2021-07-20 14:56:24.985672] M [MSGID: 113075] [posix-helpers.c:2232:posix_health_check_thread_proc] 0-engine-posix: still alive! -> SIGTERM Message from syslogd@cluster1 at Jul 20 16:56:24 ... gluster_bricks-engine-engine[6051]:[2021-07-20 14:56:24.985428] M [MSGID: 113075] [posix-helpers.c:2214:posix_health_check_thread_proc] 0-engine-posix: health-check failed, going down Message from syslogd@cluster1 at Jul 20 16:56:24 ... gluster_bricks-engine-engine[6051]:[2021-07-20 14:56:24.985672] M [MSGID: 113075] [posix-helpers.c:2232:posix_health_check_thread_proc] 0-engine-posix: still alive! -> SIGTERM grep "aio_read" /var/log/glusterfs/bricks/* /var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-20 14:56:24.985199] W [MSGID: 113075] [posix-helpers.c:2135:posix_fs_health_check] 0-engine- Seems to be this one: https://github.com/gluster/glusterfs/issues/1168 https://www.mail-archive.com/gluster-users@gluster.org/msg36948.html Try to set storage.health-check-timeout to 30 but nothing: all 3 briks are red for engine and data. But VM's started correctly. Cockpit say that all bricks are up and ok. I think the monitor service and refresh data os gluster status is a problem but do not know how ovirt controll it. Any suggestion? Thanks in advance best regards

Hi Strahil, seems that set storage.health-check-timeout to 0 in gluster volume config resolve this issue. I use configuration with dell server, controller perc 740 with battery and cache internal, and disable disk cache completely. Maby this configuration is a problem. Now i have another warning with red exclamation point on one of the 3 hosts and message:"Unavaliable dua to HA score". I know what does it mean but do not wnow how to resolve it. Is these things are related? Thanks again.

Most probably your engine failed on that host. What is the output of 'hosted-engine --vm-status' ? If for some reason the engine died there and now the system is back OK, you can always set the Hypervisour in maintenance mode (via hosted-engine) and after a minute - bring it back online (again with hosted-engine). Best Regards, Strahil Nikolov В сряда, 21 юли 2021 г., 19:54:41 ч. Гринуич+3, radchenko.anatoliy@gmail.com <radchenko.anatoliy@gmail.com> написа: Hi Strahil, seems that set storage.health-check-timeout to 0 in gluster volume config resolve this issue. I use configuration with dell server, controller perc 740 with battery and cache internal, and disable disk cache completely. Maby this configuration is a problem. Now i have another warning with red exclamation point on one of the 3 hosts and message:"Unavaliable dua to HA score". I know what does it mean but do not wnow how to resolve it. Is these things are related? Thanks again. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RYKPSVPBU4LQGH...
participants (2)
-
radchenko.anatoliy@gmail.com
-
Strahil Nikolov