When you set a host to maintenance from oVirt API/UI, one of the tasks is to umount any shared storage (incluing the NFS you got). Then rebooting should work like a charm.
Why did you reboot without putting the node in maintenance ?
P.S.: Do not confuse rebooting with fencing - the latter kills the node ungracefully in order to safely start HA VMs on another node.
Best Regards,
Strahil Nikolov
В вторник, 27 октомври 2020 г., 10:27:01 Гринуич+2, lifuqiong@sunyainfo.com <lifuqiong@sunyainfo.com> написа:
Hi everyone:
Description of problem:
When exec "reboot" or "shutdown -h 0" cmd on vdsm server, the vdsm server will reboot or shutdown more than 30 minutes. the screen shows '[FAILED] Failed unmouting /rhev/data-center/mnt/172.18.81.41:_home_nfs_data'.
other messages may be useful: [] watchdog: watchdog0: watchdog did not stop! []systemd-shutdown[5594]: Failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
[]systemd-shutdown[5595]: Failed to remount '/' read-only: Device or resource busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
Version-Release number of selected component (if applicable):
Software Version:4.2.8.2-1.el7
OS: CentOS Linux release 7.5.1804 (Core)
How reproducible:
100%
Steps to Reproduce:
1. my test enviroment is one Ovirt engine(172.17.81.17) with 4 vdsm servers, exec "reboot" cmd in one of the vdsm servers(172.17.99.105), the server will reboot more than 30 minutes.ovirt-engine : 172.17.81.17/16
vdsm: 172.17.99.105/16
nfs server: 172.17.81.14/16Actual results:
As above. the server will reboot more than 30 minutes
Expected results:
the server will reboot in a short time.
What I have done:
I have capture packet in nfs server while vdsm is rebooting, I found vdsm is always sending nfs packet to nfs server circularly as follows:this is some log files while I reboot vdsm 172.17.99.105 in 2020-10-26 22:12:34. Some conclusion is:
1. the vdsm.log said the vdsm 2020-10-26 22:12:34,461+0800 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/metadata
2. the sanlock.log said 2020-10-26 22:13:05 1454 [3301]: s1 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/ids
3. there is nothing message import to this issue.The logs is in the attachment.I'm very appreciate if anyone can help me. Thank you.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/C2GATAD35SUVWTIF3W3J3DXC53AANYC7/
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/T3ETYUH2QDB7ZVUNWLATSVSPU7TIU76I/