Hi, Strahil,
    Thank you for your reply.
    I've try setting host to maintenance and the host reboot immediately, What does vdsm do when setting host to maintenance? Thank you
   
Best Regards
Mark Lee

From: Strahil Nikolov via Users
Date: 2020-10-27 23:44
To: users; lifuqiong@sunyainfo.com
Subject: [ovirt-users] Re: vdsm with NFS storage reboot or shutdown more than 15 minutes. with error failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
When you set a host to maintenance from oVirt API/UI, one of the tasks is to umount any shared storage (incluing the NFS you got). Then rebooting should work like a charm.
 
Why did you reboot without putting the node in maintenance ?
 
P.S.: Do not confuse rebooting with fencing - the latter kills the node ungracefully in order to safely start HA VMs on another node.
 
Best Regards,
Strahil Nikolov
 
 
 
 
 
 
В вторник, 27 октомври 2020 г., 10:27:01 Гринуич+2, lifuqiong@sunyainfo.com <lifuqiong@sunyainfo.com> написа:
 
 
 
 
 
 
 
Hi everyone:    
Description of problem:
 
    When exec "reboot" or "shutdown -h 0" cmd on vdsm server, the vdsm server will reboot or shutdown more than 30 minutes. the screen shows '[FAILED] Failed unmouting /rhev/data-center/mnt/172.18.81.41:_home_nfs_data'.
other messages may be useful: [] watchdog: watchdog0: watchdog did not stop! []systemd-shutdown[5594]: Failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
[]systemd-shutdown[5595]: Failed to remount '/' read-only: Device or resource busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
 
Version-Release number of selected component (if applicable):
Software Version:4.2.8.2-1.el7
OS: CentOS Linux release 7.5.1804 (Core)
How reproducible:
100%
Steps to Reproduce:
1. my test enviroment is one Ovirt engine(172.17.81.17) with 4 vdsm servers, exec "reboot" cmd in one of the vdsm servers(172.17.99.105), the server will reboot more than 30 minutes.ovirt-engine : 172.17.81.17/16
vdsm: 172.17.99.105/16
nfs server: 172.17.81.14/16Actual results:
As above. the server will reboot more than 30 minutes
Expected results:
the server will reboot in a short time.
What I have done:
I have capture packet in nfs server while vdsm is rebooting, I found vdsm is always sending nfs packet to nfs server circularly as follows:this is some log files while I reboot vdsm 172.17.99.105 in 2020-10-26 22:12:34. Some conclusion is:
1. the vdsm.log said the vdsm 2020-10-26 22:12:34,461+0800 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/metadata
2. the sanlock.log said 2020-10-26 22:13:05 1454 [3301]: s1 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/ids
3. there is nothing message import to this issue.The logs is in the attachment.I'm very appreciate if anyone can help me. Thank you.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/C2GATAD35SUVWTIF3W3J3DXC53AANYC7/
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/T3ETYUH2QDB7ZVUNWLATSVSPU7TIU76I/