vdsm with NFS storage reboot or shutdown more than 15 minutes. with error failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
by lifuqiong@sunyainfo.com
Hi everyone:
I met problem as follows:
Description of problem:
When exec "reboot" or "shutdown -h 0" cmd on vdsm server, the vdsm server will reboot or shutdown more than 30 minutes. the screen shows '[FAILED] Failed unmouting /rhev/data-center/mnt/172.18.81.41:_home_nfs_data'.
other messages may be useful:
This message was shown in the screen.
[] watchdog: watchdog0: watchdog did not stop! []systemd-shutdown[5594]: Failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
[]systemd-shutdown[5595]: Failed to remount '/' read-only: Device or resource busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
Version-Release number of selected component (if applicable):
Software Version:4.2.8.2-1.el7
OS: CentOS Linux release 7.5.1804 (Core)How reproducible:
100%
Steps to Reproduce:
1. my test enviroment is one Ovirt engine(172.17.81.17) with 4 vdsm servers, exec "reboot" cmd in one of the vdsm servers(172.17.99.105), the server will reboot more than 30 minutes.ovirt-engine : 172.17.81.17/16
vdsm: 172.17.99.105/16
nfs server: 172.17.81.14/16Actual results:
As above. the server will reboot more than 30 minutes
Expected results:
the server will reboot in a short time.
What I have done:
I have capture packet in nfs server while vdsm is rebooting, I found vdsm server keeps sending nfs packet to nfs server circularly ;there are some log files while I reboot vdsm 172.17.99.105 in 2020-10-26 22:12:34. Some conclusion is:
1. the vdsm.log said the vdsm 2020-10-26 22:12:34,461+0800 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/metadata
2. the sanlock.log said 2020-10-26 22:13:05 1454 [3301]: s1 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/ids
3. there is nothing message import to this issue.The logs is in the attachment.I'm very appreciate if anyone can help me. Thank you.Your Sincerely,Mark Lee
4 years
vdsm with NFS storage reboot or shutdown more than 15 minutes. with error failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
by lifuqiong@sunyainfo.com
Hi everyone:
I met a problem as follows:
Description of problem:
When exec "reboot" or "shutdown -h 0" cmd on vdsm server, the vdsm server will reboot or shutdown more than 30 minutes. the screen shows '[FAILED] Failed unmouting /rhev/data-center/mnt/172.18.81.41:_home_nfs_data'.
other messages may be useful:
This message was shown in the screen.
[] watchdog: watchdog0: watchdog did not stop! []systemd-shutdown[5594]: Failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
[]systemd-shutdown[5595]: Failed to remount '/' read-only: Device or resource busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
Version-Release number of selected component (if applicable):
Software Version:4.2.8.2-1.el7
OS: CentOS Linux release 7.5.1804 (Core)How reproducible:
100%
Steps to Reproduce:
1. my test enviroment is one Ovirt engine(172.17.81.17) with 4 vdsm servers, exec "reboot" cmd in one of the vdsm servers(172.17.99.105), the server will reboot more than 30 minutes.ovirt-engine : 172.17.81.17/16
vdsm: 172.17.99.105/16
nfs server: 172.17.81.14/16Actual results:
As above. the server will reboot more than 30 minutes
Expected results:
the server will reboot in a short time.
What I have done:
I have capture packet in nfs server while vdsm is rebooting, I found vdsm is always sending nfs packet to nfs server circularly as follows:this is some log files while I reboot vdsm 172.17.99.105 in 2020-10-26 22:12:34. Some conclusion is:
1. the vdsm.log said the vdsm 2020-10-26 22:12:34,461+0800 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/metadata
2. the sanlock.log said 2020-10-26 22:13:05 1454 [3301]: s1 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/ids
3. there is nothing message import to this issue.The logs is in the attachment.I'm very appreciate if anyone can help me. Thank you.
4 years
update host to 4.4: manual ovn config necessary?
by Gianluca Cecchi
Hello,
I have updated an external engine from 4.3 to 4.4 and the OVN configuration
seems to have been retained:
[root@ovmgr1 ovirt-engine]# ovn-nbctl show
switch fc2fc4e8-ff71-4ec3-ba03-536a870cd483
(ovirt-ovn192-1e252228-ade7-47c8-acda-5209be358fcf)
switch 101d686d-7930-4176-b41a-b306d7c30a1a
(ovirt-ovn17217-4bb1d1a7-020d-4843-9ac7-dc4204b528e5)
port c1ec60a4-b4f3-4cb5-8985-43c086156e83
addresses: ["00:1a:4a:19:01:89 dynamic"]
port 174b69f8-00ed-4e25-96fc-7db11ea8a8b9
addresses: ["00:1a:4a:19:01:59 dynamic"]
port ccbd6188-78eb-437b-9df9-9929e272974b
addresses: ["00:1a:4a:19:01:88 dynamic"]
port 7e96ca70-c9e3-4efe-9ac5-e56c18476437
addresses: ["00:1a:4a:19:01:83 dynamic"]
port d2c2d9f1-8fc3-4f17-9ada-76fe3a168e65
addresses: ["00:1a:4a:19:01:5e dynamic"]
port 4d13d63e-5ff3-41c1-9b6b-feac343b514b
addresses: ["00:1a:4a:19:01:60 dynamic"]
port 66359e79-56c4-47e0-8196-2241706329f6
addresses: ["00:1a:4a:19:01:68 dynamic"]
switch 87012fa6-ffaa-4fb0-bd91-b3eb7c0a2fc1
(ovirt-ovn193-d43a7928-0dc8-49d3-8755-5d766dff821a)
port 2ae7391b-4297-4247-a315-99312f6392e6
addresses: ["00:1a:4a:19:01:51 dynamic"]
switch 9e77163a-c4e4-4abf-a554-0388e6b5e4ce
(ovirt-ovn172-4ac7ba24-aad5-432d-b1d2-672eaeea7d63)
[root@ovmgr1 ovirt-engine]#
Then I updated one of the 3 Linux hosts (not node ng) through remove from
web admin gui, install from scratch of CentOS 8.2 OS, configure repos and
then add new host (with the same name) in engine and I was able to connect
to storage (iSCSI) and start VMs in general on the host.
Coming to OVN part it seems it has not been configured on the upgraded host.
Is it expected?
Eg on engine I only see chassis for the 2 hosts still in 4.3:
[root@ovmgr1 ovirt-engine]# ovn-sbctl show
Chassis "b8872ab5-4606-4a79-b77d-9d956a18d349"
hostname: "ov301.mydomain"
Encap geneve
ip: "10.4.192.34"
options: {csum="true"}
Port_Binding "174b69f8-00ed-4e25-96fc-7db11ea8a8b9"
Port_Binding "66359e79-56c4-47e0-8196-2241706329f6"
Chassis "ddecf0da-4708-4f93-958b-6af365a5eeca"
hostname: "ov300.mydomain"
Encap geneve
ip: "10.4.192.33"
options: {csum="true"}
Port_Binding "ccbd6188-78eb-437b-9df9-9929e272974b"
[root@ovmgr1 ovirt-engine]#
What to do to add the upgraded 4.4 host? Can they live together for the OVN
part?
Thanks,
Gianluca
4 years
when configuring multi path logical network selection area is empty, hence not able to configure multipathing
by dhanaraj.ramesh@yahoo.com
Hi team,
I have 4 node cluster where in each node I configured 2 dedicated 10 gig NIC with dedicated subnet each ( NIC 1 = 10.10.10.0/24, NIC 2 = 10.10.20.0/24 ) and on the array side I configured 2 targets with 10.10.10.0 /24 & another 2 target with 10.10.20.0/24 subnet. without any errors I could check in all four paths and able to mount iscsi luns in all 4 nodes. However when i try to configure mutipathing at Data center level I could see all the paths but not the logical network, it stays empty although I configured logical network label for both NICs with dedicated names as ISCSI1 & ISCI2. these logical names visible and green at the host network level, no errors, they are just L2 IP config
Am I missing something here? what else I should do to enable multiplathing
4 years
vdsm with NFS storage reboot or shutdown more than 15 minutes. with error failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
by lifuqiong@sunyainfo.com
Hi everyone:
Description of problem:
When exec "reboot" or "shutdown -h 0" cmd on vdsm server, the vdsm server will reboot or shutdown more than 30 minutes. the screen shows '[FAILED] Failed unmouting /rhev/data-center/mnt/172.18.81.41:_home_nfs_data'.
other messages may be useful: [] watchdog: watchdog0: watchdog did not stop! []systemd-shutdown[5594]: Failed to unmount /rhev/data-center/mnt/172.18.81.14:_home_nfs_data: Device or resource busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
[]systemd-shutdown[5595]: Failed to remount '/' read-only: Device or resource busy
[]systemd-shutdown[1]: Failed to wait for process: Protocol error
dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
Version-Release number of selected component (if applicable):
Software Version:4.2.8.2-1.el7
OS: CentOS Linux release 7.5.1804 (Core)
How reproducible:
100%
Steps to Reproduce:
1. my test enviroment is one Ovirt engine(172.17.81.17) with 4 vdsm servers, exec "reboot" cmd in one of the vdsm servers(172.17.99.105), the server will reboot more than 30 minutes.ovirt-engine : 172.17.81.17/16
vdsm: 172.17.99.105/16
nfs server: 172.17.81.14/16Actual results:
As above. the server will reboot more than 30 minutes
Expected results:
the server will reboot in a short time.
What I have done:
I have capture packet in nfs server while vdsm is rebooting, I found vdsm is always sending nfs packet to nfs server circularly as follows:this is some log files while I reboot vdsm 172.17.99.105 in 2020-10-26 22:12:34. Some conclusion is:
1. the vdsm.log said the vdsm 2020-10-26 22:12:34,461+0800 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/metadata
2. the sanlock.log said 2020-10-26 22:13:05 1454 [3301]: s1 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/172.18.81.14:_home_nfs_data/02c4c6ea-7ca9-40f1-a1d0-f1636bc1824e/dom_md/ids
3. there is nothing message import to this issue.The logs is in the attachment.I'm very appreciate if anyone can help me. Thank you.
4 years
problems installing standard Linux as nodes in 4.4
by Gianluca Cecchi
Hello,
due to missing megaraid_sas kernel module in RH EL 8 and CentOS 8, I'm
deploying a new oVirt host using CentOS 8 and elrepo kernel driver and not
ovirt node ng.
Based on installation guide:
- install CentOS 8.2 ("Server" chosen as base environment)
- yum install https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm
- yum install cockpit-ovirt-dashboard
- yum update
- reboot
Try to add host from engine web admin gui, I get:
Host ov200 installation failed. Failed to execute Ansible host-deploy role:
Failed to execute call to start playbook. . Please check logs for more
details: /var/log/ovirt-engine/ansible-runner-service.log.
Inside the log file above on engine:
2020-10-08 11:58:43,389 - runner_service.controllers.hosts - DEBUG -
Request received, content-type :None
2020-10-08 11:58:43,390 - runner_service.controllers.hosts - INFO -
127.0.0.1 - GET /api/v1/hosts/ov200
2020-10-08 11:58:43,398 - runner_service.controllers.playbooks - DEBUG -
Request received, content-type :application/json; charset=UTF-8
2020-10-08 11:58:43,398 - runner_service.controllers.playbooks - INFO -
127.0.0.1 - POST /api/v1/playbooks/ovirt-host-deploy.yml
Do I have to enable any module or pre-install anything else before adding
it?
BTW: on host
[root@ov200 ~]# rpm -q ansible
ansible-2.9.13-2.el8.noarch
[root@ov200 ~]#
Thanks,
Gianluca
4 years
Left over hibernation disks that we can't delete
by james@deanimaconsulting.com
After putting the VMs in our environment into hibernation, one of the VMs that came out of hibernation still has the metadata and memory dumb hanging around as 2 disks with status ok. They are not attached to any VM but we are unable to delete them. The VM in question came out of the hibernation without any issues.
If they are no longer required how can we tidy them up?
Thanks, James
4 years
adminstration portal wont complete load, looping
by Philip Brown
I have an odd situation:
When I go to
https://ovengine/ovirt-engine/webadmin/?locale=en_US
after authentication passes...
it shows the top banner of
oVirt OPEN VIRTUALIZATION MANAGER
and the
Loading ...
in the center. but never gets past that. Any suggestions on how I could investigate and fix this?
background:
I recently updated certs to be signed wildcard certs, but this broke consoles somehow.
So I restored the original certs, and restarted things... but got stuck with this.
Interestingly, the VM portal loads fine. But not the admin portal.
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com
4 years
reinstall 4.3 host from 4.4 engine
by Gianluca Cecchi
Hello,
supposing I'm with an already upgraded 4.4.2 engine and in my environment I
still have some 4.3.10 hosts based on CentOS Linux 7.x (or 4.3.10 ng
nodes), is it supported to reinstall such host in case of any modification
in configuration?
Or will the engine try to install them in a "4.4" way?
Thanks in advance,
Gianluca
4 years
Manual VM Migration fails
by Anton Louw
Hello Everybody,
I am having a strange issue. When I try and manually migrate a VM from one host to another, I get an error stating:
"Migration failed (VM: VM1, Source: node6.example.com, Destination: node3.example.com)"
I have tried with a few different machines, and it pops up with the same error. I have attached the VDSM logs of both source and destination nodes. The time frame is 16:01
I see the below in that time frame, but not quite sure what I need to change:
2020-10-23 16:01:14,419+0200 ERROR (migsrc/42186f82) [virt.vm] (vmId='42186f82-b84c-7e65-e736-e6331acd04ed') Failed to migrate (migration:450)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 431, in _regular_run
time.time(), migrationParams, machineParams
File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 505, in _startUnderlyingMigration
self._perform_with_conv_schedule(duri, muri)
File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 591, in _perform_with_conv_schedule
self._perform_migration(duri, muri)
File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 525, in _perform_migration
self._migration_flags)
File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 100, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1781, in migrateToURI3
if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
libvirtError: operation failed: guest CPU doesn't match specification: missing features: spec-ctrl,ssbd
Anton Louw
Cloud Engineer: Storage and Virtualization
______________________________________
D: 087 805 1572 | M: N/A
A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
anton.louw(a)voxtelecom.co.za
www.vox.co.za
4 years