Unable to start HostedEngine
by Devin A. Bougie
After a failed attempt at migrating our HostedEngine to a new iSCSI storage domain, we're unable to restart the original HostedEngine.
Please see below for some details, and let me know what more information I can provide. "Lnxvirt07" was the Host used to attempt the migration. Any help would be greatly appreciated.
Many thanks,
Devin
------
[root@lnxvirt01 ~]# tail -n 5 /var/log/ovirt-hosted-engine-ha/agent.log
MainThread::INFO::2023-11-01 12:29:53,514::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Global maintenance detected
MainThread::INFO::2023-11-01 12:29:54,151::ovf_store::117::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:05ef954f-d06d-401c-85ec-5992e2afbe7d, volUUID:d2860f1d-19cf-4084-8a7e-d97880c32431
MainThread::INFO::2023-11-01 12:29:54,530::ovf_store::117::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:a375a35b-7a87-4df4-8d29-a5ba371fee85, volUUID:ef8b3dae-bcae-4d58-bea8-cf1a34872267
MainThread::ERROR::2023-11-01 12:29:54,813::config_ovf::65::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store) Failed extracting VM OVF from the OVF_STORE volume, falling back to initial vm.conf
MainThread::INFO::2023-11-01 12:29:54,843::hosted_engine::531::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state GlobalMaintenance (score: 3400)
[root@lnxvirt01 ~]# hosted-engine --vm-start
Command VM.getStats with args {'vmID': 'e6370d8f-c083-4f28-83d0-a232d693e07a'} failed:
(code=1, message=Virtual machine does not exist: {'vmId': 'e6370d8f-c083-4f28-83d0-a232d693e07a'})
Command VM.create with args {'vmID': 'e6370d8f-c083-4f28-83d0-a232d693e07a', 'vmParams': {'vmId': 'e6370d8f-c083-4f28-83d0-a232d693e07a', 'memSize': '16384', 'display': 'vnc', 'vmName': 'HostedEngine', 'smp': '4', 'maxVCpus': '40', 'cpuType': 'Haswell-noTSX', 'emulatedMachine': 'pc', 'devices': [{'index': '2', 'iface': 'ide', 'address': {'controller': '0', 'target': '0', 'unit': '0', 'bus': '1', 'type': 'drive'}, 'specParams': {}, 'readonly': 'true', 'deviceId': 'b3e2f40a-e28d-493c-af50-c1193fb9dc97', 'path': '', 'device': 'cdrom', 'shared': 'false', 'type': 'disk'}, {'index': '0', 'iface': 'virtio', 'format': 'raw', 'poolID': '00000000-0000-0000-0000-000000000000', 'volumeID': '6afa3b19-7a1a-4e5c-a681-eed756d316e9', 'imageID': '94628710-cf73-4589-bd84-e58f741a4d5f', 'specParams': {}, 'readonly': 'false', 'domainID': '555ad71c-1a4e-42b3-af8c-db39d9b9df67', 'optional': 'false', 'deviceId': '6afa3b19-7a1a-4e5c-a681-eed756d316e9', 'address': {'bus': '0x00', 'slot': '0x06', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'disk', 'shared': 'exclusive', 'propagateErrors': 'off', 'type': 'disk', 'bootOrder': '1'}, {'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller'}, {'nicModel': 'pv', 'macAddr': '00:16:3e:3b:3f:14', 'linkActive': 'true', 'network': 'ovirtmgmt', 'specParams': {}, 'deviceId': '002afd06-9649-4ac5-a5e8-1a4945c3c136', 'address': {'bus': '0x00', 'slot': '0x03', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'bridge', 'type': 'interface'}, {'device': 'console', 'type': 'console'}, {'device': 'vga', 'alias': 'video0', 'type': 'video'}, {'device': 'vnc', 'type': 'graphics'}, {'device': 'virtio', 'specParams': {'source': 'urandom'}, 'model': 'virtio', 'type': 'rng'}]}} failed:
(code=100, message=General Exception: ("'xml'",))
VM failed to launch
[root@lnxvirt01 ~]# cat /etc/ovirt-hosted-engine/hosted-engine.conf
fqdn=lnxvirt-engine.classe.cornell.edu
vm_disk_id=94628710-cf73-4589-bd84-e58f741a4d5f
vm_disk_vol_id=6afa3b19-7a1a-4e5c-a681-eed756d316e9
vmid=e6370d8f-c083-4f28-83d0-a232d693e07a
storage=192.168.56.50,192.168.56.51,192.168.56.52,192.168.56.53
nfs_version=
mnt_options=
conf=/var/run/ovirt-hosted-engine-ha/vm.conf
host_id=8
console=vnc
domainType=iscsi
spUUID=00000000-0000-0000-0000-000000000000
sdUUID=555ad71c-1a4e-42b3-af8c-db39d9b9df67
connectionUUID=e29cf818-5ee5-46e1-85c1-8aeefa33e95d
vdsm_use_ssl=true
gateway=192.168.55.1
bridge=ovirtmgmt
network_test=dns
tcp_t_address=
tcp_t_port=
metadata_volume_UUID=2bf987a2-ab81-454c-9fc7-dc7ec8945fd9
metadata_image_UUID=35429b63-16ca-417a-b87a-d232463bf6a3
lockspace_volume_UUID=b0d09780-2047-433c-812d-10ba0beff788
lockspace_image_UUID=8ccb878d-9938-43c8-908b-e1b416fe991c
conf_volume_UUID=0b40ac60-499e-4ff1-83d0-fc578f1af3dc
conf_image_UUID=551d4fe5-a9f7-4ba1-9951-87418362b434
# The following are used only for iSCSI storage
iqn=iqn.2002-10.com.infortrend:raid.uid58207.001
portal=1
user=
password=
port=3260,3260,3260,3260
[root@lnxvirt01 ~]# hosted-engine --vm-status
!! Cluster is in GLOBAL MAINTENANCE mode !!
--== Host lnxvirt06.classe.cornell.edu (id: 1) status ==--
Host ID : 1
Host timestamp : 3718817
Score : 3400
Engine status : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"}
Hostname : lnxvirt06.classe.cornell.edu
Local maintenance : False
stopped : False
crc32 : 233a1425
conf_on_shared_storage : True
local_conf_timestamp : 3718818
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=3718817 (Wed Nov 1 12:26:35 2023)
host-id=1
score=3400
vm_conf_refresh_time=3718818 (Wed Nov 1 12:26:37 2023)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
--== Host lnxvirt05.classe.cornell.edu (id: 2) status ==--
Host ID : 2
Host timestamp : 3719461
Score : 3400
Engine status : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"}
Hostname : lnxvirt05.classe.cornell.edu
Local maintenance : False
stopped : False
crc32 : b3c81abe
conf_on_shared_storage : True
local_conf_timestamp : 3719462
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=3719461 (Wed Nov 1 12:26:41 2023)
host-id=2
score=3400
vm_conf_refresh_time=3719462 (Wed Nov 1 12:26:42 2023)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
--== Host lnxvirt04.classe.cornell.edu (id: 3) status ==--
Host ID : 3
Host timestamp : 3718684
Score : 3400
Engine status : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"}
Hostname : lnxvirt04.classe.cornell.edu
Local maintenance : False
stopped : False
crc32 : 03a57b14
conf_on_shared_storage : True
local_conf_timestamp : 3718686
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=3718684 (Wed Nov 1 12:26:41 2023)
host-id=3
score=3400
vm_conf_refresh_time=3718686 (Wed Nov 1 12:26:43 2023)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
--== Host lnxvirt03.classe.cornell.edu (id: 4) status ==--
Host ID : 4
Host timestamp : 3719430
Score : 3400
Engine status : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"}
Hostname : lnxvirt03.classe.cornell.edu
Local maintenance : False
stopped : False
crc32 : adb1aad2
conf_on_shared_storage : True
local_conf_timestamp : 3719432
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=3719430 (Wed Nov 1 12:26:35 2023)
host-id=4
score=3400
vm_conf_refresh_time=3719432 (Wed Nov 1 12:26:36 2023)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
--== Host lnxvirt02.classe.cornell.edu (id: 5) status ==--
Host ID : 5
Host timestamp : 3719408
Score : 3400
Engine status : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"}
Hostname : lnxvirt02.classe.cornell.edu
Local maintenance : False
stopped : False
crc32 : 1996a067
conf_on_shared_storage : True
local_conf_timestamp : 3719410
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=3719408 (Wed Nov 1 12:26:39 2023)
host-id=5
score=3400
vm_conf_refresh_time=3719410 (Wed Nov 1 12:26:41 2023)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
--== Host lnxvirt07.classe.cornell.edu (id: 7) status ==--
Host ID : 7
Host timestamp : 495392
Score : 0
Engine status : unknown stale-data
Hostname : lnxvirt07.classe.cornell.edu
Local maintenance : False
stopped : True
crc32 : 2572e907
conf_on_shared_storage : True
local_conf_timestamp : 495352
Status up-to-date : False
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=495392 (Tue Oct 31 10:20:12 2023)
host-id=7
score=0
vm_conf_refresh_time=495352 (Tue Oct 31 10:19:33 2023)
conf_on_shared_storage=True
maintenance=False
state=AgentStopped
stopped=True
--== Host lnxvirt01.classe.cornell.edu (id: 8) status ==--
Host ID : 8
Host timestamp : 1729103
Score : 3400
Engine status : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"}
Hostname : lnxvirt01.classe.cornell.edu
Local maintenance : False
stopped : False
crc32 : 2e57e99d
conf_on_shared_storage : True
local_conf_timestamp : 1729104
Status up-to-date : True
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=1729103 (Wed Nov 1 12:26:31 2023)
host-id=8
score=3400
vm_conf_refresh_time=1729104 (Wed Nov 1 12:26:33 2023)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False
!! Cluster is in GLOBAL MAINTENANCE mode !!
1 year, 2 months
Hosted-engine restore failing when migrating to new storage domain
by Devin A. Bougie
Hello,
We have a functioning oVirt 4.5.4 cluster running on fully-updated EL9.2 hosts. We are trying to migrate the self-hosted engine to a new iSCSI storage domain using the existing hosts, following the documented procedure:
- set the cluster into global maintenance mode
- backup the engine using "engine-backup --scope=all --mode=backup --file=backup.bck --log=backuplog.log"
- shutdown the engine
- restore the engine using "hosted-engine --deploy --4 --restore-from-file=backup.bck"
This almost works, but fails with the attached log file. Any help or suggestions would be greatly appreciated, including alternate procedures for migrating a self-hosted engine from one domain to another.
Many thanks,
Devin
1 year, 2 months
Call for participation: Virtualization and Cloud infrastructure Room at FOSDEM 2024
by Piotr Kliczewski
We are excited to announce that the call for proposals is now open for the
Virtualization and Cloud infrastructure devroom at the upcoming FOSDEM
2024, to be hosted on February 3rd 2024.
This devroom is a collaborative effort, and is organized by dedicated folks
from projects such as OpenStack, Xen Project, KubeVirt, QEMU, KVM, and
Foreman. We would like to invite all those who are involved in these fields
to submit your proposals by December 8th, 2023.
About the Devroom
The Virtualization & IaaS devroom will feature session topics such as open
source hypervisors or virtual machine managers such as Xen Project, KVM,
bhyve and VirtualBox as well as Infrastructure-as-a-Service projects such
as KubeVirt, Apache CloudStack, OpenStack, QEMU and OpenNebula.
This devroom will host presentations that focus on topics of shared
interest, such as KVM; libvirt; shared storage; virtualized networking;
cloud security; clustering and high availability; interfacing with multiple
hypervisors; hyperconverged deployments; and scaling across hundreds or
thousands of servers.
Presentations in this devroom will be aimed at developers working on these
platforms who are looking to collaborate and improve shared infrastructure
or solve common problems. We seek topics that encourage dialog between
projects and continued work post-FOSDEM.
Important Dates
Submission deadline: 8th December 2023
Acceptance notifications: 10th December 2023
Final schedule announcement: 15th December 2023
Devroom: 3rd February 2024
Submit Your Proposal
All submissions must be made via the Pretalx event planning site[1]. It is
a new submission system so you will need to create an account. If you
submitted proposals for FOSDEM in previous years, you won’t be able to use
your existing account.
During submission please make sure to select Virtualization and Cloud
infrastructure from the Track list. Please fill out all the required
fields, and provide a meaningful abstract and description of your proposed
session.
Submission Guidelines
We expect more proposals than we can possibly accept, so it is vitally
important that you submit your proposal on or before the deadline. Late
submissions are unlikely to be considered.
All presentation slots are 30 minutes, with 20 minutes planned for
presentations, and 10 minutes for Q&A.
All presentations will be recorded and made available under Creative
Commons licenses. In the Submission notes field, please indicate that you
agree that your presentation will be licensed under the CC-By-SA-4.0 or
CC-By-4.0 license and that you agree to have your presentation recorded.
For example:
"If my presentation is accepted for FOSDEM, I hereby agree to license all
recordings, slides, and other associated materials under the Creative
Commons Attribution Share-Alike 4.0 International License.
Sincerely,
<NAME>."
In the Submission notes field, please also confirm that if your talk is
accepted, you will be able to attend FOSDEM and deliver your presentation.
We will not consider proposals from prospective speakers who are unsure
whether they will be able to secure funds for travel and lodging to attend
FOSDEM. (Sadly, we are not able to offer travel funding for prospective
speakers.)
Code of Conduct
Following the release of the updated code of conduct for FOSDEM, we'd like
to remind all speakers and attendees that all of the presentations and
discussions in our devroom are held under the guidelines set in the CoC and
we expect attendees, speakers, and volunteers to follow the CoC at all
times.
If you submit a proposal and it is accepted, you will be required to
confirm that you accept the FOSDEM CoC. If you have any questions about the
CoC or wish to have one of the devroom organizers review your presentation
slides or any other content for CoC compliance, please email us and we will
do our best to assist you.
Questions?
If you have any questions about this devroom, please send your questions to
our devroom mailing list. You can also subscribe to the list to receive
updates about important dates, session announcements, and to connect with
other attendees.
See you all at FOSDEM!
[1] <https://penta.fosdem.org/submission/FOSDEM17>
https://pretalx.fosdem.org/fosdem-2024/cfp
[2] virtualization-devroom-manager at fosdem.org
1 year, 2 months
Moving hosted-engine to a new cluster
by Ling Ho
Hello, I set up some new hypervisors recently, created a new cluster
using the newer CPU, and migrated all my VMs by changing the cluster and
restarting them. Now I have the hosted engine left running in my old
cluster.
I could not find any recent documentation or discussion on the net.
What's the proper way to migrate the hosted engine? I am on oVirt 4.5.4.
Thanks,
...
ling
1 year, 2 months
Cannot "Change CD" from VM portal with standard user
by Gianluca Amato
Hello,
I am trying to attach a CD to a virtual machine from the VM Portal as a standard user. When I click on the box for choosing the ISO image, I only get "[Empty]". However, when I access the VM portal as an admin, I correctly see the list of ISO images. I suspect it is a problem of permission: I have assigned to the unprivileged user the UserRole permission both for the VM and the ISO disk images. Am I doing something wrong ? Do I need to assign different roles ?
Thanks,
--gianluca
1 year, 2 months
'NoneType' object has no attribute 'data_center'
by marek
hi,
i'm using VM creation through ansible - ovirt 4.5
- name: Creates a new Virtual Machine from template
ovirt.ovirt.ovirt_vm:
auth: "{{ ovirt_auth }}"
state: present
cpu_cores: 2
memory: 4GiB
name: "{{NAME}}"
template: "{{ VM_TEMPLATE }}"
comment: "{{comment}}"
description: "{{description}}"
cluster: "{{CLUSTER}}"
type: server
high_availability: yes
operating_system: rhel_9x64
graphical_console:
protocol:
- vnc
cloud_init_persist: yes
nics:
- name: enp1s0
profile_name: "{{VLAN1}}"
- name: enp2s0
profile_name: "{{VLAN2}}"
cloud_init:
dns_servers: '1.1.1.1'
host_name: "{{NAME}}"
user_name: root
root_password: "{{ROOT_PASSWORD}}"
authorized_ssh_keys: "ssh-ed25519 secret"
cloud_init_nics:
- nic_name: enp1s0
nic_boot_protocol: static
nic_ip_address: "{{PRIVATE_IP}}"
nic_netmask: 255.255.255.0
nic_gateway: x.x.x.x
- nic_name: enp2s0
nic_boot_protocol: static
nic_ip_address: "{{PUBLIC_IP}}"
nic_netmask: "{{PUBLIC_NETMASK}}"
it works before . double checked input
no upgrades/changes
any ideas where can be problem? any tips howto debug
Marek
The full traceback is:
Traceback (most recent call last):
File
"/tmp/ansible_ovirt.ovirt.ovirt_vm_payload_thzzyp5t/ansible_ovirt.ovirt.ovirt_vm_payload.zip/ansible_collections/ovirt/ovirt/plugins/modules/ovirt_vm.py",
line 2694, in main
File
"/tmp/ansible_ovirt.ovirt.ovirt_vm_payload_thzzyp5t/ansible_ovirt.ovirt.ovirt_vm_payload.zip/ansible_collections/ovirt/ovirt/plugins/module_utils/ovirt.py",
line 673, in create
self.build_entity(),
File
"/tmp/ansible_ovirt.ovirt.ovirt_vm_payload_thzzyp5t/ansible_ovirt.ovirt.ovirt_vm_payload.zip/ansible_collections/ovirt/ovirt/plugins/modules/ovirt_vm.py",
line 1525, in build_entity
File
"/tmp/ansible_ovirt.ovirt.ovirt_vm_payload_thzzyp5t/ansible_ovirt.ovirt.ovirt_vm_payload.zip/ansible_collections/ovirt/ovirt/plugins/modules/ovirt_vm.py",
line 1418, in __get_template_with_version
AttributeError: 'NoneType' object has no attribute 'data_center'
1 year, 2 months
How to start QA and testing work in the oVirt project?
by song_chao@massclouds.com
Hello everyone, I am a testing and development engineer who has been working for 10 years. I want to learn and participate in Ovirt's testing work, and make my own contribution. Through learning and understanding, I believe that in the near future, I can also make some contributions to the ovirt community. Currently, I would like some information about QA and testing processes and methods. Can anyone provide me with it? Thank you.
1 year, 2 months
Missing OVF_STORE disks. Impact and how to recover them?
by ivan.lezhnjov.iv@gmail.com
Hi!
We have a Local Storage SD (Storage Domain) that is missing both OVF_STORE disks.
Clearly this SD is strictly speaking broken but the oVirt Host where this SD is configured and VMs that run on that Host seem to be doing fine. No major problems except for the fact that oVirt Events logs are filled with "VDSM command SetVolumeDescriptionVDS failed: Image path does not exist or cannot be accessed/created" error (and a couple others associated with this one) that is generated every 1 hour.
Is this SD in a critical state? Are we risking losing data or anything like that? What is the impact of deleting OVF_STORE disks from an SD?
And most importantly is there a way to recover/recreate these OVF_STORE disks?
1 year, 2 months
Multiple hosts stuck in Connecting state waiting for storage pool to go up.
by ivan.lezhnjov.iv@gmail.com
Hi!
We have a problem with multiple hosts stuck in Connecting state, which I hoped somebody here could help us wrap our heads around.
All hosts, except one, seem to have very similar symptoms but I'll focus on one host that represents the rest.
So, the host is stuck in Connecting state and this what we see in oVirt log files.
/var/log/ovirt-engine/engine.log:
2023-04-20 09:51:53,021+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = ABC010-176-XYZ, VdsIdAndVdsVDSCommandParametersBase:{hostId='2c458562-3d4d-4408-afc9-9a9484984a91', vds='Host[ABC010-176-XYZ,2c458562-3d4d-4408-afc9-9a9484984a91]'})' execution failed: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: SSL session is invalid
2023-04-20 09:55:16,556+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-67) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ABC010-176-XYZ command Get Host Capabilities failed: Message timeout which can be caused by communication issues
/var/log/vdsm/vdsm.log:
2023-04-20 17:48:51,977+0300 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList() from=internal, task_id=ebce7c8c-6ded-454e-9aee-86edf72764ef (api:31)
2023-04-20 17:48:51,977+0300 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=ebce7c8c-6ded-454e-9aee-86edf72764ef (api:37)
2023-04-20 17:48:51,978+0300 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:723)
Both engine.log and vdsm.log are flooded with these messages. They are repeated at regular intervals ad infinitum. This is one common symptom shared by multiple hosts in our deployment. They all have these message loops in engine.log and vdsm.log files. On all
Running vdsm-client Host getConnectedStoragePools also returns an empty list represented by [] on all hosts (but interestingly there is one that showed Storage Pool UUID and yet it was still stuck in Connecting state).
This particular host (ABC010-176-XYZ) is connected to 3 CEPH iSCSI Storage Domains and lsblk shows 3 block devices with matching UUIDs in their device components. So, the storage seems to be connected but the Storage Pool is not? How is that even possible?
Now, what's even more weird is that we tried rebooting the host (via Administrator Portal) and it didn't help. We even tried removing and re-adding the host in Administrator Portal but to no avail.
Additionally, the host refused to go into Maintenance mode so we had to enforce it by manually updating Engine DB.
We also tried reinstalling the host via Administrator Portal and ran into another weird problem, which I'm not sure if it's a related one or a problem that deserves a dedicated discussion thread but, basically, the underlying Ansible playbook exited with the following error message:
"stdout" : "fatal: [10.10.10.176]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"10.10.10.176\\\". Make sure this host can be reached over ssh: \", \"unreachable\": true}",
Counterintuitively, just before running Reinstall via Administrator Portal we had been able to reboot the same host (which as you know oVirt does via Ansible as well). So, no changes on the host in between just different Ansible playbooks. To confirm that we actually had access to the host over ssh we successfully ran ssh -p $PORT root(a)10.10.10.176 -i /etc/pki/ovirt-engine/keys/engine_id_rsa and it worked.
That made us scratch our heads for a while but what seems to had fixed Ansible's ssh access problems was manual full stop of all VDSM-related systemd services on the host. It was just a wild guess but as soon as we stopped all VDSM services Ansible stopped complaining about not being able to reach the target host and successfully did its job.
I'm sure you'd like to see more logs but I'm not certain what exactly is relevant. There are a ton of logs as this deployment is comprised of nearly 80 hosts. So, I guess it's best if you just request to see specific logs, messages or configuration details and I'll cherry-pick what's relevant.
We don't really understand what's going on and would appreciate any help. We tried just about anything we could think of to resolve this issue and are running out of ideas what to do next.
If you have any questions just ask and I'll do my best to answer them.
1 year, 2 months
Is this still an active mailing list?
by vince seavello
I am attempting to install oVirt Node 4.5.4. I could use a little help with
a few setup issues. I'm wondering if this mailing list is still active.
I ask because there are several links on the ovirt.org site that are not
functional. If this list is still active, I'll give some of the specifics
of my system and the issues I'm running into.
Vince Seavello
Seavello Web Design
c: 425.478.9682
f: 425.337.8712
e: vseavello(a)gmail.com
1 year, 2 months