This is resolved.
I manually shutdown each VM, and then from within oVirt, I went to the host, and in the
upper corner of the page when looking at the host, I clicked on 'Confirm Host has been
rebooted'.
This allowed oVirt to then recognize that the VMs were down, and I was able to bring them
back online on a healthy host.
..... That's what you're supposed to do, anyway.
I intentionally cheated, and did the order of things a little bit differently. I knew that
none of the VMs on that host were currently configured for HA, so I knew that if oVirt
thought the VMs were turned off, that oVirt would NOT turn the VMs back online.
So just to make sure that it would even work, I marked the problematic host as rebooted
FIRST. Then, once I knew that worked, and the VMs were showing down in the oVirt UI (but
still online on the problematic host), I ssh'd to each server and manually shut them
down before bringing them back online.
Hopefully this helps someone else!
-David
Sent with Proton Mail secure email.
------- Original Message -------
On Monday, September 19th, 2022 at 3:44 PM, David White via Users <users(a)ovirt.org>
wrote:
Restarting the `vdsmd` service on 1 of the problematic hosts brought
that host back, and ovirt can see it.
But that did not fix the problem on the last remaining host. I'm
still troubleshooting...
> Sent with Proton Mail secure email.
> ------- Original Message -------
> On Monday, September 19th, 2022 at 11:37 AM, David White via Users
<users(a)ovirt.org> wrote:
> > I tried rebooting the engine to see if that would magically solve the problem
(worth a try, right?). But as I expected, it didn't help.
>
> > Now one of the hosts is in a "Non Responsive" state and the other is
permanently in a "Connecting" state. All VMs associated with those 2 hosts now
show a question mark on the oVirt dashboard.
>
> > The storage for these VMs is good, and these VMs are online. Everything is
"working" -- I just need to get these VMs moved onto hosts that oVirt is able to
manage.
>
> > If it helps for troubleshooting purposes, prior to rebooting the engine, the
following errors were showing up in the oVirt UI for both of these hosts:
>
> > VDSM
cha1-storage.example.com command Get Host Capabilities failed: Internal
JSON-RPC error: {'reason': '[Errno 24] Too many open files'}
>
>
> > Any ideas? If I need to take some downtime for these VMs, so be it, but I need
to keep downtime at a minimum.
>
> > Sent with Proton Mail secure email.
>
> > ------- Original Message -------
> > On Monday, September 19th, 2022 at 8:41 AM, David White via Users
<users(a)ovirt.org> wrote:
>
>
> > > Ok, now that I'm able to (re)deploy ovirt to new hosts, I now need to
migrate VMs that are running on hosts that are currently in an "unassigned"
state in the cluser.
> >
> > > This is the result of having moved the oVirt engine OUT of a hyperconverged
environment onto its own stand-alone system, while simultaneously upgrading oVirt from
v4.4 to the latest v4.5.
> >
> > > See the following email threads:
> >
> > > -
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/TZAUCM3GB5ER...
> > > -
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/3IWXZ7VXM6CY...
> >
> >
> > > The oVirt engine knows about the VMs, and oVirt knows about the storage
that those VMs are on. But the engine sees 2 of my hosts as "unassigned", and
I've been unable to migrate the disks to new storage, nor live migrate a VM from an
unassigned host, nor make a clone of an existing VM.
> >
> > > Is there a way to recover from this scenario? I was thinking something
along the lines of manually shutting down the VM on the unassigned host, and then somehow
force the engine to bring the VM online again from a healthy host?
> >
> > > Thanks,
> > > David
> >
> >
> > Sent with Proton Mail secure email.