VM stuck in "reboot in progress" ("virtual machine XXX should be running in a host but it isn't.").

Hello all (and happy new year), (Note: Also reported as https://bugzilla.redhat.com/show_bug.cgi?id=1880251) Self hosted engine, single node, NFS. Attempted to install CentOS over an existing Fedora VM with one host device (USB printer). Reboot failed, trying to boot from a non-existent CDROM. Tried shutting the VM down, failed. Tried powering off the VM, failed. Dropped cluster to global maintenance, reboot host + engine (was planning to upgrade it anyhow...), VM still stuck. When trying to power off the VM, the following message can be found the in engine.log: 2020-09-18 07:58:51,439+03 INFO [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Running command: StopVmCommand internal: false. Entities affected : ID: b411e573-bcda-4689-b61f-1811c6f03ad5 Type: VMAction group STOP_VM with role type USER 2020-09-18 07:58:51,441+03 WARN [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Strange, according to the status 'RebootInProgress' virtual machine 'b411e573-bcda-4689-b61f-1811c6f03ad5' should be running in a host but it isn't. 2020-09-18 07:58:51,594+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM kids-home-srv (Host: <UNKNOWN>, User: gilboa@internal-authz). My question is simple: Pending a solution to the bug, can I somehow drop the state of the VM? It's currently holding a sizable disk image and a USB device I need (printer). As it's my private VM cluster, I have no problem dropping the site completely for maintenance. Thanks, Gilboa

P.S. Full engine log attached to the bugzilla entry. - Gilboa On Fri, Sep 18, 2020 at 8:23 AM Gilboa Davara <gilboad@gmail.com> wrote:
Hello all (and happy new year),
(Note: Also reported as https://bugzilla.redhat.com/show_bug.cgi?id=1880251)
Self hosted engine, single node, NFS. Attempted to install CentOS over an existing Fedora VM with one host device (USB printer). Reboot failed, trying to boot from a non-existent CDROM. Tried shutting the VM down, failed. Tried powering off the VM, failed. Dropped cluster to global maintenance, reboot host + engine (was planning to upgrade it anyhow...), VM still stuck.
When trying to power off the VM, the following message can be found the in engine.log: 2020-09-18 07:58:51,439+03 INFO [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Running command: StopVmCommand internal: false. Entities affected : ID: b411e573-bcda-4689-b61f-1811c6f03ad5 Type: VMAction group STOP_VM with role type USER 2020-09-18 07:58:51,441+03 WARN [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Strange, according to the status 'RebootInProgress' virtual machine 'b411e573-bcda-4689-b61f-1811c6f03ad5' should be running in a host but it isn't. 2020-09-18 07:58:51,594+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM kids-home-srv (Host: <UNKNOWN>, User: gilboa@internal-authz).
My question is simple: Pending a solution to the bug, can I somehow drop the state of the VM? It's currently holding a sizable disk image and a USB device I need (printer).
As it's my private VM cluster, I have no problem dropping the site completely for maintenance.
Thanks,
Gilboa

You have rebooted the host where the VM was previously running , right ? If oVirt doesn't detect that the host was rebooted , you can mark it as such : UI -> Hosts -> select the Host -> the 3 dots -> "Confirm 'Host has been rebooted'" Best Regards, Strahil Nikolov В петък, 18 септември 2020 г., 08:27:33 Гринуич+3, Gilboa Davara <gilboad@gmail.com> написа: Hello all (and happy new year), (Note: Also reported as https://bugzilla.redhat.com/show_bug.cgi?id=1880251) Self hosted engine, single node, NFS. Attempted to install CentOS over an existing Fedora VM with one host device (USB printer). Reboot failed, trying to boot from a non-existent CDROM. Tried shutting the VM down, failed. Tried powering off the VM, failed. Dropped cluster to global maintenance, reboot host + engine (was planning to upgrade it anyhow...), VM still stuck. When trying to power off the VM, the following message can be found the in engine.log: 2020-09-18 07:58:51,439+03 INFO [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Running command: StopVmCommand internal: false. Entities affected : ID: b411e573-bcda-4689-b61f-1811c6f03ad5 Type: VMAction group STOP_VM with role type USER 2020-09-18 07:58:51,441+03 WARN [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Strange, according to the status 'RebootInProgress' virtual machine 'b411e573-bcda-4689-b61f-1811c6f03ad5' should be running in a host but it isn't. 2020-09-18 07:58:51,594+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM kids-home-srv (Host: <UNKNOWN>, User: gilboa@internal-authz). My question is simple: Pending a solution to the bug, can I somehow drop the state of the VM? It's currently holding a sizable disk image and a USB device I need (printer). As it's my private VM cluster, I have no problem dropping the site completely for maintenance. Thanks, Gilboa _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/CD536K6HHE5JX7...

On Sat, Sep 19, 2020 at 5:07 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:>
You have rebooted the host where the VM was previously running , right ? If oVirt doesn't detect that the host was rebooted , you can mark it as such : UI -> Hosts -> select the Host -> the 3 dots -> "Confirm 'Host has been rebooted'"
Best Regards, Strahil Nikolov
"Confirm host was rebooted" fails when host is up. The error message is: '"Error while executing action: Cannot confirm 'Host has been rebooted' Host. Valid Host statuses are "Non operational", "Maintenance" or "Connecting"'. - Gilboa

Ah , host is treated as operational (Active)... I guess you can check the status of the VM in the engine , but keep in mind that direct manipulation of the DB is highly not recommended (but desperate people require desperate measures). Usually I ssh to the HE as root: ssh root@HE su - postgres source /opt/rh/rh-postgresql10/enable psql engine List the tables: \dt Then you have to search where the status of the VM is. Best Regards, Strahil Nikolov В неделя, 20 септември 2020 г., 11:59:53 Гринуич+3, Gilboa Davara <gilboad@gmail.com> написа: On Sat, Sep 19, 2020 at 5:07 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:>
You have rebooted the host where the VM was previously running , right ? If oVirt doesn't detect that the host was rebooted , you can mark it as such : UI -> Hosts -> select the Host -> the 3 dots -> "Confirm 'Host has been rebooted'"
Best Regards, Strahil Nikolov
"Confirm host was rebooted" fails when host is up. The error message is: '"Error while executing action: Cannot confirm 'Host has been rebooted' Host. Valid Host statuses are "Non operational", "Maintenance" or "Connecting"'. - Gilboa _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/6NUVSN3UFBRWWP...

I've had situations where the engine UI wouldn't update for shutdown/startups of VMs which were resolved after ssh-ing into the engine VM and running systemctl restart ovirt-engine.service. running engine-setup was also used on occasion and cleared out old tasks. On Fri, Sep 18, 2020 at 1:26 AM Gilboa Davara <gilboad@gmail.com> wrote:
Hello all (and happy new year),
(Note: Also reported as https://bugzilla.redhat.com/show_bug.cgi?id=1880251)
Self hosted engine, single node, NFS. Attempted to install CentOS over an existing Fedora VM with one host device (USB printer). Reboot failed, trying to boot from a non-existent CDROM. Tried shutting the VM down, failed. Tried powering off the VM, failed. Dropped cluster to global maintenance, reboot host + engine (was planning to upgrade it anyhow...), VM still stuck.
When trying to power off the VM, the following message can be found the in engine.log: 2020-09-18 07:58:51,439+03 INFO [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Running command: StopVmCommand internal: false. Entities affected : ID: b411e573-bcda-4689-b61f-1811c6f03ad5 Type: VMAction group STOP_VM with role type USER 2020-09-18 07:58:51,441+03 WARN [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Strange, according to the status 'RebootInProgress' virtual machine 'b411e573-bcda-4689-b61f-1811c6f03ad5' should be running in a host but it isn't. 2020-09-18 07:58:51,594+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM kids-home-srv (Host: <UNKNOWN>, User: gilboa@internal-authz).
My question is simple: Pending a solution to the bug, can I somehow drop the state of the VM? It's currently holding a sizable disk image and a USB device I need (printer).
As it's my private VM cluster, I have no problem dropping the site completely for maintenance.
Thanks,
Gilboa _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/CD536K6HHE5JX7...

On Fri, Sep 18, 2020 at 8:27 AM Gilboa Davara <gilboad@gmail.com> wrote:
Hello all (and happy new year),
(Note: Also reported as https://bugzilla.redhat.com/show_bug.cgi?id=1880251)
Self hosted engine, single node, NFS. Attempted to install CentOS over an existing Fedora VM with one host device (USB printer). Reboot failed, trying to boot from a non-existent CDROM. Tried shutting the VM down, failed. Tried powering off the VM, failed. Dropped cluster to global maintenance, reboot host + engine (was planning to upgrade it anyhow...), VM still stuck.
When trying to power off the VM, the following message can be found the in engine.log: 2020-09-18 07:58:51,439+03 INFO [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Running command: StopVmCommand internal: false. Entities affected : ID: b411e573-bcda-4689-b61f-1811c6f03ad5 Type: VMAction group STOP_VM with role type USER 2020-09-18 07:58:51,441+03 WARN [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Strange, according to the status 'RebootInProgress' virtual machine 'b411e573-bcda-4689-b61f-1811c6f03ad5' should be running in a host but it isn't. 2020-09-18 07:58:51,594+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM kids-home-srv (Host: <UNKNOWN>, User: gilboa@internal-authz).
My question is simple: Pending a solution to the bug, can I somehow drop the state of the VM? It's currently holding a sizable disk image and a USB device I need (printer).
It would be best to modify the VM as if it should still be running on the host and let the system discover that it's not running there and update the VM accordingly. You can do it by changing the database with: update vm_dynamic set run_on_vds='82f92946-9130-4dbd-8663-1ac0b50668a1' where vm_guid='b411e573-bcda-4689-b61f-1811c6f03ad5';
As it's my private VM cluster, I have no problem dropping the site completely for maintenance.
Thanks,
Gilboa _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/CD536K6HHE5JX7...

On Sat, Sep 19, 2020 at 7:44 PM Arik Hadas <ahadas@redhat.com> wrote:
On Fri, Sep 18, 2020 at 8:27 AM Gilboa Davara <gilboad@gmail.com> wrote:
Hello all (and happy new year),
(Note: Also reported as https://bugzilla.redhat.com/show_bug.cgi?id=1880251)
Self hosted engine, single node, NFS. Attempted to install CentOS over an existing Fedora VM with one host device (USB printer). Reboot failed, trying to boot from a non-existent CDROM. Tried shutting the VM down, failed. Tried powering off the VM, failed. Dropped cluster to global maintenance, reboot host + engine (was planning to upgrade it anyhow...), VM still stuck.
When trying to power off the VM, the following message can be found the in engine.log: 2020-09-18 07:58:51,439+03 INFO [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Running command: StopVmCommand internal: false. Entities affected : ID: b411e573-bcda-4689-b61f-1811c6f03ad5 Type: VMAction group STOP_VM with role type USER 2020-09-18 07:58:51,441+03 WARN [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Strange, according to the status 'RebootInProgress' virtual machine 'b411e573-bcda-4689-b61f-1811c6f03ad5' should be running in a host but it isn't. 2020-09-18 07:58:51,594+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM kids-home-srv (Host: <UNKNOWN>, User: gilboa@internal-authz).
My question is simple: Pending a solution to the bug, can I somehow drop the state of the VM? It's currently holding a sizable disk image and a USB device I need (printer).
It would be best to modify the VM as if it should still be running on the host and let the system discover that it's not running there and update the VM accordingly.
You can do it by changing the database with: update vm_dynamic set run_on_vds='82f92946-9130-4dbd-8663-1ac0b50668a1' where vm_guid='b411e573-bcda-4689-b61f-1811c6f03ad5';
As it's my private VM cluster, I have no problem dropping the site completely for maintenance.
Thanks,
Gilboa
Hello, Thanks for the prompt answer. Edward, Full reboot of both engine and host didn't help. Most likely there's a consistency problem in the oVirt DB. Arik, To which DB I should connect and as which user? E.g. psql -U user db_name Thanks again, - Gilboa

On Sun, Sep 20, 2020 at 11:21 AM Gilboa Davara <gilboad@gmail.com> wrote:
On Sat, Sep 19, 2020 at 7:44 PM Arik Hadas <ahadas@redhat.com> wrote:
On Fri, Sep 18, 2020 at 8:27 AM Gilboa Davara <gilboad@gmail.com> wrote:
Hello all (and happy new year),
(Note: Also reported as
Self hosted engine, single node, NFS. Attempted to install CentOS over an existing Fedora VM with one host device (USB printer). Reboot failed, trying to boot from a non-existent CDROM. Tried shutting the VM down, failed. Tried powering off the VM, failed. Dropped cluster to global maintenance, reboot host + engine (was planning to upgrade it anyhow...), VM still stuck.
When trying to power off the VM, the following message can be found the in engine.log: 2020-09-18 07:58:51,439+03 INFO [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Running command: StopVmCommand internal: false. Entities affected : ID: b411e573-bcda-4689-b61f-1811c6f03ad5 Type: VMAction group STOP_VM with role type USER 2020-09-18 07:58:51,441+03 WARN [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Strange, according to the status 'RebootInProgress' virtual machine 'b411e573-bcda-4689-b61f-1811c6f03ad5' should be running in a host but it isn't. 2020-09-18 07:58:51,594+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM kids-home-srv (Host: <UNKNOWN>, User: gilboa@internal-authz).
My question is simple: Pending a solution to the bug, can I somehow drop the state of the VM? It's currently holding a sizable disk image and a USB device I need (printer).
It would be best to modify the VM as if it should still be running on
https://bugzilla.redhat.com/show_bug.cgi?id=1880251) the host and let the system discover that it's not running there and update the VM accordingly.
You can do it by changing the database with: update vm_dynamic set run_on_vds='82f92946-9130-4dbd-8663-1ac0b50668a1'
where vm_guid='b411e573-bcda-4689-b61f-1811c6f03ad5';
As it's my private VM cluster, I have no problem dropping the site completely for maintenance.
Thanks,
Gilboa
Hello,
Thanks for the prompt answer.
Edward,
Full reboot of both engine and host didn't help. Most likely there's a consistency problem in the oVirt DB.
Arik,
To which DB I should connect and as which user? E.g. psql -U user db_name
To the 'engine' database. I usually connect to it by switching to the 'postgres' user as Strahil described.
Thanks again, - Gilboa

Arik / Strahil, Many thanks! Just in-case anyone else is hitting the same issue (*NOTE* Host and VM ID _will_ be different!) 0. Ran a backup: 1. Connect to the hosted-engine and DB: $ ssh root@vmengine $ su - postgres $ psql engine 2. Execute a select query to verify that the VM's run_on_vds is NULL: # select * from vm_dynamic where vm_guid='b411e573-bcda-4689-b61f-1811c6f03ad5'; 3. Execute Arik's update query: # update vm_dynamic set run_on_vds='82f92946-9130-4dbd-8663-1ac0b50668a1' where vm_guid='b411e573-bcda-4689-b61f-1811c6f03ad5'; 4. Re-started the engine: $ systemctl restart ovirt-engine 5. Everything seems fine now. Profit! Thanks again, Gilboa On Mon, Sep 21, 2020 at 4:28 PM Arik Hadas <ahadas@redhat.com> wrote:
On Sun, Sep 20, 2020 at 11:21 AM Gilboa Davara <gilboad@gmail.com> wrote:
On Sat, Sep 19, 2020 at 7:44 PM Arik Hadas <ahadas@redhat.com> wrote:
On Fri, Sep 18, 2020 at 8:27 AM Gilboa Davara <gilboad@gmail.com> wrote:
Hello all (and happy new year),
(Note: Also reported as https://bugzilla.redhat.com/show_bug.cgi?id=1880251)
Self hosted engine, single node, NFS. Attempted to install CentOS over an existing Fedora VM with one host device (USB printer). Reboot failed, trying to boot from a non-existent CDROM. Tried shutting the VM down, failed. Tried powering off the VM, failed. Dropped cluster to global maintenance, reboot host + engine (was planning to upgrade it anyhow...), VM still stuck.
When trying to power off the VM, the following message can be found the in engine.log: 2020-09-18 07:58:51,439+03 INFO [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Running command: StopVmCommand internal: false. Entities affected : ID: b411e573-bcda-4689-b61f-1811c6f03ad5 Type: VMAction group STOP_VM with role type USER 2020-09-18 07:58:51,441+03 WARN [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Strange, according to the status 'RebootInProgress' virtual machine 'b411e573-bcda-4689-b61f-1811c6f03ad5' should be running in a host but it isn't. 2020-09-18 07:58:51,594+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-42) [7bc4ac71-f0b2-4af7-b081-100dc99b6123] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM kids-home-srv (Host: <UNKNOWN>, User: gilboa@internal-authz).
My question is simple: Pending a solution to the bug, can I somehow drop the state of the VM? It's currently holding a sizable disk image and a USB device I need (printer).
It would be best to modify the VM as if it should still be running on the host and let the system discover that it's not running there and update the VM accordingly.
You can do it by changing the database with: update vm_dynamic set run_on_vds='82f92946-9130-4dbd-8663-1ac0b50668a1' where vm_guid='b411e573-bcda-4689-b61f-1811c6f03ad5';
As it's my private VM cluster, I have no problem dropping the site completely for maintenance.
Thanks,
Gilboa
Hello,
Thanks for the prompt answer.
Edward,
Full reboot of both engine and host didn't help. Most likely there's a consistency problem in the oVirt DB.
Arik,
To which DB I should connect and as which user? E.g. psql -U user db_name
To the 'engine' database. I usually connect to it by switching to the 'postgres' user as Strahil described.
Thanks again, - Gilboa
participants (4)
-
Arik Hadas
-
Edward Berger
-
Gilboa Davara
-
Strahil Nikolov