
I have a dual host, self hosted engine, cluster/datacenter. I call these Node1 and Node2. They are running CentOS 7 and I have been running the updates regular by putting each host into maintenance mode and installing the update via the web gui. Node1 is restarting at odd times, causing VMs on that node to start up on Node2 (which is by design). However the restarts are becoming more frequent and I am having a hard time figuring out what is causing it. Below is a snippet from the /var/log/ovirt-hosted-engine-ha/agent.log on Node1. My current plan is to simply rebuild Node1 by re-installing CentOS 7, and using the cluster tools to import it. MainThread::INFO::2018-09-10 10:04:25,907::hosted_engine::244::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: node1.***.com MainThread::INFO::2018-09-10 10:04:34,200::hosted_engine::522::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2018-09-10 10:04:34,202::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '192.168.4.1'} MainThread::ERROR::2018-09-10 10:04:34,203::hosted_engine::538::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2018-09-10 10:04:34,203::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 413, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 535, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 83, in start_monitor .format(type, options, e)) RequestError: Failed to start monitor ping, options {'addr': '192.168.4.1'}: [Errno 2] No such file or directory MainThread::ERROR::2018-09-10 10:04:34,204::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::INFO::2018-09-10 10:04:34,204::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down MainThread::INFO::2018-09-10 10:04:44,545::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.2.16 started MainThread::INFO::2018-09-10 10:04:44,583::hosted_engine::244::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Found certificate common name: node1.***.com MainThread::INFO::2018-09-10 10:04:44,744::hosted_engine::522::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Initializing ha-broker connection MainThread::INFO::2018-09-10 10:04:44,745::brokerlink::77::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor ping, options {'addr': '192.168.4.1'} MainThread::ERROR::2018-09-10 10:04:44,746::hosted_engine::538::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2018-09-10 10:04:44,746::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 413, in start_monitoring self._initialize_broker() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 535, in _initialize_broker m.get('options', {})) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 83, in start_monitor .format(type, options, e)) RequestError: Failed to start monitor ping, options {'addr': '192.168.4.1'}: [Errno 2] No such file or directory

We originally noticed an issue when the host would restart at midnight on Saturdays EST time. The other day it restarted at 10AM EST which was a surprise. The system is under low load and no hardware issues were found in iDRAC nor in kernel messages.

Can you please attach also /var/log/sanlock.log and /var/log/messages for the relevant time frame? On Wed, Sep 12, 2018 at 1:59 PM <turboaaa@gmail.com> wrote:
We originally noticed an issue when the host would restart at midnight on Saturdays EST time. The other day it restarted at 10AM EST which was a surprise.
The system is under low load and no hardware issues were found in iDRAC nor in kernel messages. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/37CJPQ6G3V54K2...

This is a stupid question, but how do I attach a file to this post? I am using the web form, as soon as I figure out this mailing list thing I will attach full logs (NOOOOB!!!!) Regardless, there are no logs before the reboot of Node1 in sanlock.log on that day. Node1 started coming back online at 10:04AM. 2018-09-08 09:19:38 3081 [1127]: s1 host 2 16 249 f9704a0b-9a09-42f4-819c-35055c1177af.node2.<host> 2018-09-10 10:03:54 5 [1154]: sanlock daemon started 3.6.0 host edfe4d27-bdbe-45e3-9e4c-a4e02b2101bd.node1.<host> Messages has more information, and when I figure out how to attach files I will post it. Here are the entries around the reboot time. The node rebooted between 09:58:52 and 10:00:01 Sep 10 09:58:52 node1 libvirtd: 2018-09-10 13:58:52.405+0000: 2513: error : qemuDomainAgentAvailable:6946 : Guest agent is not responding: QEMU guest agent is not connected Sep 10 09:58:52 node1 libvirtd: 2018-09-10 13:58:52.405+0000: 2514: error : qemuDomainAgentAvailable:6946 : Guest agent is not responding: QEMU guest agent is not connected Sep 10 10:00:01 node1 systemd: Started Session 353 of user root. Sep 10 10:00:01 node1 systemd: Starting Session 353 of user root. Sep 10 10:03:51 node1 kernel: microcode: microcode updated early to revision 0x42d, date = 2018-04-25 Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpuset Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpu Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpuacct Sep 10 10:03:51 node1 kernel: Linux version 3.10.0-862.11.6.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Tue Aug 14 21:49:04 UTC 2018 Sep 10 10:03:51 node1 kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 Sep 10 10:03:51 node1 kernel: e820: BIOS-provided physical RAM map:

On Thu, Sep 13, 2018 at 12:07 AM <turboaaa@gmail.com> wrote:
This is a stupid question, but how do I attach a file to this post? I am using the web form, as soon as I figure out this mailing list thing I will attach full logs (NOOOOB!!!!)
Sorry, which web form? This is a mailing list: users@ovirt.org
Regardless, there are no logs before the reboot of Node1 in sanlock.log on that day. Node1 started coming back online at 10:04AM.
2018-09-08 09:19:38 3081 [1127]: s1 host 2 16 249 f9704a0b-9a09-42f4-819c-35055c1177af.node2.<host> 2018-09-10 10:03:54 5 [1154]: sanlock daemon started 3.6.0 host edfe4d27-bdbe-45e3-9e4c-a4e02b2101bd.node1.<host>
Messages has more information, and when I figure out how to attach files I will post it. Here are the entries around the reboot time. The node rebooted between 09:58:52 and 10:00:01
Sep 10 09:58:52 node1 libvirtd: 2018-09-10 13:58:52.405+0000: 2513: error : qemuDomainAgentAvailable:6946 : Guest agent is not responding: QEMU guest agent is not connected Sep 10 09:58:52 node1 libvirtd: 2018-09-10 13:58:52.405+0000: 2514: error : qemuDomainAgentAvailable:6946 : Guest agent is not responding: QEMU guest agent is not connected Sep 10 10:00:01 node1 systemd: Started Session 353 of user root. Sep 10 10:00:01 node1 systemd: Starting Session 353 of user root. Sep 10 10:03:51 node1 kernel: microcode: microcode updated early to revision 0x42d, date = 2018-04-25 Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpuset Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpu Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpuacct Sep 10 10:03:51 node1 kernel: Linux version 3.10.0-862.11.6.el7.x86_64 ( builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Tue Aug 14 21:49:04 UTC 2018 Sep 10 10:03:51 node1 kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 Sep 10 10:03:51 node1 kernel: e820: BIOS-provided physical RAM map: _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/I7I26QE6OC4QMH...

I have never participated in a mailing list. In this case I have been using the following web form. https://lists.ovirt.org/archives/list/users@ovirt.org/thread/AXONQYNLZHO7JBI... But for this reply I am simply replying to the email in my inbox, will see if it shows up. I am used to forums where I get an email notification, but need to log in to add content. On Thu, Sep 13, 2018 at 4:57 AM Simone Tiraboschi <stirabos@redhat.com> wrote:
On Thu, Sep 13, 2018 at 12:07 AM <turboaaa@gmail.com> wrote:
This is a stupid question, but how do I attach a file to this post? I am using the web form, as soon as I figure out this mailing list thing I will attach full logs (NOOOOB!!!!)
Sorry, which web form? This is a mailing list: users@ovirt.org
Regardless, there are no logs before the reboot of Node1 in sanlock.log on that day. Node1 started coming back online at 10:04AM.
2018-09-08 09:19:38 3081 [1127]: s1 host 2 16 249 f9704a0b-9a09-42f4-819c-35055c1177af.node2.<host> 2018-09-10 10:03:54 5 [1154]: sanlock daemon started 3.6.0 host edfe4d27-bdbe-45e3-9e4c-a4e02b2101bd.node1.<host>
Messages has more information, and when I figure out how to attach files I will post it. Here are the entries around the reboot time. The node rebooted between 09:58:52 and 10:00:01
Sep 10 09:58:52 node1 libvirtd: 2018-09-10 13:58:52.405+0000: 2513: error : qemuDomainAgentAvailable:6946 : Guest agent is not responding: QEMU guest agent is not connected Sep 10 09:58:52 node1 libvirtd: 2018-09-10 13:58:52.405+0000: 2514: error : qemuDomainAgentAvailable:6946 : Guest agent is not responding: QEMU guest agent is not connected Sep 10 10:00:01 node1 systemd: Started Session 353 of user root. Sep 10 10:00:01 node1 systemd: Starting Session 353 of user root. Sep 10 10:03:51 node1 kernel: microcode: microcode updated early to revision 0x42d, date = 2018-04-25 Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpuset Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpu Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpuacct Sep 10 10:03:51 node1 kernel: Linux version 3.10.0-862.11.6.el7.x86_64 ( builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Tue Aug 14 21:49:04 UTC 2018 Sep 10 10:03:51 node1 kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 Sep 10 10:03:51 node1 kernel: e820: BIOS-provided physical RAM map:
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/I7I26QE6OC4QMH...
--
- Michael Mast

My last post worked, good to know. As an FYI, on the ovirt.org website there is a "Forum" link at the top right hand side that takes me to the mailing list. With the web form I figured it was more or less just a standard forum. But now I know better lol Please see attached for log files. On Thu, Sep 13, 2018 at 7:41 AM Michael Mast <turboaaa@gmail.com> wrote:
I have never participated in a mailing list. In this case I have been using the following web form.
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/AXONQYNLZHO7JBI... But for this reply I am simply replying to the email in my inbox, will see if it shows up. I am used to forums where I get an email notification, but need to log in to add content.
On Thu, Sep 13, 2018 at 4:57 AM Simone Tiraboschi <stirabos@redhat.com> wrote:
On Thu, Sep 13, 2018 at 12:07 AM <turboaaa@gmail.com> wrote:
This is a stupid question, but how do I attach a file to this post? I am using the web form, as soon as I figure out this mailing list thing I will attach full logs (NOOOOB!!!!)
Sorry, which web form? This is a mailing list: users@ovirt.org
Regardless, there are no logs before the reboot of Node1 in sanlock.log on that day. Node1 started coming back online at 10:04AM.
2018-09-08 09:19:38 3081 [1127]: s1 host 2 16 249 f9704a0b-9a09-42f4-819c-35055c1177af.node2.<host> 2018-09-10 10:03:54 5 [1154]: sanlock daemon started 3.6.0 host edfe4d27-bdbe-45e3-9e4c-a4e02b2101bd.node1.<host>
Messages has more information, and when I figure out how to attach files I will post it. Here are the entries around the reboot time. The node rebooted between 09:58:52 and 10:00:01
Sep 10 09:58:52 node1 libvirtd: 2018-09-10 13:58:52.405+0000: 2513: error : qemuDomainAgentAvailable:6946 : Guest agent is not responding: QEMU guest agent is not connected Sep 10 09:58:52 node1 libvirtd: 2018-09-10 13:58:52.405+0000: 2514: error : qemuDomainAgentAvailable:6946 : Guest agent is not responding: QEMU guest agent is not connected Sep 10 10:00:01 node1 systemd: Started Session 353 of user root. Sep 10 10:00:01 node1 systemd: Starting Session 353 of user root. Sep 10 10:03:51 node1 kernel: microcode: microcode updated early to revision 0x42d, date = 2018-04-25 Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpuset Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpu Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpuacct Sep 10 10:03:51 node1 kernel: Linux version 3.10.0-862.11.6.el7.x86_64 ( builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Tue Aug 14 21:49:04 UTC 2018 Sep 10 10:03:51 node1 kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 Sep 10 10:03:51 node1 kernel: e820: BIOS-provided physical RAM map:
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/I7I26QE6OC4QMH...
--
- Michael Mast
-- - Michael Mast

Anything strange in sanlock logs before the reboot? On Thu, Sep 13, 2018 at 2:16 PM Michael Mast <turboaaa@gmail.com> wrote:
My last post worked, good to know. As an FYI, on the ovirt.org website there is a "Forum" link at the top right hand side that takes me to the mailing list. With the web form I figured it was more or less just a standard forum. But now I know better lol
Please see attached for log files.
On Thu, Sep 13, 2018 at 7:41 AM Michael Mast <turboaaa@gmail.com> wrote:
I have never participated in a mailing list. In this case I have been using the following web form.
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/AXONQYNLZHO7JBI... But for this reply I am simply replying to the email in my inbox, will see if it shows up. I am used to forums where I get an email notification, but need to log in to add content.
On Thu, Sep 13, 2018 at 4:57 AM Simone Tiraboschi <stirabos@redhat.com> wrote:
On Thu, Sep 13, 2018 at 12:07 AM <turboaaa@gmail.com> wrote:
This is a stupid question, but how do I attach a file to this post? I am using the web form, as soon as I figure out this mailing list thing I will attach full logs (NOOOOB!!!!)
Sorry, which web form? This is a mailing list: users@ovirt.org
Regardless, there are no logs before the reboot of Node1 in sanlock.log on that day. Node1 started coming back online at 10:04AM.
2018-09-08 09:19:38 3081 [1127]: s1 host 2 16 249 f9704a0b-9a09-42f4-819c-35055c1177af.node2.<host> 2018-09-10 10:03:54 5 [1154]: sanlock daemon started 3.6.0 host edfe4d27-bdbe-45e3-9e4c-a4e02b2101bd.node1.<host>
Messages has more information, and when I figure out how to attach files I will post it. Here are the entries around the reboot time. The node rebooted between 09:58:52 and 10:00:01
Sep 10 09:58:52 node1 libvirtd: 2018-09-10 13:58:52.405+0000: 2513: error : qemuDomainAgentAvailable:6946 : Guest agent is not responding: QEMU guest agent is not connected Sep 10 09:58:52 node1 libvirtd: 2018-09-10 13:58:52.405+0000: 2514: error : qemuDomainAgentAvailable:6946 : Guest agent is not responding: QEMU guest agent is not connected Sep 10 10:00:01 node1 systemd: Started Session 353 of user root. Sep 10 10:00:01 node1 systemd: Starting Session 353 of user root. Sep 10 10:03:51 node1 kernel: microcode: microcode updated early to revision 0x42d, date = 2018-04-25 Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpuset Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpu Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpuacct Sep 10 10:03:51 node1 kernel: Linux version 3.10.0-862.11.6.el7.x86_64 ( builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Tue Aug 14 21:49:04 UTC 2018 Sep 10 10:03:51 node1 kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 Sep 10 10:03:51 node1 kernel: e820: BIOS-provided physical RAM map:
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/I7I26QE6OC4QMH...
--
- Michael Mast
-- - Michael Mast

Anything strange in sanlock logs before the reboot?
On Thu, Sep 13, 2018 at 2:16 PM Michael Mast <turboaaa@gmail.com> wrote:
My last post worked, good to know. As an FYI, on the ovirt.org website there is a "Forum" link at the top right hand side that takes me to the mailing list. With the web form I figured it was more or less just a standard forum. But now I know better lol
Please see attached for log files.
On Thu, Sep 13, 2018 at 7:41 AM Michael Mast <turboaaa@gmail.com> wrote:
I have never participated in a mailing list. In this case I have been using the following web form.
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/AXONQYNLZHO7JBI... But for this reply I am simply replying to the email in my inbox, will see if it shows up. I am used to forums where I get an email notification, but need to log in to add content.
On Thu, Sep 13, 2018 at 4:57 AM Simone Tiraboschi <stirabos@redhat.com> wrote:
On Thu, Sep 13, 2018 at 12:07 AM <turboaaa@gmail.com> wrote:
This is a stupid question, but how do I attach a file to this post? I am using the web form, as soon as I figure out this mailing list thing I will attach full logs (NOOOOB!!!!)
Sorry, which web form? This is a mailing list: users@ovirt.org
Regardless, there are no logs before the reboot of Node1 in sanlock.log on that day. Node1 started coming back online at 10:04AM.
2018-09-08 09:19:38 3081 [1127]: s1 host 2 16 249 f9704a0b-9a09-42f4-819c-35055c1177af.node2.<host> 2018-09-10 10:03:54 5 [1154]: sanlock daemon started 3.6.0 host edfe4d27-bdbe-45e3-9e4c-a4e02b2101bd.node1.<host>
Messages has more information, and when I figure out how to attach files I will post it. Here are the entries around the reboot time. The node rebooted between 09:58:52 and 10:00:01
Sep 10 09:58:52 node1 libvirtd: 2018-09-10 13:58:52.405+0000: 2513: error : qemuDomainAgentAvailable:6946 : Guest agent is not responding: QEMU guest agent is not connected Sep 10 09:58:52 node1 libvirtd: 2018-09-10 13:58:52.405+0000: 2514: error : qemuDomainAgentAvailable:6946 : Guest agent is not responding: QEMU guest agent is not connected Sep 10 10:00:01 node1 systemd: Started Session 353 of user root. Sep 10 10:00:01 node1 systemd: Starting Session 353 of user root. Sep 10 10:03:51 node1 kernel: microcode: microcode updated early to revision 0x42d, date = 2018-04-25 Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpuset Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpu Sep 10 10:03:51 node1 kernel: Initializing cgroup subsys cpuacct Sep 10 10:03:51 node1 kernel: Linux version 3.10.0-862.11.6.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Tue Aug 14 21:49:04 UTC 2018 Sep 10 10:03:51 node1 kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 Sep 10 10:03:51 node1 kernel: e820: BIOS-provided physical RAM map:
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/I7I26QE6OC4QMH...
--
- Michael Mast
-- - Michael Mast
--
No. The log file literally jumps from the 8th to the 10th. The first entry on the 10th was when the host booted back up. On Thu, Sep 13, 2018, 10:14 Simone Tiraboschi <stirabos@redhat.com> wrote: - Michael Mast

We had it happen again, but this time on the other host. This started happening not long after I tried to update the engine, and I think I screwed that up. The instructions I was working off of was outdated and a lot of the links on the ovirt site are broken. My theory is that the engines were not "synced" since a copy runs on each host. This would cause the engine on one host to come online and think the other is not responding, at this point it forces the other host offline using iDRAC (This is in the logs how it reboots the other hsot) and forces all VMs over. So what I did was remove one host from the cluster, re-install CentOS7, and re-add it to the cluster. This way the engine config would be copied from the primary host and keep them synced. If this does not work the next step is to convert the hosted engine into a dedicated engine. Are there any good guides for making this conversion? -- - Michael Mast

Use the backup and restore procedure to move your engine maybe On Thu, Sep 27, 2018 at 4:08 PM Michael Mast <turboaaa@gmail.com> wrote:
We had it happen again, but this time on the other host. This started happening not long after I tried to update the engine, and I think I screwed that up. The instructions I was working off of was outdated and a lot of the links on the ovirt site are broken. My theory is that the engines were not "synced" since a copy runs on each host. This would cause the engine on one host to come online and think the other is not responding, at this point it forces the other host offline using iDRAC (This is in the logs how it reboots the other hsot) and forces all VMs over.
So what I did was remove one host from the cluster, re-install CentOS7, and re-add it to the cluster. This way the engine config would be copied from the primary host and keep them synced. If this does not work the next step is to convert the hosted engine into a dedicated engine. Are there any good guides for making this conversion? -- - Michael Mast _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/2EDXBKVNRMFDYS...

Use the backup and restore procedure to move your engine maybe
On Thu, Sep 27, 2018 at 4:08 PM Michael Mast <turboaaa@gmail.com> wrote:
We had it happen again, but this time on the other host. This started happening not long after I tried to update the engine, and I think I screwed that up. The instructions I was working off of was outdated and a lot of the links on the ovirt site are broken. My theory is that the engines were not "synced" since a copy runs on each host. This would cause the engine on one host to come online and think the other is not responding, at this point it forces the other host offline using iDRAC (This is in the logs how it reboots the other hsot) and forces all VMs over.
So what I did was remove one host from the cluster, re-install CentOS7, and re-add it to the cluster. This way the engine config would be copied from the primary host and keep them synced. If this does not work the next step is to convert the hosted engine into a dedicated engine. Are there any good guides for making this conversion? -- - Michael Mast
_______________________________________________
Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/2EDXBKVNRMFDYS...
--
I figured as much, but wasn't sure if there would be any changes needed to the database so the engine will know it is no longer self hosted. On Thu, Sep 27, 2018, 20:28 Donny Davis <donny@fortnebula.com> wrote: - Michael Mast

I am sure there would be some challenges there because you cannot delete that vm, its protected... However the rest of your stuff should run just fine. I have never done it myself, most people head the opposite direction. On Thu, Sep 27, 2018 at 8:31 PM Michael Mast <turboaaa@gmail.com> wrote: > I figured as much, but wasn't sure if there would be any changes needed to > the database so the engine will know it is no longer self hosted. > > On Thu, Sep 27, 2018, 20:28 Donny Davis <donny@fortnebula.com> wrote: > >> Use the backup and restore procedure to move your engine maybe >> >> On Thu, Sep 27, 2018 at 4:08 PM Michael Mast <turboaaa@gmail.com> wrote: >> >>> We had it happen again, but this time on the other host. This started >>> happening not long after I tried to update the engine, and I think I >>> screwed that up. The instructions I was working off of was outdated and a >>> lot of the links on the ovirt site are broken. My theory is that the >>> engines were not "synced" since a copy runs on each host. This would cause >>> the engine on one host to come online and think the other is not >>> responding, at this point it forces the other host offline using iDRAC >>> (This is in the logs how it reboots the other hsot) and forces all VMs over. >>> >>> So what I did was remove one host from the cluster, re-install CentOS7, >>> and re-add it to the cluster. This way the engine config would be copied >>> from the primary host and keep them synced. If this does not work the next >>> step is to convert the hosted engine into a dedicated engine. Are there any >>> good guides for making this conversion? >>> -- >>> - >>> Michael Mast >>> >> _______________________________________________ >>> Users mailing list -- users@ovirt.org >>> To unsubscribe send an email to users-leave@ovirt.org >>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>> oVirt Code of Conduct: >>> https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: >>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/2EDXBKVNRMFDYSLP5CHUDMXVZ5NCLXFI/ >>> >> -- > - > Michael Mast >

Thanks. I will create a new thread for that, I'm getting off topic. What are the top reasons one host would think the other is offline? On Thu, Sep 27, 2018, 20:42 Donny Davis <donny@fortnebula.com> wrote: > I am sure there would be some challenges there because you cannot delete > that vm, its protected... However the rest of your stuff should run just > fine. I have never done it myself, most people head the opposite direction. > > On Thu, Sep 27, 2018 at 8:31 PM Michael Mast <turboaaa@gmail.com> wrote: > >> I figured as much, but wasn't sure if there would be any changes needed >> to the database so the engine will know it is no longer self hosted. >> >> On Thu, Sep 27, 2018, 20:28 Donny Davis <donny@fortnebula.com> wrote: >> >>> Use the backup and restore procedure to move your engine maybe >>> >>> On Thu, Sep 27, 2018 at 4:08 PM Michael Mast <turboaaa@gmail.com> wrote: >>> >>>> We had it happen again, but this time on the other host. This started >>>> happening not long after I tried to update the engine, and I think I >>>> screwed that up. The instructions I was working off of was outdated and a >>>> lot of the links on the ovirt site are broken. My theory is that the >>>> engines were not "synced" since a copy runs on each host. This would cause >>>> the engine on one host to come online and think the other is not >>>> responding, at this point it forces the other host offline using iDRAC >>>> (This is in the logs how it reboots the other hsot) and forces all VMs over. >>>> >>>> So what I did was remove one host from the cluster, re-install CentOS7, >>>> and re-add it to the cluster. This way the engine config would be copied >>>> from the primary host and keep them synced. If this does not work the next >>>> step is to convert the hosted engine into a dedicated engine. Are there any >>>> good guides for making this conversion? >>>> -- >>>> - >>>> Michael Mast >>>> >>> _______________________________________________ >>>> Users mailing list -- users@ovirt.org >>>> To unsubscribe send an email to users-leave@ovirt.org >>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>> oVirt Code of Conduct: >>>> https://www.ovirt.org/community/about/community-guidelines/ >>>> List Archives: >>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/2EDXBKVNRMFDYSLP5CHUDMXVZ5NCLXFI/ >>>> >>> -- >> - >> Michael Mast >> > -- - Michael Mast
participants (5)
-
Donny Davis
-
Michael Mast
-
p.staniforth@leedsbeckett.ac.uk
-
Simone Tiraboschi
-
turboaaa@gmail.com