broker.log not rotating
by Anton Louw
Hi All,
I had a space alert on one of my nodes this morning, and when I looked around, I saw that var/log/ovirt-hosted-engine-ha/broker.log was sitting at around 30GB. Does anybody know if it is safe to delete the log file? Or is there another process that I should follow?
I had a look at my other nodes, and the broker.log file does not exceed 4GB.
Thank you
Anton Louw
Cloud Engineer: Storage and Virtualization
______________________________________
D: 087 805 1572 | M: N/A
A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
anton.louw(a)voxtelecom.co.za
www.vox.co.za
3 years, 4 months
VMs shutdown mysteriously
by Bobby
Hello,
All 4 VMs on one of my oVirt cluster node shutdown for an unknown reason
almost simultaneously.
Please help me to find the root cause.
Thanks.
Please note the host seems doing fine and never crash or hangs and I can
migrate VMs back to it later.
Here is the exact timeline of all the related events combined from the host
and the VM(s):
On oVirt host:
/var/log/vdsm/vdsm.log:
2020-06-25 15:25:16,944-0500 WARN (qgapoller/3)
[virt.periodic.VmDispatcher] could not run <function <lambda> at
0x7f4ed2f9f5f0> on ['e0257b06-28fd-4d41-83a9-adf1904d3622'] (periodic:289)
2020-06-25 15:25:19,203-0500 WARN (libvirt/events) [root] File:
/var/lib/libvirt/qemu/channels/e0257b06-28fd-4d41-83a9-adf1904d3622.ovirt-guest-agent.0
already removed (fileutils:54)
2020-06-25 15:25:19,203-0500 WARN (libvirt/events) [root] File:
/var/lib/libvirt/qemu/channels/e0257b06-28fd-4d41-83a9-adf1904d3622.org.qemu.guest_agent.0
already removed (fileutils:54)
[root@athos log]# journalctl -u NetworkManager --since=today
-- Logs begin at Wed 2020-05-20 22:07:33 CDT, end at Thu 2020-06-25
16:36:05 CDT. --
Jun 25 15:25:18 athos NetworkManager[1600]: <info> [1593116718.1136]
device (vnet0): state change: disconnected -> unmanaged (reason
'unmanaged', sys-iface-state: 'removed')
Jun 25 15:25:18 athos NetworkManager[1600]: <info> [1593116718.1146]
device (vnet0): released from master device SRV-VL
/var/log/messages:
Jun 25 15:25:18 athos kernel: SRV-VL: port 2(vnet0) entered disabled state
Jun 25 15:25:18 athos NetworkManager[1600]: <info> [1593116718.1136]
device (vnet0): state change: disconnected -> unmanaged (reason
'unmanaged', sys-iface-state: 'removed')
Jun 25 15:25:18 athos NetworkManager[1600]: <info> [1593116718.1146]
device (vnet0): released from master device SRV-VL
Jun 25 15:25:18 athos libvirtd: 2020-06-25 20:25:18.122+0000: 2713: error :
qemuMonitorIO:718 : internal error: End of file from qemu monitor
/var/log/libvirt/qemu/aries.log:
2020-06-25T20:25:28.353975Z qemu-kvm: terminating on signal 15 from pid
2713 (/usr/sbin/libvirtd)
2020-06-25 20:25:28.584+0000: shutting down, reason=shutdown
=============================================================================================
On the first VM effected (same thing on others):
/var/log/ovirt-guest-agent/ovirt-guest-agent.log:
MainThread::INFO::2020-06-25
15:25:20,270::ovirt-guest-agent::104::root::Stopping oVirt guest agent
CredServer::INFO::2020-06-25
15:25:20,626::CredServer::262::root::CredServer has stopped.
MainThread::INFO::2020-06-25
15:25:21,150::ovirt-guest-agent::78::root::oVirt guest agent is down.
=============================================================================================
Packages version installated:
Host OS version: CentOS 7.7.1908:
ovirt-hosted-engine-ha-2.3.5-1.el7.noarch
ovirt-provider-ovn-driver-1.2.22-1.el7.noarch
ovirt-release43-4.3.6-1.el7.noarch
ovirt-imageio-daemon-1.5.2-0.el7.noarch
ovirt-vmconsole-1.0.7-2.el7.noarch
ovirt-imageio-common-1.5.2-0.el7.x86_64
ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch
ovirt-vmconsole-host-1.0.7-2.el7.noarch
ovirt-host-4.3.4-1.el7.x86_64
libvirt-4.5.0-23.el7_7.1.x86_64
libvirt-daemon-4.5.0-23.el7_7.1.x86_6
qemu-kvm-ev-2.12.0-33.1.el7.x86_64
qemu-kvm-common-ev-2.12.0-33.1.el7.x86_64
On guest VM:
ovirt-guest-agent-1.0.13-1.el6.noarch
qemu-guest-agent-0.12.1.2-2.491.el6_8.3.x86_64
3 years, 4 months
Re: Some nodes periodically display as Non Responsive
by Martin Perina
Hi Anton,
to diagnose the issue we would need to have logs from both engine and
affected host.
Regards,
Martin
On Wed, Jul 1, 2020 at 6:51 AM Anton Louw via Users <users(a)ovirt.org> wrote:
>
>
> Hi Everybody,
>
>
>
> I am got some strange things happening. I have got two data centers, DC1
> and DC2, in DC1, some of my nodes (Not all the time and not all the nodes)
> go into a “not responding” state. I can still ping the hosts, and I can
> still access the VMs on the hosts. My Engine sits in DC2, and this does not
> happen to any of the hosts in DC2.
>
>
>
> It seems like the Engine loses connectivity to the hosts in DC1, and then
> cannot re-establish the connection.
>
>
>
> Is there anywhere I can check to get more insight into what is actually
> happening?
>
>
>
> Thanks
>
>
>
> *Anton Louw*
> *Cloud Engineer: Storage and Virtualization* at *Vox*
> ------------------------------
> *T:* 087 805 0000 | *D:* 087 805 1572
> *M:* N/A
> *E:* anton.louw(a)voxtelecom.co.za
> *A:* Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
> www.vox.co.za
>
> [image: F] <https://www.facebook.com/voxtelecomZA>
> [image: T] <https://www.twitter.com/voxtelecom>
> [image: I] <https://www.instagram.com/voxtelecomza/>
> [image: L] <https://www.linkedin.com/company/voxtelecom>
> [image: Y] <https://www.youtube.com/user/VoxTelecom>
>
> [image: #VoxBrand]
> <https://www.vox.co.za/fibre/fibre-to-the-home/?prod=HOME>
> *Disclaimer*
>
> The contents of this email are confidential to the sender and the intended
> recipient. Unless the contents are clearly and entirely of a personal
> nature, they are subject to copyright in favour of the holding company of
> the Vox group of companies. Any recipient who receives this email in error
> should immediately report the error to the sender and permanently delete
> this email from all storage devices.
>
> This email has been scanned for viruses and malware, and may have been
> automatically archived by *Mimecast Ltd*, an innovator in Software as a
> Service (SaaS) for business. Providing a *safer* and *more useful* place
> for your human generated data. Specializing in; Security, archiving and
> compliance. To find out more Click Here
> <https://www.voxtelecom.co.za/security/mimecast/?prod=Enterprise>.
>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TRMWV4Q6AFH...
>
--
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.
3 years, 5 months
Some nodes periodically display as Non Responsive
by Anton Louw
Hi Everybody,
I am got some strange things happening. I have got two data centers, DC1 and DC2, in DC1, some of my nodes (Not all the time and not all the nodes) go into a "not responding" state. I can still ping the hosts, and I can still access the VMs on the hosts. My Engine sits in DC2, and this does not happen to any of the hosts in DC2.
It seems like the Engine loses connectivity to the hosts in DC1, and then cannot re-establish the connection.
Is there anywhere I can check to get more insight into what is actually happening?
Thanks
Anton Louw
Cloud Engineer: Storage and Virtualization
______________________________________
D: 087 805 1572 | M: N/A
A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
anton.louw(a)voxtelecom.co.za
www.vox.co.za
3 years, 5 months