
25.12.2018 10:14, Mike Lykov пишет:
1. Why (when it cannot boot due to corruption) it NOT show anything at all in console? I can get to grub menu (if moving fast enough), but if I continue boot I see a blinking cursor for many minutes and not more. Grub options not contain any splash/quiet parameters. (exclusion for EDD message - it is meaningless, if I use edd=off - I get only black console).
Where is a kernel boot logs/console output? Are it try to load initrd at least?
2. How to set some timeouts for ha-agent NOT to try restart HE after 1-2 unsuccessful pings and 10 seconds outage? For HE VM stability (not crash/broke fs) are more important instead availability (I can live with unavailable it for 10-15 sec, but cannot with broken VM).
3. I stop ha-agent, broker and HE VM on all (two) nodes. Fix a partition in VM. Then I start ha-agent on nodes, and it BROKE VM fs AGAIN! (trying to decide which VM are starting). I fix VM fs again, put a cluster in maintenance mode, start a VM on one node by hand, check it for status/health ok, and only then put ha-agent in work (none) mode. Easy way to broke the cluster by crash HE VM fs (by not put it to global maintenance mode).
--------------------------- Dec 21 12:32:56 ovirtnode6 kernel: bnx2x 0000:3b:00.0 enp59s0f0: NIC Link is Down Dec 21 12:32:56 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) entered disabled state Dec 21 12:33:13 ovirtnode6 kernel: bnx2x 0000:3b:00.0 enp59s0f0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit Dec 21 12:33:13 ovirtnode6 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp59s0f0: link becomes ready Dec 21 12:33:13 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) entered forwarding state Dec 21 12:33:13 ovirtnode6 NetworkManager[1715]: <info> [1545381193.2204] device (enp59s0f0): carrier: link connected -----------------------
There is 17 second. at 33:13 link are back. BUT all events lead to crash follow later:
HA agent log: ------------------------------ MainThread::INFO::2018-12-21 12:32:59,540::states::444::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm running on localhost MainThread::INFO::2018-12-21 12:32:59,662::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineUp (score: 3400) MainThread::INFO::2018-12-21 12:33:09,797::states::136::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Penalizing score by 1280 due to gateway status MainThread::INFO::2018-12-21 12:33:09,798::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineUp (score: 2120) MainThread::ERROR::2018-12-21 12:33:19,815::states::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Host ovirtnode1.miac (id 1) score is significantly better than local score, shutting down VM on this host ----------------------------------------------
syslog messages:
Dec 21 12:33:19 ovirtnode6 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Host ovirtnode1.miac (id 1) score is significantly better than local score, shutting down VM on this host Dec 21 12:33:29 ovirtnode6 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered disabled state Dec 21 12:33:37 ovirtnode6 kernel: device vnet1 left promiscuous mode Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered disabled state Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]: <info> [1545381217.1796] device (vnet1): state change: disconnected -> unmanaged (reason 'unmanaged', sys-iface-state: 'removed') Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]: <info> [1545381217.1798] device (vnet1): released from master device ovirtmgmt Dec 21 12:33:37 ovirtnode6 libvirtd: 2018-12-21 08:33:37.192+0000: 2783: **************error : qemuMonitorIO:719 : internal error: End of file from qemu monitor************* - WHAT IS THIS? Dec 21 12:33:37 ovirtnode6 kvm: 2 guests now active Dec 21 12:33:37 ovirtnode6 systemd-machined: Machine qemu-2-HostedEngine terminated. Dec 21 12:33:37 ovirtnode6 firewalld[1693]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -w -D libvirt-out -m physdev --physdev-is-bridged --physdev-out vnet1 -g FP-vnet1' failed: iptables v1.4.21: goto 'FP-vnet1' is not a chain#012#0 12Try `iptables -h' or 'iptables --help' for more information.
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered blocking state Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered disabled state Dec 21 12:33:55 ovirtnode6 kernel: device vnet1 entered promiscuous mode Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered blocking state Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered forwarding state Dec 21 12:33:55 ovirtnode6 lldpad: recvfrom(Event interface): No buffer space available Dec 21 12:33:55 ovirtnode6 NetworkManager[1715]: <info> [1545381235.8086] manager: (vnet1): new Tun device (/org/freedesktop/NetworkManager/Devices/37) Dec 21 12:33:55 ovirtnode6 NetworkManager[1715]: <info> [1545381235.8121] device (vnet1): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external') Dec 21 12:33:55 ovirtnode6 NetworkManager[1715]: <info> [1545381235.8127] device (vnet1): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'external') ---------------------------
*** WHAT IS THIS *** ? Link are ready some time ago, why this bridge status transitions and iptables errors are happen?
and this machine try to start again: ----------------- Dec 21 12:33:56 ovirtnode6 systemd-machined: New machine qemu-15-HostedEngine. Dec 21 12:33:56 ovirtnode6 systemd: Started Virtual Machine qemu-15-HostedEngine. -------------------
HA agent log on this host (ovirtnode6):
----------------------------------------- MainThread::INFO::2018-12-21 12:33:49,880::states::510::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3400), attempting to start engine VM MainThread::INFO::2018-12-21 12:33:57,884::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-12-21 12:34:04,898::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up.. ..... MainThread::INFO::2018-12-21 12:36:24,800::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) VM is powering up.. MainThread::INFO::2018-12-21 12:36:24,921::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) ---------------------------
HA Agent log on ovirtnode1 (spare HE host where VM trying to start):
---------------------- MainThread::INFO::2018-12-21 12:33:52,984::states::510::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3400), attempting to start engine VM
MainThread::INFO::2018-12-21 12:33:56,787::hosted_engine::947::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Engine VM started on localhost MainThread::INFO::2018-12-21 12:33:59,923::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineStart-EngineStarting) sent? sent MainThread::INFO::2018-12-21 12:33:59,936::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStarting (score: 3400) MainThread::INFO::2018-12-21 12:34:06,950::states::783::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Another host already took over.. -----------------
WHAT IS THIS? what in means about "took over", what process it do in this case? ------------------------------------ MainThread::INFO::2018-12-21 12:34:10,240::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineForceStop (score: 3400) MainThread::INFO::2018-12-21 12:34:10,246::hosted_engine::1006::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm) Shutting down vm using `/usr/sbin/hosted-engine --vm-poweroff` MainThread::INFO::2018-12-21 12:34:10,797::hosted_engine::1011::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm) stdout: MainThread::INFO::2018-12-21 12:34:10,797::hosted_engine::1012::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm) stderr: MainThread::ERROR::2018-12-21 12:34:10,797::hosted_engine::1020::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm) Engine VM stopped on localhost -------------------------------------
I mount HE VM partitions with log, here is a syslog messages:
WHAT IS THIS guest-shutdown? why? no network problems at this time
Dec 21 12:28:24 ovirtengine ovsdb-server: ovs|36386|reconnect|WARN|ssl:[::ffff:127.0.0.1]:58834: connection dropped (Protocol error) Dec 21 12:28:24 ovirtengine python: ::ffff:172.16.10.101 - - [21/Dec/2018 12:28:24] "GET /v2.0/networks HTTP/1.1" 200 - Dec 21 12:30:44 ovirtengine ovsdb-server: ovs|00111|reconnect|ERR|ssl:[::ffff:172.16.10.5]:42032: no response to inactivity probe after 5 seconds, disconnecting Dec 21 12:30:44 ovirtengine ovsdb-server: ovs|00112|reconnect|ERR|ssl:[::ffff:172.16.10.1]:45624: no response to inactivity probe after 5 seconds, disconnecting Dec 21 12:31:07 ovirtengine qemu-ga: info: guest-shutdown called, mode: powerdown
Dec 21 12:33:58 ovirtengine kernel: Linux version 3.10.0-862.14.4.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Wed Sep 26 15:12:11 UTC 2018 Dec 21 12:33:58 ovirtengine kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-862.14.4.el7.x86_64 root=UUID=091e7022-295b-4b3f-96ad-4a7d90a2a9b0 ro crashkernel=auto rd.lvm.lv=ovirt/swap console=ttyS0 LANG=en_US.UTF-8 ..... Dec 21 12:33:59 ovirtengine kernel: XFS (vda3): Ending clean mount Dec 21 12:34:01 ovirtengine systemd: Mounted /sysroot. .... Dec 21 12:34:06 ovirtengine lvm: 6 logical volume(s) in volume group "ovirt" now active Dec 21 12:34:06 ovirtengine systemd: Started LVM2 PV scan on device 252:2. Dec 21 12:34:06 ovirtengine systemd: Found device /dev/mapper/ovirt-audit. Dec 21 12:34:06 ovirtengine kernel: XFS (dm-5): Ending clean mount Dec 21 12:34:06 ovirtengine systemd: Mounted /home. Dec 21 12:34:07 ovirtengine kernel: XFS (dm-3): Ending clean mount Dec 21 12:34:07 ovirtengine systemd: Mounted /var. Dec 21 12:34:07 ovirtengine systemd: Starting Load/Save Random Seed... Dec 21 12:34:07 ovirtengine systemd: Mounting /var/log... Dec 21 12:34:07 ovirtengine kernel: XFS (dm-2): Mounting V5 Filesystem Dec 21 12:34:07 ovirtengine systemd: Started Load/Save Random Seed. Dec 21 12:34:08 ovirtengine kernel: XFS (dm-2): Ending clean mount Dec 21 12:34:08 ovirtengine systemd: Mounted /var/log. Dec 21 12:34:08 ovirtengine systemd: Starting Flush Journal to Persistent Storage... Dec 21 12:34:08 ovirtengine systemd: Mounting /var/log/audit... Dec 21 12:34:08 ovirtengine kernel: XFS (dm-1): Mounting V5 Filesystem Dec 21 12:34:08 ovirtengine systemd: Started Flush Journal to Persistent Storage. Dec 21 12:34:08 ovirtengine kernel: XFS (dm-1): Ending clean mount Dec 21 12:34:08 ovirtengine systemd: Mounted /var/log/audit. Dec 21 12:34:13 ovirtengine kernel: XFS (dm-4): Ending clean mount Dec 21 12:34:13 ovirtengine systemd: Mounted /tmp. Dec 21 12:34:13 ovirtengine systemd: Reached target Local File Systems. ..... Dec 21 12:34:24 ovirtengine sshd[1324]: Server listening on 0.0.0.0 port 2222. Dec 21 12:34:24 ovirtengine sshd[1324]: Server listening on :: port 2222. Dec 21 12:34:25 ovirtengine rsyslogd: [origin software="rsyslogd" swVersion="8.24.0" x-pid="1334" x-info="http://www.rsyslog.com"] start Dec 21 12:34:25 ovirtengine systemd: Started System Logging Service. ...... Dec 21 12:34:25 ovirtengine aliasesdb: BDB0196 Encrypted checksum: no encryption key specified Dec 21 12:34:25 ovirtengine aliasesdb: BDB0196 Encrypted checksum: no encryption key specified Dec 21 12:34:25 ovirtengine kernel: XFS (vda3): Internal error XFS_WANT_CORRUPTED_GOTO at line 1664 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_extent+0xaa/0x140 [xfs] Dec 21 12:34:25 ovirtengine kernel: CPU: 1 PID: 1379 Comm: postalias Not tainted 3.10.0-862.14.4.el7.x86_64 #1 Dec 21 12:34:25 ovirtengine kernel: Hardware name: oVirt oVirt Node, BIOS 1.11.0-2.el7 04/01/2014 Dec 21 12:34:25 ovirtengine kernel: XFS (vda3): xfs_do_force_shutdown(0x8) called from line 236 of file fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffffc02709cb
Dec 21 12:34:27 ovirtengine kernel: XFS (vda3): Corruption of in-memory data detected. Shutting down filesystem Dec 21 12:34:27 ovirtengine kernel: XFS (vda3): Please umount the filesystem and rectify the problem(s) ......
Dec 21 12:34:27 ovirtengine ovirt-websocket-proxy.py: ImportError: No module named websocketproxy Dec 21 12:34:27 ovirtengine journal: 2018-12-21 12:34:27,189+0400 ovirt-engine-dwhd: ERROR run:554 Error: list index out of range Dec 21 12:34:27 ovirtengine systemd: ovirt-websocket-proxy.service: main process exited, code=killed, status=7/BUS Dec 21 12:34:27 ovirtengine systemd: Failed to start oVirt Engine websockets proxy. Dec 21 12:34:27 ovirtengine systemd: Unit ovirt-websocket-proxy.service entered failed state. Dec 21 12:34:27 ovirtengine systemd: ovirt-websocket-proxy.service failed. Dec 21 12:34:27 ovirtengine systemd: ovirt-engine-dwhd.service: main process exited, code=exited, status=1/FAILURE ...... ----------------------------
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BBHBZ6ZWU5KLCW...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/CIW5NDHMUXTPJL...