1. Why (when it cannot boot due to corruption) it NOT show anything
at
all in console?
I can get to grub menu (if moving fast enough), but if I continue boot I
see a blinking cursor for many minutes and not more. Grub options not
contain any splash/quiet parameters.
(exclusion for EDD message - it is meaningless, if I use edd=off - I get
only black console).
Where is a kernel boot logs/console output? Are it try to load initrd at
least?
2. How to set some timeouts for ha-agent NOT to try restart HE after 1-2
unsuccessful pings and 10 seconds outage?
For HE VM stability (not crash/broke fs) are more important instead
availability (I can live with unavailable it for 10-15 sec, but cannot
with broken VM).
3. I stop ha-agent, broker and HE VM on all (two) nodes. Fix a partition
in VM. Then I start ha-agent on nodes, and it BROKE VM fs AGAIN! (trying
to decide which VM are starting).
I fix VM fs again, put a cluster in maintenance mode, start a VM on one
node by hand, check it for status/health ok, and only then put ha-agent
in work (none) mode. Easy way to broke the cluster by crash HE VM fs (by
not put it to global maintenance mode).
> ---------------------------
> Dec 21 12:32:56 ovirtnode6 kernel: bnx2x 0000:3b:00.0 enp59s0f0: NIC
> Link is Down
> Dec 21 12:32:56 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0)
> entered disabled state
> Dec 21 12:33:13 ovirtnode6 kernel: bnx2x 0000:3b:00.0 enp59s0f0: NIC
> Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
> Dec 21 12:33:13 ovirtnode6 kernel: IPv6: ADDRCONF(NETDEV_CHANGE):
> enp59s0f0: link becomes ready
> Dec 21 12:33:13 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0)
> entered forwarding state
> Dec 21 12:33:13 ovirtnode6 NetworkManager[1715]: <info>
> [1545381193.2204] device (enp59s0f0): carrier: link connected
> -----------------------
>
> There is 17 second. at 33:13 link are back. BUT all events lead to
> crash follow later:
>
> HA agent log:
> ------------------------------
> MainThread::INFO::2018-12-21
>
12:32:59,540::states::444::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> Engine vm running on localhost
> MainThread::INFO::2018-12-21
>
12:32:59,662::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
> Current state EngineUp (score: 3400)
> MainThread::INFO::2018-12-21
>
12:33:09,797::states::136::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
> Penalizing score by 1280 due to gateway status
> MainThread::INFO::2018-12-21
>
12:33:09,798::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
> Current state EngineUp (score: 2120)
> MainThread::ERROR::2018-12-21
>
12:33:19,815::states::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> Host ovirtnode1.miac (id 1) score is significantly better than local
> score, shutting down VM on this host
> ----------------------------------------------
>
>
> syslog messages:
>
> Dec 21 12:33:19 ovirtnode6 journal: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Host
> ovirtnode1.miac (id 1) score is significantly better than local score,
> shutting down VM on this host
> Dec 21 12:33:29 ovirtnode6 journal: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine
> VM stopped on localhost
> Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered
> disabled state
> Dec 21 12:33:37 ovirtnode6 kernel: device vnet1 left promiscuous mode
> Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered
> disabled state
> Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]: <info>
> [1545381217.1796] device (vnet1): state change: disconnected ->
> unmanaged (reason 'unmanaged', sys-iface-state: 'removed')
> Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]: <info>
> [1545381217.1798] device (vnet1): released from master device ovirtmgmt
> Dec 21 12:33:37 ovirtnode6 libvirtd: 2018-12-21 08:33:37.192+0000:
> 2783: **************error : qemuMonitorIO:719 : internal error: End of
> file from qemu monitor************* - WHAT IS THIS?
> Dec 21 12:33:37 ovirtnode6 kvm: 2 guests now active
> Dec 21 12:33:37 ovirtnode6 systemd-machined: Machine
> qemu-2-HostedEngine terminated.
> Dec 21 12:33:37 ovirtnode6 firewalld[1693]: WARNING: COMMAND_FAILED:
> '/usr/sbin/iptables -w2 -w -D libvirt-out -m physdev
> --physdev-is-bridged --physdev-out vnet1 -g FP-vnet1' failed: iptables
> v1.4.21: goto 'FP-vnet1' is not a chain#012#0
> 12Try `iptables -h' or 'iptables --help' for more information.
>
> Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered
> blocking state
> Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered
> disabled state
> Dec 21 12:33:55 ovirtnode6 kernel: device vnet1 entered promiscuous mode
> Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered
> blocking state
> Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered
> forwarding state
> Dec 21 12:33:55 ovirtnode6 lldpad: recvfrom(Event interface): No
> buffer space available
> Dec 21 12:33:55 ovirtnode6 NetworkManager[1715]: <info>
> [1545381235.8086] manager: (vnet1): new Tun device
> (/org/freedesktop/NetworkManager/Devices/37)
> Dec 21 12:33:55 ovirtnode6 NetworkManager[1715]: <info>
> [1545381235.8121] device (vnet1): state change: unmanaged ->
> unavailable (reason 'connection-assumed', sys-iface-state:
'external')
> Dec 21 12:33:55 ovirtnode6 NetworkManager[1715]: <info>
> [1545381235.8127] device (vnet1): state change: unavailable ->
> disconnected (reason 'none', sys-iface-state: 'external')
> ---------------------------
>
> *** WHAT IS THIS *** ? Link are ready some time ago, why this bridge
> status transitions and iptables errors are happen?
>
> and this machine try to start again:
> -----------------
> Dec 21 12:33:56 ovirtnode6 systemd-machined: New machine
> qemu-15-HostedEngine.
> Dec 21 12:33:56 ovirtnode6 systemd: Started Virtual Machine
> qemu-15-HostedEngine.
> -------------------
>
> HA agent log on this host (ovirtnode6):
>
> -----------------------------------------
> MainThread::INFO::2018-12-21
>
12:33:49,880::states::510::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> Engine down and local host has best score (3400), attempting to start
> engine VM
> MainThread::INFO::2018-12-21
>
12:33:57,884::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
> Current state EngineStarting (score: 3400)
> MainThread::INFO::2018-12-21
>
12:34:04,898::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> VM is powering up..
> .....
> MainThread::INFO::2018-12-21
>
12:36:24,800::states::779::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> VM is powering up..
> MainThread::INFO::2018-12-21
>
12:36:24,921::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
> Current state EngineStarting (score: 3400)
> ---------------------------
>
> HA Agent log on ovirtnode1 (spare HE host where VM trying to start):
>
> ----------------------
> MainThread::INFO::2018-12-21
>
12:33:52,984::states::510::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> Engine down and local host has best score (3400), attempting to start
> engine VM
>
> MainThread::INFO::2018-12-21
>
12:33:56,787::hosted_engine::947::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm)
> Engine VM started on localhost
> MainThread::INFO::2018-12-21
>
12:33:59,923::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
> Success, was notification of state_transition
> (EngineStart-EngineStarting) sent? sent
> MainThread::INFO::2018-12-21
>
12:33:59,936::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
> Current state EngineStarting (score: 3400)
> MainThread::INFO::2018-12-21
>
12:34:06,950::states::783::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
> Another host already took over..
> -----------------
>
> WHAT IS THIS? what in means about "took over", what process it do in
> this case?
> ------------------------------------
> MainThread::INFO::2018-12-21
>
12:34:10,240::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
> Current state EngineForceStop (score: 3400)
> MainThread::INFO::2018-12-21
>
12:34:10,246::hosted_engine::1006::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
> Shutting down vm using `/usr/sbin/hosted-engine --vm-poweroff`
> MainThread::INFO::2018-12-21
>
12:34:10,797::hosted_engine::1011::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
> stdout:
> MainThread::INFO::2018-12-21
>
12:34:10,797::hosted_engine::1012::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
> stderr:
> MainThread::ERROR::2018-12-21
>
12:34:10,797::hosted_engine::1020::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_engine_vm)
> Engine VM stopped on localhost
> -------------------------------------
>
>
> I mount HE VM partitions with log, here is a syslog messages:
>
> WHAT IS THIS guest-shutdown? why? no network problems at this time
>
> Dec 21 12:28:24 ovirtengine ovsdb-server:
> ovs|36386|reconnect|WARN|ssl:[::ffff:127.0.0.1]:58834: connection
> dropped (Protocol error)
> Dec 21 12:28:24 ovirtengine python: ::ffff:172.16.10.101 - -
> [21/Dec/2018 12:28:24] "GET /v2.0/networks HTTP/1.1" 200 -
> Dec 21 12:30:44 ovirtengine ovsdb-server:
> ovs|00111|reconnect|ERR|ssl:[::ffff:172.16.10.5]:42032: no response to
> inactivity probe after 5 seconds, disconnecting
> Dec 21 12:30:44 ovirtengine ovsdb-server:
> ovs|00112|reconnect|ERR|ssl:[::ffff:172.16.10.1]:45624: no response to
> inactivity probe after 5 seconds, disconnecting
> Dec 21 12:31:07 ovirtengine qemu-ga: info: guest-shutdown called,
> mode: powerdown
>
> Dec 21 12:33:58 ovirtengine kernel: Linux version
> 3.10.0-862.14.4.el7.x86_64 (mockbuild(a)kbuilder.bsys.centos.org) (gcc
> version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC) ) #1 SMP Wed Sep 26
> 15:12:11 UTC 2018
> Dec 21 12:33:58 ovirtengine kernel: Command line:
> BOOT_IMAGE=/vmlinuz-3.10.0-862.14.4.el7.x86_64
> root=UUID=091e7022-295b-4b3f-96ad-4a7d90a2a9b0 ro crashkernel=auto
> rd.lvm.lv=ovirt/swap console=ttyS0 LANG=en_US.UTF-8
> .....
> Dec 21 12:33:59 ovirtengine kernel: XFS (vda3): Ending clean mount
> Dec 21 12:34:01 ovirtengine systemd: Mounted /sysroot.
> ....
> Dec 21 12:34:06 ovirtengine lvm: 6 logical volume(s) in volume group
> "ovirt" now active
> Dec 21 12:34:06 ovirtengine systemd: Started LVM2 PV scan on device
> 252:2.
> Dec 21 12:34:06 ovirtengine systemd: Found device
> /dev/mapper/ovirt-audit.
> Dec 21 12:34:06 ovirtengine kernel: XFS (dm-5): Ending clean mount
> Dec 21 12:34:06 ovirtengine systemd: Mounted /home.
> Dec 21 12:34:07 ovirtengine kernel: XFS (dm-3): Ending clean mount
> Dec 21 12:34:07 ovirtengine systemd: Mounted /var.
> Dec 21 12:34:07 ovirtengine systemd: Starting Load/Save Random Seed...
> Dec 21 12:34:07 ovirtengine systemd: Mounting /var/log...
> Dec 21 12:34:07 ovirtengine kernel: XFS (dm-2): Mounting V5 Filesystem
> Dec 21 12:34:07 ovirtengine systemd: Started Load/Save Random Seed.
> Dec 21 12:34:08 ovirtengine kernel: XFS (dm-2): Ending clean mount
> Dec 21 12:34:08 ovirtengine systemd: Mounted /var/log.
> Dec 21 12:34:08 ovirtengine systemd: Starting Flush Journal to
> Persistent Storage...
> Dec 21 12:34:08 ovirtengine systemd: Mounting /var/log/audit...
> Dec 21 12:34:08 ovirtengine kernel: XFS (dm-1): Mounting V5 Filesystem
> Dec 21 12:34:08 ovirtengine systemd: Started Flush Journal to
> Persistent Storage.
> Dec 21 12:34:08 ovirtengine kernel: XFS (dm-1): Ending clean mount
> Dec 21 12:34:08 ovirtengine systemd: Mounted /var/log/audit.
> Dec 21 12:34:13 ovirtengine kernel: XFS (dm-4): Ending clean mount
> Dec 21 12:34:13 ovirtengine systemd: Mounted /tmp.
> Dec 21 12:34:13 ovirtengine systemd: Reached target Local File Systems.
> .....
> Dec 21 12:34:24 ovirtengine sshd[1324]: Server listening on 0.0.0.0
> port 2222.
> Dec 21 12:34:24 ovirtengine sshd[1324]: Server listening on :: port 2222.
> Dec 21 12:34:25 ovirtengine rsyslogd: [origin software="rsyslogd"
> swVersion="8.24.0" x-pid="1334"
x-info="http://www.rsyslog.com"] start
> Dec 21 12:34:25 ovirtengine systemd: Started System Logging Service.
> ......
> Dec 21 12:34:25 ovirtengine aliasesdb: BDB0196 Encrypted checksum: no
> encryption key specified
> Dec 21 12:34:25 ovirtengine aliasesdb: BDB0196 Encrypted checksum: no
> encryption key specified
> Dec 21 12:34:25 ovirtengine kernel: XFS (vda3): Internal error
> XFS_WANT_CORRUPTED_GOTO at line 1664 of file
> fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_extent+0xaa/0x140 [xfs]
> Dec 21 12:34:25 ovirtengine kernel: CPU: 1 PID: 1379 Comm: postalias
> Not tainted 3.10.0-862.14.4.el7.x86_64 #1
> Dec 21 12:34:25 ovirtengine kernel: Hardware name: oVirt oVirt Node,
> BIOS 1.11.0-2.el7 04/01/2014
> Dec 21 12:34:25 ovirtengine kernel: XFS (vda3):
> xfs_do_force_shutdown(0x8) called from line 236 of file
> fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffffc02709cb
>
> Dec 21 12:34:27 ovirtengine kernel: XFS (vda3): Corruption of
> in-memory data detected. Shutting down filesystem
> Dec 21 12:34:27 ovirtengine kernel: XFS (vda3): Please umount the
> filesystem and rectify the problem(s)
> ......
>
> Dec 21 12:34:27 ovirtengine ovirt-websocket-proxy.py: ImportError: No
> module named websocketproxy
> Dec 21 12:34:27 ovirtengine journal: 2018-12-21 12:34:27,189+0400
> ovirt-engine-dwhd: ERROR run:554 Error: list index out of range
> Dec 21 12:34:27 ovirtengine systemd: ovirt-websocket-proxy.service:
> main process exited, code=killed, status=7/BUS
> Dec 21 12:34:27 ovirtengine systemd: Failed to start oVirt Engine
> websockets proxy.
> Dec 21 12:34:27 ovirtengine systemd: Unit
> ovirt-websocket-proxy.service entered failed state.
> Dec 21 12:34:27 ovirtengine systemd: ovirt-websocket-proxy.service
> failed.
> Dec 21 12:34:27 ovirtengine systemd: ovirt-engine-dwhd.service: main
> process exited, code=exited, status=1/FAILURE
> ......
> ----------------------------
>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
>
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
>
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BBHBZ6ZWU5K...
>
_______________________________________________
Users mailing list -- users(a)ovirt.org
To unsubscribe send an email to users-leave(a)ovirt.org
Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CIW5NDHMUXT...