On Fri, Oct 23, 2015 at 5:55 PM, Gianluca Cecchi <gianluca.cecchi(a)gmail.com>
wrote:
On Fri, Oct 23, 2015 at 5:05 PM, Simone Tiraboschi
<stirabos(a)redhat.com>
wrote:
>
>>
> OK, can you please try again the whole reboot procedure just to ensure
> that it was just a temporary NFS glitch?
>
It seems reproducible.
This time I was able to shutdown the hypervisor without manual power off.
Only strange thing is that I ran
shutdown -h now
and actually the VM at some point (I was able to see that the watchdog
stopped...) booted.... ?
Related lines in messages:
Oct 23 17:33:32 ovc71 systemd: Unmounting RPC Pipe File System...
Oct 23 17:33:32 ovc71 systemd: Stopping Session 11 of user root.
Oct 23 17:33:33 ovc71 systemd: Stopped Session 11 of user root.
Oct 23 17:33:33 ovc71 systemd: Stopping user-0.slice.
Oct 23 17:33:33 ovc71 systemd: Removed slice user-0.slice.
Oct 23 17:33:33 ovc71 systemd: Stopping vdsm-dhclient.slice.
Oct 23 17:33:33 ovc71 systemd: Removed slice vdsm-dhclient.slice.
Oct 23 17:33:33 ovc71 systemd: Stopping vdsm.slice.
Oct 23 17:33:33 ovc71 systemd: Removed slice vdsm.slice.
Oct 23 17:33:33 ovc71 systemd: Stopping Sound Card.
Oct 23 17:33:33 ovc71 systemd: Stopped target Sound Card.
Oct 23 17:33:33 ovc71 systemd: Stopping LVM2 PV scan on device 8:2...
Oct 23 17:33:33 ovc71 systemd: Stopping LVM2 PV scan on device 8:16...
Oct 23 17:33:33 ovc71 systemd: Stopping Dump dmesg to /var/log/dmesg...
Oct 23 17:33:33 ovc71 systemd: Stopped Dump dmesg to /var/log/dmesg.
Oct 23 17:33:33 ovc71 systemd: Stopping Watchdog Multiplexing Daemon...
Oct 23 17:33:33 ovc71 systemd: Stopping Multi-User System.
Oct 23 17:33:33 ovc71 systemd: Stopped target Multi-User System.
Oct 23 17:33:33 ovc71 systemd: Stopping ABRT kernel log watcher...
Oct 23 17:33:33 ovc71 systemd: Stopping Command Scheduler...
Oct 23 17:33:33 ovc71 rsyslogd: [origin software="rsyslogd"
swVersion="7.4.7" x-pid="690"
x-info="http://www.rsyslog.com"] exiting on
signal 15.
Oct 23 17:36:24 ovc71 rsyslogd: [origin software="rsyslogd"
swVersion="7.4.7" x-pid="697"
x-info="http://www.rsyslog.com"] start
Oct 23 17:36:21 ovc71 journal: Runtime journal is using 8.0M (max 500.0M,
leaving 750.0M of free 4.8G, current limit 500.0M).
Oct 23 17:36:21 ovc71 kernel: Initializing cgroup subsys cpuset
Coming back with the ovrt processes I see:
[root@ovc71 ~]# systemctl status ovirt-ha-broker
ovirt-ha-broker.service - oVirt Hosted Engine High Availability
Communications Broker
Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
enabled)
Active: inactive (dead) since Fri 2015-10-23 17:36:25 CEST; 31s ago
Process: 849 ExecStop=/usr/lib/systemd/systemd-ovirt-ha-broker stop
(code=exited, status=0/SUCCESS)
Process: 723 ExecStart=/usr/lib/systemd/systemd-ovirt-ha-broker start
(code=exited, status=0/SUCCESS)
Main PID: 844 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/ovirt-ha-broker.service
Oct 23 17:36:24 ovc71.localdomain.local systemd-ovirt-ha-broker[723]:
Starting ovirt-ha-broker: [...
Oct 23 17:36:24 ovc71.localdomain.local systemd[1]: Started oVirt Hosted
Engine High Availabili...r.
Oct 23 17:36:25 ovc71.localdomain.local systemd-ovirt-ha-broker[849]:
Stopping ovirt-ha-broker: [...
Hint: Some lines were ellipsized, use -l to show in full.
ANd
[root@ovc71 ~]# systemctl status nfs-server
nfs-server.service - NFS server and services
Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled)
Active: active (exited) since Fri 2015-10-23 17:36:27 CEST; 1min 9s ago
Process: 1123 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited,
status=0/SUCCESS)
Process: 1113 ExecStartPre=/usr/sbin/exportfs -r (code=exited,
status=0/SUCCESS)
Main PID: 1123 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/nfs-server.service
Oct 23 17:36:27 ovc71.localdomain.local systemd[1]: Starting NFS server
and services...
Oct 23 17:36:27 ovc71.localdomain.local systemd[1]: Started NFS server and
services.
So it seems that the broker tries to start and fails (17:36:25) before NFS
server start phase completes (17:36:27)...?
Again if I then manually start ha-broker and ha-agent, they start ok and
I'm able to become operational again with the sh engine up
systemd file for broker is this
[Unit]
Description=oVirt Hosted Engine High Availability Communications Broker
[Service]
Type=forking
EnvironmentFile=-/etc/sysconfig/ovirt-ha-broker
ExecStart=/usr/lib/systemd/systemd-ovirt-ha-broker start
ExecStop=/usr/lib/systemd/systemd-ovirt-ha-broker stop
[Install]
WantedBy=multi-user.target
Probably inside the [unit] section I should add
After=nfs-server.service
Ok, I understood.
You are right: the broker was failing cause the NFS storage was not ready
cause it was served in loopback and there isn't any explicit service
dependency on that.
We are not imposing it cause generally an NFS shared domain is generally
thought to be served from and external system while a loopback NFS is just
a degenerate case.
Simply fix it manually.
but this should be true only for sh engine configured with NFS.... so
to
be done at install/setup time?
If you want I can set this change for my environment and verify...
>
> The issue was here: --spice-host-subject="C=EN, L=Test, O=Test, CN=Test"
> This one was just the temporary subject used by hosted-engine-setup
> during the bootstrap sequence when your engine was still to come.
> At the end that cert got replace by the engine CA signed ones and so you
> have to substitute that subject to match the one you used during your setup.
>
>
Even using correct certificate I have problem
On hypervisor
[root@ovc71 ~]# openssl x509 -in /etc/pki/vdsm/libvirt-spice/ca-cert.pem
-text | grep Subject
Subject: C=US, O=localdomain.local,
CN=shengine.localdomain.local.75331
Subject Public Key Info:
X509v3 Subject Key Identifier:
On engine
[root@shengine ~]# openssl x509 -in /etc/pki/ovirt-engine/ca.pem -text |
grep Subject
Subject: C=US, O=localdomain.local,
CN=shengine.localdomain.local.75331
Subject Public Key Info:
X509v3 Subject Key Identifier:
but
[root@ovc71 ~]# hosted-engine --add-console-password
Enter password:
code = 0
message = 'Done'
[root@ovc71 ~]# remote-viewer
--spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem
spice://localhost?tls-port=5900 --spice-host-subject="C=US,
O=localdomain.local, CN=shengine.localdomain.local.75331"
it should be:
remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem
spice://ovc71.localdomain.local?tls-port=5900 --spice-host-subject="C=US,
O=localdomain.local, CN=ovc71.localdomain.local"
** (remote-viewer:4297): WARNING **: Couldn't connect to
accessibility
bus: Failed to connect to socket /tmp/dbus-Gb5xXSKiKK: Connection refused
GLib-GIO-Message: Using the 'memory' GSettings backend. Your settings
will not be saved or shared with other applications.
(/usr/bin/remote-viewer:4297): Spice-Warning **:
ssl_verify.c:492:openssl_verify: ssl: subject 'C=US, O=localdomain.local,
CN=shengine.localdomain.local.75331' verification failed
(/usr/bin/remote-viewer:4297): Spice-Warning **:
ssl_verify.c:494:openssl_verify: ssl: verification failed
(remote-viewer:4297): GSpice-WARNING **: main-1:0: SSL_connect:
error:00000001:lib(0):func(0):reason(1)
and the remote-viewer window with
Unable to connect to the graphic server spice://localhost?tls-port=5900