On Fri, Oct 23, 2015 at 5:05 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:OK, can you please try again the whole reboot procedure just to ensure that it was just a temporary NFS glitch?It seems reproducible.This time I was able to shutdown the hypervisor without manual power off.Only strange thing is that I ranshutdown -h nowand actually the VM at some point (I was able to see that the watchdog stopped...) booted.... ?Related lines in messages:Oct 23 17:33:32 ovc71 systemd: Unmounting RPC Pipe File System...Oct 23 17:33:32 ovc71 systemd: Stopping Session 11 of user root.Oct 23 17:33:33 ovc71 systemd: Stopped Session 11 of user root.Oct 23 17:33:33 ovc71 systemd: Stopping user-0.slice.Oct 23 17:33:33 ovc71 systemd: Removed slice user-0.slice.Oct 23 17:33:33 ovc71 systemd: Stopping vdsm-dhclient.slice.Oct 23 17:33:33 ovc71 systemd: Removed slice vdsm-dhclient.slice.Oct 23 17:33:33 ovc71 systemd: Stopping vdsm.slice.Oct 23 17:33:33 ovc71 systemd: Removed slice vdsm.slice.Oct 23 17:33:33 ovc71 systemd: Stopping Sound Card.Oct 23 17:33:33 ovc71 systemd: Stopped target Sound Card.Oct 23 17:33:33 ovc71 systemd: Stopping LVM2 PV scan on device 8:2...Oct 23 17:33:33 ovc71 systemd: Stopping LVM2 PV scan on device 8:16...Oct 23 17:33:33 ovc71 systemd: Stopping Dump dmesg to /var/log/dmesg...Oct 23 17:33:33 ovc71 systemd: Stopped Dump dmesg to /var/log/dmesg.Oct 23 17:33:33 ovc71 systemd: Stopping Watchdog Multiplexing Daemon...Oct 23 17:33:33 ovc71 systemd: Stopping Multi-User System.Oct 23 17:33:33 ovc71 systemd: Stopped target Multi-User System.Oct 23 17:33:33 ovc71 systemd: Stopping ABRT kernel log watcher...Oct 23 17:33:33 ovc71 systemd: Stopping Command Scheduler...Oct 23 17:33:33 ovc71 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="690" x-info="http://www.rsyslog.com"] exiting on signal 15.Oct 23 17:36:24 ovc71 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="697" x-info="http://www.rsyslog.com"] startOct 23 17:36:21 ovc71 journal: Runtime journal is using 8.0M (max 500.0M, leaving 750.0M of free 4.8G, current limit 500.0M).Oct 23 17:36:21 ovc71 kernel: Initializing cgroup subsys cpusetComing back with the ovrt processes I see:[root@ovc71 ~]# systemctl status ovirt-ha-brokerovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications BrokerLoaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled)Active: inactive (dead) since Fri 2015-10-23 17:36:25 CEST; 31s agoProcess: 849 ExecStop=/usr/lib/systemd/systemd-ovirt-ha-broker stop (code=exited, status=0/SUCCESS)Process: 723 ExecStart=/usr/lib/systemd/systemd-ovirt-ha-broker start (code=exited, status=0/SUCCESS)Main PID: 844 (code=exited, status=0/SUCCESS)CGroup: /system.slice/ovirt-ha-broker.serviceOct 23 17:36:24 ovc71.localdomain.local systemd-ovirt-ha-broker[723]: Starting ovirt-ha-broker: [...Oct 23 17:36:24 ovc71.localdomain.local systemd[1]: Started oVirt Hosted Engine High Availabili...r.Oct 23 17:36:25 ovc71.localdomain.local systemd-ovirt-ha-broker[849]: Stopping ovirt-ha-broker: [...Hint: Some lines were ellipsized, use -l to show in full.ANd[root@ovc71 ~]# systemctl status nfs-servernfs-server.service - NFS server and servicesLoaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled)Active: active (exited) since Fri 2015-10-23 17:36:27 CEST; 1min 9s agoProcess: 1123 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=0/SUCCESS)Process: 1113 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)Main PID: 1123 (code=exited, status=0/SUCCESS)CGroup: /system.slice/nfs-server.serviceOct 23 17:36:27 ovc71.localdomain.local systemd[1]: Starting NFS server and services...Oct 23 17:36:27 ovc71.localdomain.local systemd[1]: Started NFS server and services.So it seems that the broker tries to start and fails (17:36:25) before NFS server start phase completes (17:36:27)...?Again if I then manually start ha-broker and ha-agent, they start ok and I'm able to become operational again with the sh engine upsystemd file for broker is this[Unit]Description=oVirt Hosted Engine High Availability Communications Broker[Service]Type=forkingEnvironmentFile=-/etc/sysconfig/ovirt-ha-brokerExecStart=/usr/lib/systemd/systemd-ovirt-ha-broker startExecStop=/usr/lib/systemd/systemd-ovirt-ha-broker stop[Install]WantedBy=multi-user.targetProbably inside the [unit] section I should addAfter=nfs-server.service
but this should be true only for sh engine configured with NFS.... so to be done at install/setup time?If you want I can set this change for my environment and verify...The issue was here: --spice-host-subject="C=EN, L=Test, O=Test, CN=Test"This one was just the temporary subject used by hosted-engine-setup during the bootstrap sequence when your engine was still to come.At the end that cert got replace by the engine CA signed ones and so you have to substitute that subject to match the one you used during your setup.Even using correct certificate I have problemOn hypervisor[root@ovc71 ~]# openssl x509 -in /etc/pki/vdsm/libvirt-spice/ca-cert.pem -text | grep SubjectSubject: C=US, O=localdomain.local, CN=shengine.localdomain.local.75331Subject Public Key Info:X509v3 Subject Key Identifier:On engine[root@shengine ~]# openssl x509 -in /etc/pki/ovirt-engine/ca.pem -text | grep SubjectSubject: C=US, O=localdomain.local, CN=shengine.localdomain.local.75331Subject Public Key Info:X509v3 Subject Key Identifier:but[root@ovc71 ~]# hosted-engine --add-console-passwordEnter password:code = 0message = 'Done'[root@ovc71 ~]# remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://localhost?tls-port=5900 --spice-host-subject="C=US, O=localdomain.local, CN=shengine.localdomain.local.75331"
** (remote-viewer:4297): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-Gb5xXSKiKK: Connection refusedGLib-GIO-Message: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications.(/usr/bin/remote-viewer:4297): Spice-Warning **: ssl_verify.c:492:openssl_verify: ssl: subject 'C=US, O=localdomain.local, CN=shengine.localdomain.local.75331' verification failed(/usr/bin/remote-viewer:4297): Spice-Warning **: ssl_verify.c:494:openssl_verify: ssl: verification failed(remote-viewer:4297): GSpice-WARNING **: main-1:0: SSL_connect: error:00000001:lib(0):func(0):reason(1)and the remote-viewer window withUnable to connect to the graphic server spice://localhost?tls-port=5900