On Wed, Oct 21, 2015 at 5:44 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Wed, Oct 21, 2015 at 4:58 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

Ok, I will crosscheck the output...

That question is:
 Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?
 Note: ensuring that this host could resolve the engine VM hostname is still up to you


You are right. I misunderstood the message meaning.. 

 
 
I run it through screen and I'm not able to scroll through the output of it....
 
1. Hit your screen prefix combination ( C-a / control + A by default), then hit Escape .
2. Move up/down with the arrow keys to scrool the output buffer.
3. When you're done, hit Return twice to get back to the end of the scroll buffer.


Ok, thank you very much! I never used screen before and didn't find this useful infromation.

 
 
> Can I re-run using the answer file? In this case do I have to pre-insert sh
> engine hostname inside /etc/hosts of hypervisor?

Yes.



What is the supposed command for using answer file?

hosted-engine --deploy --config-append=/var/lib/ovirt-hosted-engine-setup/answers/answers-20151021154529.conf

OK. It went successfull fro engine point of view.

One note: I chose spice as protocol for the sh engine and while I was able to connect to it using root user, I wasn't using a normal user.
I got:

  [g.cecchi@ovc71 ~]$ remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://localhost?tls-port=5900 --spice-host-subject="C=EN, L=Test, O=Test, CN=Test"

** (remote-viewer:1633): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-1N7zvOm2Zf: Connection refused
GLib-GIO-Message: Using the 'memory' GSettings backend.  Your settings will not be saved or shared with other applications.

(remote-viewer:1633): GSpice-WARNING **: loading ca certs from /etc/pki/vdsm/libvirt-spice/ca-cert.pem failed
(/usr/bin/remote-viewer:1633): Spice-Warning **: ssl_verify.c:428:openssl_verify: Error in certificate chain verification: self signed certificate in certificate chain (num=19:depth1:/C=EN/L=Test/O=Test/CN=TestCA)

(remote-viewer:1633): GSpice-WARNING **: main-1:0: SSL_connect: error:00000001:lib(0):func(0):reason(1)


It seems that the hypervisor phase was stalled in some way and then failed:

[ INFO  ] Engine is still not reachable, waiting...
[ INFO  ] Engine replied: DB Up!Welcome to Health Status!
[ INFO  ] Connecting to the Engine
[ INFO  ] Waiting for the host to become operational in the engine. This may take several minutes...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
..
[ INFO  ] Still waiting for VDSM host to become operational...
[ ERROR ] Timed out while waiting for host to start. Please check the logs.
[ ERROR ] Unable to add hosted_engine_1 to the manager
[ INFO  ] Saving hosted-engine configuration on the shared storage domain
[ INFO  ] Shutting down the engine VM
[ INFO  ] Enabling and starting HA services
          Hosted Engine successfully set up
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20151021173522.conf'
[ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination

During this waiting phase on hypervisor I got:

[root@ovc71 ~]# systemctl status vdsmd
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: active (running) since Wed 2015-10-21 17:10:55 CEST; 22min ago
 Main PID: 506 (vdsm)
   CGroup: /system.slice/vdsmd.service
           ├─319 /usr/libexec/ioprocess --read-pipe-fd 29 --write-pipe-fd 23 --max-threads 10 --m...
           ├─506 /usr/bin/python /usr/share/vdsm/vdsm
           └─943 /usr/libexec/ioprocess --read-pipe-fd 57 --write-pipe-fd 56 --max-threads 10 --m...

Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 client step 1
Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 ask_user_info()
Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 client step 1
Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 ask_user_info()
Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 make_client_response()
Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 client step 2
Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 parse_server_challenge()
Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 ask_user_info()
Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 make_client_response()
Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 client step 3

Inside vdsm.log under /var/log/vdsm/ I saw many loops of type

Thread-160::DEBUG::2015-10-21 17:31:23,084::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bi
n/dd if=/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/f538db13-330f-4a73-bf6d-1f7e3af12
370/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-160::DEBUG::2015-10-21 17:31:23,091::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS
: <err> = '0+1 records in\n0+1 records out\n311 bytes (311 B) copied, 0.000233708 s, 1.3 MB/s\n'; <r
c> = 0
Reactor thread::INFO::2015-10-21 17:31:30,028::protocoldetector::72::ProtocolDetector.AcceptorImpl::
(handle_accept) Accepting connection from 127.0.0.1:40382
Reactor thread::DEBUG::2015-10-21 17:31:30,034::protocoldetector::82::ProtocolDetector.Detector::(__
init__) Using required_size=11
Reactor thread::INFO::2015-10-21 17:31:30,035::protocoldetector::118::ProtocolDetector.Detector::(ha
ndle_read) Detected protocol xml from 127.0.0.1:40382
Reactor thread::DEBUG::2015-10-21 17:31:30,035::bindingxmlrpc::1297::XmlDetector::(handle_socket) xm
l over http detected from ('127.0.0.1', 40382)
BindingXMLRPC::INFO::2015-10-21 17:31:30,036::xmlrpc::73::vds.XMLRPCServer::(handle_request) Startin
g request handler for 127.0.0.1:40382
Thread-284::INFO::2015-10-21 17:31:30,037::xmlrpc::84::vds.XMLRPCServer::(_process_requests) Request
 handler for 127.0.0.1:40382 started
Thread-284::INFO::2015-10-21 17:31:30,043::xmlrpc::92::vds.XMLRPCServer::(_process_requests) Request
 handler for 127.0.0.1:40382 stopped
Thread-160::DEBUG::2015-10-21 17:31:33,095::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/f538db13-330f-4a73-bf6d-1f7e3af12370/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-160::DEBUG::2015-10-21 17:31:33,104::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n311 bytes (311 B) copied, 0.00030368 s, 1.0 MB/s\n'; <rc> = 0
Thread-160::DEBUG::2015-10-21 17:31:43,109::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/f538db13-330f-4a73-bf6d-1f7e3af12370/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-160::DEBUG::2015-10-21 17:31:43,116::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n311 bytes (311 B) copied, 0.00109833 s, 283 kB/s\n'; <rc> = 0
Reactor thread::INFO::2015-10-21 17:31:45,067::protocoldetector::72::ProtocolDetector.AcceptorImpl::(handle_accept) Accepting connection from 127.0.0.1:40384
Reactor thread::DEBUG::2015-10-21 17:31:45,072::protocoldetector::82::ProtocolDetector.Detector::(__init__) Using required_size=11
Reactor thread::INFO::2015-10-21 17:31:45,073::protocoldetector::118::ProtocolDetector.Detector::(handle_read) Detected protocol xml from 127.0.0.1:40384
Reactor thread::DEBUG::2015-10-21 17:31:45,073::bindingxmlrpc::1297::XmlDetector::(handle_socket) xml over http detected from ('127.0.0.1', 40384)
BindingXMLRPC::INFO::2015-10-21 17:31:45,074::xmlrpc::73::vds.XMLRPCServer::(handle_request) Starting request handler for 127.0.0.1:40384
Thread-285::INFO::2015-10-21 17:31:45,074::xmlrpc::84::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:40384 started
Thread-285::INFO::2015-10-21 17:31:45,080::xmlrpc::92::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:40384 stopped

Full vdsm.log in gzip format here:

Exactly what kind of check does the install run to veryfy hypervisor is up ?

It pools engine REST API checking the host status for 10 minutes till it become 'up' or 'non_operational'. In your case it reached the 10 minutes timeout.
Please check engine and host-deploy logs on the engine VM.
 

Can I simply restart the hypervisor and see what happens? I remember similar thinigs in all-in-one setups where initial setup failed the vdsm part but then a restart was ok.....

It's better to understood what was wrong.
 

Gianluca