oVirt 3.5 2nd test day report - hosted engine

Hi all, Today I tested Hosted Engine on iSCSI (setup and operation) on Fedora 20. The first part of setup went smoothly, but there were some hiccups I eventually ran into: - HA services didn't start after setup [1] - HA agent failed without reporting an error [2][3] I also noticed that when an iSCSI target has multiple LUNs, a random (?) one would be chosen by setup. I ended up running setup again with only one LUN available to make sure this didn't cause further errors. After setup of the first host completed, things seemed to be working well. However, after completing 2nd host setup and rebooting the first host, I have errors in both agent.log files. [4] I'll follow up with msivak and jmoskovc to see if these are errors on my part or something I can help troubleshoot. Thanks, Greg [1] https://bugzilla.redhat.com/show_bug.cgi?id=1123285 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1124624 [3] http://gerrit.ovirt.org/#/c/30814/ [4] First host: Error: 'path to storage domain 35ff13aa-7ff1-4add-9869-651267e36921 not found in /rhev/data-center/mnt' - trying to restart agent [followed by agent existing after too many retries] Second host: Exception: Failed to start monitoring domain (sd_uuid=35ff13aa- 7ff1-4add-9869-651267e36921, host_id=2): timeout during domain acquisition

On 07/30/2014 02:02 AM, Greg Padgett wrote:
Hi all,
Today I tested Hosted Engine on iSCSI (setup and operation) on Fedora 20. The first part of setup went smoothly, but there were some hiccups I eventually ran into:
- HA services didn't start after setup [1] - HA agent failed without reporting an error [2][3]
I also noticed that when an iSCSI target has multiple LUNs, a random (?) one would be chosen by setup. I ended up running setup again with only one LUN available to make sure this didn't cause further errors.
After setup of the first host completed, things seemed to be working well. However, after completing 2nd host setup and rebooting the first host, I have errors in both agent.log files. [4]
I'll follow up with msivak and jmoskovc to see if these are errors on my part or something I can help troubleshoot.
Thanks, Greg
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1123285 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1124624 [3] http://gerrit.ovirt.org/#/c/30814/ [4] First host: Error: 'path to storage domain 35ff13aa-7ff1-4add-9869-651267e36921 not found in /rhev/data-center/mnt' - trying to restart agent [followed by agent existing after too many retries]
- also seems like the vdsm failed to connect the domain
Second host: Exception: Failed to start monitoring domain (sd_uuid=35ff13aa- 7ff1-4add-9869-651267e36921, host_id=2): timeout during domain acquisition
- I'm not sure what we can do about this, if vdsm is not able to connect the domain after 90 seconds. We can increase the timeout, but how much is enough and yet not too much? --Jirka
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
participants (2)
-
Greg Padgett
-
Jiri Moskovcak