[ovirt-users] HostedEngine with HA

Simone Tiraboschi stirabos at redhat.com
Thu Aug 18 10:34:22 UTC 2016


On Thu, Aug 18, 2016 at 12:11 PM, Carlos Rodrigues <cmar at eurotux.com> wrote:
> On Thu, 2016-08-18 at 11:53 +0200, Simone Tiraboschi wrote:
>>
>>
>> On Thu, Aug 18, 2016 at 11:50 AM, Carlos Rodrigues <cmar at eurotux.com>
>> wrote:
>> > On Thu, 2016-08-18 at 11:42 +0200, Simone Tiraboschi wrote:
>> > > On Thu, Aug 18, 2016 at 11:25 AM, Carlos Rodrigues <cmar at eurotux.
>> > > com> wrote:
>> > > >
>> > > > On Thu, 2016-08-18 at 11:04 +0200, Simone Tiraboschi wrote:
>> > > > >
>> > > > > On Thu, Aug 18, 2016 at 10:36 AM, Carlos Rodrigues <cmar at euro
>> > > > > tux.com>
>> > > > > wrote:
>> > > > > >
>> > > > > >
>> > > > > > On Thu, 2016-08-18 at 10:27 +0200, Simone Tiraboschi wrote:
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Aug 18, 2016 at 10:22 AM, Carlos Rodrigues <cmar@
>> > > > > > > eurotux.
>> > > > > > > com>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Thu, 2016-08-18 at 08:54 +0200, Simone Tiraboschi
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Tue, Aug 16, 2016 at 12:53 PM, Carlos Rodrigues <c
>> > > > > > > > > mar at euro
>> > > > > > > > > tux.
>> > > > > > > > > com>
>> > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Sun, 2016-08-14 at 14:22 +0300, Roy Golan wrote:
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On 12 August 2016 at 20:23, Carlos Rodrigues <cma
>> > > > > > > > > > > r at eurotu
>> > > > > > > > > > > x.co
>> > > > > > > > > > > m>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Hello,
>> > > > > > > > > > > >
>> > > > > > > > > > > > I have one cluster with two hosts with power
>> > > > > > > > > > > > management
>> > > > > > > > > > > > correctly
>> > > > > > > > > > > > configured and one virtual machine with
>> > > > > > > > > > > > HostedEngine
>> > > > > > > > > > > > over
>> > > > > > > > > > > > shared
>> > > > > > > > > > > > storage with FiberChannel.
>> > > > > > > > > > > >
>> > > > > > > > > > > > When i shutdown the network of host with
>> > > > > > > > > > > > HostedEngine
>> > > > > > > > > > > > VM,  it
>> > > > > > > > > > > > should be
>> > > > > > > > > > > > possible the HostedEngine VM migrate
>> > > > > > > > > > > > automatically to
>> > > > > > > > > > > > another
>> > > > > > > > > > > > host?
>> > > > > > > > > > > >
>> > > > > > > > > > > migrate on which network?
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > What is the expected behaviour on this HA
>> > > > > > > > > > > > scenario?
>> > > > > > > > > > >
>> > > > > > > > > > > After a few minutes your vm will be shutdown by
>> > > > > > > > > > > the High
>> > > > > > > > > > > Availability
>> > > > > > > > > > > agent, as it can't see network, and started on
>> > > > > > > > > > > another
>> > > > > > > > > > > host.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > I'm testing this scenario and after shutdown
>> > > > > > > > > > network, it
>> > > > > > > > > > should
>> > > > > > > > > > be
>> > > > > > > > > > expected that agent shutdown ha and started on
>> > > > > > > > > > another
>> > > > > > > > > > host,
>> > > > > > > > > > but
>> > > > > > > > > > after
>> > > > > > > > > > couple minutes nothing happens and on host with
>> > > > > > > > > > network we
>> > > > > > > > > > getting
>> > > > > > > > > > the
>> > > > > > > > > > following messages:
>> > > > > > > > > >
>> > > > > > > > > > Aug 16 11:44:08 ied-blade11.install.eurotux.local
>> > > > > > > > > > ovirt-ha-
>> > > > > > > > > > agent[2779]:
>> > > > > > > > > > ovirt-ha-agent
>> > > > > > > > > > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEn
>> > > > > > > > > > gine.con
>> > > > > > > > > > fig
>> > > > > > > > > > ERROR
>> > > > > > > > > > Unable to get vm.conf from OVF_STORE, falling back
>> > > > > > > > > > to
>> > > > > > > > > > initial
>> > > > > > > > > > vm.conf
>> > > > > > > > > >
>> > > > > > > > > > I think the HA agent its trying to get vm
>> > > > > > > > > > configuration but
>> > > > > > > > > > some
>> > > > > > > > > > how it
>> > > > > > > > > > can't get vm.conf to start VM.
>> > > > > > > > >
>> > > > > > > > > No, this is a different issues.
>> > > > > > > > > In 3.6 we added a feature to let the engine manage
>> > > > > > > > > also the
>> > > > > > > > > engine VM
>> > > > > > > > > itself; ovirt-ha-agent will pickup the latest engine
>> > > > > > > > > VM
>> > > > > > > > > configuration
>> > > > > > > > > from the OVF_STORE which is managed by the engine.
>> > > > > > > > > If something goes wrong, ovirt-ha-agent could
>> > > > > > > > > fallback to the
>> > > > > > > > > initial
>> > > > > > > > > (bootstrap time) vm.conf. This will normally happen
>> > > > > > > > > till you
>> > > > > > > > > add
>> > > > > > > > > your
>> > > > > > > > > first regular storage domain and the engine imports
>> > > > > > > > > the
>> > > > > > > > > engine
>> > > > > > > > > VM.
>> > > > > > > >
>> > > > > > > > But i already have my first storage domain and storage
>> > > > > > > > engine
>> > > > > > > > domain
>> > > > > > > > and already imported engine VM.
>> > > > > > > >
>> > > > > > > > I'm using 4.0 version.
>> > > > > > >
>> > > > > > > This seams an issue, can you please share your
>> > > > > > > /var/log/ovirt-hosted-engine-ha/agent.log ?
>> > > > > > >
>> > > > > >
>> > > > > > I sent it in attachment.
>> > > > >
>> > > > > Nothing strange here;
>> > > > > do you see a couple of disks with alias OVF_STORE on the
>> > > > > hosted-
>> > > > > engine
>> > > > > storage domain if you check it from the engine?
>> > > > >
>> > > >
>> > > > Do you mean any disk label?
>> > > > I don't have it anyone:
>> > > >
>> > > > [root at ied-blade11 ~]#  ls /dev/disk/by-label/
>> > > > ls: cannot access /dev/disk/by-label/: No such file or
>> > > > directory
>> > >
>> > > No I mean: go to the engine web-ui, select the hosted-engine
>> > > storage
>> > > domain, check the disks there.
>> >
>> > No, the alias is virtio-disk0.
>> >
>>
>> And this is the engine VM disk, so the issue is why the engine has
>> still to create the OVF_STORE.
>> Can you please share your engine.log from the engine VM?
>>
>
> Go in attachment.

The creation of the OVF_STORE disk failed but it's not that clear why:

2016-08-17 08:43:33,538 ERROR
[org.ovirt.engine.core.bll.storage.ovfstore.CreateOvfVolumeForStorageDomainCommand]
(DefaultQuartzScheduler6) [6f1f1fd4] Ending command
'org.ovirt.engine.core.bll.storage.ovfstore.CreateOvfVolumeForStorageDomainCommand'
with failure.
2016-08-17 08:43:33,540 ERROR
[org.ovirt.engine.core.bll.storage.disk.AddDiskCommand]
(DefaultQuartzScheduler6) [6f1f1fd4] Ending command
'org.ovirt.engine.core.bll.storage.disk.AddDiskCommand' with failure.
2016-08-17 08:43:33,541 WARN
[org.ovirt.engine.core.bll.storage.disk.AddDiskCommand]
(DefaultQuartzScheduler6) [6f1f1fd4] VmCommand::EndVmCommand: Vm is
null - not performing endAction on Vm
2016-08-17 08:43:33,553 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler6) [6f1f1fd4] Correlation ID: 6f1f1fd4, Call
Stack: null, Custom Event ID: -1, Message: Add-Disk operation failed
to complete.
2016-08-17 08:43:33,557 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler6) [] Correlation ID: 19ac5bda, Call Stack:
null, Custom Event ID: -1, Message: Failed to create OVF store disk
for Storage Domain hosted_storage.
 OVF data won't be updated meanwhile for that domain.
2016-08-17 08:43:33,585 INFO
[org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
(DefaultQuartzScheduler6) [5f5a8daf] Command
'ProcessOvfUpdateForStorageDomain' (id:
'71aaaafe-7b9e-45e8-a40c-6d33bdf646a0') waiting on child command id:
'eb2e6f1a-c756-4ccd-85a1-60d97d6880de'
type:'CreateOvfVolumeForStorageDomain' to complete
2016-08-17 08:43:33,595 ERROR
[org.ovirt.engine.core.bll.storage.ovfstore.CreateOvfVolumeForStorageDomainCommand]
(DefaultQuartzScheduler6) [5d314e49] Ending command
'org.ovirt.engine.core.bll.storage.ovfstore.CreateOvfVolumeForStorageDomainCommand'
with failure.
2016-08-17 08:43:33,596 ERROR
[org.ovirt.engine.core.bll.storage.disk.AddDiskCommand]
(DefaultQuartzScheduler6) [5d314e49] Ending command
'org.ovirt.engine.core.bll.storage.disk.AddDiskCommand' with failure.
2016-08-17 08:43:33,596 WARN
[org.ovirt.engine.core.bll.storage.disk.AddDiskCommand]
(DefaultQuartzScheduler6) [5d314e49] VmCommand::EndVmCommand: Vm is
null - not performing endAction on Vm
2016-08-17 08:43:33,602 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler6) [5d314e49] Correlation ID: 5d314e49, Call
Stack: null, Custom Event ID: -1, Message: Add-Disk operation failed
to complete.
2016-08-17 08:43:33,605 WARN
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler6) [] Correlation ID: 5f5a8daf, Call Stack:
null, Custom Event ID: -1, Message: Failed to create OVF store disk
for Storage Domain hosted_storage.
 OVF data won't be updated meanwhile for that domain.
2016-08-17 08:43:36,460 INFO
[org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
(DefaultQuartzScheduler7) [5d314e49] HA reservation status for cluster
'Default' is 'OK'
2016-08-17 08:43:36,662 INFO
[org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
(DefaultQuartzScheduler4) [5f5a8daf] Command
'ProcessOvfUpdateForStorageDomain' id:
'71aaaafe-7b9e-45e8-a40c-6d33bdf646a0' child commands
'[84959a4b-6a10-4d22-b37e-6c154e17a0da,
eb2e6f1a-c756-4ccd-85a1-60d97d6880de]' executions were completed,
status 'FAILED'
2016-08-17 08:43:37,691 ERROR
[org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand]
(DefaultQuartzScheduler6) [5f5a8daf] Ending command
'org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand'
with failure.

Can you please check vdsm logs for that time frame on the SPM host?


It seams that you also have an issue in the SPM election procedure:

2016-08-17 18:04:31,053 ERROR
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler1) [] SPM Init: could not find reported vds or
not up - pool: 'Default' vds_spm_id: '2'
2016-08-17 18:04:31,076 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler1) [] SPM selection - vds seems as spm
'hosted_engine_2'
2016-08-17 18:04:31,076 WARN
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler1) [] spm vds is non responsive, stopping spm
selection.
2016-08-17 18:04:31,539 INFO
[org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher]
(DefaultQuartzScheduler7) [] Fetched 1 VMs from VDS
'06372186-572c-41ad-916f-7cbb0aba5302'

probably due to:
2016-08-17 18:02:33,569 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(DefaultQuartzScheduler6) [] Failure to refresh Vds runtime info:
VDSGenericException: VDSNetworkException: Message timeout which can be
caused by communication issues
2016-08-17 18:02:33,569 ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(DefaultQuartzScheduler6) [] Exception:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Message timeout which can be
caused by communication issues

can you please check if the engine VM could correctly resolve and
reach each host?


>> > > > > > > > > > Regards,
>> > > > > > > > > > Carlos Rodrigues
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Regards,
>> > > > > > > > > > > >
>> > > > > > > > > > > > --
>> > > > > > > > > > > > Carlos Rodrigues
>> > > > > > > > > > > >
>> > > > > > > > > > > > Engenheiro de Software Sénior
>> > > > > > > > > > > >
>> > > > > > > > > > > > Eurotux Informática, S.A. | www.eurotux.com
>> > > > > > > > > > > > (t) +351 253 680 300 (m) +351 911 926 110
>> > > > > > > > > > > >
>> > > > > > > > > > > > _______________________________________________
>> > > > > > > > > > > > Users mailing list
>> > > > > > > > > > > > Users at ovirt.org
>> > > > > > > > > > > > http://lists.ovirt.org/mailman/listinfo/users
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > --
>> > > > > > > > > > Carlos Rodrigues
>> > > > > > > > > >
>> > > > > > > > > > Engenheiro de Software Sénior
>> > > > > > > > > >
>> > > > > > > > > > Eurotux Informática, S.A. | www.eurotux.com
>> > > > > > > > > > (t) +351 253 680 300 (m) +351 911 926 110
>> > > > > > > > > >
>> > > > > > > > > > _______________________________________________
>> > > > > > > > > > Users mailing list
>> > > > > > > > > > Users at ovirt.org
>> > > > > > > > > > http://lists.ovirt.org/mailman/listinfo/users
>> > > > > > > > --
>> > > > > > > > Carlos Rodrigues
>> > > > > > > >
>> > > > > > > > Engenheiro de Software Sénior
>> > > > > > > >
>> > > > > > > > Eurotux Informática, S.A. | www.eurotux.com
>> > > > > > > > (t) +351 253 680 300 (m) +351 911 926 110
>> > > > > > > >
>> > > > > > --
>> > > > > > Carlos Rodrigues
>> > > > > >
>> > > > > > Engenheiro de Software Sénior
>> > > > > >
>> > > > > > Eurotux Informática, S.A. | www.eurotux.com
>> > > > > > (t) +351 253 680 300 (m) +351 911 926 110
>> > > > --
>> > > > Carlos Rodrigues
>> > > >
>> > > > Engenheiro de Software Sénior
>> > > >
>> > > > Eurotux Informática, S.A. | www.eurotux.com
>> > > > (t) +351 253 680 300 (m) +351 911 926 110
>> > > >
>> > --
>> > Carlos Rodrigues
>> >
>> > Engenheiro de Software Sénior
>> >
>> > Eurotux Informática, S.A. | www.eurotux.com
>> > (t) +351 253 680 300 (m) +351 911 926 110
>> >
>> >
>>
> --
> Carlos Rodrigues
>
> Engenheiro de Software Sénior
>
> Eurotux Informática, S.A. | www.eurotux.com
> (t) +351 253 680 300 (m) +351 911 926 110



More information about the Users mailing list