[ovirt-users] HostedEngine with HA

Carlos Rodrigues cmar at eurotux.com
Thu Aug 18 11:14:08 UTC 2016


On Thu, 2016-08-18 at 12:34 +0200, Simone Tiraboschi wrote:
> On Thu, Aug 18, 2016 at 12:11 PM, Carlos Rodrigues <cmar at eurotux.com>
> wrote:
> > 
> > On Thu, 2016-08-18 at 11:53 +0200, Simone Tiraboschi wrote:
> > > 
> > > 
> > > 
> > > On Thu, Aug 18, 2016 at 11:50 AM, Carlos Rodrigues <cmar at eurotux.
> > > com>
> > > wrote:
> > > > 
> > > > On Thu, 2016-08-18 at 11:42 +0200, Simone Tiraboschi wrote:
> > > > > 
> > > > > On Thu, Aug 18, 2016 at 11:25 AM, Carlos Rodrigues <cmar at euro
> > > > > tux.
> > > > > com> wrote:
> > > > > > 
> > > > > > 
> > > > > > On Thu, 2016-08-18 at 11:04 +0200, Simone Tiraboschi wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Thu, Aug 18, 2016 at 10:36 AM, Carlos Rodrigues <cmar@
> > > > > > > euro
> > > > > > > tux.com>
> > > > > > > wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Thu, 2016-08-18 at 10:27 +0200, Simone Tiraboschi
> > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Thu, Aug 18, 2016 at 10:22 AM, Carlos Rodrigues
> > > > > > > > > <cmar@
> > > > > > > > > eurotux.
> > > > > > > > > com>
> > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > On Thu, 2016-08-18 at 08:54 +0200, Simone
> > > > > > > > > > Tiraboschi
> > > > > > > > > > wrote:
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > On Tue, Aug 16, 2016 at 12:53 PM, Carlos
> > > > > > > > > > > Rodrigues <c
> > > > > > > > > > > mar at euro
> > > > > > > > > > > tux.
> > > > > > > > > > > com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > On Sun, 2016-08-14 at 14:22 +0300, Roy Golan
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On 12 August 2016 at 20:23, Carlos Rodrigues
> > > > > > > > > > > > > <cma
> > > > > > > > > > > > > r at eurotu
> > > > > > > > > > > > > x.co
> > > > > > > > > > > > > m>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I have one cluster with two hosts with
> > > > > > > > > > > > > > power
> > > > > > > > > > > > > > management
> > > > > > > > > > > > > > correctly
> > > > > > > > > > > > > > configured and one virtual machine with
> > > > > > > > > > > > > > HostedEngine
> > > > > > > > > > > > > > over
> > > > > > > > > > > > > > shared
> > > > > > > > > > > > > > storage with FiberChannel.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > When i shutdown the network of host with
> > > > > > > > > > > > > > HostedEngine
> > > > > > > > > > > > > > VM,  it
> > > > > > > > > > > > > > should be
> > > > > > > > > > > > > > possible the HostedEngine VM migrate
> > > > > > > > > > > > > > automatically to
> > > > > > > > > > > > > > another
> > > > > > > > > > > > > > host?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > migrate on which network?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > What is the expected behaviour on this HA
> > > > > > > > > > > > > > scenario?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > After a few minutes your vm will be shutdown
> > > > > > > > > > > > > by
> > > > > > > > > > > > > the High
> > > > > > > > > > > > > Availability
> > > > > > > > > > > > > agent, as it can't see network, and started
> > > > > > > > > > > > > on
> > > > > > > > > > > > > another
> > > > > > > > > > > > > host.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > I'm testing this scenario and after shutdown
> > > > > > > > > > > > network, it
> > > > > > > > > > > > should
> > > > > > > > > > > > be
> > > > > > > > > > > > expected that agent shutdown ha and started on
> > > > > > > > > > > > another
> > > > > > > > > > > > host,
> > > > > > > > > > > > but
> > > > > > > > > > > > after
> > > > > > > > > > > > couple minutes nothing happens and on host with
> > > > > > > > > > > > network we
> > > > > > > > > > > > getting
> > > > > > > > > > > > the
> > > > > > > > > > > > following messages:
> > > > > > > > > > > > 
> > > > > > > > > > > > Aug 16 11:44:08 ied-
> > > > > > > > > > > > blade11.install.eurotux.local
> > > > > > > > > > > > ovirt-ha-
> > > > > > > > > > > > agent[2779]:
> > > > > > > > > > > > ovirt-ha-agent
> > > > > > > > > > > > ovirt_hosted_engine_ha.agent.hosted_engine.Host
> > > > > > > > > > > > edEn
> > > > > > > > > > > > gine.con
> > > > > > > > > > > > fig
> > > > > > > > > > > > ERROR
> > > > > > > > > > > > Unable to get vm.conf from OVF_STORE, falling
> > > > > > > > > > > > back
> > > > > > > > > > > > to
> > > > > > > > > > > > initial
> > > > > > > > > > > > vm.conf
> > > > > > > > > > > > 
> > > > > > > > > > > > I think the HA agent its trying to get vm
> > > > > > > > > > > > configuration but
> > > > > > > > > > > > some
> > > > > > > > > > > > how it
> > > > > > > > > > > > can't get vm.conf to start VM.
> > > > > > > > > > > 
> > > > > > > > > > > No, this is a different issues.
> > > > > > > > > > > In 3.6 we added a feature to let the engine
> > > > > > > > > > > manage
> > > > > > > > > > > also the
> > > > > > > > > > > engine VM
> > > > > > > > > > > itself; ovirt-ha-agent will pickup the latest
> > > > > > > > > > > engine
> > > > > > > > > > > VM
> > > > > > > > > > > configuration
> > > > > > > > > > > from the OVF_STORE which is managed by the
> > > > > > > > > > > engine.
> > > > > > > > > > > If something goes wrong, ovirt-ha-agent could
> > > > > > > > > > > fallback to the
> > > > > > > > > > > initial
> > > > > > > > > > > (bootstrap time) vm.conf. This will normally
> > > > > > > > > > > happen
> > > > > > > > > > > till you
> > > > > > > > > > > add
> > > > > > > > > > > your
> > > > > > > > > > > first regular storage domain and the engine
> > > > > > > > > > > imports
> > > > > > > > > > > the
> > > > > > > > > > > engine
> > > > > > > > > > > VM.
> > > > > > > > > > 
> > > > > > > > > > But i already have my first storage domain and
> > > > > > > > > > storage
> > > > > > > > > > engine
> > > > > > > > > > domain
> > > > > > > > > > and already imported engine VM.
> > > > > > > > > > 
> > > > > > > > > > I'm using 4.0 version.
> > > > > > > > > 
> > > > > > > > > This seams an issue, can you please share your
> > > > > > > > > /var/log/ovirt-hosted-engine-ha/agent.log ?
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > I sent it in attachment.
> > > > > > > 
> > > > > > > Nothing strange here;
> > > > > > > do you see a couple of disks with alias OVF_STORE on the
> > > > > > > hosted-
> > > > > > > engine
> > > > > > > storage domain if you check it from the engine?
> > > > > > > 
> > > > > > 
> > > > > > Do you mean any disk label?
> > > > > > I don't have it anyone:
> > > > > > 
> > > > > > [root at ied-blade11 ~]#  ls /dev/disk/by-label/
> > > > > > ls: cannot access /dev/disk/by-label/: No such file or
> > > > > > directory
> > > > > 
> > > > > No I mean: go to the engine web-ui, select the hosted-engine
> > > > > storage
> > > > > domain, check the disks there.
> > > > 
> > > > No, the alias is virtio-disk0.
> > > > 
> > > 
> > > And this is the engine VM disk, so the issue is why the engine
> > > has
> > > still to create the OVF_STORE.
> > > Can you please share your engine.log from the engine VM?
> > > 
> > 
> > Go in attachment.
> 
> The creation of the OVF_STORE disk failed but it's not that clear
> why:
> 
> 2016-08-17 08:43:33,538 ERROR
> [org.ovirt.engine.core.bll.storage.ovfstore.CreateOvfVolumeForStorage
> DomainCommand]
> (DefaultQuartzScheduler6) [6f1f1fd4] Ending command
> 'org.ovirt.engine.core.bll.storage.ovfstore.CreateOvfVolumeForStorage
> DomainCommand'
> with failure.
> 2016-08-17 08:43:33,540 ERROR
> [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand]
> (DefaultQuartzScheduler6) [6f1f1fd4] Ending command
> 'org.ovirt.engine.core.bll.storage.disk.AddDiskCommand' with failure.
> 2016-08-17 08:43:33,541 WARN
> [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand]
> (DefaultQuartzScheduler6) [6f1f1fd4] VmCommand::EndVmCommand: Vm is
> null - not performing endAction on Vm
> 2016-08-17 08:43:33,553 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector
> ]
> (DefaultQuartzScheduler6) [6f1f1fd4] Correlation ID: 6f1f1fd4, Call
> Stack: null, Custom Event ID: -1, Message: Add-Disk operation failed
> to complete.
> 2016-08-17 08:43:33,557 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector
> ]
> (DefaultQuartzScheduler6) [] Correlation ID: 19ac5bda, Call Stack:
> null, Custom Event ID: -1, Message: Failed to create OVF store disk
> for Storage Domain hosted_storage.
>  OVF data won't be updated meanwhile for that domain.
> 2016-08-17 08:43:33,585 INFO
> [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
> (DefaultQuartzScheduler6) [5f5a8daf] Command
> 'ProcessOvfUpdateForStorageDomain' (id:
> '71aaaafe-7b9e-45e8-a40c-6d33bdf646a0') waiting on child command id:
> 'eb2e6f1a-c756-4ccd-85a1-60d97d6880de'
> type:'CreateOvfVolumeForStorageDomain' to complete
> 2016-08-17 08:43:33,595 ERROR
> [org.ovirt.engine.core.bll.storage.ovfstore.CreateOvfVolumeForStorage
> DomainCommand]
> (DefaultQuartzScheduler6) [5d314e49] Ending command
> 'org.ovirt.engine.core.bll.storage.ovfstore.CreateOvfVolumeForStorage
> DomainCommand'
> with failure.
> 2016-08-17 08:43:33,596 ERROR
> [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand]
> (DefaultQuartzScheduler6) [5d314e49] Ending command
> 'org.ovirt.engine.core.bll.storage.disk.AddDiskCommand' with failure.
> 2016-08-17 08:43:33,596 WARN
> [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand]
> (DefaultQuartzScheduler6) [5d314e49] VmCommand::EndVmCommand: Vm is
> null - not performing endAction on Vm
> 2016-08-17 08:43:33,602 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector
> ]
> (DefaultQuartzScheduler6) [5d314e49] Correlation ID: 5d314e49, Call
> Stack: null, Custom Event ID: -1, Message: Add-Disk operation failed
> to complete.
> 2016-08-17 08:43:33,605 WARN
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector
> ]
> (DefaultQuartzScheduler6) [] Correlation ID: 5f5a8daf, Call Stack:
> null, Custom Event ID: -1, Message: Failed to create OVF store disk
> for Storage Domain hosted_storage.
>  OVF data won't be updated meanwhile for that domain.
> 2016-08-17 08:43:36,460 INFO
> [org.ovirt.engine.core.bll.scheduling.HaReservationHandling]
> (DefaultQuartzScheduler7) [5d314e49] HA reservation status for
> cluster
> 'Default' is 'OK'
> 2016-08-17 08:43:36,662 INFO
> [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
> (DefaultQuartzScheduler4) [5f5a8daf] Command
> 'ProcessOvfUpdateForStorageDomain' id:
> '71aaaafe-7b9e-45e8-a40c-6d33bdf646a0' child commands
> '[84959a4b-6a10-4d22-b37e-6c154e17a0da,
> eb2e6f1a-c756-4ccd-85a1-60d97d6880de]' executions were completed,
> status 'FAILED'
> 2016-08-17 08:43:37,691 ERROR
> [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorag
> eDomainCommand]
> (DefaultQuartzScheduler6) [5f5a8daf] Ending command
> 'org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorag
> eDomainCommand'
> with failure.
> 
> Can you please check vdsm logs for that time frame on the SPM host?
> 

I sent in attachment the vdsm logs from both hosts, but i think the SPM
host on this time frame it was ied-blade13

> 
> It seams that you also have an issue in the SPM election procedure:
> 
> 2016-08-17 18:04:31,053 ERROR
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
> (DefaultQuartzScheduler1) [] SPM Init: could not find reported vds or
> not up - pool: 'Default' vds_spm_id: '2'
> 2016-08-17 18:04:31,076 INFO
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
> (DefaultQuartzScheduler1) [] SPM selection - vds seems as spm
> 'hosted_engine_2'
> 2016-08-17 18:04:31,076 WARN
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
> (DefaultQuartzScheduler1) [] spm vds is non responsive, stopping spm
> selection.
> 2016-08-17 18:04:31,539 INFO
> [org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher]
> (DefaultQuartzScheduler7) [] Fetched 1 VMs from VDS
> '06372186-572c-41ad-916f-7cbb0aba5302'
> 
> probably due to:
> 2016-08-17 18:02:33,569 ERROR
> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
> (DefaultQuartzScheduler6) [] Failure to refresh Vds runtime info:
> VDSGenericException: VDSNetworkException: Message timeout which can
> be
> caused by communication issues
> 2016-08-17 18:02:33,569 ERROR
> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
> (DefaultQuartzScheduler6) [] Exception:
> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
> VDSGenericException: VDSNetworkException: Message timeout which can
> be
> caused by communication issues
> 

This messages maybe cause by connection issues from yesterday at 6pm.

> can you please check if the engine VM could correctly resolve and
> reach each host?

Now i can read engine VM from both hosts

[root at ied-blade11 ~]# ping ied-hosted-engine
PING ied-hosted-engine.install.eurotux.local (10.10.4.115) 56(84) bytes
of data.
64 bytes from ied-hosted-engine.install.eurotux.local (10.10.4.115):
icmp_seq=1 ttl=64 time=0.179 ms
64 bytes from ied-hosted-engine.install.eurotux.local (10.10.4.115):
icmp_seq=2 ttl=64 time=0.141 ms
^C
--- ied-hosted-engine.install.eurotux.local ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.141/0.160/0.179/0.019 ms
[root at ied-blade13 ~]# ping ied-hosted-engine
PING ied-hosted-engine.install.eurotux.local (10.10.4.115) 56(84) bytes
of data.
64 bytes from ied-hosted-engine.install.eurotux.local (10.10.4.115):
icmp_seq=1 ttl=64 time=0.172 ms
64 bytes from ied-hosted-engine.install.eurotux.local (10.10.4.115):
icmp_seq=2 ttl=64 time=0.169 ms
^C
--- ied-hosted-engine.install.eurotux.local ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.169/0.170/0.172/0.013 ms


I have a message critical of low disk space on hosted_storage domain.
I have 50G of disk and i created i VM with 40G. Do i need more space of
OVF_STORAGE?
What is the minimum requirements of disk space for deploy engine VM?

Regards,
Carlos

> 
> 
> > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Regards,
> > > > > > > > > > > > Carlos Rodrigues
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Carlos Rodrigues
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Engenheiro de Software Sénior
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Eurotux Informática, S.A. | www.eurotux.com
> > > > > > > > > > > > > > (t) +351 253 680 300 (m) +351 911 926 110
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > ___________________________________________
> > > > > > > > > > > > > > ____
> > > > > > > > > > > > > > Users mailing list
> > > > > > > > > > > > > > Users at ovirt.org
> > > > > > > > > > > > > > http://lists.ovirt.org/mailman/listinfo/use
> > > > > > > > > > > > > > rs
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > --
> > > > > > > > > > > > Carlos Rodrigues
> > > > > > > > > > > > 
> > > > > > > > > > > > Engenheiro de Software Sénior
> > > > > > > > > > > > 
> > > > > > > > > > > > Eurotux Informática, S.A. | www.eurotux.com
> > > > > > > > > > > > (t) +351 253 680 300 (m) +351 911 926 110
> > > > > > > > > > > > 
> > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > Users mailing list
> > > > > > > > > > > > Users at ovirt.org
> > > > > > > > > > > > http://lists.ovirt.org/mailman/listinfo/users
> > > > > > > > > > --
> > > > > > > > > > Carlos Rodrigues
> > > > > > > > > > 
> > > > > > > > > > Engenheiro de Software Sénior
> > > > > > > > > > 
> > > > > > > > > > Eurotux Informática, S.A. | www.eurotux.com
> > > > > > > > > > (t) +351 253 680 300 (m) +351 911 926 110
> > > > > > > > > > 
> > > > > > > > --
> > > > > > > > Carlos Rodrigues
> > > > > > > > 
> > > > > > > > Engenheiro de Software Sénior
> > > > > > > > 
> > > > > > > > Eurotux Informática, S.A. | www.eurotux.com
> > > > > > > > (t) +351 253 680 300 (m) +351 911 926 110
> > > > > > --
> > > > > > Carlos Rodrigues
> > > > > > 
> > > > > > Engenheiro de Software Sénior
> > > > > > 
> > > > > > Eurotux Informática, S.A. | www.eurotux.com
> > > > > > (t) +351 253 680 300 (m) +351 911 926 110
> > > > > > 
> > > > --
> > > > Carlos Rodrigues
> > > > 
> > > > Engenheiro de Software Sénior
> > > > 
> > > > Eurotux Informática, S.A. | www.eurotux.com
> > > > (t) +351 253 680 300 (m) +351 911 926 110
> > > > 
> > > > 
> > > 
> > --
> > Carlos Rodrigues
> > 
> > Engenheiro de Software Sénior
> > 
> > Eurotux Informática, S.A. | www.eurotux.com
> > (t) +351 253 680 300 (m) +351 911 926 110
-- 
Carlos Rodrigues 

Engenheiro de Software Sénior

Eurotux Informática, S.A. | www.eurotux.com
(t) +351 253 680 300 (m) +351 911 926 110
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ied-blade13-vdsm.log.28.xz
Type: application/x-xz
Size: 958884 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160818/c22cd680/attachment-0002.xz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ied-blade11-vdsm.log.24.xz
Type: application/x-xz
Size: 849716 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20160818/c22cd680/attachment-0003.xz>


More information about the Users mailing list