Dear Strahil,
Thank you for all knowledge sharing and support so far.
The procedure went fine so far and I have the Engine running on Node1 (it was on
Node3).
However, I see “strange things” :
* Engine is running and I have access to WebUI as well - good.
* None of the HA Nodes actually show who is hosting the Engine atm – all crowns are
gray - Strange
* If I look at the list of VMs, I see HostedEngine VM as powered off - Strange
Can I safely assume procedure went fine and now the Engine conf sync time of 12 hours
started or something went wrong?
Kindly awaiting your reply.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com>
w:
www.activevideo.com<http://www.activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein 1.1217 WJ Hilversum, The
Netherlands. The information contained in this message may be legally privileged and
confidential. It is intended to be read only by the individual or entity to whom it is
addressed or by their designee. If the reader of this message is not the intended
recipient, you are on notice that any distribution of this message, in any form, is
strictly prohibited. If you have received this message in error, please immediately
notify the sender and/or ActiveVideo Networks, LLC by telephone at +1 408.931.9200 and
delete or destroy any copy of this message.
On 17/02/2020, 15:04, "Strahil Nikolov" <hunter86_bg(a)yahoo.com> wrote:
On February 17, 2020 1:55:13 PM GMT+02:00, "Vrgotic, Marko"
<M.Vrgotic(a)activevideo.com> wrote:
Good day Strahil,
I believe I found the causing link:
HostedEngine.log-20200216:-cpu
SandyBridge,pcid=on,spec-ctrl=on,ssbd=on,md-clear=on,vme=on,hypervisor=on,arat=on,xsaveopt=on
\
HostedEngine.log-20200216:2020-02-13T17:58:38.674630Z qemu-kvm:
warning: host doesn't support requested feature:
CPUID.07H:EDX.md-clear
[bit 10]
HostedEngine.log-20200216:2020-02-13T17:58:38.676205Z qemu-kvm:
warning: host doesn't support requested feature:
CPUID.07H:EDX.md-clear
[bit 10]
HostedEngine.log-20200216:2020-02-13T17:58:38.676901Z qemu-kvm:
warning: host doesn't support requested feature:
CPUID.07H:EDX.md-clear
[bit 10]
HostedEngine.log-20200216:2020-02-13T17:58:38.677616Z qemu-kvm:
warning: host doesn't support requested feature:
CPUID.07H:EDX.md-clear
[bit 10]
The "md-clear" CPU seem to be removed as feature due to
spectre
vulnerabilities.
However, when I check the CPU Type/flags of the VMs on the same Host
as
where Engine is currently, as well as on the other hosts, the
md-clear
seems to be only present on the HostedEngine:
* HostedEngine:
FromwebUI:
Intel SandyBridge IBRS SSBD Family
Via virsh:
#virsh dumpxml
<cpu mode='custom' match='exact'
check='full'>
<model fallback='forbid'>SandyBridge</model>
<topology sockets='16' cores='4'
threads='1'/>
<feature policy='require' name='pcid'/>
<feature policy='require'
name='spec-ctrl'/>
<feature policy='require' name='ssbd'/>
<feature policy='require' name='md-clear'/>
<feature policy='require' name='vme'/>
<feature policy='require'
name='hypervisor'/>
<feature policy='require' name='arat'/>
<feature policy='require' name='xsaveopt'/>
<numa>
<cell id='0' cpus='0-3'
memory='16777216' unit='KiB'/>
</numa>
</cpu>
* OtherVMs:
From webUI:
(SandyBridge,+pcid,+spec-ctrl,+ssbd)
Via virsh:
#virsh dumpxml
<cpu mode='custom' match='exact'
check='full'>
<model fallback='forbid'>SandyBridge</model>
<topology sockets='16' cores='1'
threads='1'/>
<feature policy='require' name='pcid'/>
<feature policy='require'
name='spec-ctrl'/>
<feature policy='require' name='ssbd'/>
<feature policy='require' name='vme'/>
<feature policy='require'
name='hypervisor'/>
<feature policy='require' name='arat'/>
<feature policy='require' name='xsaveopt'/>
<numa>
<cell id='0' cpus='0-3'
memory='4194304' unit='KiB'/>
</numa>
</cpu>
Strahil, knowing this, do you propose different approach or shall I
just proceed with initially suggested workaround?
Kindly awaiting your eply.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
Sr. System Engineer @ System Administration
ActiveVideo
o: +31 (35) 6774131
e: m.vrgotic@activevideo.com<mailto:m.vrgotic@activevideo.com>
ActiveVideo Networks BV. Mediacentrum 3745 Joop van den Endeplein
1.1217 WJ Hilversum, The Netherlands. The information contained in
this
message may be legally privileged and confidential. It is intended to
be read only by the individual or entity to whom it is addressed or
by
their designee. If the reader of this message is not the intended
recipient, you are on notice that any distribution of this message,
in
any form, is strictly prohibited. If you have received this message
in
error, please immediately notify the sender and/or ActiveVideo
Networks, LLC by telephone at +1 408.931.9200 and delete or destroy
any
copy of this message.
On 16/02/2020, 15:28, "Strahil Nikolov"
<hunter86_bg(a)yahoo.com> wrote:
ssh root@engine "poweroff"
ssh host-that-holded-engine "virsh undefine HostedEngine; virsh
list
--all"
Lot's of virsh - less vdsm :)
Good luck
Best Regards,
Strahil Nikolov
В неделя, 16 февруари 2020 г., 16:01:44 ч. Гринуич+2, Vrgotic, Marko
<m.vrgotic(a)activevideo.com> написа:
Hi Strahil,
Regarding step 3: Stop and undefine the VM on the last working
host
One question: How do I undefine HostedEngine from last Host?
Hosted-engine command does not provide such option, or it's just
not
obvious.
Kindly awaiting your reply.
-----
kind regards/met vriendelijke groeten
Marko Vrgotic
ActiveVideo
On 14/02/2020, 18:44, "Strahil Nikolov"
<hunter86_bg(a)yahoo.com> wrote:
On February 14, 2020 4:19:53 PM GMT+02:00, "Vrgotic, Marko"
<M.Vrgotic(a)activevideo.com> wrote:
>Good answer Strahil,
>
>Thank you, I forgot.
>
>Libvirt logs are actually showing the reason why:
>
>2020-02-14T12:33:51.847970Z qemu-kvm: -drive
>file=/var/run/vdsm/storage/054c43fc-1924-4106-9f80-0f2ac62b9886/b019c5fa-8fb5-4bfc-8339-f5b7f590a051/f1ce8ba6-2d3b-4309-bca0-e6a00ce74c75,format=raw,if=none,id=drive-ua-b019c5fa-8fb5-4bfc-8339-f5b7f590a051,serial=b019c5fa-8fb5-4bfc-8339-f5b7f590a051,werror=stop,rerror=stop,cache=none,aio=threads:
>'serial' is deprecated, please use the
corresponding option of
>'-device' instead
>Spice-Message: 04:33:51.856: setting TLS option
'CipherString' to
>'kECDHE+FIPS:kDHE+FIPS:kRSA+FIPS:!eNULL:!aNULL'
from
>/etc/pki/tls/spice.cnf configuration file
>2020-02-14T12:33:51.863449Z qemu-kvm: warning: CPU(s) not
present in
>any NUMA nodes: CPU 4 [socket-id: 1, core-id: 0, thread-id: 0],
CPU 5
>[socket-id: 1, core-id: 1, thread-id: 0], CPU 6 [socket-id: 1,
core-id:
>2, thread-id: 0], CPU 7 [socket-id: 1, core-id: 3, thread-id: 0],
CPU
8
>[socket-id: 2, core-id: 0, thread-id: 0], CPU 9 [socket-id: 2,
core-id:
>1, thread-id: 0], CPU 10 [socket-id: 2, core-id: 2, thread-id: 0],
CPU
>11 [socket-id: 2, core-id: 3, thread-id: 0], CPU 12
[socket-id: 3,
>core-id: 0, thread-id: 0], CPU 13 [socket-id: 3, core-id: 1,
thread-id:
>0], CPU 14 [socket-id: 3, core-id: 2, thread-id: 0], CPU 15
[socket-id:
>3, core-id: 3, thread-id: 0], CPU 16 [socket-id: 4,
core-id: 0,
>thread-id: 0], CPU 17 [socket-id: 4, core-id: 1, thread-id: 0],
CPU 18
>[socket-id: 4, core-id: 2, thread-id: 0], CPU 19
[socket-id: 4,
>core-id: 3, thread-id: 0], CPU 20 [socket-id: 5, core-id: 0,
thread-id:
>0], CPU 21 [socket-id: 5, core-id: 1, thread-id: 0], CPU 22
[socket-id:
>5, core-id: 2, thread-id: 0], CPU 23 [socket-id: 5,
core-id: 3,
>thread-id: 0], CPU 24 [socket-id: 6, core-id: 0, thread-id: 0],
CPU 25
>[socket-id: 6, core-id: 1, thread-id: 0], CPU 26
[socket-id: 6,
>core-id: 2, thread-id: 0], CPU 27 [socket-id: 6, core-id: 3,
thread-id:
>0], CPU 28 [socket-id: 7, core-id: 0, thread-id: 0], CPU 29
[socket-id:
>7, core-id: 1, thread-id: 0], CPU 30 [socket-id: 7,
core-id: 2,
>thread-id: 0], CPU 31 [socket-id: 7, core-id: 3, thread-id: 0],
CPU 32
>[socket-id: 8, core-id: 0, thread-id: 0], CPU 33
[socket-id: 8,
>core-id: 1, thread-id: 0], CPU 34 [socket-id: 8, core-id: 2,
thread-id:
>0], CPU 35 [socket-id: 8, core-id: 3, thread-id: 0], CPU 36
[socket-id:
>9, core-id: 0, thread-id: 0], CPU 37 [socket-id: 9,
core-id: 1,
>thread-id: 0], CPU 38 [socket-id: 9, core-id: 2, thread-id: 0],
CPU 39
>[socket-id: 9, core-id: 3, thread-id: 0], CPU 40 [socket-id:
10,
>core-id: 0, thread-id: 0], CPU 41 [socket-id: 10, core-id:
1,
>thread-id: 0], CPU 42 [socket-id: 10, core-id: 2, thread-id: 0],
CPU
43
>[socket-id: 10, core-id: 3, thread-id: 0], CPU 44 [socket-id:
11,
>core-id: 0, thread-id: 0], CPU 45 [socket-id: 11, core-id:
1,
>thread-id: 0], CPU 46 [socket-id: 11, core-id: 2, thread-id: 0],
CPU
47
>[socket-id: 11, core-id: 3, thread-id: 0], CPU 48 [socket-id:
12,
>core-id: 0, thread-id: 0], CPU 49 [socket-id: 12, core-id:
1,
>thread-id: 0], CPU 50 [socket-id: 12, core-id: 2, thread-id: 0],
CPU
51
>[socket-id: 12, core-id: 3, thread-id: 0], CPU 52 [socket-id:
13,
>core-id: 0, thread-id: 0], CPU 53 [socket-id: 13, core-id:
1,
>thread-id: 0], CPU 54 [socket-id: 13, core-id: 2, thread-id: 0],
CPU
55
>[socket-id: 13, core-id: 3, thread-id: 0], CPU 56 [socket-id:
14,
>core-id: 0, thread-id: 0], CPU 57 [socket-id: 14, core-id:
1,
>thread-id: 0], CPU 58 [socket-id: 14, core-id: 2, thread-id: 0],
CPU
59
>[socket-id: 14, core-id: 3, thread-id: 0], CPU 60 [socket-id:
15,
>core-id: 0, thread-id: 0], CPU 61 [socket-id: 15, core-id:
1,
>thread-id: 0], CPU 62 [socket-id: 15, core-id: 2, thread-id: 0],
CPU
63
>[socket-id: 15, core-id: 3, thread-id: 0]
>2020-02-14T12:33:51.863475Z qemu-kvm: warning: All CPU(s) up to
maxcpus
>should be described in NUMA config, ability to start up with
partial
>NUMA mappings is obsoleted and will be removed in future
>2020-02-14T12:33:51.863973Z qemu-kvm: warning: host doesn't
support
>requested feature: CPUID.07H:EDX.md-clear [bit 10]
>2020-02-14T12:33:51.865066Z qemu-kvm: warning: host doesn't
support
>requested feature: CPUID.07H:EDX.md-clear [bit 10]
>2020-02-14T12:33:51.865547Z qemu-kvm: warning: host doesn't
support
>requested feature: CPUID.07H:EDX.md-clear [bit 10]
>2020-02-14T12:33:51.865996Z qemu-kvm: warning: host doesn't
support
>requested feature: CPUID.07H:EDX.md-clear [bit 10]
>2020-02-14 12:33:51.932+0000: shutting down,
reason=failed
>
>But then I wonder if the following is related to error
above:
>
>Before I started upgrading Host by Host, all Hosts in Cluster
were
>showing CPU Family type: " Intel SandyBridge IBRS SSBD
MDS Family"
>After first Host was upgraded, his CPU Family type was changed
to: "
>Intel SandyBridge IBRS SSBD Family" and that forced me
to have do
>"downgrade" Cluster family type to " Intel
SandyBridge IBRS SSBD
>Family" in order to be able to Activate the Host back inside
Cluster.
>Following further, each Host CPU family type changed after
Upgrade
from
>"" Intel SandyBridge IBRS SSBD MDS Family" to
"" Intel SandyBridge
IBRS
>SSBD Family" , except one where HostedEngine is
currently one.
>
>Could this possibly be the reason why I cannot Migrate the
HostedEngine
>now and how to solve it?
>
>Kindly awaiting your reply.
>
>
>
-----
>
kind regards/met vriendelijke groeten
>
>
Marko Vrgotic
>
Sr. System Engineer @ System
Administration
>
>
ActiveVideo
>
o: +31 (35) 6774131
>e: m.vrgotic(a)activevideo.com
>
>
ActiveVideo Networks BV. Mediacentrum 3745
Joop van den Endeplein
>1.1217 WJ Hilversum, The Netherlands. The information contained
in
this
>message may be legally privileged and confidential. It is
intended to
>be read only by the individual or entity to whom it is addressed
or by
>
their designee. If the reader of this
message is not the intended
>recipient, you are on notice that any distribution of this
message, in
>any form, is strictly prohibited. If you have received this
message
in
>
error, please immediately notify the sender
and/or ActiveVideo
>Networks, LLC by telephone at +1 408.931.9200 and delete or
destroy
any
>
copy of this message.
>
>
>
>
>
>
>
>On 14/02/2020, 14:01, "Strahil Nikolov"
<hunter86_bg(a)yahoo.com> wrote:
>
>On February 14, 2020 2:47:04 PM GMT+02:00, "Vrgotic,
Marko"
>
<M.Vrgotic(a)activevideo.com> wrote:
> >Dear oVirt,
> >
> >I have problem migrating HostedEngine, only HA VM server, to
other
HA
> >nodes.
> >
> >Bit of background story:
> >
> > * We have oVirt SHE 4.3.5
> > * Three Nodes act as HA pool for SHE
> > * Node 3 is currently Hosting SHE
> > * Actions:
>>* Put Node1 in Maintenance mode, all VMs were successfully
migrated,
> >than Upgrade packages, Activate Host – all looks
good
>>* Put Node2 in Maintenance mode, all VMs were successfully
migrated,
> >than Upgrade packages, Activate Host – all looks
good
> >
> >Not the problem:
> >Try to set Node3 in Maintenance mode, all VMs were
successfully
> >migrated, except HostedEngine.
> >
> >When attempting Migration of the VM HostedEngine, it
fails with
> >following error message:
> >
> >2020-02-14 12:33:49,960Z INFO
> >[org.ovirt.engine.core.bll.MigrateVmCommand] (default
task-265)
> >[16f4559e-e262-4c9d-80b4-ec81c2cbf950] Lock Acquired to
object
>>'EngineLock:{exclusiveLocks='[66b6d489-ceb8-486a-951a-355e21f13627=VM]',
> >sharedLocks=''}'
> >2020-02-14 12:33:49,984Z INFO
> >[org.ovirt.engine.core.bll.scheduling.SchedulingManager]
(default
> >task-265) [16f4559e-e262-4c9d-80b4-ec81c2cbf950]
Candidate host
> >'ovirt-sj-04.ictv.com'
('d98843da-bd81-46c9-9425-065b196ac59d') was
> >filtered out by 'VAR__FILTERTYPE__INTERNAL' filter
'HA'
(correlation
> >id: null)
> >2020-02-14 12:33:49,984Z INFO
> >[org.ovirt.engine.core.bll.scheduling.SchedulingManager]
(default
> >task-265) [16f4559e-e262-4c9d-80b4-ec81c2cbf950]
Candidate host
> >'ovirt-sj-05.ictv.com'
('e3176705-9fb0-41d6-8721-367dfa2e62bd') was
> >filtered out by 'VAR__FILTERTYPE__INTERNAL' filter
'HA'
(correlation
> >id: null)
> >2020-02-14 12:33:49,997Z INFO
> >[org.ovirt.engine.core.bll.MigrateVmCommand] (default
task-265)
> >[16f4559e-e262-4c9d-80b4-ec81c2cbf950] Running
command:
> >MigrateVmCommand internal: false. Entities affected
: ID:
> >66b6d489-ceb8-486a-951a-355e21f13627 Type: VMAction group
MIGRATE_VM
> >with role type USER
> >2020-02-14 12:33:50,008Z INFO
> >[org.ovirt.engine.core.bll.scheduling.SchedulingManager]
(default
> >task-265) [16f4559e-e262-4c9d-80b4-ec81c2cbf950]
Candidate host
> >'ovirt-sj-04.ictv.com'
('d98843da-bd81-46c9-9425-065b196ac59d') was
> >filtered out by 'VAR__FILTERTYPE__INTERNAL' filter
'HA'
(correlation
> >id: 16f4559e-e262-4c9d-80b4-ec81c2cbf950)
> >2020-02-14 12:33:50,008Z INFO
> >[org.ovirt.engine.core.bll.scheduling.SchedulingManager]
(default
> >task-265) [16f4559e-e262-4c9d-80b4-ec81c2cbf950]
Candidate host
> >'ovirt-sj-05.ictv.com'
('e3176705-9fb0-41d6-8721-367dfa2e62bd') was
> >filtered out by 'VAR__FILTERTYPE__INTERNAL' filter
'HA'
(correlation
> >id: 16f4559e-e262-4c9d-80b4-ec81c2cbf950)
> >2020-02-14 12:33:50,033Z INFO
>>[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (default
task-265)
> >[16f4559e-e262-4c9d-80b4-ec81c2cbf950] START,
MigrateVDSCommand(
>>MigrateVDSCommandParameters:{hostId='f8d27efb-1527-45f0-97d6-d34a86abaaa2',
>
>vmId='66b6d489-ceb8-486a-951a-355e21f13627',
> >srcHost='ovirt-sj-03.ictv.com',
>
>dstVdsId='9808f434-5cd4-48b5-8bbc-e639e391c6a5',
> >dstHost='ovirt-sj-01.ictv.com:54321',
migrationMethod='ONLINE',
> >tunnelMigration='false',
migrationDowntime='0',
autoConverge='true',
> >migrateCompressed='false',
consoleAddress='null',
maxBandwidth='40',
> >enableGuestEvents='true',
maxIncomingMigrations='2',
> >maxOutgoingMigrations='2',
> >convergenceSchedule='[init=[{name=setDowntime,
params=[100]}],
>>stalling=[{limit=1, action={name=setDowntime,
params=[150]}},
>{limit=2,
> >action={name=setDowntime, params=[200]}},
{limit=3,
> >action={name=setDowntime, params=[300]}},
{limit=4,
> >action={name=setDowntime, params=[400]}},
{limit=6,
> >action={name=setDowntime, params=[500]}},
{limit=-1,
> >action={name=abort, params=[]}}]]',
dstQemu='10.210.13.11'}), log
id:
> >5c126a47
> >2020-02-14 12:33:50,036Z INFO
>
>[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand]
> >(default task-265) [16f4559e-e262-4c9d-80b4-ec81c2cbf950]
START,
> >MigrateBrokerVDSCommand(HostName =
ovirt-sj-03.ictv.com,
>>MigrateVDSCommandParameters:{hostId='f8d27efb-1527-45f0-97d6-d34a86abaaa2',
>
>vmId='66b6d489-ceb8-486a-951a-355e21f13627',
> >srcHost='ovirt-sj-03.ictv.com',
>
>dstVdsId='9808f434-5cd4-48b5-8bbc-e639e391c6a5',
> >dstHost='ovirt-sj-01.ictv.com:54321',
migrationMethod='ONLINE',
> >tunnelMigration='false',
migrationDowntime='0',
autoConverge='true',
> >migrateCompressed='false',
consoleAddress='null',
maxBandwidth='40',
> >enableGuestEvents='true',
maxIncomingMigrations='2',
> >maxOutgoingMigrations='2',
> >convergenceSchedule='[init=[{name=setDowntime,
params=[100]}],
>>stalling=[{limit=1, action={name=setDowntime,
params=[150]}},
>{limit=2,
> >action={name=setDowntime, params=[200]}},
{limit=3,
> >action={name=setDowntime, params=[300]}},
{limit=4,
> >action={name=setDowntime, params=[400]}},
{limit=6,
> >action={name=setDowntime, params=[500]}},
{limit=-1,
> >action={name=abort, params=[]}}]]',
dstQemu='10.210.13.11'}), log
id:
> >a0f776d
> >2020-02-14 12:33:50,043Z INFO
>
>[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand]
> >(default task-265) [16f4559e-e262-4c9d-80b4-ec81c2cbf950]
FINISH,
> >MigrateBrokerVDSCommand, return: , log id:
a0f776d
> >2020-02-14 12:33:50,046Z INFO
>>[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (default
task-265)
> >[16f4559e-e262-4c9d-80b4-ec81c2cbf950] FINISH,
MigrateVDSCommand,
> >return: MigratingFrom, log id: 5c126a47
> >2020-02-14 12:33:50,052Z INFO
>>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> >(default task-265) [16f4559e-e262-4c9d-80b4-ec81c2cbf950]
EVENT_ID:
> >VM_MIGRATION_START(62), Migration started (VM:
HostedEngine,
Source:
> >ovirt-sj-03.ictv.com, Destination:
ovirt-sj-01.ictv.com, User:
> >mvrgotic@ictv.com(a)ictv.com-authz).
> >2020-02-14 12:33:52,893Z INFO
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
>>(ForkJoinPool-1-worker-8) [] VM
'66b6d489-ceb8-486a-951a-355e21f13627'
> >was reported as Down on VDS
>
>'9808f434-5cd4-48b5-8bbc-e639e391c6a5'(ovirt-sj-01.ictv.com)
> >2020-02-14 12:33:52,893Z INFO
>
>[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
> >(ForkJoinPool-1-worker-8) [] START,
DestroyVDSCommand(HostName =
> >ovirt-sj-01.ictv.com,
>>DestroyVmVDSCommandParameters:{hostId='9808f434-5cd4-48b5-8bbc-e639e391c6a5',
> >vmId='66b6d489-ceb8-486a-951a-355e21f13627',
secondsToWait='0',
> >gracefully='false', reason='',
ignoreNoVm='true'}), log id:
7532a8c0
> >2020-02-14 12:33:53,217Z INFO
>
>[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
> >(ForkJoinPool-1-worker-8) [] Failed to destroy VM
> >'66b6d489-ceb8-486a-951a-355e21f13627' because VM
does not exist,
> >ignoring
> >2020-02-14 12:33:53,217Z INFO
>
>[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
> >(ForkJoinPool-1-worker-8) [] FINISH, DestroyVDSCommand,
return: ,
log
> >id: 7532a8c0
> >2020-02-14 12:33:53,217Z INFO
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
> >(ForkJoinPool-1-worker-8) [] VM
> >'66b6d489-ceb8-486a-951a-355e21f13627'(HostedEngine)
was
unexpectedly
> >detected as 'Down' on VDS
>>'9808f434-5cd4-48b5-8bbc-e639e391c6a5'(ovirt-sj-01.ictv.com)
(expected
> >on
'f8d27efb-1527-45f0-97d6-d34a86abaaa2')
> >2020-02-14 12:33:53,217Z ERROR
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
> >(ForkJoinPool-1-worker-8) [] Migration of VM
'HostedEngine' to host
> >'ovirt-sj-01.ictv.com' failed: VM destroyed
during the startup.
> >2020-02-14 12:33:53,219Z INFO
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
> >(ForkJoinPool-1-worker-15) [] VM
>
>'66b6d489-ceb8-486a-951a-355e21f13627'(HostedEngine) moved from
> >'MigratingFrom' --> 'Up'
> >2020-02-14 12:33:53,219Z INFO
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
> >(ForkJoinPool-1-worker-15) [] Adding VM
> >'66b6d489-ceb8-486a-951a-355e21f13627'(HostedEngine)
to re-run list
> >2020-02-14 12:33:53,221Z ERROR
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring]
> >(ForkJoinPool-1-worker-15) [] Rerun VM
> >'66b6d489-ceb8-486a-951a-355e21f13627'.
Called from VDS
> >'ovirt-sj-03.ictv.com'
> >2020-02-14 12:33:53,259Z INFO
>
>[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
> >(EE-ManagedThreadFactory-engine-Thread-377323) []
START,
> >MigrateStatusVDSCommand(HostName =
ovirt-sj-03.ictv.com,
>>MigrateStatusVDSCommandParameters:{hostId='f8d27efb-1527-45f0-97d6-d34a86abaaa2',
> >vmId='66b6d489-ceb8-486a-951a-355e21f13627'}),
log id: 62bac076
> >2020-02-14 12:33:53,265Z INFO
>
>[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand]
> >(EE-ManagedThreadFactory-engine-Thread-377323) []
FINISH,
> >MigrateStatusVDSCommand, return: , log id:
62bac076
> >2020-02-14 12:33:53,277Z WARN
>>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> >(EE-ManagedThreadFactory-engine-Thread-377323) []
EVENT_ID:
> >VM_MIGRATION_TRYING_RERUN(128), Failed to migrate VM
HostedEngine
to
> >Host
ovirt-sj-01.ictv.com . Trying to migrate to
another Host.
> >2020-02-14 12:33:53,330Z INFO
>
>[org.ovirt.engine.core.bll.scheduling.SchedulingManager]
> >(EE-ManagedThreadFactory-engine-Thread-377323) []
Candidate host
> >'ovirt-sj-04.ictv.com'
('d98843da-bd81-46c9-9425-065b196ac59d') was
> >filtered out by 'VAR__FILTERTYPE__INTERNAL' filter
'HA'
(correlation
> >id: null)
> >2020-02-14 12:33:53,330Z INFO
>
>[org.ovirt.engine.core.bll.scheduling.SchedulingManager]
> >(EE-ManagedThreadFactory-engine-Thread-377323) []
Candidate host
> >'ovirt-sj-05.ictv.com'
('e3176705-9fb0-41d6-8721-367dfa2e62bd') was
> >filtered out by 'VAR__FILTERTYPE__INTERNAL' filter
'HA'
(correlation
> >id: null)
> >2020-02-14 12:33:53,345Z INFO
> >[org.ovirt.engine.core.bll.MigrateVmCommand]
> >(EE-ManagedThreadFactory-engine-Thread-377323) [] Running
command:
> >MigrateVmCommand internal: false. Entities affected
: ID:
> >66b6d489-ceb8-486a-951a-355e21f13627 Type: VMAction group
MIGRATE_VM
> >with role type USER
> >2020-02-14 12:33:53,356Z INFO
>
>[org.ovirt.engine.core.bll.scheduling.SchedulingManager]
> >(EE-ManagedThreadFactory-engine-Thread-377323) []
Candidate host
> >'ovirt-sj-04.ictv.com'
('d98843da-bd81-46c9-9425-065b196ac59d') was
> >filtered out by 'VAR__FILTERTYPE__INTERNAL' filter
'HA'
(correlation
> >id: 16f4559e-e262-4c9d-80b4-ec81c2cbf950)
> >2020-02-14 12:33:53,356Z INFO
>
>[org.ovirt.engine.core.bll.scheduling.SchedulingManager]
> >(EE-ManagedThreadFactory-engine-Thread-377323) []
Candidate host
> >'ovirt-sj-05.ictv.com'
('e3176705-9fb0-41d6-8721-367dfa2e62bd') was
> >filtered out by 'VAR__FILTERTYPE__INTERNAL' filter
'HA'
(correlation
> >id: 16f4559e-e262-4c9d-80b4-ec81c2cbf950)
> >2020-02-14 12:33:53,380Z INFO
>
>[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand]
> >(EE-ManagedThreadFactory-engine-Thread-377323) []
START,
> >MigrateVDSCommand(
>>MigrateVDSCommandParameters:{hostId='f8d27efb-1527-45f0-97d6-d34a86abaaa2',
>
>vmId='66b6d489-ceb8-486a-951a-355e21f13627',
> >srcHost='ovirt-sj-03.ictv.com',
>
>dstVdsId='33e8ff78-e396-4f40-b43c-685bfaaee9af',
> >dstHost='ovirt-sj-02.ictv.com:54321',
migrationMethod='ONLINE',
> >tunnelMigration='false',
migrationDowntime='0',
autoConverge='true',
> >migrateCompressed='false',
consoleAddress='null',
maxBandwidth='40',
> >enableGuestEvents='true',
maxIncomingMigrations='2',
> >maxOutgoingMigrations='2',
> >convergenceSchedule='[init=[{name=setDowntime,
params=[100]}],
>>stalling=[{limit=1, action={name=setDowntime,
params=[150]}},
>{limit=2,
> >action={name=setDowntime, params=[200]}},
{limit=3,
> >action={name=setDowntime, params=[300]}},
{limit=4,
> >action={name=setDowntime, params=[400]}},
{limit=6,
> >action={name=setDowntime, params=[500]}},
{limit=-1,
> >action={name=abort, params=[]}}]]',
dstQemu='10.210.13.12'}), log
id:
> >d99059f
> >2020-02-14 12:33:53,380Z INFO
>
>[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand]
> >(EE-ManagedThreadFactory-engine-Thread-377323) []
START,
> >MigrateBrokerVDSCommand(HostName =
ovirt-sj-03.ictv.com,
>>MigrateVDSCommandParameters:{hostId='f8d27efb-1527-45f0-97d6-d34a86abaaa2',
>
>vmId='66b6d489-ceb8-486a-951a-355e21f13627',
> >srcHost='ovirt-sj-03.ictv.com',
>
>dstVdsId='33e8ff78-e396-4f40-b43c-685bfaaee9af',
> >dstHost='ovirt-sj-02.ictv.com:54321',
migrationMethod='ONLINE',
> >tunnelMigration='false',
migrationDowntime='0',
autoConverge='true',
> >migrateCompressed='false',
consoleAddress='null',
maxBandwidth='40',
> >enableGuestEvents='true',
maxIncomingMigrations='2',
> >maxOutgoingMigrations='2',
> >convergenceSchedule='[init=[{name=setDowntime,
params=[100]}],
>>stalling=[{limit=1, action={name=setDowntime,
params=[150]}},
>{limit=2,
> >action={name=setDowntime, params=[200]}},
{limit=3,
> >action={name=setDowntime, params=[300]}},
{limit=4,
> >action={name=setDowntime, params=[400]}},
{limit=6,
> >action={name=setDowntime, params=[500]}},
{limit=-1,
> >action={name=abort, params=[]}}]]',
dstQemu='10.210.13.12'}), log
id:
> >6f0483ac
> >2020-02-14 12:33:53,386Z INFO
>
>[org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand]
> >(EE-ManagedThreadFactory-engine-Thread-377323) []
FINISH,
> >MigrateBrokerVDSCommand, return: , log id:
6f0483ac
> >2020-02-14 12:33:53,388Z INFO
>
>[org.ovirt.engine.core.vdsbroker.MigrateVDSCommand]
> >(EE-ManagedThreadFactory-engine-Thread-377323) []
FINISH,
> >MigrateVDSCommand, return: MigratingFrom, log id:
d99059f
> >2020-02-14 12:33:53,391Z INFO
>>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> >(EE-ManagedThreadFactory-engine-Thread-377323) []
EVENT_ID:
> >VM_MIGRATION_START(62), Migration started (VM:
HostedEngine,
Source:
> >ovirt-sj-03.ictv.com, Destination:
ovirt-sj-02.ictv.com, User:
> >mvrgotic@ictv.com(a)ictv.com-authz).
> >2020-02-14 12:33:55,108Z INFO
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher]
> >(EE-ManagedThreadFactory-engineScheduled-Thread-96) []
Fetched 10
VMs
> >from VDS
'33e8ff78-e396-4f40-b43c-685bfaaee9af'
> >2020-02-14 12:33:55,110Z INFO
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
>
>(EE-ManagedThreadFactory-engineScheduled-Thread-96) [] VM
> >'66b6d489-ceb8-486a-951a-355e21f13627' is
migrating to VDS
>
>'33e8ff78-e396-4f40-b43c-685bfaaee9af'(ovirt-sj-02.ictv.com)
ignoring
> >it in the refresh until migration is done
> >2020-02-14 12:33:57,224Z INFO
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
>>(ForkJoinPool-1-worker-15) [] VM
>
'66b6d489-ceb8-486a-951a-355e21f13627'
> >was reported as Down on VDS
>
>'33e8ff78-e396-4f40-b43c-685bfaaee9af'(ovirt-sj-02.ictv.com)
> >2020-02-14 12:33:57,225Z INFO
>
>[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
> >(ForkJoinPool-1-worker-15) [] START,
DestroyVDSCommand(HostName =
> >ovirt-sj-02.ictv.com,
>>DestroyVmVDSCommandParameters:{hostId='33e8ff78-e396-4f40-b43c-685bfaaee9af',
> >vmId='66b6d489-ceb8-486a-951a-355e21f13627',
secondsToWait='0',
> >gracefully='false', reason='',
ignoreNoVm='true'}), log id:
1dec553e
> >2020-02-14 12:33:57,672Z INFO
>
>[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
> >(ForkJoinPool-1-worker-15) [] Failed to destroy
VM
> >'66b6d489-ceb8-486a-951a-355e21f13627' because VM
does not exist,
> >ignoring
> >2020-02-14 12:33:57,672Z INFO
>
>[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
>>(ForkJoinPool-1-worker-15) [] FINISH, DestroyVDSCommand,
return: ,
log
> >id: 1dec553e
> >2020-02-14 12:33:57,672Z INFO
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
> >(ForkJoinPool-1-worker-15) [] VM
> >'66b6d489-ceb8-486a-951a-355e21f13627'(HostedEngine)
was
unexpectedly
> >detected as 'Down' on VDS
>>'33e8ff78-e396-4f40-b43c-685bfaaee9af'(ovirt-sj-02.ictv.com)
(expected
> >on
'f8d27efb-1527-45f0-97d6-d34a86abaaa2')
> >2020-02-14 12:33:57,672Z ERROR
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
> >(ForkJoinPool-1-worker-15) [] Migration of VM
'HostedEngine' to
host
> >'ovirt-sj-02.ictv.com' failed: VM destroyed
during the startup.
> >2020-02-14 12:33:57,674Z INFO
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
> >(ForkJoinPool-1-worker-8) [] VM
>
>'66b6d489-ceb8-486a-951a-355e21f13627'(HostedEngine) moved from
> >'MigratingFrom' --> 'Up'
> >2020-02-14 12:33:57,674Z INFO
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
> >(ForkJoinPool-1-worker-8) [] Adding VM
> >'66b6d489-ceb8-486a-951a-355e21f13627'(HostedEngine)
to re-run list
> >2020-02-14 12:33:57,676Z ERROR
>
>[org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring]
> >(ForkJoinPool-1-worker-8) [] Rerun VM
> >'66b6d489-ceb8-48
I am afraid that your suspicions are right.
What is the host cpu and the HostedEngine's xml?
Have you checked the xml on any working VM ? What cpu flags do the
working VMs have ?
How to solve - I think I have a solution , but you might not like
it.
1. Get current VM xml with virsh
2. Set all nodes in maintenance 'hosted-engine --set-maintenance
--mode=global'
3. Stop and undefine the VM on the last working host
4. Edit the xml from step 1 and add/remove the flags that are
different from the other (working) VMs
5. Define the HostedEngine on any of the updated hosts
6. Start the HostedEngine via virsh.
7. Try with different cpu flags until the engine starts.
8. Leave the engine for at least 12 hours , so it will have enough
time
to update it's own configuration.
9. Remove the maintenance and migrate the engine to the other
upgraded host
10. Patch the last HostedEngine's host
I have done this procedure in order to recover my engine (except
changing the cpu flags).
Note: You may hit some hiccups:
A) virsh alias
alias virsh='virsh -c
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf'
B) HostedEngine network missing:
[root@ovirt1 ~]# virsh net-dumpxml vdsm-ovirtmgmt
<network>
<name>vdsm-ovirtmgmt</name>
<uuid>986c27cf-a1ec-44d8-ae61-ee09ce75c886</uuid>
<forward mode='bridge'/>
<bridge name='ovirtmgmt'/>
</network>
Define in xml and add it via:
virsh net-define somefile.xml
C) Missing disk
Vdsm is creating symlinks like these:
[root@ovirt1 808423f9-8a5c-40cd-bc9f-2568c85b8c74]# pwd
/var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74
[root@ovirt1 808423f9-8a5c-40cd-bc9f-2568c85b8c74]# ls -l
total 20
lrwxrwxrwx. 1 vdsm kvm 129 Feb 2 19:05
2c74697a-8bd9-4472-8a98-bf624f3462d5 ->
/rhev/data-center/mnt/glusterSD/gluster1:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/2c74697a-8bd9-4472-8a98-bf624f3462d5
lrwxrwxrwx. 1 vdsm kvm 129 Feb 2 19:09
3ec27d6d-921c-4348-b799-f50543b6f919 ->
/rhev/data-center/mnt/glusterSD/gluster1:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/3ec27d6d-921c-4348-b799-f50543b6f919
lrwxrwxrwx. 1 vdsm kvm 129 Feb 2 19:09
441abdc8-6cb1-49a4-903f-a1ec0ed88429 ->
/rhev/data-center/mnt/glusterSD/gluster1:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429
lrwxrwxrwx. 1 vdsm kvm 129 Feb 2 19:09
94ade632-6ecc-4901-8cec-8e39f3d69cb0 ->
/rhev/data-center/mnt/glusterSD/gluster1:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0
lrwxrwxrwx. 1 vdsm kvm 129 Feb 2 19:05
fe62a281-51e9-4b23-87b3-2deb52357304 ->
/rhev/data-center/mnt/glusterSD/gluster1:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/fe62a281-51e9-4b23-87b3-2deb52357304
[root@ovirt1 808423f9-8a5c-40cd-bc9f-2568c85b8c74]#
Just create the link, so it points to correct destinationand power
up again.
Good luck !
Best Regards,
Strahil Nikolov
Hi Marko,
If the other VMs work without issues -> it's worth trying.
Best Regards,
Strahil Nikolov