[ovirt-users] 4-2rc hosted-engine don't boot error:cannot allocate kernel buffer

Martin Sivak msivak at redhat.com
Mon Dec 11 11:14:06 UTC 2017


Hi,

we also have a proper fix now and will release it with the next RC build.

Best regards

--
Martin Sivak
SLA / oVirt

On Mon, Dec 11, 2017 at 12:10 PM, Maton, Brett <matonb at ltresources.co.uk>
wrote:

> Really short version ( I can't find the link to the ovirt doc at the
> moment)
>
> Put hosted engine in to global maintenance and power off the vm
> (hosted-engine command).
>
> On one of your physical hosts, make a copy of he config and update the
> memory
>
> cp /var/run/ovirt-hosted-engine-ha/vm.conf .
> vim vm.conf
>
> Then start hosted engine with the new config
>
>
> hosted-engine --vm-start --vm-conf=./vm.conf
>
>
>
> On 11 December 2017 at 10:36, Roberto Nunin <robnunin at gmail.com> wrote:
>
>>
>>
>> 2017-12-11 10:32 GMT+01:00 Maton, Brett <matonb at ltresources.co.uk>:
>>
>>> Hi Roberto can you check how much RAM is allocated to the HE VM ?
>>>
>>>
>>> virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.
>>> conf
>>>
>>> virsh # dominfo HostedEngine
>>>
>>>
>>> The last update I did seems to have changed the HE RAM from 4GB to 4MB!
>>>
>>
>>
>> ​Yes, you're right...... ​
>>
>> virsh # dominfo HostedEngine
>> Id:             191
>> Name:           HostedEngine
>> UUID:           6831dd96-af48-4673-ac98-f1b9ba60754b
>> OS Type:        hvm
>> State:          running
>> CPU(s):         4
>> CPU time:       9053.7s
>> Max memory:     4096 KiB
>> Used memory:    4096 KiB
>> Persistent:     yes
>> Autostart:      disable
>> Managed save:   no
>> Security model: selinux
>> Security DOI:   0
>> Security label: system_u:system_r:svirt_t:s0:c201,c408 (enforcing)
>>
>>
>>>
>>> On 11 December 2017 at 09:08, Simone Tiraboschi <stirabos at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Dec 11, 2017 at 9:47 AM, Roberto Nunin <robnunin at gmail.com>
>>>> wrote:
>>>>
>>>>> Hello all
>>>>>
>>>>> during weekend, I've re-tried to deploy my 4.2_rc lab.
>>>>> Everything was fine, apart the fact host 2 and 3 weren't imported. I
>>>>> had to add them to the cluster manually, with the NEW function.
>>>>> After this Gluster volumes were added fine to the environment.
>>>>>
>>>>> Next engine deploy on nodes 2 and 3, ended with ok status.
>>>>>
>>>>> Tring to migrate HE from host 1 to host 2 was fine, the same from host
>>>>> 2 to host 3.
>>>>>
>>>>> After these two attempts, no way to migrate HE back to any host.
>>>>> Tried Maintenance mode set to global, reboot the HE and now I'm in the
>>>>> same condition reported below, not anymore able to boot the HE.
>>>>>
>>>>> Here's hosted-engine --vm-status:
>>>>>
>>>>> !! Cluster is in GLOBAL MAINTENANCE mode !!
>>>>>
>>>>>
>>>>>
>>>>> --== Host 1 status ==--
>>>>>
>>>>> conf_on_shared_storage             : True
>>>>> Status up-to-date                  : True
>>>>> Hostname                           : aps-te61-mng.example.com
>>>>> Host ID                            : 1
>>>>> Engine status                      : {"reason": "vm not running on
>>>>> this host", "health": "bad", "vm": "down", "detail": "unknown"}
>>>>> Score                              : 3400
>>>>> stopped                            : False
>>>>> Local maintenance                  : False
>>>>> crc32                              : 7dfc420b
>>>>> local_conf_timestamp               : 181953
>>>>> Host timestamp                     : 181952
>>>>> Extra metadata (valid at timestamp):
>>>>>         metadata_parse_version=1
>>>>>         metadata_feature_version=1
>>>>>         timestamp=181952 (Mon Dec 11 09:21:46 2017)
>>>>>         host-id=1
>>>>>         score=3400
>>>>>         vm_conf_refresh_time=181953 (Mon Dec 11 09:21:47 2017)
>>>>>         conf_on_shared_storage=True
>>>>>         maintenance=False
>>>>>         state=GlobalMaintenance
>>>>>         stopped=False
>>>>>
>>>>>
>>>>> --== Host 2 status ==--
>>>>>
>>>>> conf_on_shared_storage             : True
>>>>> Status up-to-date                  : True
>>>>> Hostname                           : aps-te64-mng.example.com
>>>>> Host ID                            : 2
>>>>> Engine status                      : {"reason": "vm not running on
>>>>> this host", "health": "bad", "vm": "down", "detail": "unknown"}
>>>>> Score                              : 3400
>>>>> stopped                            : False
>>>>> Local maintenance                  : False
>>>>> crc32                              : 67c7dd1d
>>>>> local_conf_timestamp               : 181946
>>>>> Host timestamp                     : 181946
>>>>> Extra metadata (valid at timestamp):
>>>>>         metadata_parse_version=1
>>>>>         metadata_feature_version=1
>>>>>         timestamp=181946 (Mon Dec 11 09:21:49 2017)
>>>>>         host-id=2
>>>>>         score=3400
>>>>>         vm_conf_refresh_time=181946 (Mon Dec 11 09:21:49 2017)
>>>>>         conf_on_shared_storage=True
>>>>>         maintenance=False
>>>>>         state=GlobalMaintenance
>>>>>         stopped=False
>>>>>
>>>>>
>>>>> --== Host 3 status ==--
>>>>>
>>>>> conf_on_shared_storage             : True
>>>>> Status up-to-date                  : True
>>>>> Hostname                           : aps-te68-mng.example.com
>>>>> Host ID                            : 3
>>>>> Engine status                      : {"reason": "failed liveliness
>>>>> check", "health": "bad", "vm": "up", "detail": "Up"}
>>>>> Score                              : 3400
>>>>> stopped                            : False
>>>>> Local maintenance                  : False
>>>>> crc32                              : 4daea041
>>>>> local_conf_timestamp               : 181078
>>>>> Host timestamp                     : 181078
>>>>> Extra metadata (valid at timestamp):
>>>>>         metadata_parse_version=1
>>>>>         metadata_feature_version=1
>>>>>         timestamp=181078 (Mon Dec 11 09:21:53 2017)
>>>>>         host-id=3
>>>>>         score=3400
>>>>>         vm_conf_refresh_time=181078 (Mon Dec 11 09:21:53 2017)
>>>>>         conf_on_shared_storage=True
>>>>>         maintenance=False
>>>>>         state=GlobalMaintenance
>>>>>         stopped=False
>>>>>
>>>>>
>>>>> !! Cluster is in GLOBAL MAINTENANCE mode !!
>>>>>
>>>>> (it is in global maintenance to avoid messages to be sent to admin
>>>>> mailbox).
>>>>>
>>>>
>>>> As soon as you exit the global maintenance mode, one of the hosts
>>>> should take care of automatically restarting the engine VM within a couple
>>>> of minutes.
>>>>
>>>> If you want to manually start the engine VM over a specific host while
>>>> in maintenance mode you can use:
>>>>     hosted-engine --vm-start
>>>> on the specific host
>>>>
>>>>
>>>>>
>>>>> Engine image is available on all three hosts, gluster is working fine:
>>>>>
>>>>> Volume Name: engine
>>>>> Type: Replicate
>>>>> Volume ID: 95355a0b-1f45-4329-95c7-604682e812d0
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 1 x 3 = 3
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: aps-te61-mng.example.com:/gluster_bricks/engine/engine
>>>>> Brick2: aps-te64-mng.example.com:/gluster_bricks/engine/engine
>>>>> Brick3: aps-te68-mng.example.com:/gluster_bricks/engine/engine
>>>>> Options Reconfigured:
>>>>> nfs.disable: on
>>>>> transport.address-family: inet
>>>>> performance.quick-read: off
>>>>> performance.read-ahead: off
>>>>> performance.io-cache: off
>>>>> performance.low-prio-threads: 32
>>>>> network.remote-dio: off
>>>>> cluster.eager-lock: enable
>>>>> cluster.quorum-type: auto
>>>>> cluster.server-quorum-type: server
>>>>> cluster.data-self-heal-algorithm: full
>>>>> cluster.locking-scheme: granular
>>>>> cluster.shd-max-threads: 8
>>>>> cluster.shd-wait-qlength: 10000
>>>>> features.shard: on
>>>>> user.cifs: off
>>>>> storage.owner-uid: 36
>>>>> storage.owner-gid: 36
>>>>> network.ping-timeout: 30
>>>>> performance.strict-o-direct: on
>>>>> cluster.granular-entry-heal: enable
>>>>> features.shard-block-size: 64MB
>>>>>
>>>>> Engine qemu image seems to be ok:
>>>>>
>>>>> [root at aps-te68-mng 9998de26-8a5c-4495-b450-c8dfc1e016da]# l
>>>>> total 2660796
>>>>> -rw-rw----. 1 vdsm kvm 53687091200 Dec  8 17:22
>>>>> 35ac0f88-e97d-4710-a385-127c751a3190
>>>>> -rw-rw----. 1 vdsm kvm     1048576 Dec 11 09:04
>>>>> 35ac0f88-e97d-4710-a385-127c751a3190.lease
>>>>> -rw-r--r--. 1 vdsm kvm         285 Dec  8 11:19
>>>>> 35ac0f88-e97d-4710-a385-127c751a3190.meta
>>>>> [root at aps-te68-mng 9998de26-8a5c-4495-b450-c8dfc1e016da]# qemu-img
>>>>> info 35ac0f88-e97d-4710-a385-127c751a3190
>>>>> image: 35ac0f88-e97d-4710-a385-127c751a3190
>>>>> file format: raw
>>>>> virtual size: 50G (53687091200 bytes)
>>>>> disk size: 2.5G
>>>>> [root at aps-te68-mng 9998de26-8a5c-4495-b450-c8dfc1e016da]#
>>>>>
>>>>> Attached agent and broker logs from the last host where HE startup was
>>>>> attempted. Only last two hours.
>>>>>
>>>>> Any hint to perform further investigation ?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Environment: 3 HPE Proliant BL680cG7, OS on mirrored volume 1, gluster
>>>>> on mirrored volume 2, 1 TB for each server.
>>>>> Multiple network adapters (6), configured only one.
>>>>> I've used last ovirt-node-ng iso image : ovirt-node-ng-installer-ovirt-
>>>>> 4.2-pre-2017120512 <(201)%20712-0512>.iso and
>>>>> ovirt-hosted-engine-setup-2.2.1-0.0.master.20171206123553.git94f4c9e.el7.centos.noarch
>>>>> to work around HE static ip address not masked correctly.
>>>>>
>>>>>>>>>> --
>>>>> Roberto
>>>>>
>>>>> 2017-12-07 12:44 GMT+01:00 Roberto Nunin <robnunin at gmail.com>:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> after successfully deployed fresh 4.2_rc with ovirt node, I'm facing
>>>>>> with a blocking problem.
>>>>>>
>>>>>> hosted-engine won't boot. Reaching the console via vnc hook, I can
>>>>>> see that it is at initial boot screen, but for any OS release available, I
>>>>>> receive:
>>>>>>
>>>>>> [image: Immagine incorporata 1]
>>>>>> then
>>>>>> [image: Immagine incorporata 2]
>>>>>>
>>>>>> ​googling around, I'm not able to find suggestions. Any hints ?
>>>>>>
>>>>>> Thanks​
>>>>>>
>>>>>> --
>>>>>> Roberto
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at ovirt.org
>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>>
>> --
>> Roberto Nunin
>>
>>
>>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171211/d6a3e1cd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 9015 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171211/d6a3e1cd/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 18183 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171211/d6a3e1cd/attachment-0001.png>


More information about the Users mailing list