[ovirt-users] 4-2rc hosted-engine don't boot error:cannot allocate kernel buffer

Artyom Lukianov alukiano at redhat.com
Mon Dec 11 09:43:58 UTC 2017


I opened the bug because I had the same issue
https://bugzilla.redhat.com/show_bug.cgi?id=1524331.

Best Regards


On Mon, Dec 11, 2017 at 11:32 AM, Maton, Brett <matonb at ltresources.co.uk>
wrote:

> Hi Roberto can you check how much RAM is allocated to the HE VM ?
>
>
> virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf
>
> virsh # dominfo HostedEngine
>
>
> The last update I did seems to have changed the HE RAM from 4GB to 4MB!
>
>
> On 11 December 2017 at 09:08, Simone Tiraboschi <stirabos at redhat.com>
> wrote:
>
>>
>>
>> On Mon, Dec 11, 2017 at 9:47 AM, Roberto Nunin <robnunin at gmail.com>
>> wrote:
>>
>>> Hello all
>>>
>>> during weekend, I've re-tried to deploy my 4.2_rc lab.
>>> Everything was fine, apart the fact host 2 and 3 weren't imported. I had
>>> to add them to the cluster manually, with the NEW function.
>>> After this Gluster volumes were added fine to the environment.
>>>
>>> Next engine deploy on nodes 2 and 3, ended with ok status.
>>>
>>> Tring to migrate HE from host 1 to host 2 was fine, the same from host 2
>>> to host 3.
>>>
>>> After these two attempts, no way to migrate HE back to any host.
>>> Tried Maintenance mode set to global, reboot the HE and now I'm in the
>>> same condition reported below, not anymore able to boot the HE.
>>>
>>> Here's hosted-engine --vm-status:
>>>
>>> !! Cluster is in GLOBAL MAINTENANCE mode !!
>>>
>>>
>>>
>>> --== Host 1 status ==--
>>>
>>> conf_on_shared_storage             : True
>>> Status up-to-date                  : True
>>> Hostname                           : aps-te61-mng.example.com
>>> Host ID                            : 1
>>> Engine status                      : {"reason": "vm not running on this
>>> host", "health": "bad", "vm": "down", "detail": "unknown"}
>>> Score                              : 3400
>>> stopped                            : False
>>> Local maintenance                  : False
>>> crc32                              : 7dfc420b
>>> local_conf_timestamp               : 181953
>>> Host timestamp                     : 181952
>>> Extra metadata (valid at timestamp):
>>>         metadata_parse_version=1
>>>         metadata_feature_version=1
>>>         timestamp=181952 (Mon Dec 11 09:21:46 2017)
>>>         host-id=1
>>>         score=3400
>>>         vm_conf_refresh_time=181953 (Mon Dec 11 09:21:47 2017)
>>>         conf_on_shared_storage=True
>>>         maintenance=False
>>>         state=GlobalMaintenance
>>>         stopped=False
>>>
>>>
>>> --== Host 2 status ==--
>>>
>>> conf_on_shared_storage             : True
>>> Status up-to-date                  : True
>>> Hostname                           : aps-te64-mng.example.com
>>> Host ID                            : 2
>>> Engine status                      : {"reason": "vm not running on this
>>> host", "health": "bad", "vm": "down", "detail": "unknown"}
>>> Score                              : 3400
>>> stopped                            : False
>>> Local maintenance                  : False
>>> crc32                              : 67c7dd1d
>>> local_conf_timestamp               : 181946
>>> Host timestamp                     : 181946
>>> Extra metadata (valid at timestamp):
>>>         metadata_parse_version=1
>>>         metadata_feature_version=1
>>>         timestamp=181946 (Mon Dec 11 09:21:49 2017)
>>>         host-id=2
>>>         score=3400
>>>         vm_conf_refresh_time=181946 (Mon Dec 11 09:21:49 2017)
>>>         conf_on_shared_storage=True
>>>         maintenance=False
>>>         state=GlobalMaintenance
>>>         stopped=False
>>>
>>>
>>> --== Host 3 status ==--
>>>
>>> conf_on_shared_storage             : True
>>> Status up-to-date                  : True
>>> Hostname                           : aps-te68-mng.example.com
>>> Host ID                            : 3
>>> Engine status                      : {"reason": "failed liveliness
>>> check", "health": "bad", "vm": "up", "detail": "Up"}
>>> Score                              : 3400
>>> stopped                            : False
>>> Local maintenance                  : False
>>> crc32                              : 4daea041
>>> local_conf_timestamp               : 181078
>>> Host timestamp                     : 181078
>>> Extra metadata (valid at timestamp):
>>>         metadata_parse_version=1
>>>         metadata_feature_version=1
>>>         timestamp=181078 (Mon Dec 11 09:21:53 2017)
>>>         host-id=3
>>>         score=3400
>>>         vm_conf_refresh_time=181078 (Mon Dec 11 09:21:53 2017)
>>>         conf_on_shared_storage=True
>>>         maintenance=False
>>>         state=GlobalMaintenance
>>>         stopped=False
>>>
>>>
>>> !! Cluster is in GLOBAL MAINTENANCE mode !!
>>>
>>> (it is in global maintenance to avoid messages to be sent to admin
>>> mailbox).
>>>
>>
>> As soon as you exit the global maintenance mode, one of the hosts should
>> take care of automatically restarting the engine VM within a couple of
>> minutes.
>>
>> If you want to manually start the engine VM over a specific host while in
>> maintenance mode you can use:
>>     hosted-engine --vm-start
>> on the specific host
>>
>>
>>>
>>> Engine image is available on all three hosts, gluster is working fine:
>>>
>>> Volume Name: engine
>>> Type: Replicate
>>> Volume ID: 95355a0b-1f45-4329-95c7-604682e812d0
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: aps-te61-mng.example.com:/gluster_bricks/engine/engine
>>> Brick2: aps-te64-mng.example.com:/gluster_bricks/engine/engine
>>> Brick3: aps-te68-mng.example.com:/gluster_bricks/engine/engine
>>> Options Reconfigured:
>>> nfs.disable: on
>>> transport.address-family: inet
>>> performance.quick-read: off
>>> performance.read-ahead: off
>>> performance.io-cache: off
>>> performance.low-prio-threads: 32
>>> network.remote-dio: off
>>> cluster.eager-lock: enable
>>> cluster.quorum-type: auto
>>> cluster.server-quorum-type: server
>>> cluster.data-self-heal-algorithm: full
>>> cluster.locking-scheme: granular
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 10000
>>> features.shard: on
>>> user.cifs: off
>>> storage.owner-uid: 36
>>> storage.owner-gid: 36
>>> network.ping-timeout: 30
>>> performance.strict-o-direct: on
>>> cluster.granular-entry-heal: enable
>>> features.shard-block-size: 64MB
>>>
>>> Engine qemu image seems to be ok:
>>>
>>> [root at aps-te68-mng 9998de26-8a5c-4495-b450-c8dfc1e016da]# l
>>> total 2660796
>>> -rw-rw----. 1 vdsm kvm 53687091200 Dec  8 17:22
>>> 35ac0f88-e97d-4710-a385-127c751a3190
>>> -rw-rw----. 1 vdsm kvm     1048576 Dec 11 09:04
>>> 35ac0f88-e97d-4710-a385-127c751a3190.lease
>>> -rw-r--r--. 1 vdsm kvm         285 Dec  8 11:19
>>> 35ac0f88-e97d-4710-a385-127c751a3190.meta
>>> [root at aps-te68-mng 9998de26-8a5c-4495-b450-c8dfc1e016da]# qemu-img info
>>> 35ac0f88-e97d-4710-a385-127c751a3190
>>> image: 35ac0f88-e97d-4710-a385-127c751a3190
>>> file format: raw
>>> virtual size: 50G (53687091200 bytes)
>>> disk size: 2.5G
>>> [root at aps-te68-mng 9998de26-8a5c-4495-b450-c8dfc1e016da]#
>>>
>>> Attached agent and broker logs from the last host where HE startup was
>>> attempted. Only last two hours.
>>>
>>> Any hint to perform further investigation ?
>>>
>>> Thanks in advance.
>>>
>>> Environment: 3 HPE Proliant BL680cG7, OS on mirrored volume 1, gluster
>>> on mirrored volume 2, 1 TB for each server.
>>> Multiple network adapters (6), configured only one.
>>> I've used last ovirt-node-ng iso image : ovirt-node-ng-installer-ovirt-
>>> 4.2-pre-2017120512 <(201)%20712-0512>.iso and ovirt-hosted-engine-setup-
>>> 2.2.1-0.0.master.20171206123553.git94f4c9e.el7.centos.noarch to work
>>> around HE static ip address not masked correctly.
>>>
>>>>>> --
>>> Roberto
>>>
>>> 2017-12-07 12:44 GMT+01:00 Roberto Nunin <robnunin at gmail.com>:
>>>
>>>> Hi
>>>>
>>>> after successfully deployed fresh 4.2_rc with ovirt node, I'm facing
>>>> with a blocking problem.
>>>>
>>>> hosted-engine won't boot. Reaching the console via vnc hook, I can see
>>>> that it is at initial boot screen, but for any OS release available, I
>>>> receive:
>>>>
>>>> [image: Immagine incorporata 1]
>>>> then
>>>> [image: Immagine incorporata 2]
>>>>
>>>> ​googling around, I'm not able to find suggestions. Any hints ?
>>>>
>>>> Thanks​
>>>>
>>>> --
>>>> Roberto
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171211/8e0df97f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 18183 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171211/8e0df97f/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 9015 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171211/8e0df97f/attachment-0001.png>


More information about the Users mailing list