[ovirt-users] 4-2rc hosted-engine don't boot error:cannot allocate kernel buffer

Mon Dec 11 09:08:42 UTC 2017

On Mon, Dec 11, 2017 at 9:47 AM, Roberto Nunin <robnunin at gmail.com> wrote:

> Hello all
>
> during weekend, I've re-tried to deploy my 4.2_rc lab.
> Everything was fine, apart the fact host 2 and 3 weren't imported. I had
> to add them to the cluster manually, with the NEW function.
> After this Gluster volumes were added fine to the environment.
>
> Next engine deploy on nodes 2 and 3, ended with ok status.
>
> Tring to migrate HE from host 1 to host 2 was fine, the same from host 2
> to host 3.
>
> After these two attempts, no way to migrate HE back to any host.
> Tried Maintenance mode set to global, reboot the HE and now I'm in the
> same condition reported below, not anymore able to boot the HE.
>
> Here's hosted-engine --vm-status:
>
> !! Cluster is in GLOBAL MAINTENANCE mode !!
>
>
>
> --== Host 1 status ==--
>
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : aps-te61-mng.example.com
> Host ID                            : 1
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 7dfc420b
> local_conf_timestamp               : 181953
> Host timestamp                     : 181952
> Extra metadata (valid at timestamp):
>         metadata_parse_version=1
>         metadata_feature_version=1
>         timestamp=181952 (Mon Dec 11 09:21:46 2017)
>         host-id=1
>         score=3400
>         vm_conf_refresh_time=181953 (Mon Dec 11 09:21:47 2017)
>         conf_on_shared_storage=True
>         maintenance=False
>         state=GlobalMaintenance
>         stopped=False
>
>
> --== Host 2 status ==--
>
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : aps-te64-mng.example.com
> Host ID                            : 2
> Engine status                      : {"reason": "vm not running on this
> host", "health": "bad", "vm": "down", "detail": "unknown"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 67c7dd1d
> local_conf_timestamp               : 181946
> Host timestamp                     : 181946
> Extra metadata (valid at timestamp):
>         metadata_parse_version=1
>         metadata_feature_version=1
>         timestamp=181946 (Mon Dec 11 09:21:49 2017)
>         host-id=2
>         score=3400
>         vm_conf_refresh_time=181946 (Mon Dec 11 09:21:49 2017)
>         conf_on_shared_storage=True
>         maintenance=False
>         state=GlobalMaintenance
>         stopped=False
>
>
> --== Host 3 status ==--
>
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : aps-te68-mng.example.com
> Host ID                            : 3
> Engine status                      : {"reason": "failed liveliness check",
> "health": "bad", "vm": "up", "detail": "Up"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 4daea041
> local_conf_timestamp               : 181078
> Host timestamp                     : 181078
> Extra metadata (valid at timestamp):
>         metadata_parse_version=1
>         metadata_feature_version=1
>         timestamp=181078 (Mon Dec 11 09:21:53 2017)
>         host-id=3
>         score=3400
>         vm_conf_refresh_time=181078 (Mon Dec 11 09:21:53 2017)
>         conf_on_shared_storage=True
>         maintenance=False
>         state=GlobalMaintenance
>         stopped=False
>
>
> !! Cluster is in GLOBAL MAINTENANCE mode !!
>
> (it is in global maintenance to avoid messages to be sent to admin
> mailbox).
>

As soon as you exit the global maintenance mode, one of the hosts should
take care of automatically restarting the engine VM within a couple of
minutes.

If you want to manually start the engine VM over a specific host while in
maintenance mode you can use:
    hosted-engine --vm-start
on the specific host

>
> Engine image is available on all three hosts, gluster is working fine:
>
> Volume Name: engine
> Type: Replicate
> Volume ID: 95355a0b-1f45-4329-95c7-604682e812d0
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: aps-te61-mng.example.com:/gluster_bricks/engine/engine
> Brick2: aps-te64-mng.example.com:/gluster_bricks/engine/engine
> Brick3: aps-te68-mng.example.com:/gluster_bricks/engine/engine
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.low-prio-threads: 32
> network.remote-dio: off
> cluster.eager-lock: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 10000
> features.shard: on
> user.cifs: off
> storage.owner-uid: 36
> storage.owner-gid: 36
> network.ping-timeout: 30
> performance.strict-o-direct: on
> cluster.granular-entry-heal: enable
> features.shard-block-size: 64MB
>
> Engine qemu image seems to be ok:
>
> [root at aps-te68-mng 9998de26-8a5c-4495-b450-c8dfc1e016da]# l
> total 2660796
> -rw-rw----. 1 vdsm kvm 53687091200 Dec  8 17:22 35ac0f88-e97d-4710-a385-
> 127c751a3190
> -rw-rw----. 1 vdsm kvm     1048576 Dec 11 09:04 35ac0f88-e97d-4710-a385-
> 127c751a3190.lease
> -rw-r--r--. 1 vdsm kvm         285 Dec  8 11:19 35ac0f88-e97d-4710-a385-
> 127c751a3190.meta
> [root at aps-te68-mng 9998de26-8a5c-4495-b450-c8dfc1e016da]# qemu-img info
> 35ac0f88-e97d-4710-a385-127c751a3190
> image: 35ac0f88-e97d-4710-a385-127c751a3190
> file format: raw
> virtual size: 50G (53687091200 bytes)
> disk size: 2.5G
> [root at aps-te68-mng 9998de26-8a5c-4495-b450-c8dfc1e016da]#
>
> Attached agent and broker logs from the last host where HE startup was
> attempted. Only last two hours.
>
> Any hint to perform further investigation ?
>
> Thanks in advance.
>
> Environment: 3 HPE Proliant BL680cG7, OS on mirrored volume 1, gluster on
> mirrored volume 2, 1 TB for each server.
> Multiple network adapters (6), configured only one.
> I've used last ovirt-node-ng iso image : ovirt-node-ng-installer-ovirt-
> 4.2-pre-2017120512 <(201)%20712-0512>.iso and ovirt-hosted-engine-setup-
> 2.2.1-0.0.master.20171206123553.git94f4c9e.el7.centos.noarch to work
> around HE static ip address not masked correctly.
>
> 
> --
> Roberto
>
> 2017-12-07 12:44 GMT+01:00 Roberto Nunin <robnunin at gmail.com>:
>
>> Hi
>>
>> after successfully deployed fresh 4.2_rc with ovirt node, I'm facing with
>> a blocking problem.
>>
>> hosted-engine won't boot. Reaching the console via vnc hook, I can see
>> that it is at initial boot screen, but for any OS release available, I
>> receive:
>>
>> [image: Immagine incorporata 1]
>> then
>> [image: Immagine incorporata 2]
>>
>> googling around, I'm not able to find suggestions. Any hints ?
>>
>> Thanks
>>
>> --
>> Roberto
>>
>>
>>
>>
>
>
> 
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171211/5efed19e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 9015 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171211/5efed19e/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 18183 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20171211/5efed19e/attachment-0001.png>