Hosted Engine VM - CPU and Memory Sizing
by paul.christian.suba@cevalogistics.com
Hi,
Is there a recommended CPU and memory size for the Hosted Engine VM? We have what started as a 4 node physical cluster lab with 4 vms that has now grown to 44vms. The dashboard is slow to load information and the HE VM is consistently seen with 99% CPU with the breakdown below.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12076 postgres 20 0 506516 276172 143772 R 99.3 1.7 206:17.15 postmaster
10337 postgres 20 0 484756 254156 144000 R 99.0 1.6 179:52.35 postmaster
38603 postgres 20 0 500068 267836 143992 S 70.1 1.6 528:04.13 postmaster
49217 postgres 20 0 468736 235484 143624 S 19.3 1.4 41:57.55 postmaster
5569 ovirt 20 0 6430912 2.3g 6368 S 1.3 14.5 894:20.75 java
We used the default 4 CPU and 16 GB RAM.
This is oVirt 4.3.1.
I am curious to find out as well if the postgres processes using 99% CPU is normal?
3 years, 10 months
ovirt-engine-appliance ova
by jingjie.jiang@oracle.com
Hi,
Can someone tell me how to generate ovirt-engine-appliance ova file in ovirt-engine-appliance-4.3-20190610.1.el7.x86_64.rpm?
I tried to import ovirt-engine-appliance ova(ovirt-engine-appliance-4.3-20190610.1.el7.ova) from ovirt-engine, but I got error as following:
Failed to load VM configuration from OVA file: /var/tmp/ovirt-engine-appliance-4.2-20190121.1.el7.ova
I guess ovirt-engine-appliance-4.2-20190121.1.el7.ova has more than CentOS7.6.
Thanks,
Jingjie
3 years, 10 months
Re: major network changes
by Strahil
According to another post in the mailing list, the Engine Hosts (that has ovirt-ha-agent/ovirt-ha-broker running) is checking http://{fqdn}/ovirt-engine/services/health
As the IP is changed, I think you need to check the URL before and after thr mifgration.
Best Regards,
Strahil NikolovOn Jul 23, 2019 16:41, Derek Atkins <derek(a)ihtfp.com> wrote:
>
> Hi,
>
> If I understand it correctly, the HE Hosts try to ping (or SSH, or
> otherwise reach) the Engine host. If it reaches it, then it passes the
> liveness check. If it cannot reach it, then it fails. So to me this error
> means that there is some configuration, somewhere, that is trying to reach
> the engine on the old address (which fails when the engine has the new
> address).
>
> I do not know where in the *host* configuration this data lives, so I
> cannot suggest where you need to change it.
>
> Can 10.16.248.x reach 10.8.236.x and vice-versa?
>
> Maybe multi-home the engine on both networks for now until you figure it out?
>
> -derek
>
> On Tue, July 23, 2019 9:13 am, carl langlois wrote:
> > Hi,
> >
> > We have managed to stabilize the DNS udpate in out network. Now the
> > current
> > situation is.
> > I have 3 hosts that can run the engine (hosted-engine).
> > They were all in the 10.8.236.x. Now i have moved one of them in the
> > 10.16.248.x.
> >
> > If i boot the engine on one of the host that is in the 10.8.236.x the
> > engine is going up with status "good". I can access the engine UI. I can
> > see all my hosts even the one in the 10.16.248.x network.
> >
> > But if i boot the engine on the hosted-engine host that was switch to the
> > 10.16.248.x the engine is booting. I can ssh to it but the status is
> > always
> > " fail for liveliness check".
> > The main difference is that when i boot on the host that is in the
> > 10.16.248.x network the engine gets a address in the 248.x network.
> >
> > On the engine i have this in the
> > /var/log/ovirt-engine-dwh/ovirt-engine-dwhd.log
> > 019-07-23
> > 09:05:30|MFzehi|YYTDiS|jTq2w8|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|5|tWarn|tWarn_1|Can
> > not sample data, oVirt Engine is not updating the statistics. Please check
> > your oVirt Engine status.|9704
> > the engine.log seems okey.
> >
> > So i need to understand what this " liveliness check" do(or try to do) so
> > i
> > can investigate why the engine status is not becoming good.
> >
> > The initial deployment was done in the 10.8.236.x network. Maybe is as
> > something to do with that.
> >
> > Thanks & Regards
> >
> > Carl
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Jul 18, 2019 at 8:53 AM Miguel Duarte de Mora Barroso <
> > mdbarroso(a)redhat.com> wrote:
> >
> >> On Thu, Jul 18, 2019 at 2:50 PM Miguel Duarte de Mora Barroso
> >> <mdbarroso(a)redhat.com> wrote:
> >> >
> >> > On Thu, Jul 18, 2019 at 1:57 PM carl langlois <crl.langlois(a)gmail.com>
> >> wrote:
> >> > >
> >> > > Hi Miguel,
> >> > >
> >> > > I have managed to change the config for the ovn-controler.
> >> > > with those commands
> >> > > ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=ssl:
> >> 10.16.248.74:6642
> >> > > ovs-vsctl set Open_vSwitch . external-ids:ovn-encap-ip=10.16.248.65
> >> > > and restating the services
> >> >
> >> > Yes, that's what the script is supposed to do, check [0].
> >> >
> >> > Not sure why running vdsm-tool didn't work for you.
> >> >
> >> > >
> >> > > But even with this i still have the "fail for liveliness check" when
> >> starting the ovirt engine. But one thing i notice with our new network
> >> is
> >> that the reverse DNS does not work(IP -> hostname). The forward is
> >> working
> >> fine. I am trying to see with our IT why it is not working.
> >> >
> >> > Do you guys use OVN? If not, you could disable the provider, install
> >> > the hosted-engine VM, then, if needed, re-add / re-activate it .
> >>
> >> I'm assuming it fails for the same reason you've stated initially -
> >> i.e. ovn-controller is involved; if it is not, disregard this msg :)
> >> >
> >> > [0] -
> >> https://github.com/oVirt/ovirt-provider-ovn/blob/master/driver/scripts/se...
> >> >
> >> > >
> >> > > Regards.
> >> > > Carl
> >>
3 years, 10 months
Active-Passive DR: mutual for different storage domains possible?
by Gianluca Cecchi
Hello,
suppose I want to implement Active-Passive DR between 2 sites.
Sites are SiteA and SiteB.
I have 2 storage domains SD1 and SD2, that I can configure so that SD1 is
active in storage array installed in SiteA with replica in SiteB and SD2
the reverse.
I have 4 hosts: host1 and host2 in SiteA and host3 and host4 in SiteB.
I would like to optimize compute resources and workload so that:
oVirt env OV1 with ovmgr1 in SiteA (external engine) is composed by host1
and host2 and configured with SD1
oVirt env OV2 with ovmgr2 in SiteB (external engine) is composed by host3
and host4 and configured with SD2.
Can I use OV2 as DR for OV1 for VMs installed on SD1 and at the same time
OV1 as DR for OV2 for VMs installed on SD2?
Thanks in advance,
Gianluca Cecchi
3 years, 10 months
Failed Installation
by Marcello Gentile
I’m installing a self-hosted engine on CentOS7.6 using cockpit. Keep getting this error
[ INFO ] TASK [ovirt.hosted_engine_setup : Wait for the host to be up]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Check host status]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, fix accordingly and re-deploy.\n"}
[ INFO ] TASK [ovirt.hosted_engine_setup : Fetch logs from the engine VM]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Set destination directory path]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Create destination directory]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Find the local appliance image]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Set local_vm_disk_path]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Give the vm time to flush dirty buffers]
[ INFO ] ok: [localhost -> localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Copy engine logs]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove local vm dir]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Remove temporary entry in /etc/hosts for the local VM]
[ INFO ] changed: [localhost]
[ INFO ] TASK [ovirt.hosted_engine_setup : Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n”}
The logs don’t tell me much or at least the ones I’m checking
Any help would be greatly appreciated
Thanks
3 years, 10 months
iSCSI-based storages won't login to the portals with all IPs on reboot
by nicolas@devels.es
Hi,
We're running oVirt 4.3.2. Currently, we have one storage backend
(cabinet) with two controllers, each of them with 2 network interfaces
(4 network interfaces in total). When we added the Storage Domain, we
discovered the target for each of the 4 IPs and marked the LUN so it
would be added with 4 different IPs.
When we put a host on maintenance, all the paths are deactivated, and
when we activate it back it discovers all the 4 paths for the storage
backend. However, if we reboot the host, on activation it only activates
one path. We can see this running 'multipath -ll'.
We can manually activate the rest of the paths using this command for
each of the IPs:
# iscsiadm --mode discovery --type sendtargets --portal 10.X.X.X
--login
However, we wonder why oVirt wouldn't log into each of the IPs upon a
boot. Is there something we're missing? Can this be fixed manually?
Currently we're running a script on boot that will issue the command
above for each of the IPs of the cabinet.
Thanks for any help!
3 years, 10 months
template permissions not inherited (4.3.4)
by Timmi
Hi oVirt List,
I have just a quick question if I should open a ticket for this or if
I'm doing something wrong.
I created a new VM template with specific permissions in addition to the
system wide permissions. If I create a new VM with the template I
notices that only system permissions are copied to the permission of the
new VM.
Is this the intended behavior? I was somehow under the impression that
the permission from the template should have been copied to the newly
created VM.
Tested with Version 4.3.4.3-1.el7
Best regards
Christoph
3 years, 10 months
if hosted engine corrupted
by Crazy Ayansh
Hi All,
What is the way if self hosted engine corrupted in any environment and we
need to setup it from scratch however on the new engine we need the same
hosts and virtual machine without loosing data.
Thanks
Shashakn
3 years, 10 months
reinstallation information
by nikkognt@gmail.com
Hi,
after a blackout one host of my ovirt not working properly. I tried to Reinstall it but ends with the following error "Failed to install Host ov1. Failed to execute stage 'Closing up': Failed to start service 'vdsmd'." I tried to start manually but it not start .
Now, I would like to reinstall from iso ovirt node.
After I put the host is in maintenance must I remove host from the cluster (Hosts -> host1 -> Remove) or I can reinstall without remove it?
If I remore it from the cluster the network configurations I lose them or not?
My ovirt version is oVirt Engine Version: 4.1.9.1-1.el7.centos.
3 years, 10 months
Re: Storage domain 'Inactive' but still functional
by Strahil
I forgot to mention that LVM config has to be modified in order to 'inform' local LVM stack to rely on clvmd/dlm for locking purposes.
Yet, this brings abother layer of complexity which I prefer to avoid , thus I use HA-LVM on my pacemaker clusters.
@Martin,
Check the link from Benny and if possible check if the 2 cases are related.
Best Regards,
Strahil NikolovOn Jul 24, 2019 11:07, Benny Zlotnik <bzlotnik(a)redhat.com> wrote:
>
> We have seen something similar in the past and patches were posted to deal with this issue, but it's still in progress[1]
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1553133
>
> On Mon, Jul 22, 2019 at 8:07 PM Strahil <hunter86_bg(a)yahoo.com> wrote:
>>
>> I have a theory... But after all without any proof it will remain theory.
>>
>> The storage volumes are just VGs over a shared storage.The SPM host is supposed to be the only one that is working with the LVM metadata, but I have observed that when someone is executing a simple LVM command (for example -lvs, vgs or pvs ) while another one is going on on another host - your metadata can corrupt, due to lack of clvmd.
>>
>> As a protection, I could offer you to try the following solution:
>> 1. Create new iSCSI lun
>> 2. Share it to all nodes and create the storage domain. Set it to maintenance.
>> 3. Start dlm & clvmd services on all hosts
>> 4. Convert the VG of your shared storage domain to have a 'cluster'-ed flag:
>> vgchange -c y mynewVG
>> 5. Check the lvs of that VG.
>> 6. Activate the storage domain.
>>
>> Of course test it on a test cluster before inplementing it on Prod.
>> This is one of the approaches used in Linux HA clusters in order to avoid LVM metadata corruption.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> On Jul 22, 2019 15:46, Martijn Grendelman <Martijn.Grendelman(a)isaac.nl> wrote:
>>>
>>> Hi,
>>>
>>> Op 22-7-2019 om 14:30 schreef Strahil:
>>>>
>>>> If you can give directions (some kind of history) , the dev might try to reproduce this type of issue.
>>>>
>>>> If it is reproduceable - a fix can be provided.
>>>>
>>>> Based on my experience, if something as used as Linux LVM gets broken, the case is way hard to reproduce.
>>>
>>>
>>> Yes, I'd think so too, especially since this activity (online moving of disk images) is done all the time, mostly without problems. In this case, there was a lot of activity on all storage domains, because I'm moving all my storage (> 10TB in 185 disk images) to a new storage platform. During the online move of one the images, the metadata checksum became corrupted and the storage domain went offline.
>>>
>>> Of course, I could dig up the engine logs and vdsm logs of when it happened, but that would be some work and I'm not very confident that the actual cause would be in there.
>>>
>>> If any oVirt devs are interested in the logs, I'll provide them, but otherwise I think I'll just see it as an incident and move on.
>>>
>>> Best regards,
>>> Martijn.
>>>
>>>
>>>
>>>
>>> On Jul 22, 2019 10:17, Martijn Grendelman <Martijn.Grendelman(a)isaac.nl> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks for the tips! I didn't know about 'pvmove', thanks.
>>>>>
>>>>> In the mean time, I managed to get it fixed by restoring the VG metadata on the iSCSI server, so on the underlying Zvol directly, rather than via the iSCSI session on the oVirt host. That allowed me to perform the restore without bringing all VMs down, which was important to me, because if I had to shut down VMs, I was sure I wouldn't be able to restart them before the storage domain was back online.
>>>>>
>>>>> Of course this is a more a Linux problem than an oVirt problem, but oVirt did cause it ;-)
>>>>>
>>>>> Thanks,
>>>>> Martijn.
>>>>>
>>>>>
>>>>>
>>>>> Op 19-7-2019 om 19:06 schreef Strahil Nikolov:
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> First check what went wrong with the VG -as it could be something simple.
>>>>>> vgcfgbackup -f VGname will create a file which you can use to compare current metadata with a previous version.
>>>>>>
>>>>>> If you have Linux boxes - you can add disks from another storage an
3 years, 10 months