A "memory" issue when trying to start VM on oVirt 4.3
by Vrgotic, Marko
Dear oVirt,
We are running 4.3.3 oVirt with SHE on three CentOS nodes for HA – with NFS storage.
We also have two Local Storage DCs managed by same SHE.
[cid:image001.png@01D50A59.F4276FD0]
The issue we are reporting was present on 4.3.2 as well.
The memory usage on both hosts (third one is in maintenance mode) does not go above 45%:
[cid:image002.png@01D50A59.F4276FD0]
I have also attached the UI Dashboard screenshot.
However, when trying to start one VM:
[cid:image003.png@01D50A59.F4276FD0]
Engine reports that there is not enough memory and writes the following in the engine.log:
2019-05-14 11:23:10,809Z INFO [org.ovirt.engine.core.bll.RunVmCommand] (default task-14) [63aa9817-80fd-4b18-8c86-26c910a2954a] Lock Acquired to object 'EngineLock:{exclusiveLocks='[13b2299f-b4b4-48e4-b2f9-09450eff235d=VM]', sharedLocks=''}'
2019-05-14 11:23:10,819Z INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-14) [63aa9817-80fd-4b18-8c86-26c910a2954a] START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='13b2299f-b4b4-48e4-b2f9-09450eff235d'}), log id: 5f80129f
2019-05-14 11:23:10,819Z INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-14) [63aa9817-80fd-4b18-8c86-26c910a2954a] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 5f80129f
2019-05-14 11:23:10,833Z INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-14) [63aa9817-80fd-4b18-8c86-26c910a2954a] Candidate host 'ovirt-staging-hv-01' ('d81b296a-f3b2-49ea-9b4d-04c90f3508e9') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2019-05-14 11:23:10,833Z INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (default task-14) [63aa9817-80fd-4b18-8c86-26c910a2954a] Candidate host 'ovirt-staging-hv-02' ('f4fdeccd-4cf4-4ed7-9eb1-d98ac65c4ce6') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: null)
2019-05-14 11:23:10,838Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-14) [63aa9817-80fd-4b18-8c86-26c910a2954a] EVENT_ID: USER_FAILED_RUN_VM(54), Failed to run VM csm2-216-d due to a failed validation: [Cannot run VM. There is no host that satisfies current scheduling constraints. See below for details:, The host ovirt-staging-hv-01 did not satisfy internal filter Memory because its available memory is too low (0 MB) to run the VM., The host ovirt-staging-hv-01 did not satisfy internal filter Memory because its available memory is too low (0 MB) to run the VM.] (User: admin@internal-authz).
2019-05-14 11:23:10,838Z WARN [org.ovirt.engine.core.bll.RunVmCommand] (default task-14) [63aa9817-80fd-4b18-8c86-26c910a2954a] Validation of action 'RunVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName ovirt-staging-hv-01,$filterName Memory,$availableMem 0,VAR__DETAIL__NOT_ENOUGH_MEMORY,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL,VAR__FILTERTYPE__INTERNAL,$hostName ovirt-staging-hv-02,$filterName Memory,$availableMem 0,VAR__DETAIL__NOT_ENOUGH_MEMORY,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL
2019-05-14 11:23:10,838Z INFO [org.ovirt.engine.core.bll.RunVmCommand] (default task-14) [63aa9817-80fd-4b18-8c86-26c910a2954a] Lock freed to object 'EngineLock:{exclusiveLocks='[13b2299f-b4b4-48e4-b2f9-09450eff235d=VM]', sharedLocks=''}'
Current memory optimization is set to “None”
Please let me know if you need additional log files.
Kindly awaiting your reply.
Marko Vrgotic
Sr. System Engineer
5 years, 6 months
Re: Gluster volume heals and after 5 seconds has /dom_md/ids dirty again
by Strahil
I have just noticed this behaviour.
I stopped and redesigned my brick (addded vdo layer) and after the heal (1000+ entries got sync) the dom_ids need healing.
Running 'gluster volume heal volume_name full' heals the file and 1 min later it happens again.
I will try to do a rolling reboot and see if the issues got fixes.
Best Regards,
Strahil NikolovOn May 13, 2019 18:14, Darrell Budic <budic(a)onholyground.com> wrote:
>
> I see this sometimes after rebooting a server, and it usually stops happening, generally within a few hours, I’ve never tracked it down further. Don’t know for sure, but I assume it’s related to healing and goes away once everything syncs up.
>
> Occasionally it turns out to be a communications problem between servers (usually an update to something screws up my firewall), so I always check my peer status when I see it and make sure all servers are talking to each other.
>
> > On May 13, 2019, at 4:13 AM, Andreas Elvers <andreas.elvers+ovirtforum(a)solutions.work> wrote:
> >
> > I restored my engine to a gluster volume named :/engine on a three node hyperconverged oVirt 4.3.3.1 cluster. Before restoring I was checking the status of the volumes. They were clean. No heal entries. All peers connected. gluster volume status looked good. Then I restored. This went well. The engine is up. But the engine gluster volume shows entries on node02 and node03. The engine was installed to node01. I have to deploy the engine to the other two hosts to reach full HA, but I bet maintenance is not possible until the volume is healed.
> >
> > I tried "gluster volume heal engine" also with added "full". The heal entries will disappear for a few seconds and then /dom_md/ids will pop up again. The __DIRECT_IO_TEST__ will join later. The split-brain info has no entries. Is this some kind of hidden split brain? Maybe there is data on node01 brick which got not synced to the other two nodes? I can only speculate. Gluster docs say: this should heal. But it doesn't. I have two other volumes. Those are fine. One of them containing 3 VMs that are running. I also tried to shut down the engine, so no-one was using the volume. Then heal. Same effect. Those two files will always show up. But none other. Heal can always be started successfully from any of the participating nodes.
> >
> > Reset the volume bricks one by one and cross fingers?
> >
> > [root@node03 ~]# gluster volume heal engine info
> > Brick node01.infra.solutions.work:/gluster_bricks/engine/engine
> > Status: Connected
> > Number of entries: 0
> >
> > Brick node02.infra.solutions.work:/gluster_bricks/engine/engine
> > /9f4d5ae9-e01d-4b73-8b6d-e349279e9782/dom_md/ids
> > /__DIRECT_IO_TEST__
> > Status: Connected
> > Number of entries: 2
> >
> > Brick node03.infra.solutions.work:/gluster_bricks/engine/engine
> > /9f4d5ae9-e01d-4b73-8b6d-e349279e9782/dom_md/ids
> > /__DIRECT_IO_TEST__
> > Status: Connected
> > Number of entries: 2
> > _______________________________________________
> > Users mailing list -- users(a)ovirt.org
> > To unsubscribe send an email to users-leave(a)ovirt.org
> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> > oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> > List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/L3YCRPRAGPU...
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/6XOCRXRCQOU...
5 years, 6 months
Gluster volume heals and after 5 seconds has /dom_md/ids dirty again
by Andreas Elvers
I restored my engine to a gluster volume named :/engine on a three node hyperconverged oVirt 4.3.3.1 cluster. Before restoring I was checking the status of the volumes. They were clean. No heal entries. All peers connected. gluster volume status looked good. Then I restored. This went well. The engine is up. But the engine gluster volume shows entries on node02 and node03. The engine was installed to node01. I have to deploy the engine to the other two hosts to reach full HA, but I bet maintenance is not possible until the volume is healed.
I tried "gluster volume heal engine" also with added "full". The heal entries will disappear for a few seconds and then /dom_md/ids will pop up again. The __DIRECT_IO_TEST__ will join later. The split-brain info has no entries. Is this some kind of hidden split brain? Maybe there is data on node01 brick which got not synced to the other two nodes? I can only speculate. Gluster docs say: this should heal. But it doesn't. I have two other volumes. Those are fine. One of them containing 3 VMs that are running. I also tried to shut down the engine, so no-one was using the volume. Then heal. Same effect. Those two files will always show up. But none other. Heal can always be started successfully from any of the participating nodes.
Reset the volume bricks one by one and cross fingers?
[root@node03 ~]# gluster volume heal engine info
Brick node01.infra.solutions.work:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0
Brick node02.infra.solutions.work:/gluster_bricks/engine/engine
/9f4d5ae9-e01d-4b73-8b6d-e349279e9782/dom_md/ids
/__DIRECT_IO_TEST__
Status: Connected
Number of entries: 2
Brick node03.infra.solutions.work:/gluster_bricks/engine/engine
/9f4d5ae9-e01d-4b73-8b6d-e349279e9782/dom_md/ids
/__DIRECT_IO_TEST__
Status: Connected
Number of entries: 2
5 years, 6 months
Re: oVirt Open Source Backup solution?
by Strahil
Another option is to create a snapshot, backup the snapahot and merge the disks (delete the snapshot actually).
Sadly that option doesn't work with Databases, as you might inyerrupt a transaction and leave the DB in inconsistent state.
Best Regards,
Strahil Nikolov
On May 10, 2019 15:37, Derek Atkins <derek(a)ihtfp.com> wrote:
>
> Hi,
>
> Michael Blanchard <michael(a)wanderingmad.com> writes:
>
> > If you haven't seen my other posts, I'm not a very experienced Linux admin, so
> > I'm trying to make it as easy as possible to run and maintain. It's hard
> > enough for me to not break ovirt in crazy ways
>
> This has nothing to do with ovirt.
>
> You could use rdiff-backup on any running machine, be it virtual or bare
> metal. It's just a way to use a combination of diff and rsync to backup
> machines. Indeed, I was using it with my vmware-based systems and, when
> I migrated them to ovirt, the backups just continued working.
>
> > Get Outlook for Android
>
> -derek
> --
> Derek Atkins 617-623-3745
> derek(a)ihtfp.com www.ihtfp.com
> Computer and Internet Security Consultant
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/CBMHSCOBQ3M...
5 years, 6 months
New to OVirt
by Slobodan Stevanovic
I am having hard time installing OVIrt.
Gluster deployment for the single node sometimes work, but host engine deployment never works.
Any one has suggestion on which release of OVirt to use as a beginner.
5 years, 6 months
Permission Problem for admin portal
by s.gay@ovea.com
Hello,
I'm using the lastest ovirt release.
I would like to add a cluster permission for a user.
This user should only see this cluster and not other information or other cluster
I've create a new user in internal or external
When the ClusterAdmin role is added in the permission tab of a cluster, the user can access the entire infrastructure through the user portal or administrator portal.
It can modify other VMs, ...
I checked the other cluster and the user does not appear in the permissions
When I add the UserRole role to the cluster and remove the ClusterAdmin role, it sees only the VMs in the cluster in the user portal
When the user creating a VM, he can see all the cluster in the drop-down window for cluster selection
It's possible to add only a cluster permission for users ?
Thanks
5 years, 6 months
HostedEngine migration from ovirt1:/engine to gluster1:/engine (new ip)
by Strahil Nikolov
Hello Community,
I have added new interfaces and bonded them in order to split storage from oVirt traffic and I have one last issue.
In Storage -> Storage Domains , I have the "hosted_storage" domain that is pointing to the old "ovirt1.localdomain:/engine" instead of "gluster1:/engine".
I have managed to reconfigure the ha agent to bring the new storage , but it seems the engine mounts the old gluster path and this causes problem with the ha agent .
How can I edit the "hosted_storage" in a safe manner to point to "gluster1:/engine" and mount options of "backup-volfile-servers=gluster2:ovirt3".
Should I edit the DB ?
P.S.: My google skills did not show any results on this topic and thus I'm raising it to the mail list.Thanks in advance.
Best Regards,Strahil Nikolov
5 years, 6 months
Upgrade success story (4.3.0->4.3.3)
by Juhani Rautiainen
Hi!
Just wanted to let you know that I succesfully upgrade our cluster to
4.3.3 (FC SAN based system and no Gluster). I postponed upgrade a lot
because of the difficulties I had when upgrading from 4.2.8->4.3.0.
This time Hosted Engine upgrade went without any problems. The first
node also upgraded itself from admin UI without any problems. The
second node was different story. I couldn't upgrade it because yum
complained about missing packages. This was mysterious as I just had
upgraded the other node. I bypassed this by taking ovirt-4.3 repos
from other node since diffing files gave results that they differed.
After that node upgraded from UI but went missing after reboot.
Contacting it via ILO I found out that it had turned to DHCP again
(which it never should do). Somehow settings in VDSM persistent
settings were still wrong and pointing to DHCP (it has always been in
fixed address). Fixed this by hand by using vdsm persistent settings
from the running node. Now the node reboots reliably.
Anyway thanks for all the good work oVirt people put in. Small
problems here and there but it's getting better by the release.
-Juhani
5 years, 6 months
Node losing management network address?
by Juhani Rautiainen
Hi!
I had weird occurence in my two node ovirt cluster today (I have HE).
I noticed that one node had ovirtmgmt network unsynchronized. I tried
to resynchronize it. This led the node being rebooted by HP ILO. After
reboot the node came up with DHCP address. Tried to change it back by
fixing ifcfg-ovirtmgmt to original static address. It reverter back to
DHCP if I tried the resync the network. I decided to remove HE from
the node in order to remove the node in order to be able to add it
back. After I started HE removal, address popped back to static
address. I did upgrade which was pending on node and after reboot it
came back with DHCP address again. After this I removed the node from
the cluster, added it back and now it seems to work. I'm just
wondering how I prevent this from happening again? How this unsync
situation happens for instance and why it decides that DHCP is the way
to go?
-Juhani
5 years, 6 months