Re: Scenario: Ovirt-engine on hardware and recovery
by Strahil
As this is not a VM, you won't have to worry about putting 1 host in maintenance.
I make backups lile this:
engine-backup --mode=backup --scope=all --file=/root/engine-bkp-empty-ovirt1-ver4.3.7-rc4--2019-11-30 --log=/var/log/engine-backup.log
I'm specifying the host that will be used for restore, which is needed when the engine is VM.
In your bare-metal case - you just need to recover the OS/harfware and restore from backup.
Best Regards,
Strahil NikolovOn Dec 3, 2019 04:36, rwebb(a)ropeguru.com wrote:
>
> So thinking about my setup, I am thinking through different failure scenarios.
>
> So lets say I have a small physical server with 8 cores and 16GB RAM and I install Centos 7 and ovirt-engine on bare metal. This would also be the same scenario is the engine were on a VM.
>
> I run into a major hardware issue and completely lose the engine. How does one recover the cluster setup and not have to start from scratch by having to rebuild all the nodes? Can the engine just be rebuilt and the ovirt nodes be imported?
>
> This scenario is based on the nodes being built from the ovirt node iso.
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/XBFYDGZNCT4...
5 years
Re: host failed to attach one of the storage domains attached to it
by Strahil
Hi Daniel,
The hosts go to unresponsive mode as one of the gluster volumes cannot be mounted.
Verify that all host can mount (over /mnt) all gluster volumes. If they do, just set the hosts into maintenance and then activate them.
Best Regards,
Strahil NikolovOn Dec 2, 2019 12:03, Daniel Menzel <daniel.menzel(a)menzel-it.net> wrote:
>
> Dear all,
>
> after a GlusterFS reboot of two servers out of three (and thus a loss of quorum) by an admin we've got strange problems:
>
> The hosted engine and one export work perfectly fine. (Same three servers!)
> Another export (main one) seems to be fine within GlusterFS itself. But: When we activate this domain most of the hosts go into "non operational" with the subject's error message. As soon as we deactivate this domain all those hosts come back online. Strange thing: The GlusterFS export seems to be mounted on the SPM.
>
> Does anyone know what could have happened and how to fix that?
>
> Kind regards
> Daniel
5 years
Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing
by Strahil
Hi Jiffin,
Thanks for the info.
As I'm now on Gluster v7 , I hope it won't happen again.
It's nice to know it got fixed.
Best Regards,
Strahil NikolovOn Dec 2, 2019 11:32, Jiffin Thottan <jthottan(a)redhat.com> wrote:
>
> Hi Krutika,
>
> Apparently, in context acl info got corrupted see brick logs
>
> [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
> client:
> CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
> gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
> req(uid:107,gid:107,perm:1,ngrps:4),
> ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
> [Permission denied]
>
> which resulted in the situation. There was one bug similar was reported https://bugzilla.redhat.com/show_bug.cgi?id=1668286 and
>
> it got fixed in 6.6 release IMO https://review.gluster.org/#/c/glusterfs/+/23233/. But here he mentioned he saw the issue when
>
> he upgraded from 6.5 to 6.6
>
> One way to workaround is to perform a dummy setfacl(preferably using root) on the corrupted files which will forcefully fetch the acl
>
> info again from backend and update the context. Another approach to restart brick process(kill and vol start force)
>
> Regards,
> Jiffin
>
> ----- Original Message -----
> From: "Krutika Dhananjay" <kdhananj(a)redhat.com>
> To: "Strahil Nikolov" <hunter86_bg(a)yahoo.com>, "Jiffin Thottan" <jthottan(a)redhat.com>, "raghavendra talur" <rtalur(a)redhat.com>
> Cc: "Nir Soffer" <nsoffer(a)redhat.com>, "Rafi Kavungal Chundattu Parambil" <rkavunga(a)redhat.com>, "users" <users(a)ovirt.org>, "gluster-user" <gluster-users(a)gluster.org>
> Sent: Monday, December 2, 2019 11:48:22 AM
> Subject: Re: [ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing
>
> Sorry about the late response.
>
> I looked at the logs. These errors are originating from posix-acl
> translator -
>
>
>
> *[2019-11-17 07:55:47.090065] E [MSGID: 115050]
> [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162496:
> LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.6
> (be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.6),
> client:
> CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
> error-xlator: data_fast-access-control [Permission denied][2019-11-17
> 07:55:47.090174] I [MSGID: 139001]
> [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
> client:
> CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
> gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
> req(uid:36,gid:36,perm:1,ngrps:3),
> ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
> [Permission denied][2019-11-17 07:55:47.090209] E [MSGID: 115050]
> [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162497:
> LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.7
> (be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.7),
> client:
> CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
> error-xlator: data_fast-access-control [Permission denied][2019-11-17
> 07:55:47.090299] I [MSGID: 139001]
> [posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
> client:
> CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
> gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
> req(uid:36,gid:36,perm:1,ngrps:3),
> ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
> [Permission denied]*
>
> Jiffin/Raghavendra Talur,
> Can you help?
>
> -Krutika
>
> On Wed, Nov 27, 2019 at 2:11 PM Strahil Nikolov <hunter86_bg(a)yahoo.com>
> wrote:
>
> > Hi Nir,All,
> >
> > it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached
> > screenshot of oVirt running on v7 gluster).
> > It seems strange that both my serious issues with oVirt are related to
> > gluster issue (1st gluster v3 to v5 migration and now this one).
> >
> > I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all
> > nodes.
> > Now both Engine and all my VMs are back online - so if you hit issues with
> > 6.6 , you should give a try to 7.0 (and even 7.1 is coming soon) before
> > deciding to wipe everything.
> >
> > @Krutika,
> >
> > I guess you will ask for the logs, so let's switch to gluster-users about
> > this one ?
> >
> > Best Regards,
> > Strahil Nikolov
> >
> > В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov <
> > hunter86_bg(a)yahoo.com> написа:
> >
> >
> > Hi Krutika,
> >
> > I have enabled TRACE log level for the volume data_fast,
> >
> > but the issue is not much clear:
> > FUSE reports:
> >
> > [2019-11-25 21:31:53.478130] I [MSGID: 133022]
> > [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of
> > gfid=6d9ed2e5-d4f2-4749-839b-2f1
> > 3a68ed472 from backend
> > [2019-11-25 21:32:43.564694] W [MSGID: 114031]
> > [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
> > remote operation failed. Path:
5 years
Re: NFS Storage Domain on OpenMediaVault
by Amit Bawer
On Mon, Dec 2, 2019 at 7:21 PM Robert Webb <rwebb(a)ropeguru.com> wrote:
> Thanks for the response.
>
> As it turned out, the issue was on the export where I had to remove
> subtree_check and added no_root_squash and it started working.
>
> Being new to the setup, I am still trying to work through some config
> issues and deciding if I want to continue. The main thing I am looking for
> is good HA and failover capability. Been using Proxmox VE and I like the
> admin of it, but failover still needs work. Simple things like auto
> migration when rebooting a host does not exist and is something I need.
>
for RHEV, you can check the Cluster scheduling policies
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/...
> Robert
> ------------------------------
> *From:* Strahil Nikolov <hunter86_bg(a)yahoo.com>
> *Sent:* Sunday, December 1, 2019 2:54 PM
> *To:* users(a)ovirt.org <users(a)ovirt.org>; Robert Webb <rwebb(a)ropeguru.com>
> *Subject:* Re: [ovirt-users] NFS Storage Domain on OpenMediaVault
>
> Does sanlock user has rights on the ...../dom_md/ids ?
>
> Check the sanlock.service for issues.
> journalctl -u sanlock.service
>
> Best Regards,
> Strahil Nikolov
>
> В неделя, 1 декември 2019 г., 17:22:21 ч. Гринуич+2, rwebb(a)ropeguru.com <
> rwebb(a)ropeguru.com> написа:
>
>
> I have a clean install with openmediavault as backend NFS and cannot get
> it to work. Keep getting permission errors even though I created a vdsm
> user and kvm group; and they are the owners of the directory on OMV with
> full permissions.
>
> The directory gets created on the NFS side for the host, but then get the
> permission error and is removed form the host but the directory structure
> is left on the NFS server.
>
> Logs:
>
> From the engine:
>
> Error while executing action New NFS Storage Domain: Unexpected exception
>
> From the oVirt node log:
>
> 2019-11-29 10:03:02 136998 [30025]: open error -13 EACCES: no permission
> to open /rhev/data-center/mnt/192.168.1.56:
> _export_Datastore-oVirt/f38b19e4-8060-4467-860b-09cf606ccc15/dom_md/ids
> 2019-11-29 10:03:02 136998 [30025]: check that daemon user sanlock 179
> group sanlock 179 has access to disk or file.
>
> File system on Openmediavault:
>
> drwxrwsrwx+ 3 vdsm kvm 4096 Nov 29 10:03 .
> drwxr-xr-x 9 root root 4096 Nov 27 20:56 ..
> drwxrwsr-x+ 4 vdsm kvm 4096 Nov 29 10:03
> f38b19e4-8060-4467-860b-09cf606ccc15
>
> drwxrwsr-x+ 4 vdsm kvm 4096 Nov 29 10:03 .
> drwxrwsrwx+ 3 vdsm kvm 4096 Nov 29 10:03 ..
> drwxrwsr-x+ 2 vdsm kvm 4096 Nov 29 10:03 dom_md
> drwxrwsr-x+ 2 vdsm kvm 4096 Nov 29 10:03 images
>
> drwxrwsr-x+ 2 vdsm kvm 4096 Nov 29 10:03 .
> drwxrwsr-x+ 4 vdsm kvm 4096 Nov 29 10:03 ..
> -rw-rw----+ 1 vdsm kvm 0 Nov 29 10:03 ids
> -rw-rw----+ 1 vdsm kvm 16777216 Nov 29 10:03 inbox
> -rw-rw----+ 1 vdsm kvm 0 Nov 29 10:03 leases
> -rw-rw-r--+ 1 vdsm kvm 343 Nov 29 10:03 metadata
> -rw-rw----+ 1 vdsm kvm 16777216 Nov 29 10:03 outbox
> -rw-rw----+ 1 vdsm kvm 1302528 Nov 29 10:03 xleases
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ILKNT57F6VU...
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/KH2H2V254Z7...
>
5 years
NFS Storage Domain on OpenMediaVault
by rwebb@ropeguru.com
I have a clean install with openmediavault as backend NFS and cannot get it to work. Keep getting permission errors even though I created a vdsm user and kvm group; and they are the owners of the directory on OMV with full permissions.
The directory gets created on the NFS side for the host, but then get the permission error and is removed form the host but the directory structure is left on the NFS server.
Logs:
From the engine:
Error while executing action New NFS Storage Domain: Unexpected exception
From the oVirt node log:
2019-11-29 10:03:02 136998 [30025]: open error -13 EACCES: no permission to open /rhev/data-center/mnt/192.168.1.56:_export_Datastore-oVirt/f38b19e4-8060-4467-860b-09cf606ccc15/dom_md/ids
2019-11-29 10:03:02 136998 [30025]: check that daemon user sanlock 179 group sanlock 179 has access to disk or file.
File system on Openmediavault:
drwxrwsrwx+ 3 vdsm kvm 4096 Nov 29 10:03 .
drwxr-xr-x 9 root root 4096 Nov 27 20:56 ..
drwxrwsr-x+ 4 vdsm kvm 4096 Nov 29 10:03 f38b19e4-8060-4467-860b-09cf606ccc15
drwxrwsr-x+ 4 vdsm kvm 4096 Nov 29 10:03 .
drwxrwsrwx+ 3 vdsm kvm 4096 Nov 29 10:03 ..
drwxrwsr-x+ 2 vdsm kvm 4096 Nov 29 10:03 dom_md
drwxrwsr-x+ 2 vdsm kvm 4096 Nov 29 10:03 images
drwxrwsr-x+ 2 vdsm kvm 4096 Nov 29 10:03 .
drwxrwsr-x+ 4 vdsm kvm 4096 Nov 29 10:03 ..
-rw-rw----+ 1 vdsm kvm 0 Nov 29 10:03 ids
-rw-rw----+ 1 vdsm kvm 16777216 Nov 29 10:03 inbox
-rw-rw----+ 1 vdsm kvm 0 Nov 29 10:03 leases
-rw-rw-r--+ 1 vdsm kvm 343 Nov 29 10:03 metadata
-rw-rw----+ 1 vdsm kvm 16777216 Nov 29 10:03 outbox
-rw-rw----+ 1 vdsm kvm 1302528 Nov 29 10:03 xleases
5 years
host failed to attach one of the storage domains attached to it
by Daniel Menzel
Dear all,
after a GlusterFS reboot of two servers out of three (and thus a loss of
quorum) by an admin we've got strange problems:
1. The hosted engine and one export work perfectly fine. (Same three
servers!)
2. Another export (main one) seems to be fine within GlusterFS itself.
But: When we activate this domain most of the hosts go into "non
operational" with the subject's error message. As soon as we
deactivate this domain all those hosts come back online. Strange
thing: The GlusterFS export seems to be mounted on the SPM.
Does anyone know what could have happened and how to fix that?
Kind regards
Daniel
5 years
Overt Networking VM not pinging
by Vijay Sachdeva
Dear Community,
I have installed Ovirt Engine 4.3 and Ovirt Node 4.3. Node got successfully added to engine and setup host network also done. When trying to ping host using “ovirtmgmt” as vNic profile for a VM, it is not even able to ping it’s host or any other machine on that same network. Also added a VLAN network which is passed via same uplink of Node interface where “Ovirtmgmt” is passed, that is also not working.
Although this vnet is vnic type is VirtIO and state shows “UNKNOWN”, would this be a problem?
Any help would be highly appreciated.
Thanks
Vijay Sachdeva
Senior Manager – Service Delivery
IndiQus Technologies
O +91 11 4055 1411 | M +91 8826699409
www.indiqus.com
5 years
hyperconverged single node with SSD cache fails gluster creation
by thomas@hoberg.net
I am seeing more success than failures at creating single and triple node hyperconverged setups after some weeks of experimentation so I am branching out to additional features: In this case the ability to use SSDs as cache media for hard disks.
I tried first with a single node that combined caching and compression and that fails during the creation of LVMs.
I tried again without the VDO compression, but actually the results where identical whilst VDO compression but without the LV cache worked ok.
I tried various combinations, using less space etc., but the results are always the same and unfortunately rather cryptic (substituted the physical disk label with {disklabel}):
TASK [gluster.infra/roles/backend_setup : Extend volume group] *****************
failed: [{hostname}] (item={u'vgname': u'gluster_vg_{disklabel}p1', u'cachethinpoolname': u'gluster_thinpool_gluster_vg_{disklabel}p1', u'cachelvname': u'cachelv_gluster_thinpool_gluster_vg_{disklabel}p1', u'cachedisk': u'/dev/sda4', u'cachemetalvname': u'cache_gluster_thinpool_gluster_vg_{disklabel}p1', u'cachemode': u'writeback', u'cachemetalvsize': u'70G', u'cachelvsize': u'630G'}) => {"ansible_loop_var": "item", "changed": false, "err": " Physical volume \"/dev/mapper/vdo_{disklabel}p1\" still in use\n", "item": {"cachedisk": "/dev/sda4", "cachelvname": "cachelv_gluster_thinpool_gluster_vg_{disklabel}p1", "cachelvsize": "630G", "cachemetalvname": "cache_gluster_thinpool_gluster_vg_{disklabel}p1", "cachemetalvsize": "70G", "cachemode": "writeback", "cachethinpoolname": "gluster_thinpool_gluster_vg_{disklabel}p1", "vgname": "gluster_vg_{disklabel}p1"}, "msg": "Unable to reduce gluster_vg_{disklabel}p1 by /dev/dm-15.", "rc": 5}
somewhere within that I see something that points to a race condition ("still in use").
Unfortunately I have not been able to pinpoint the raw logs which are used at that stage and I wasn't able to obtain more info.
At this point quite a bit of storage setup is already done, so rolling back for a clean new attempt, can be a bit complicated, with reboots to reconcile the kernel with data on disk.
I don't actually believe it's related to single node and I'd be quite happy to move the creation of the SSD cache to a later stage, but in a VDO setup, this looks slightly complex to someone without intimate knowledge of LVS-with-cache-and-perhaps-thin/VDO/Gluster all thrown into one.
Needless the feature set (SSD caching & compressed-dedup) sounds terribly attractive but when things don't just work, it's more terrifying.
5 years