Gluster set up fails - Nearly there I think...
by rob.downer@orbitalsystems.co.uk
Gluster fails with
vdo: ERROR - Device /dev/sdb excluded by a filter.\n",
however I have run
[root@ovirt1 ~]# vdo create --name=vdo1 --device=/dev/sdb --force
Creating VDO vdo1
Starting VDO vdo1
Starting compression on VDO vdo1
VDO instance 1 volume is ready at /dev/mapper/vdo1
[root@ovirt1 ~]#
there are no filters in lvm.conf
I have run
wipefs -a /dev/sdb —force
on all hosts before start
2 years, 8 months
Re: Cannot obtain information from export domain
by Strahil
Hi can you describe your actions?
Usually the export is like this:
1. You make a backup of the VM
2. You migrate the disks to the export storage domain
3. You shut down the VM
4. Set the storage domain in maintenance and then detach it from the oVirt
5. You atttach it to the new oVirt
6. Once the domain is active - click on import VM tab and import all VMs (defining the cluster you want them to be running on)
7. Power up VM and then migrate the disks to the permanent storage.
Best Regards,
Strahil NikolovOn Nov 26, 2019 19:41, Arthur Rodrigues Stilben <arthur.stilben(a)gmail.com> wrote:
>
> Hello everyone,
>
> I'm trying to export a virtual machine, but I'm getting the following error:
>
> 2019-11-26 16:30:06,250-02 ERROR
> [org.ovirt.engine.core.bll.exportimport.GetVmsFromExportDomainQuery]
> (default task-22) [b9a0b9d5-2127-4002-9cee-2e3525bccc89] Exception:
> org.ovirt.engine.core.common.errors.EngineException: EngineException:
> org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException:
> IRSGenericException: IRSErrorException: Failed to GetVmsInfoVDS, error =
> Storage domain does not exist:
> (u'5ac6c35d-0406-4a06-a682-ed8fb2d1933f',), code = 358 (Failed with
> error StorageDomainDoesNotExist and code 358)
>
> 2019-11-26 16:30:06,249-02 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (default task-22) [b9a0b9d5-2127-4002-9cee-2e3525bccc89] EVENT_ID:
> IMPORTEXPORT_GET_VMS_INFO_FAILED(200), Correlation ID: null, Call Stack:
> null, Custom ID: null, Custom Event ID: -1, Message: Failed to retrieve
> VM/Templates information from export domain BackupMV
>
> The version of the oVirt that I am using is 4.1.
>
> Att,
>
> --
> Arthur Rodrigues Stilben
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z6FQ45UVQOD...
2 years, 8 months
Cannot activate/deactivate storage domain
by Albl, Oliver
Hi all,
I run an oVirt 4.3.6.7-1.el7 installation (50+ hosts, 40+ FC storage domains on two all-flash arrays) and experienced a problem accessing single storage domains.
As a result, hosts were taken "not operational" because they could not see all storage domains, SPM started to move around the hosts.
oVirt messages start with:
2019-11-04 15:10:22.739+01 | VDSM HOST082 command SpmStatusVDS failed: (-202, 'Sanlock resource read failure', 'IO timeout')
2019-11-04 15:10:22.781+01 | Invalid status on Data Center <name>. Setting Data Center status to Non Responsive (On host HOST82, Error: General Exception).
...
2019-11-04 15:13:58.836+01 | Host HOST017 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational.
2019-11-04 15:13:58.85+01 | Host HOST005 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational.
2019-11-04 15:13:58.85+01 | Host HOST012 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational.
2019-11-04 15:13:58.851+01 | Host HOST002 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational.
2019-11-04 15:13:58.851+01 | Host HOST010 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational.
2019-11-04 15:13:58.851+01 | Host HOST011 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational.
2019-11-04 15:13:58.852+01 | Host HOST004 cannot access the Storage Domain(s) HOST_LUN_204 attached to the Data Center <name>. Setting Host state to Non-Operational.
2019-11-04 15:13:59.011+01 | Host HOST017 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational.
2019-11-04 15:13:59.238+01 | Host HOST004 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational.
2019-11-04 15:13:59.249+01 | Host HOST005 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational.
2019-11-04 15:13:59.255+01 | Host HOST012 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational.
2019-11-04 15:13:59.273+01 | Host HOST002 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational.
2019-11-04 15:13:59.279+01 | Host HOST010 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational.
2019-11-04 15:13:59.386+01 | Host HOST011 cannot access the Storage Domain(s) <UNKNOWN> attached to the Data Center <UNKNOWN>. Setting Host state to Non-Operational.
2019-11-04 15:15:14.145+01 | Storage domain HOST_LUN_221 experienced a high latency of 9.60953 seconds from host HOST038. This may cause performance and functional issues. Please consult your Storage Administrator.
The problem mainly affected two storage domains (on the same array) but I also saw single messages for other storage domains (one the other array as well).
Storage domains stayed available to the hosts, all VMs continued to run.
When constantly reading from the storage domains (/bin/dd iflag=direct if=<metadata> bs=4096 count=1 of=/dev/null) we got expected 20+ MBytes/s on all but some storage domains. One of them showed "transfer rates" around 200 Bytes/s, but went up to normal performance from time to time. Transfer rate to this domain was different between the hosts.
/var/log/messages contain qla2xxx abort messages on almost all hosts. There are no errors on SAN switches or storage array (but vendor is still investigating). I did not see high load on the storage array.
The system seemed to stabilize when I stopped all VMs on the affected storage domain and this storage domain became "inactive". Currently, this storage domain still is inactive and we cannot place it in maintenance mode ("Failed to deactivate Storage Domain") nor activate it. OVF Metadata seems to be corrupt as well (failed to update OVF disks <id>, OVF data isn't updated on those OVF stores). The first six 512 byte blocks of /dev/<id>/metadata seem to contain only zeros.
Any advice on how to proceed here?
Is there a way to recover this storage domain?
All the best,
Oliver
2 years, 8 months
Re: Certificate of host is invalid
by Strahil
Hi ,
You can try with:
1. Set the host in maintenance
2. From Install dropdown , select 'reinstall' and then configure the necessary info + whether you would like to use the host as Host for the HostedEngine VM.
Once the reinstall (of Ovirt software) is OK, the node will be activated automatically.
Best Regards,
Strahil NikolovOn Nov 27, 2019 18:01, Jon bae <jonbae77(a)gmail.com> wrote:
>
> Hello everybody,
> since last update to 4.3.7 I get this error message:
>
> Certificate of host host.name is invalid. The certificate doesn't contain valid subject alternative name, please enroll new certificate for the host.
>
> Have you an idea of how I can fix that?
>
> Regards
> Jonathan
2 years, 8 months
Disk move succeed but didn't move content
by Juan Pablo Lorier
Hi,
I've a fresh new install of ovirt 4.3 and tried to import an gluster
vmstore. I managed to import via NFS the former data domain. The problem
is that when I moved the disks of the vms to the new ISCSI data domain,
I got a warning that sparse disk type will be converted to qcow2 disks,
and after accepting, the disks were moved with no error.
The problem is that the disks now figure as <1Gb size instead of the
original size and thus, the vms fail to start.
Is there any way to recover those disks? I have no backup of the vms :-(
Regards
2 years, 8 months
Current status of Ceph support in oVirt (2019)?
by victorhooi@yahoo.com
Hi,
I currently have a 3-node HA cluster running Proxmox (with integrated Ceph). oVirt looks pretty neat, however, and I'm excited to check it out.
One of the things I love about Proxmox is the integrated Ceph support.
I saw on the mailing lists that there is some talk of Ceph support earlier, but it was via OpenStack/Cinder. What exactly does this mean?
1. Does this require you to install OpenStack, or will a vanilla Ceph installation work?
2. Is it possible to deploy Ceph on the same nodes that run oVirt? (i.e. is a 3-node oVirt + Ceph cluster possible?)
3. Is there any monitoring/management of Ceph from within oVirt? (Guessing no?)
4. Are all the normal VM features working yet, or is this planned?
5. Is making Ceph a first-class citizen (like Gluster) on oVirt on the roadmap?
Thanks,
Victor
https://www.reddit.com/r/ovirt/comments/ci38zp/ceph_rbd_support_in_ovirt_...
2 years, 8 months
oVirt nodes not responding
by Tim Herrmann
Hi everyone,
we have an oVirt-Cluster with 5 nodes and 3 of them provide the storage
with GlusterFS and replica 3.
The cluster is running 87 VMs and has 9TB storage where 4TB is in use.
The version of the oVirt-Engine is 4.1.8.2 and GlusterFS is 3.8.15.
The servers are running in a HP bladecenter and are connected with 10GBit
to each other.
Actually we have some problems that all ovirt nodes periodically won't
respond in the cluster with the following error messages in the oVirt
webinterface:
VDSM glustervirt05 command GetGlusterVolumeHealInfoVDS failed: Message
timeout which can be caused by communication issues
Host glustervirt05 is not responding. It will stay in Connecting state for
a grace period of 68 seconds and after that an attempt to fence the host
will be issued.
Host glustervirt05 does not enforce SELinux. Current status: PERMISSIVE
Executing power management status on Host glustervirt05 using Proxy Host
glustervirt02 and Fence Agent ilo4:xxx.xxx.xxx.xxx.
Manually synced the storage devices from host glustervirt05
Status of host glustervirt05 was set to Up.
In the vdsm logfile I can find the following message:
2019-11-26 11:18:22,909+0100 WARN (vdsm.Scheduler) [Executor] Worker
blocked: <Worker name=jsonrpc/7 running <Task <JsonRpcTask {'params':
{u'volumeName': u'data'}, 'jsonrpc': '2.0', 'method':
u'GlusterVolume.healInfo', 'id': u'2e86ed2c-
3e79-42c1-a7e4-c09bfbfc7794'} at 0x7fb938373190> timeout=60, duration=180
at 0x316a6d0> task#=2859802 at 0x1b70dd0> (executor:351)
And I figured out, that the gluster heal info command takes very long:
[root@glustervirt01 ~]# time gluster volume heal data info
Brick glustervirt01:/gluster/data/brick1
Status: Connected
Number of entries: 0
Brick glustervirt02:/gluster/data/brick1
Status: Connected
Number of entries: 0
Brick glustervirt03:/gluster/data/brick2
Status: Connected
Number of entries: 0
real 3m3.626s
user 0m0.593s
sys 0m0.559s
A strange behavier is also that there is one virtual machine (a postgresql
database) which stops running unexpectedly every one or two days ...
The only thing that has been changed on the vm in the least past was a
resize of the disk.
VM replication-zabbix is down with error. Exit message: Lost connection
with qemu process.
And when we add or delete a larger disk with approximately 100GB in
glusterfs, the glusterfs cluster freaks out won't respond anymore.
This also results in paused VMs ...
Has anyone an idea what could cause such problems?
2 years, 8 months
Overt Networking VM not pinging
by Vijay Sachdeva
Dear Community,
I have installed Ovirt Engine 4.3 and Ovirt Node 4.3. Node got successfully added to engine and setup host network also done. When trying to ping host using “ovirtmgmt” as vNic profile for a VM, it is not even able to ping it’s host or any other machine on that same network. Also added a VLAN network which is passed via same uplink of Node interface where “Ovirtmgmt” is passed, that is also not working.
Although this vnet is vnic type is VirtIO and state shows “UNKNOWN”, would this be a problem?
Any help would be highly appreciated.
Thanks
Vijay Sachdeva
Senior Manager – Service Delivery
IndiQus Technologies
O +91 11 4055 1411 | M +91 8826699409
www.indiqus.com
2 years, 8 months
hyperconverged single node with SSD cache fails gluster creation
by thomas@hoberg.net
I am seeing more success than failures at creating single and triple node hyperconverged setups after some weeks of experimentation so I am branching out to additional features: In this case the ability to use SSDs as cache media for hard disks.
I tried first with a single node that combined caching and compression and that fails during the creation of LVMs.
I tried again without the VDO compression, but actually the results where identical whilst VDO compression but without the LV cache worked ok.
I tried various combinations, using less space etc., but the results are always the same and unfortunately rather cryptic (substituted the physical disk label with {disklabel}):
TASK [gluster.infra/roles/backend_setup : Extend volume group] *****************
failed: [{hostname}] (item={u'vgname': u'gluster_vg_{disklabel}p1', u'cachethinpoolname': u'gluster_thinpool_gluster_vg_{disklabel}p1', u'cachelvname': u'cachelv_gluster_thinpool_gluster_vg_{disklabel}p1', u'cachedisk': u'/dev/sda4', u'cachemetalvname': u'cache_gluster_thinpool_gluster_vg_{disklabel}p1', u'cachemode': u'writeback', u'cachemetalvsize': u'70G', u'cachelvsize': u'630G'}) => {"ansible_loop_var": "item", "changed": false, "err": " Physical volume \"/dev/mapper/vdo_{disklabel}p1\" still in use\n", "item": {"cachedisk": "/dev/sda4", "cachelvname": "cachelv_gluster_thinpool_gluster_vg_{disklabel}p1", "cachelvsize": "630G", "cachemetalvname": "cache_gluster_thinpool_gluster_vg_{disklabel}p1", "cachemetalvsize": "70G", "cachemode": "writeback", "cachethinpoolname": "gluster_thinpool_gluster_vg_{disklabel}p1", "vgname": "gluster_vg_{disklabel}p1"}, "msg": "Unable to reduce gluster_vg_{disklabel}p1 by /dev/dm-15.", "rc": 5}
somewhere within that I see something that points to a race condition ("still in use").
Unfortunately I have not been able to pinpoint the raw logs which are used at that stage and I wasn't able to obtain more info.
At this point quite a bit of storage setup is already done, so rolling back for a clean new attempt, can be a bit complicated, with reboots to reconcile the kernel with data on disk.
I don't actually believe it's related to single node and I'd be quite happy to move the creation of the SSD cache to a later stage, but in a VDO setup, this looks slightly complex to someone without intimate knowledge of LVS-with-cache-and-perhaps-thin/VDO/Gluster all thrown into one.
Needless the feature set (SSD caching & compressed-dedup) sounds terribly attractive but when things don't just work, it's more terrifying.
2 years, 8 months
Moving HostedEngine
by Joseph Goldman
Hi List,
In one of my installs, I set up the first storage domain (and where
the HostedEngine is) on a bigger NFS NAS - since then I have created a
Gluster volume that spans the 3 hosts and I'm putting a few VM's in
there for higher reliability (as SAN is single point of failure) namely
I'd like to put HostedEngine in there so it stays up no matter what and
can help report if issues occur (network issue to NAS, NAS dies etc etc)
Looking through other posts and documentation, there's no real way to
move the HostedEngine storage, is this correct? The solution I've seen
is to backup the hosted engine DB, blow it away, and re-deploy it from
the .backup file configuring it to the new storage domain in the deploy
script - is this the only process? How likely is this to fail? Is it
likely that all VM's and settings will be picked straight back up and
continue to operate like normal? I dont have a test setup to play around
with atm so just trying to gauge confidence in such a solution.
Thanks,
Joe
2 years, 8 months