Live storage migration is failing in 4.2.8
by Ladislav Humenik
Hello, we have recently updated few ovirts from 4.2.5 to 4.2.8 version
(actually 9 ovirt engine nodes), where the live storage migration
stopped to work, and leave auto-generated snapshot behind.
If we power the guest VM down, the migration works as expected. Is there
a known bug for this? Shall we open a new one?
Setup:
ovirt - Dell PowerEdge R630
- CentOS Linux release 7.6.1810 (Core)
- ovirt-engine-4.2.8.2-1.el7.noarch
- kernel-3.10.0-957.10.1.el7.x86_64
hypervisors - Dell PowerEdge R640
- CentOS Linux release 7.6.1810 (Core)
- kernel-3.10.0-957.10.1.el7.x86_64
- vdsm-4.20.46-1.el7.x86_64
- libvirt-5.0.0-1.el7.x86_64
- qemu-kvm-ev-2.12.0-18.el7_6.3.1.x86_64
storage domain - netapp NFS share
logs are attached
--
Ladislav Humenik
System administrator
5 years, 7 months
Global maintenance and fencing of hosts
by Andreas Elvers
I am wondering whether global maintenance inhibits fencing of non-responsive hosts. Is this so?
Background: I plan on migrating the engine from one cluster to another. I understand this means to backup/restore the engine. While migrating the engine it is shut down and all VMs will continue running. This is good. When starting the engine in the new location, I really don't want the engine to fence any host on its own, because of reasons I can not yet know.
So is global maintenance enough to suppress fencing, or do I have to deactivate fencing on all hosts?
5 years, 7 months
VDSM command CreateStoragePoolVDS faile
by fangjian@linuxtrend.cn
Hi, I tried to setup Ovirt in my environment but encountered some problems in data domain creation.
oVirt Manager Version 4.3.2.1-1.el7
oVirt Node Version 4.3.2
1. Create new data center & cluster; ---- successful
2. Create new host 'centos-node-01' in cluster; ---- successful
3. Prepare the NFS storage; ---- successful
4. Tried to create new datadomain and attach NFS storage to host 'centos-node-01' but failed.
Message: VDSM centos-node-01 command CreateStoragePoolVDS failed: Cannot acquire host id: (u'ec225640-f05e-4a9d-bdc4-5219065704ec', SanlockException(2, 'Sanlock lockspace add failure', 'No such file or directory'))
How can I fix the problem and attach the NFS storage to new datadomain?
5 years, 7 months
Re: Expand existing gluster storage in ovirt 4.2/4.3
by Strahil
Just add new servers to the gluster cluster (by 3).
Then just install them from the oVirt HostedEngine.
Best Regards,
Strahil NikolovOn Apr 11, 2019 15:49, adrianquintero(a)gmail.com wrote:
>
> Would you know of any documentation that I could follow for that type of setup?
>
> I have read quite a bit about oVIrt and Hyperconverged but I have only found 3 node setup examples, nobody seems to go past that or at least I've yet to find one. I already have a 3 node setup with hyperconverged and working as it should with no issues and tested fail scenarios, but I need to have an environment with at least 12 servers and be able to maintain the correct fail scenarios with storage (gluster), however I cant seem to figure out the proper steps to achieve this.
>
> So if anyone can point me in the right direction it will help a lot, and once I test I should be able to provide back any knowledge that I gain with such type of setup.
>
>
> thanks again.
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/PJIRZGP6NQ4...
5 years, 7 months
Host reinstall: ModuleNotFoundError: No module named 'rpmUtils'
by John Florian
After mucking around trying to use jumbo MTU for my iSCSI storage nets
(which apparently I can't do because my Cisco 3560 switch only supports
1500 max for its vlan interfaces) I got one of my Hosts screwed up. I
likely could rebuild it from scratch but I suspect that's overkill. I
simply tried to do a reinstall via the GUI. That fails. Looking at the
ovirt-host-deploy log I see several tracebacks with $SUBJECT. Since
Python pays my bills I figure this is an easy fix. Except ... I see
this on the host:
$ rpm -qf /usr/lib/python2.7/site-packages/rpmUtils/
yum-3.4.3-161.el7.centos.noarch
$ python
Python 2.7.5 (default, Oct 30 2018, 23:45:53)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Tab completion has been enabled.
>>> import rpmUtils
>>>
I'm guessing this must mean the tracebacks are from Python 3 since I can
clearly see the module doesn't exist for either Python 3.4 or 3.6. So
this smells like a packaging bug somehow related to upgrading from 4.2.
I mean, I can't imagine a brand new install fails this blatantly.
Either that or this import error has nothing to do with my reinstall
failure.
--
John Florian
5 years, 7 months
Bug 1666795 - Related? - VM's don't start after shutdown on FCP
by nardusg@gmail.com
Hi There
Wonder if this issue is related to our problem and if there is a way around it. We upgraded from 4.2.8. to 4.3.2. Now when we start some of the VM's fail to start. You need to deattach the disks, create new VM, reattach the disks to the new VM and then the new VM starts.
Thanks
Nar
5 years, 7 months
Unable to start vdsm, upgrade 4.0 to 4.1
by Todd Barton
Looking for some help/suggestions to correct an issue I'm having. I have a 3 host HA setup running a hosted-engine and gluster storage. The hosts are identical hardware configurations and have been running for several years very solidly. I was performing an upgrade to 4.1. 1st host when fine. The second upgrade didn't go well...On server reboot, it went into kernel panic and I had to load previous kernel to diagnose.
I couldn't get it out of panic and I had to revert the system to the previous kernel which was a big PITA. I updated it to current and verified installation of ovirt/vdsm. Everything seemed to be ok, but vdsm won't start. Gluster is working fine. It appears I have a authentication issue with libvirt. I'm getting the message "libvirt: XML-RPC error : authentication failed: authentication failed" which seems to be the core issue.
I've looked at all the past issues/resolutions to this issue and tried them, but I can't get it to work. For example, I do a vdsm-tool configure --force and I get this...
Checking configuration status...
abrt is already configured for vdsm
lvm is configured for vdsm
libvirt is already configured for vdsm
SUCCESS: ssl configured to true. No conflicts
Current revision of multipath.conf detected, preserving
Running configure...
Reconfiguration of abrt is done.
Traceback (most recent call last):
File "/usr/bin/vdsm-tool", line 219, in main
return tool_command[cmd]["command"](*args)
File "/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py", line 38, in wrapper
func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 141, in configure
_configure(c)
File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 88, in _configure
getattr(module, 'configure', lambda: None)()
File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/passwd.py", line 68, in configure
configure_passwd()
File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/passwd.py", line 98, in configure_passwd
raise RuntimeError("Set password failed: %s" % (err,))
RuntimeError: Set password failed: ['saslpasswd2: invalid parameter supplied']
...and help would be greatly appreciated. I'm not a linux/ovirt expert by any means, but I desperately need to get this setup back to being stable. This happened many months ago and I gave up fixing, but I really need to get this back online again.
Thank you
Todd Barton
5 years, 7 months
Re: Unable to start vdsm, upgrade 4.0 to 4.1
by Strahil
The fastest (windows style) approach is to completely wipe the host and do a reinstall -> install in vdsm and so on.
You should consider oVirt Node.
Another one that comes to my mind is to do:
yum history
yum history rollback <id>
reboot
repeat the upgrade.
For now, I'm planing to do gluster snapshots (all machines on volume -> stopped) before major upgrades, as this is the fastest recovery approach.
Actually oVirt Node uses thin lvm snapshots to guarantee fast rollback.
Best Regards,
Strahil NikolovOn Apr 13, 2019 06:44, Todd Barton <tcbarton(a)ipvoicedatasystems.com> wrote:
>
> Looking for some help/suggestions to correct an issue I'm having. I have a 3 host HA setup running a hosted-engine and gluster storage. The hosts are identical hardware configurations and have been running for several years very solidly. I was performing an upgrade to 4.1. 1st host when fine. The second upgrade didn't go well...On server reboot, it went into kernel panic and I had to load previous kernel to diagnose.
>
> I couldn't get it out of panic and I had to revert the system to the previous kernel which was a big PITA. I updated it to current and verified installation of ovirt/vdsm. Everything seemed to be ok, but vdsm won't start. Gluster is working fine. It appears I have a authentication issue with libvirt. I'm getting the message "libvirt: XML-RPC error : authentication failed: authentication failed" which seems to be the core issue.
>
> I've looked at all the past issues/resolutions to this issue and tried them, but I can't get it to work. For example, I do a vdsm-tool configure --force and I get this...
>
> Checking configuration status...
>
> abrt is already configured for vdsm
> lvm is configured for vdsm
> libvirt is already configured for vdsm
> SUCCESS: ssl configured to true. No conflicts
> Current revision of multipath.conf detected, preserving
>
> Running configure...
> Reconfiguration of abrt is done.
> Traceback (most recent call last):
> File "/usr/bin/vdsm-tool", line 219, in main
> return tool_command[cmd]["command"](*args)
> File "/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py", line 38, in wrapper
> func(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 141, in configure
> _configure(c)
> File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 88, in _configure
> getattr(module, 'configure', lambda: None)()
> File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/passwd.py", line 68, in configure
> configure_passwd()
> File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/passwd.py", line 98, in configure_passwd
> raise RuntimeError("Set password failed: %s" % (err,))
> RuntimeError: Set password failed: ['saslpasswd2: invalid parameter supplied']
>
> ...and help would be greatly appreciated. I'm not a linux/ovirt expert by any means, but I desperately need to get this setup back to being stable. This happened many months ago and I gave up fixing, but I really need to get this back online again.
>
> Thank you
>
> Todd Barton
>
>
>
>
>
5 years, 7 months
Tuning Gluster Writes
by Alex McWhirter
I have 8 machines acting as gluster servers. They each have 12 drives
raid 50'd together (3 sets of 4 drives raid 5'd then 0'd together as
one).
They connect to the compute hosts and to each other over lacp'd 10GB
connections split across two cisco nexus switched with VPC.
Gluster has the following set.
performance.write-behind-window-size: 4MB
performance.flush-behind: on
performance.stat-prefetch: on
server.event-threads: 4
client.event-threads: 8
performance.io-thread-count: 32
network.ping-timeout: 30
cluster.granular-entry-heal: enable
performance.strict-o-direct: on
storage.owner-gid: 36
storage.owner-uid: 36
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: off
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
user.cifs: off
transport.address-family: inet
nfs.disable: off
performance.client-io-threads: on
I have the following sysctl values on gluster client and servers, using
libgfapi, MTU 9K
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_moderate_rcvbuf =1
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_congestion_control=htcp
reads with this setup are perfect, benchmarked in VM to be about 770MB/s
sequential with disk access times of < 1ms. Writes on the other hand are
all over the place. They peak around 320MB/s sequential write, which is
what i expect but it seems as if there is some blocking going on.
During the write test i will hit 320MB/s briefly, then 0MB/s as disk
access time shoot to over 3000ms, then back to 320MB/s. It averages out
to about 110MB/s afterwards.
Gluster version is 3.12.15 ovirt is 4.2.7.5
Any ideas on what i could tune to eliminate or minimize that blocking?
5 years, 7 months
Migrate self-hosted engine between cluster
by raul.caballero.girol@gmail.com
Hello everibody,
Iḿ a noob with ovirt and I have a problem. I have deployed a new enviroment with a self-hosted engine. Now, when I am learning about ovirt, I think that my engine should be in another cluster. How I can move my manage from one cluster to other?.
Cheers
5 years, 7 months