
Hello everybody, we run a 3 node self hosted cluster with GlusterFS. I had a lot of problem upgrading ovirt from 4.4.10 to 4.5.0.2 and now we have cluster instability. First I will write down the problems I had with upgrading, so you get a bigger picture: * engine update when fine * But nodes I could not update because of wrong version of imgbase, so I did a manual update to 4.5.0.1 and later to 4.5.0.2. First time after updating it was still booting into 4.4.10, so I did a reinstall. * Then after second reboot I ended up in the emergency mode. After a long searching I figure out that lvm.conf using *use_devicesfile *now but there it uses the wrong filters. So I comment out this and add the old filters back. This procedure I have done on all 3 nodes. * Then in cockpit on all nodes I see errors about: |ovs|00077|stream_ssl|ERR|Private key must be configured to use SSL| to fix that I run *vdsm-tool ovn-config [engine IP] ovirtmgmt, *and later in then web interface I choice for every node: enroll certificate. * between upgrading the nodes, I was a bit to fast to migrate all running VMs inclusive the HostedEngine, from one host to another and then hosted engine crashes one time. But it came back after some minutes and since this the engine runs normal. * Then I finish the installation with updating the cluster compatibility version to 4.7. * I notice some unsync volume warning, but because I had this in the past to, after upgrading, I though after some time they will disappear. The next day there still where there, so I decided to put the nodes again in the maintenance mode and restart the glusterd service. After some time the sync warnings where gone. So now the actual problem: Since this time the cluster is unstable. I get different errors and warning, like: * VM [name] is not responding * out of nothing HA VM gets migrated * VM migration can fail * VM backup with snapshoting and export take very long * VMs are getting very slow some times * Storage domain vmstore experienced a high latency of 9.14251 * ovs|00001|db_ctl_base|ERR|no key "dpdk-init" in Open_vSwitch record "." column other_config * 489279 [1064359]: s8 renewal error -202 delta_length 10 last_success 489249 * 444853 [2243175]: s27 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/dom_md/ids * 471099 [2243175]: s27 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/dom_md/ids * many of: 424035 [2243175]: s27 delta_renew long write time XX sec I will put here the sanlock.log messages and vdsm.log. Is there a way that I can fix this issues? Regards! Jonathan