
On Sun, Oct 4, 2020 at 2:07 AM Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Sat, Oct 3, 2020 at 9:42 PM Amit Bawer <abawer@redhat.com> wrote:
On Sat, Oct 3, 2020 at 10:24 PM Amit Bawer <abawer@redhat.com> wrote:
For the gluster bricks being filtered out in 4.4.2, this seems like [1].
Maybe remove the lvm filter from /etc/lvm/lvm.conf while in 4.4.2 maintenance mode if the fs is mounted as read only, try
mount -o remount,rw /
sync and try to reboot 4.4.2.
Indeed if i run, when in emergency shell in 4.4.2, the command:
lvs --config 'devices { filter = [ "a|.*|" ] }'
I see also all the gluster volumes, so I think the update injected the nasty filter. Possibly during update the command # vdsm-tool config-lvm-filter -y was executed and erroneously created the filter?
Since there wasn't a filter set on the node, the 4.4.2 update added the default filter for the root-lv pv if there was some filter set before the upgrade, it would not have been added by the 4.4.2 update.
Anyway remounting read write the root filesystem and removing the filter line from lvm.conf and rebooting worked and 4.4.2 booted ok and I was able to exit global maintenance and have the engine up.
Thanks Amit for the help and all the insights.
Right now only two problems:
1) a long running problem that from engine web admin all the volumes are seen as up and also the storage domains up, while only the hosted engine one is up, while "data" and vmstore" are down, as I can verify from the host, only one /rhev/data-center/ mount:
[root@ovirt01 ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 16G 0 16G 0% /dev tmpfs 16G 16K 16G 1% /dev/shm tmpfs 16G 18M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/mapper/onn-ovirt--node--ng--4.4.2--0.20200918.0+1 133G 3.9G 129G 3% / /dev/mapper/onn-tmp 1014M 40M 975M 4% /tmp /dev/mapper/gluster_vg_sda-gluster_lv_engine 100G 9.0G 91G 9% /gluster_bricks/engine /dev/mapper/gluster_vg_sda-gluster_lv_data 500G 126G 375G 26% /gluster_bricks/data /dev/mapper/gluster_vg_sda-gluster_lv_vmstore 90G 6.9G 84G 8% /gluster_bricks/vmstore /dev/mapper/onn-home 1014M 40M 975M 4% /home /dev/sdb2 976M 307M 603M 34% /boot /dev/sdb1 599M 6.8M 593M 2% /boot/efi /dev/mapper/onn-var 15G 263M 15G 2% /var /dev/mapper/onn-var_log 8.0G 541M 7.5G 7% /var/log /dev/mapper/onn-var_crash 10G 105M 9.9G 2% /var/crash /dev/mapper/onn-var_log_audit 2.0G 79M 2.0G 4% /var/log/audit ovirt01st.lutwyn.storage:/engine 100G 10G 90G 10% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_engine tmpfs 3.2G 0 3.2G 0% /run/user/1000 [root@ovirt01 ~]#
I can also wait 10 minutes and no change. The way I use to exit from this stalled situation is power on a VM, so that obviously it fails VM f32 is down with error. Exit message: Unable to get volume size for domain d39ed9a3-3b10-46bf-b334-e8970f5deca1 volume 242d16c6-1fd9-4918-b9dd-0d477a86424c. 10/4/20 12:50:41 AM
and suddenly all the data storage domains are deactivated (from engine point of view, because actually they were not active...): Storage Domain vmstore (Data Center Default) was deactivated by system because it's not visible by any of the hosts. 10/4/20 12:50:31 AM
and I can go in Data Centers --> Default --> Storage and activate "vmstore" and "data" storage domains and suddenly I get them activated and filesystems mounted.
[root@ovirt01 ~]# df -h | grep rhev ovirt01st.lutwyn.storage:/engine 100G 10G 90G 10% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_engine ovirt01st.lutwyn.storage:/data 500G 131G 370G 27% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_data ovirt01st.lutwyn.storage:/vmstore 90G 7.8G 83G 9% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_vmstore [root@ovirt01 ~]#
and VM starts ok now.
I already reported this, but I don't know if there is yet a bugzilla open for it.
Did you get any response for the original mail? haven't seen it on the users-list.
2) I see that I cannot connect to cockpit console of node.
In firefox (version 80) in my Fedora 31 I get: " Secure Connection Failed
An error occurred during a connection to ovirt01.lutwyn.local:9090. PR_CONNECT_RESET_ERROR
The page you are trying to view cannot be shown because the authenticity of the received data could not be verified. Please contact the website owners to inform them of this problem.
Learn more… " In Chrome (build 85.0.4183.121)
" Your connection is not private Attackers might be trying to steal your information from ovirt01.lutwyn.local (for example, passwords, messages, or credit cards). Learn more NET::ERR_CERT_AUTHORITY_INVALID " Click Advanced and select to go to the site
" This server could not prove that it is ovirt01.lutwyn.local; its security certificate is not trusted by your computer's operating system. This may be caused by a misconfiguration or an attacker intercepting your connection."
If I select
" This page isn’t working ovirt01.lutwyn.local didn’t send any data. ERR_EMPTY_RESPONSE "
NOTE: the ost is not resolved by DNS but I put an entry in my hosts client.
Might be required to set DNS for authenticity, maybe other members on the list could tell better.
On host:
[root@ovirt01 ~]# systemctl status cockpit.socket --no-pager ● cockpit.socket - Cockpit Web Service Socket Loaded: loaded (/usr/lib/systemd/system/cockpit.socket; disabled; vendor preset: enabled) Active: active (listening) since Sun 2020-10-04 00:36:36 CEST; 25min ago Docs: man:cockpit-ws(8) Listen: [::]:9090 (Stream) Process: 1425 ExecStartPost=/bin/ln -snf active.motd /run/cockpit/motd (code=exited, status=0/SUCCESS) Process: 1417 ExecStartPost=/usr/share/cockpit/motd/update-motd localhost (code=exited, status=0/SUCCESS) Tasks: 0 (limit: 202981) Memory: 1.6M CGroup: /system.slice/cockpit.socket
Oct 04 00:36:36 ovirt01.lutwyn.local systemd[1]: Starting Cockpit Web Service Socket. Oct 04 00:36:36 ovirt01.lutwyn.local systemd[1]: Listening on Cockpit Web Service Socket. [root@ovirt01 ~]#
[root@ovirt01 ~]# systemctl status cockpit.service --no-pager ● cockpit.service - Cockpit Web Service Loaded: loaded (/usr/lib/systemd/system/cockpit.service; static; vendor preset: disabled) Active: active (running) since Sun 2020-10-04 00:58:09 CEST; 3min 30s ago Docs: man:cockpit-ws(8) Process: 19260 ExecStartPre=/usr/sbin/remotectl certificate --ensure --user=root --group=cockpit-ws --selinux-type=etc_t (code=exited, status=0/SUCCESS) Main PID: 19263 (cockpit-tls) Tasks: 1 (limit: 202981) Memory: 1.4M CGroup: /system.slice/cockpit.service └─19263 /usr/libexec/cockpit-tls
Oct 04 00:59:59 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(http-redirect.sock) failed: Permission denied Oct 04 00:59:59 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(http-redirect.sock) failed: Permission denied Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received. Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(https-factory.sock) failed: Permission denied Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received. Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(https-factory.sock) failed: Permission denied Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received. Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received. Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received. Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(https-factory.sock) failed: Permission denied [root@ovirt01 ~]#
Gianluca