On Sat, Oct 3, 2020 at 9:42 PM Amit Bawer <abawer@redhat.com> wrote:On Sat, Oct 3, 2020 at 10:24 PM Amit Bawer <abawer@redhat.com> wrote:For the gluster bricks being filtered out in 4.4.2, this seems like [1].Maybe remove the lvm filter from /etc/lvm/lvm.conf while in 4.4.2 maintenance modeif the fs is mounted as read only, trymount -o remount,rw /sync and try to reboot 4.4.2.Indeed if i run, when in emergency shell in 4.4.2, the command:lvs --config 'devices { filter = [ "a|.*|" ] }'I see also all the gluster volumes, so I think the update injected the nasty filter.Possibly during update the command# vdsm-tool config-lvm-filter -ywas executed and erroneously created the filter?
Anyway remounting read write the root filesystem and removing the filter line from lvm.conf and rebooting worked and 4.4.2 booted ok and I was able to exit global maintenance and have the engine up.Thanks Amit for the help and all the insights.Right now only two problems:1) a long running problem that from engine web admin all the volumes are seen as up and also the storage domains up, while only the hosted engine one is up, while "data" and vmstore" are down, as I can verify from the host, only one /rhev/data-center/ mount:[root@ovirt01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 16K 16G 1% /dev/shm
tmpfs 16G 18M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/mapper/onn-ovirt--node--ng--4.4.2--0.20200918.0+1 133G 3.9G 129G 3% /
/dev/mapper/onn-tmp 1014M 40M 975M 4% /tmp
/dev/mapper/gluster_vg_sda-gluster_lv_engine 100G 9.0G 91G 9% /gluster_bricks/engine
/dev/mapper/gluster_vg_sda-gluster_lv_data 500G 126G 375G 26% /gluster_bricks/data
/dev/mapper/gluster_vg_sda-gluster_lv_vmstore 90G 6.9G 84G 8% /gluster_bricks/vmstore
/dev/mapper/onn-home 1014M 40M 975M 4% /home
/dev/sdb2 976M 307M 603M 34% /boot
/dev/sdb1 599M 6.8M 593M 2% /boot/efi
/dev/mapper/onn-var 15G 263M 15G 2% /var
/dev/mapper/onn-var_log 8.0G 541M 7.5G 7% /var/log
/dev/mapper/onn-var_crash 10G 105M 9.9G 2% /var/crash
/dev/mapper/onn-var_log_audit 2.0G 79M 2.0G 4% /var/log/audit
ovirt01st.lutwyn.storage:/engine 100G 10G 90G 10% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_engine
tmpfs 3.2G 0 3.2G 0% /run/user/1000
[root@ovirt01 ~]#I can also wait 10 minutes and no change. The way I use to exit from this stalled situation is power on a VM, so that obviously it failsVM f32 is down with error. Exit message: Unable to get volume size for domain d39ed9a3-3b10-46bf-b334-e8970f5deca1 volume 242d16c6-1fd9-4918-b9dd-0d477a86424c.10/4/20 12:50:41 AMand suddenly all the data storage domains are deactivated (from engine point of view, because actually they were not active...):Storage Domain vmstore (Data Center Default) was deactivated by system because it's not visible by any of the hosts.10/4/20 12:50:31 AMand I can go in Data Centers --> Default --> Storage and activate "vmstore" and "data" storage domains and suddenly I get them activated and filesystems mounted.[root@ovirt01 ~]# df -h | grep rhev
ovirt01st.lutwyn.storage:/engine 100G 10G 90G 10% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_engine
ovirt01st.lutwyn.storage:/data 500G 131G 370G 27% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_data
ovirt01st.lutwyn.storage:/vmstore 90G 7.8G 83G 9% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_vmstore
[root@ovirt01 ~]#and VM starts ok now.I already reported this, but I don't know if there is yet a bugzilla open for it.
2) I see that I cannot connect to cockpit console of node.In firefox (version 80) in my Fedora 31 I get:"Secure Connection Failed
An error occurred during a connection to ovirt01.lutwyn.local:9090. PR_CONNECT_RESET_ERROR
The page you are trying to view cannot be shown because the authenticity of the received data could not be verified.
Please contact the website owners to inform them of this problem.
Learn more…"In Chrome (build 85.0.4183.121)"Your connection is not private
Attackers might be trying to steal your information from ovirt01.lutwyn.local (for example, passwords, messages, or credit cards). Learn more
NET::ERR_CERT_AUTHORITY_INVALID"Click Advanced and select to go to the site"This server could not prove that it is ovirt01.lutwyn.local; its security certificate is not trusted by your computer's operating system. This may be caused by a misconfiguration or an attacker intercepting your connection."If I select"This page isn’t working ovirt01.lutwyn.local didn’t send any data.
ERR_EMPTY_RESPONSE"NOTE: the ost is not resolved by DNS but I put an entry in my hosts client.
On host:[root@ovirt01 ~]# systemctl status cockpit.socket --no-pager
● cockpit.socket - Cockpit Web Service Socket
Loaded: loaded (/usr/lib/systemd/system/cockpit.socket; disabled; vendor preset: enabled)
Active: active (listening) since Sun 2020-10-04 00:36:36 CEST; 25min ago
Docs: man:cockpit-ws(8)
Listen: [::]:9090 (Stream)
Process: 1425 ExecStartPost=/bin/ln -snf active.motd /run/cockpit/motd (code=exited, status=0/SUCCESS)
Process: 1417 ExecStartPost=/usr/share/cockpit/motd/update-motd localhost (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 202981)
Memory: 1.6M
CGroup: /system.slice/cockpit.socket
Oct 04 00:36:36 ovirt01.lutwyn.local systemd[1]: Starting Cockpit Web Service Socket.
Oct 04 00:36:36 ovirt01.lutwyn.local systemd[1]: Listening on Cockpit Web Service Socket.
[root@ovirt01 ~]#[root@ovirt01 ~]# systemctl status cockpit.service --no-pager
● cockpit.service - Cockpit Web Service
Loaded: loaded (/usr/lib/systemd/system/cockpit.service; static; vendor preset: disabled)
Active: active (running) since Sun 2020-10-04 00:58:09 CEST; 3min 30s ago
Docs: man:cockpit-ws(8)
Process: 19260 ExecStartPre=/usr/sbin/remotectl certificate --ensure --user=root --group=cockpit-ws --selinux-type=etc_t (code=exited, status=0/SUCCESS)
Main PID: 19263 (cockpit-tls)
Tasks: 1 (limit: 202981)
Memory: 1.4M
CGroup: /system.slice/cockpit.service
└─19263 /usr/libexec/cockpit-tls
Oct 04 00:59:59 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(http-redirect.sock) failed: Permission denied
Oct 04 00:59:59 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(http-redirect.sock) failed: Permission denied
Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(https-factory.sock) failed: Permission denied
Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(https-factory.sock) failed: Permission denied
Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(https-factory.sock) failed: Permission denied
[root@ovirt01 ~]#Gianluca