On Sat, Oct 3, 2020 at 9:42 PM Amit Bawer <abawer@redhat.com> wrote:


On Sat, Oct 3, 2020 at 10:24 PM Amit Bawer <abawer@redhat.com> wrote:


For the gluster bricks being filtered out in 4.4.2, this seems like [1].


Maybe remove the lvm filter from /etc/lvm/lvm.conf while in 4.4.2 maintenance mode
if the fs is mounted as read only, try
mount -o remount,rw /
sync and try to reboot 4.4.2.


Indeed if i run, when in emergency shell in 4.4.2, the command:

lvs --config 'devices { filter = [ "a|.*|" ] }'

I see also all the gluster volumes, so I think the update injected the nasty filter.
Possibly during update the command
# vdsm-tool config-lvm-filter -y
was executed and erroneously created the filter?

Anyway remounting read write the root filesystem and removing the filter line from lvm.conf and rebooting worked and 4.4.2 booted ok and I was able to exit global maintenance and have the engine up.

Thanks Amit for the help and all the insights.

Right now only two problems:

1) a long running problem that from engine web admin all the volumes are seen as up and also the storage domains up, while only the hosted engine one is up, while "data" and vmstore" are down, as I can verify from the host, only one /rhev/data-center/ mount:

[root@ovirt01 ~]# df -h
Filesystem                                              Size  Used Avail Use% Mounted on
devtmpfs                                                 16G     0   16G   0% /dev
tmpfs                                                    16G   16K   16G   1% /dev/shm
tmpfs                                                    16G   18M   16G   1% /run
tmpfs                                                    16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/onn-ovirt--node--ng--4.4.2--0.20200918.0+1  133G  3.9G  129G   3% /
/dev/mapper/onn-tmp                                    1014M   40M  975M   4% /tmp
/dev/mapper/gluster_vg_sda-gluster_lv_engine            100G  9.0G   91G   9% /gluster_bricks/engine
/dev/mapper/gluster_vg_sda-gluster_lv_data              500G  126G  375G  26% /gluster_bricks/data
/dev/mapper/gluster_vg_sda-gluster_lv_vmstore            90G  6.9G   84G   8% /gluster_bricks/vmstore
/dev/mapper/onn-home                                   1014M   40M  975M   4% /home
/dev/sdb2                                               976M  307M  603M  34% /boot
/dev/sdb1                                               599M  6.8M  593M   2% /boot/efi
/dev/mapper/onn-var                                      15G  263M   15G   2% /var
/dev/mapper/onn-var_log                                 8.0G  541M  7.5G   7% /var/log
/dev/mapper/onn-var_crash                                10G  105M  9.9G   2% /var/crash
/dev/mapper/onn-var_log_audit                           2.0G   79M  2.0G   4% /var/log/audit
ovirt01st.lutwyn.storage:/engine                        100G   10G   90G  10% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_engine
tmpfs                                                   3.2G     0  3.2G   0% /run/user/1000
[root@ovirt01 ~]#

I can also wait 10 minutes and no change. The way I use to exit from this stalled situation is power on a VM, so that obviously it fails
VM f32 is down with error. Exit message: Unable to get volume size for domain d39ed9a3-3b10-46bf-b334-e8970f5deca1 volume 242d16c6-1fd9-4918-b9dd-0d477a86424c.
10/4/20 12:50:41 AM

and suddenly all the data storage domains are deactivated (from engine point of view, because actually they were not active...):
Storage Domain vmstore (Data Center Default) was deactivated by system because it's not visible by any of the hosts.
10/4/20 12:50:31 AM

and I can go in Data Centers --> Default --> Storage and activate "vmstore" and "data" storage domains and suddenly I get them activated and filesystems mounted.

[root@ovirt01 ~]# df -h | grep rhev
ovirt01st.lutwyn.storage:/engine                        100G   10G   90G  10% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_engine
ovirt01st.lutwyn.storage:/data                          500G  131G  370G  27% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_data
ovirt01st.lutwyn.storage:/vmstore                        90G  7.8G   83G   9% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_vmstore
[root@ovirt01 ~]#

and VM starts ok now.

I already reported this, but I don't know if there is yet a bugzilla open for it.

2) I see that I cannot connect to cockpit console of node.

In firefox (version 80) in my Fedora 31 I get:
"
Secure Connection Failed

An error occurred during a connection to ovirt01.lutwyn.local:9090. PR_CONNECT_RESET_ERROR

    The page you are trying to view cannot be shown because the authenticity of the received data could not be verified.
    Please contact the website owners to inform them of this problem.

Learn more…
"
In Chrome (build 85.0.4183.121)

"
Your connection is not private
Attackers might be trying to steal your information from ovirt01.lutwyn.local (for example, passwords, messages, or credit cards). Learn more
NET::ERR_CERT_AUTHORITY_INVALID
"
Click Advanced and select to go to the site

"
This server could not prove that it is ovirt01.lutwyn.local; its security certificate is not trusted by your computer's operating system. This may be caused by a misconfiguration or an attacker intercepting your connection."

If I select

"
This page isn’t working ovirt01.lutwyn.local didn’t send any data.
ERR_EMPTY_RESPONSE
"

NOTE: the ost is not resolved by DNS but I put an entry in my hosts client.

On host:

[root@ovirt01 ~]# systemctl status cockpit.socket --no-pager
● cockpit.socket - Cockpit Web Service Socket
   Loaded: loaded (/usr/lib/systemd/system/cockpit.socket; disabled; vendor preset: enabled)
   Active: active (listening) since Sun 2020-10-04 00:36:36 CEST; 25min ago
     Docs: man:cockpit-ws(8)
   Listen: [::]:9090 (Stream)
  Process: 1425 ExecStartPost=/bin/ln -snf active.motd /run/cockpit/motd (code=exited, status=0/SUCCESS)
  Process: 1417 ExecStartPost=/usr/share/cockpit/motd/update-motd  localhost (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 202981)
   Memory: 1.6M
   CGroup: /system.slice/cockpit.socket

Oct 04 00:36:36 ovirt01.lutwyn.local systemd[1]: Starting Cockpit Web Service Socket.
Oct 04 00:36:36 ovirt01.lutwyn.local systemd[1]: Listening on Cockpit Web Service Socket.
[root@ovirt01 ~]#

[root@ovirt01 ~]# systemctl status cockpit.service --no-pager
● cockpit.service - Cockpit Web Service
   Loaded: loaded (/usr/lib/systemd/system/cockpit.service; static; vendor preset: disabled)
   Active: active (running) since Sun 2020-10-04 00:58:09 CEST; 3min 30s ago
     Docs: man:cockpit-ws(8)
  Process: 19260 ExecStartPre=/usr/sbin/remotectl certificate --ensure --user=root --group=cockpit-ws --selinux-type=etc_t (code=exited, status=0/SUCCESS)
 Main PID: 19263 (cockpit-tls)
    Tasks: 1 (limit: 202981)
   Memory: 1.4M
   CGroup: /system.slice/cockpit.service
           └─19263 /usr/libexec/cockpit-tls

Oct 04 00:59:59 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(http-redirect.sock) failed: Permission denied
Oct 04 00:59:59 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(http-redirect.sock) failed: Permission denied
Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(https-factory.sock) failed: Permission denied
Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(https-factory.sock) failed: Permission denied
Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls: connect(https-factory.sock) failed: Permission denied
[root@ovirt01 ~]#


Gianluca