[ovirt-users] Re: oVirt Node 4.4.2 is now generally available

4 Oct 2020


      On Sun, Oct 4, 2020 at 2:07 AM Gianluca Cecchi <gianluca.cecchi@gmail.com>
wrote:
...
On Sat, Oct 3, 2020 at 9:42 PM Amit Bawer <abawer@redhat.com> wrote:
...
On Sat, Oct 3, 2020 at 10:24 PM Amit Bawer <abawer@redhat.com> wrote:
...
For the gluster bricks being filtered out in 4.4.2, this seems like [1].
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1883805
Maybe remove the lvm filter from /etc/lvm/lvm.conf while in 4.4.2
maintenance mode
if the fs is mounted as read only, try
mount -o remount,rw /
sync and try to reboot 4.4.2.
Indeed if i run, when in emergency shell in 4.4.2, the command:
lvs --config 'devices { filter = [ "a|.*|" ] }'
I see also all the gluster volumes, so I think the update injected the
nasty filter.
Possibly during update the command
# vdsm-tool config-lvm-filter -y
was executed and erroneously created the filter?
Since there wasn't a filter set on the node, the 4.4.2 update added the
default filter for the root-lv pv
if there was some filter set before the upgrade, it would not have been
added by the 4.4.2 update.
...
Anyway remounting read write the root filesystem and removing the filter
line from lvm.conf and rebooting worked and 4.4.2 booted ok and I was able
to exit global maintenance and have the engine up.
Thanks Amit for the help and all the insights.
Right now only two problems:
1) a long running problem that from engine web admin all the volumes are
seen as up and also the storage domains up, while only the hosted engine
one is up, while "data" and vmstore" are down, as I can verify from the
host, only one /rhev/data-center/ mount:
[root@ovirt01 ~]# df -h
Filesystem                                              Size  Used Avail
Use% Mounted on
devtmpfs                                                 16G     0   16G
0% /dev
tmpfs                                                    16G   16K   16G
1% /dev/shm
tmpfs                                                    16G   18M   16G
1% /run
tmpfs                                                    16G     0   16G
0% /sys/fs/cgroup
/dev/mapper/onn-ovirt--node--ng--4.4.2--0.20200918.0+1  133G  3.9G  129G
3% /
/dev/mapper/onn-tmp                                    1014M   40M  975M
4% /tmp
/dev/mapper/gluster_vg_sda-gluster_lv_engine            100G  9.0G   91G
9% /gluster_bricks/engine
/dev/mapper/gluster_vg_sda-gluster_lv_data              500G  126G  375G
 26% /gluster_bricks/data
/dev/mapper/gluster_vg_sda-gluster_lv_vmstore            90G  6.9G   84G
8% /gluster_bricks/vmstore
/dev/mapper/onn-home                                   1014M   40M  975M
4% /home
/dev/sdb2                                               976M  307M  603M
 34% /boot
/dev/sdb1                                               599M  6.8M  593M
2% /boot/efi
/dev/mapper/onn-var                                      15G  263M   15G
2% /var
/dev/mapper/onn-var_log                                 8.0G  541M  7.5G
7% /var/log
/dev/mapper/onn-var_crash                                10G  105M  9.9G
2% /var/crash
/dev/mapper/onn-var_log_audit                           2.0G   79M  2.0G
4% /var/log/audit
ovirt01st.lutwyn.storage:/engine                        100G   10G   90G
 10% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_engine
tmpfs                                                   3.2G     0  3.2G
0% /run/user/1000
[root@ovirt01 ~]#
I can also wait 10 minutes and no change. The way I use to exit from this
stalled situation is power on a VM, so that obviously it fails
VM f32 is down with error. Exit message: Unable to get volume size for
domain d39ed9a3-3b10-46bf-b334-e8970f5deca1 volume
242d16c6-1fd9-4918-b9dd-0d477a86424c.
10/4/20 12:50:41 AM
and suddenly all the data storage domains are deactivated (from engine
point of view, because actually they were not active...):
Storage Domain vmstore (Data Center Default) was deactivated by system
because it's not visible by any of the hosts.
10/4/20 12:50:31 AM
and I can go in Data Centers --> Default --> Storage and activate
"vmstore" and "data" storage domains and suddenly I get them activated and
filesystems mounted.
[root@ovirt01 ~]# df -h | grep rhev
ovirt01st.lutwyn.storage:/engine                        100G   10G   90G
 10% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_engine
ovirt01st.lutwyn.storage:/data                          500G  131G  370G
 27% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_data
ovirt01st.lutwyn.storage:/vmstore                        90G  7.8G   83G
9% /rhev/data-center/mnt/glusterSD/ovirt01st.lutwyn.storage:_vmstore
[root@ovirt01 ~]#
and VM starts ok now.
I already reported this, but I don't know if there is yet a bugzilla open
for it.
Did you get any response for the original mail? haven't seen it on the
users-list.
...
2) I see that I cannot connect to cockpit console of node.
In firefox (version 80) in my Fedora 31 I get:
"
Secure Connection Failed
An error occurred during a connection to ovirt01.lutwyn.local:9090.
PR_CONNECT_RESET_ERROR
The page you are trying to view cannot be shown because the
authenticity of the received data could not be verified.
    Please contact the website owners to inform them of this problem.
Learn more…
"
In Chrome (build 85.0.4183.121)
"
Your connection is not private
Attackers might be trying to steal your information from
ovirt01.lutwyn.local (for example, passwords, messages, or credit cards).
Learn more
NET::ERR_CERT_AUTHORITY_INVALID
"
Click Advanced and select to go to the site
"
This server could not prove that it is ovirt01.lutwyn.local; its security
certificate is not trusted by your computer's operating system. This may be
caused by a misconfiguration or an attacker intercepting your connection."
If I select
"
This page isn’t working ovirt01.lutwyn.local didn’t send any data.
ERR_EMPTY_RESPONSE
"
NOTE: the ost is not resolved by DNS but I put an entry in my hosts client.
Might be required to set DNS for authenticity, maybe other members on the
list could tell better.
...
On host:
[root@ovirt01 ~]# systemctl status cockpit.socket --no-pager
● cockpit.socket - Cockpit Web Service Socket
   Loaded: loaded (/usr/lib/systemd/system/cockpit.socket; disabled;
vendor preset: enabled)
   Active: active (listening) since Sun 2020-10-04 00:36:36 CEST; 25min ago
     Docs: man:cockpit-ws(8)
   Listen: [::]:9090 (Stream)
  Process: 1425 ExecStartPost=/bin/ln -snf active.motd /run/cockpit/motd
(code=exited, status=0/SUCCESS)
  Process: 1417 ExecStartPost=/usr/share/cockpit/motd/update-motd
 localhost (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 202981)
   Memory: 1.6M
   CGroup: /system.slice/cockpit.socket
Oct 04 00:36:36 ovirt01.lutwyn.local systemd[1]: Starting Cockpit Web
Service Socket.
Oct 04 00:36:36 ovirt01.lutwyn.local systemd[1]: Listening on Cockpit Web
Service Socket.
[root@ovirt01 ~]#
[root@ovirt01 ~]# systemctl status cockpit.service --no-pager
● cockpit.service - Cockpit Web Service
   Loaded: loaded (/usr/lib/systemd/system/cockpit.service; static; vendor
preset: disabled)
   Active: active (running) since Sun 2020-10-04 00:58:09 CEST; 3min 30s
ago
     Docs: man:cockpit-ws(8)
  Process: 19260 ExecStartPre=/usr/sbin/remotectl certificate --ensure
--user=root --group=cockpit-ws --selinux-type=etc_t (code=exited,
status=0/SUCCESS)
 Main PID: 19263 (cockpit-tls)
    Tasks: 1 (limit: 202981)
   Memory: 1.4M
   CGroup: /system.slice/cockpit.service
           └─19263 /usr/libexec/cockpit-tls
Oct 04 00:59:59 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
connect(http-redirect.sock) failed: Permission denied
Oct 04 00:59:59 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
connect(http-redirect.sock) failed: Permission denied
Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
connect(https-factory.sock) failed: Permission denied
Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:11 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
connect(https-factory.sock) failed: Permission denied
Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
gnutls_handshake failed: A TLS fatal alert has been received.
Oct 04 01:00:16 ovirt01.lutwyn.local cockpit-tls[19263]: cockpit-tls:
connect(https-factory.sock) failed: Permission denied
[root@ovirt01 ~]#
Gianluca