this means vdsm lost connectivity to the storage, but it also looks like it recovered eventually

On Thu, Aug 8, 2019 at 12:26 PM Vrgotic, Marko <M.Vrgotic@activevideo.com> wrote:

Another one that seem to be related:

 

2019-08-07 14:43:59,069-0700 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/10.210.13.64:_ovirt__production/6effda5e-1a0d-4312-bf93-d97fa9eb5aee/dom_md/metadata (monitor:499)

Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 497, in _pathChecked

    delay = result.delay()

  File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in delay

    raise exception.MiscFileReadException(self.path, self.rc, self.err)

MiscFileReadException: Internal file read failure: (u'/rhev/data-center/mnt/10.210.13.64:_ovirt__production/6effda5e-1a0d-4312-bf93-d97fa9eb5aee/dom_md/metadata', 1, 'Read timeout')

2019-08-07 14:43:59,116-0700 WARN  (monitor/6effda5) [storage.Monitor] Host id for domain 6effda5e-1a0d-4312-bf93-d97fa9eb5aee was released (id: 1) (monitor:445)

 

From: "Vrgotic, Marko" <M.Vrgotic@activevideo.com>
Date: Wednesday, 7 August 2019 at 09:50
To: "users@ovirt.org" <users@ovirt.org>
Subject: Re: oVirt 4.3.5 potential issue with NFS storage

 

Log line form VDSM:

 

“[root@ovirt-sj-05 ~]# tail -f /var/log/vdsm/vdsm.log | grep WARN

2019-08-07 09:40:03,556-0700 WARN  (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/10.210.13.64:_ovirt__production/bda97276-a399-448f-9113-017972f6b55a/dom_md/metadata' is blocked for 20.00 seconds (check:282)

2019-08-07 09:40:47,132-0700 WARN  (monitor/bda9727) [storage.Monitor] Host id for domain bda97276-a399-448f-9113-017972f6b55a was released (id: 5) (monitor:445)

2019-08-07 09:44:53,564-0700 WARN  (check/loop) [storage.check] Checker u'/rhev/data-center/mnt/10.210.13.64:_ovirt__production/bda97276-a399-448f-9113-017972f6b55a/dom_md/metadata' is blocked for 20.00 seconds (check:282)

2019-08-07 09:46:38,604-0700 WARN  (monitor/bda9727) [storage.Monitor] Host id for domain bda97276-a399-448f-9113-017972f6b55a was released (id: 5) (monitor:445)”

 

 

 

From: "Vrgotic, Marko" <M.Vrgotic@activevideo.com>
Date: Wednesday, 7 August 2019 at 09:09
To: "users@ovirt.org" <users@ovirt.org>
Subject: oVirt 4.3.5 potential issue with NFS storage

 

Dear oVIrt,

 

This is my third oVirt platform in the company, but first time I am seeing following logs:

 

“2019-08-07 16:00:16,099Z INFO  [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-51) [1b85e637] Lock freed to object 'EngineLock:{exclusiveLocks='[2350ee82-94ed-4f90-9366-451e0104d1d6=PROVIDER]', sharedLocks=''}'

2019-08-07 16:00:25,618Z WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-37723) [] domain 'bda97276-a399-448f-9113-017972f6b55a:ovirt_production' in problem 'PROBLEMATIC'. vds: 'ovirt-sj-05.ictv.com'

2019-08-07 16:00:40,630Z INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-37735) [] Domain 'bda97276-a399-448f-9113-017972f6b55a:ovirt_production' recovered from problem. vds: 'ovirt-sj-05.ictv.com'

2019-08-07 16:00:40,652Z INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-37737) [] Domain 'bda97276-a399-448f-9113-017972f6b55a:ovirt_production' recovered from problem. vds: 'ovirt-sj-01.ictv.com'

2019-08-07 16:00:40,652Z INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-37737) [] Domain 'bda97276-a399-448f-9113-017972f6b55a:ovirt_production' has recovered from problem. No active host in the DC is reporting it as problematic, so clearing the domain recovery timer.”

 

Can you help me understanding why is this being reported?

 

This setup is:

 

5HOSTS, 3 in HA

SelfHostedEngine

Version 4.3.5

NFS based Netapp storage, version 4.1

“10.210.13.64:/ovirt_hosted_engine on /rhev/data-center/mnt/10.210.13.64:_ovirt__hosted__engine type nfs4 (rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=10.210.11.14,local_lock=none,addr=10.210.13.64)

 

10.210.13.64:/ovirt_production on /rhev/data-center/mnt/10.210.13.64:_ovirt__production type nfs4 (rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=10.210.11.14,local_lock=none,addr=10.210.13.64)

tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=9878396k,mode=700)”

 

First mount is SHE dedicated storage.

Second mount “ovirt_produciton” is for other VM Guests.

 

Kindly awaiting your reply.

 

Marko Vrgotic

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/H4FH6GYAYLUP5OIVHUTG7JAUTOZNP7Y3/