Re: [Users] Host stuck in unresponsive state

1 Sep 2013

      Hi Frank,

I sometimes have (had) the same issues with all-in-one-setups, so I don't use local storage in all-in-one-setup anymore.
Instead I share a directory on my node via NFS, create a new NFS datacenter and mount it locally.
This might now the best way to do it, but I have better experience with this setup as with local storage.

Btw, when changing multipath.conf make sure you set "RHEV PRIVATE" below "RHEV REVISION X.Y" to avoid losing your changes during next reboot.

With iSCSI and FC backends vdsm is working fine in combination with multipath. In such setups multipath absolutely makes sense, but I also don't understand why multipathing is used for local storage - disks are controlled by a (hardware) raid controller and there's no alternate path oVirt could use in case of storage loss or for better throughput...

Regards,
René

-----Original message-----
...
From:Frank Wall <fw@moov.de>
Sent: Sunday 1st September 2013 16:40
To: users@ovirt.org
Subject: Re: [Users] Host stuck in unresponsive state
On 01.09.2013 01:28, Frank Wall wrote:
...
OK, for some reason it got stuck trying to start "iscsid" and
"multipathd". I was able to solve the issues with these services and
now the real error message is visible:
Did some more fiddling... I removed my /etc/multipath.conf and started 
with the new file. Apparently there is a syntax error in this 
auto-generated config:
[root@aio ~]# multipath -ll
Sep 01 00:32:27 | multipath.conf +5, invalid keyword: getuid_callout
Sep 01 00:32:27 | multipath.conf +18, invalid keyword: getuid_callout
OK, I removed lines 5 and 18 and now multipathd is working again. This 
time it was possible to successfully start vdsmd afterwards:
[root@aio ~]# systemctl status vdsmd.service
vdsmd.service - Virtual Desktop Server Manager
    Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
    Active: active (running) since So 2013-09-01 16:25:45 CEST; 1min 30s 
ago
   Process: 3138 ExecStart=/lib/systemd/systemd-vdsmd start (code=exited, 
status=0/SUCCESS)
  Main PID: 3285 (respawn)
    CGroup: name=systemd:/system/vdsmd.service
            ├─3285 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 
--daemon --masterpid /var/run/vdsm/respawn.pid /us...
            └─3288 /usr/bin/python /usr/share/vdsm/vdsm
Sep 01 16:25:45 aio.exmaple.com python[3288]: DIGEST-MD5 client step 2
Sep 01 16:25:45 aio.exmaple.com python[3288]: DIGEST-MD5 
parse_server_challenge()
Sep 01 16:25:45 aio.exmaple.com python[3288]: DIGEST-MD5 ask_user_info()
Sep 01 16:25:45 aio.exmaple.com vdsm[3288]: vdsm vds WARNING Unable to 
load the json rpc server module. Please make su...alled.
Sep 01 16:25:45 aio.exmaple.com python[3288]: DIGEST-MD5 client step 2
Sep 01 16:25:45 aio.exmaple.com python[3288]: DIGEST-MD5 ask_user_info()
Sep 01 16:25:45 aio.exmaple.com python[3288]: DIGEST-MD5 
make_client_response()
Sep 01 16:25:45 aio.exmaple.com python[3288]: DIGEST-MD5 client step 3
Sep 01 16:25:54 aio.exmaple.com vdsm[3288]: vdsm TaskManager.Task ERROR 
Task=`7fc3840c-1518-4260-9f27-ee20434b5a7a`::U... error
Sep 01 16:25:54 aio.exmaple.com vdsm[3288]: vdsm TaskManager.Task ERROR 
Task=`82f757b5-a669-40fa-b09d-9cad90c971e1`::U... error
Still, this doesn't feel right. I think vdsmd is just too unstable and 
vulnerable. Why did vdsmd core dump with another multipathd config in 
place? Why does it even have this strict dependency on multipathd?
There have been severel similar reports in the last months and I wonder 
if there is a way to make vdsmd just more stable. It would be better to 
have vdsmd started and report an error to ovirt-engine, instead of 
failing to start the vdsmd service all the time. The current behaviour 
makes it hard to debug.
Thanks
- Frank
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [Users] Host stuck in unresponsive state

René Koch