
On Thu, Feb 2, 2017 at 6:05 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
On Thu, Feb 2, 2017 at 3:51 PM, Nir Soffer <nsoffer@redhat.com> wrote:
Can you confirm that the host can be active when I restart vdsmd service?
Sure. This may abort a storage operation if one is running when you restart vdsm, but vdsm is designed so you can restart or kill it safely.
For example, if you abort a disk copy in the middle, the operation will fail and the destination disk will be deleted.
If you want to avoid such issue, you can put a host to maintenance, but this requires migration of vms to other hosts.
Nir
OK. Created 50_thin_block_extension_rules.conf under /etc/vdsm/vdsm.conf.d and restarted vdsmd
One last (latest probably... ;-) question Is it expected that if I restart vdsmd on the host that is the SPM, then SPM is shifted to another node?
Yes, engine will move spm to another host when spm fails, unless you disabled spm role for any other host (see host > spm tab).
Because when restarting vdsmd on the host that is not SPM I didn't get any message in web admin gui and restart of vdsmd itself was very fast. Instead on the host with SPM, the command took several seconds and I got these events
It is expected the restarting the spm is slower, but we need to see vdsm logs to understand why.
Feb 2, 2017 4:01:23 PM Host ovmsrv05 power management was verified successfully. Feb 2, 2017 4:01:23 PM Status of host ovmsrv05 was set to Up. Feb 2, 2017 4:01:19 PM Executing power management status on Host ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212. Feb 2, 2017 4:01:18 PM Storage Pool Manager runs on Host ovmsrv06 (Address: ovmsrv06.datacenter.polimi.it). Feb 2, 2017 4:01:13 PM VDSM ovmsrv05 command failed: Recovering from crash or Initializing Feb 2, 2017 4:01:11 PM Host ovmsrv05 is initializing. Message: Recovering from crash or Initializing Feb 2, 2017 4:01:11 PM VDSM ovmsrv05 command failed: Recovering from crash or Initializing Feb 2, 2017 4:01:11 PM Invalid status on Data Center Default. Setting Data Center status to Non Responsive (On host ovmsrv05, Error: Recovering from crash or Initializing). Feb 2, 2017 4:01:11 PM VDSM ovmsrv05 command failed: Recovering from crash or Initializing Feb 2, 2017 4:01:05 PM Host ovmsrv05 is not responding. It will stay in Connecting state for a grace period of 80 seconds and after that an attempt to fence the host will be issued. Feb 2, 2017 4:01:05 PM Host ovmsrv05 is not responding. It will stay in Connecting state for a grace period of 80 seconds and after that an attempt to fence the host will be issued. Feb 2, 2017 4:01:05 PM VDSM ovmsrv05 command failed: Connection reset by peer
It look like the engine discovered that the SPM was down, and reconnected. It is expected that changes in the spm status are detected early and engine is trying to recover the spm, the SPM role is critical in ovirt. Are you sure you did not get any message when restarting the other host? I would expect that engine detect and report a restart of all hosts. If you can reproduce this, restarting vdsm is not detected on engine and not reported in engine even log, please file a bug. Nir