[ovirt-users] VM has been paused due to storage I/O problem

Nir Soffer nsoffer at redhat.com
Thu Feb 2 16:27:58 UTC 2017


On Thu, Feb 2, 2017 at 6:05 PM, Gianluca Cecchi
<gianluca.cecchi at gmail.com> wrote:
> On Thu, Feb 2, 2017 at 3:51 PM, Nir Soffer <nsoffer at redhat.com> wrote:
>>
>>
>> > Can you confirm that the host can be active when I restart vdsmd
>> > service?
>>
>> Sure. This may abort a storage operation if one is running when you
>> restart
>> vdsm, but vdsm is designed so you can restart or kill it safely.
>>
>> For example, if you abort a disk copy in the middle, the operation will
>> fail
>> and the destination disk will be deleted.
>>
>> If you want to avoid such issue, you can put a host to maintenance, but
>> this
>> requires migration of vms to other hosts.
>>
>> Nir
>
>
> OK. Created 50_thin_block_extension_rules.conf under /etc/vdsm/vdsm.conf.d
> and restarted vdsmd
>
> One last (latest probably... ;-) question
> Is it expected that if I restart vdsmd on the host that is the SPM, then SPM
> is shifted to another node?

Yes, engine will move spm to another host when spm fails, unless you
disabled spm role for any other host (see host > spm tab).

> Because when restarting vdsmd on the host that is not SPM I didn't get any
> message in web admin gui and restart of vdsmd itself was very fast.
> Instead on the host with SPM, the command took several seconds and I got
> these events

It is expected the restarting the spm is slower, but we need to see vdsm logs
to understand why.

> Feb 2, 2017 4:01:23 PM Host ovmsrv05 power management was verified
> successfully.
> Feb 2, 2017 4:01:23 PM Status of host ovmsrv05 was set to Up.
> Feb 2, 2017 4:01:19 PM Executing power management status on Host ovmsrv05
> using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
> Feb 2, 2017 4:01:18 PM Storage Pool Manager runs on Host ovmsrv06 (Address:
> ovmsrv06.datacenter.polimi.it).
> Feb 2, 2017 4:01:13 PM VDSM ovmsrv05 command failed: Recovering from crash
> or Initializing
> Feb 2, 2017 4:01:11 PM Host ovmsrv05 is initializing. Message: Recovering
> from crash or Initializing
> Feb 2, 2017 4:01:11 PM VDSM ovmsrv05 command failed: Recovering from crash
> or Initializing
> Feb 2, 2017 4:01:11 PM Invalid status on Data Center Default. Setting Data
> Center status to Non Responsive (On host ovmsrv05, Error: Recovering from
> crash or Initializing).
> Feb 2, 2017 4:01:11 PM VDSM ovmsrv05 command failed: Recovering from crash
> or Initializing
> Feb 2, 2017 4:01:05 PM Host ovmsrv05 is not responding. It will stay in
> Connecting state for a grace period of 80 seconds and after that an attempt
> to fence the host will be issued.
> Feb 2, 2017 4:01:05 PM Host ovmsrv05 is not responding. It will stay in
> Connecting state for a grace period of 80 seconds and after that an attempt
> to fence the host will be issued.
> Feb 2, 2017 4:01:05 PM VDSM ovmsrv05 command failed: Connection reset by
> peer

It look like the engine discovered that the SPM was down, and reconnected.

It is expected that changes in the spm status are detected early and engine
is trying to recover the spm, the SPM role is critical in ovirt.

Are you sure you did not get any message when restarting the other host?
I would expect that engine detect and report a restart of all hosts.

If you can reproduce this, restarting vdsm is not detected on engine and
not reported in engine even log, please file a bug.

Nir


More information about the Users mailing list