This is a multi-part message in MIME format.
--------------06F287541BC9D8DB37199903
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
I'm also seeing this error using a Dell MD3800i array. The multipath
errors shown in our logs are different however.
Feb 1 15:11:58 ovirt-node-production2 kernel: dd: sending ioctl
80306d02 to a partition!
Feb 1 15:21:01 ovirt-node-production2 multipathd: dm-31: remove map
(uevent)
Feb 1 15:21:01 ovirt-node-production2 multipathd: dm-31: devmap not
registered, can't remove
Feb 1 15:21:01 ovirt-node-production2 multipathd: dm-31: remove map
(uevent)
The dd error seems to happen every time that SPM runs a test.
On 01/31/2017 09:23 AM, Nathanaël Blanchet wrote:
exactly the same issue by there with FC EMC domain storage...
Le 31/01/2017 à 15:20, Gianluca Cecchi a écrit :
> Hello,
> my test environment is composed by 2 old HP blades BL685c G1
> (ovmsrv05 and ovmsrv06) and they are connected in a SAN with
> FC-switches to an old IBM DS4700 storage array.
> Apart from being old, they seem all ok from an hw point of view.
> I have configured oVirt 4.0.6 and an FCP storage domain.
> The hosts are plain CentOS 7.3 servers fully updated.
> It is not an hosted engine environment: the manager is a vm outside
> of the cluster.
> I have configured power mgmt on both and it works good.
>
> I have at the moment only one VM for test and it is doing quite nothing.
>
> Starting point: ovmsrv05 is in maintenance (since about 2 days) and
> the VM is running on ovmsrv06.
> I update qemu-kvm package on ovmsrv05 and then I restart it from web
> admin gui:
> Power Mgmt --> Restart
>
> Sequence of events in pane and the problem in subject:
> Jan 31, 2017 10:29:43 AM Host ovmsrv05 power management was verified
> successfully.
> Jan 31, 2017 10:29:43 AM Status of host ovmsrv05 was set to Up.
> Jan 31, 2017 10:29:38 AM Executing power management status on Host
> ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
> Jan 31, 2017 10:29:29 AM Activation of host ovmsrv05 initiated by
> admin@internal-authz.
> Jan 31, 2017 10:28:05 AM VM ol65 has recovered from paused back to up.
> Jan 31, 2017 10:27:55 AM VM ol65 has been paused due to storage I/O
> problem.
> Jan 31, 2017 10:27:55 AM VM ol65 has been paused.
> Jan 31, 2017 10:25:52 AM Host ovmsrv05 was restarted by
> admin@internal-authz.
> Jan 31, 2017 10:25:52 AM Host ovmsrv05 was started by
> admin@internal-authz.
> Jan 31, 2017 10:25:52 AM Power management start of Host ovmsrv05
> succeeded.
> Jan 31, 2017 10:25:50 AM Executing power management status on Host
> ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
> Jan 31, 2017 10:25:37 AM Executing power management start on Host
> ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
> Jan 31, 2017 10:25:37 AM Power management start of Host ovmsrv05
> initiated.
> Jan 31, 2017 10:25:37 AM Auto fence for host ovmsrv05 was started.
> Jan 31, 2017 10:25:37 AM All VMs' status on Non Responsive Host
> ovmsrv05 were changed to 'Down' by admin@internal-authz
> Jan 31, 2017 10:25:36 AM Host ovmsrv05 was stopped by
> admin@internal-authz.
> Jan 31, 2017 10:25:36 AM Power management stop of Host ovmsrv05
> succeeded.
> Jan 31, 2017 10:25:34 AM Executing power management status on Host
> ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
> Jan 31, 2017 10:25:15 AM Executing power management stop on Host
> ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
> Jan 31, 2017 10:25:15 AM Power management stop of Host ovmsrv05
> initiated.
> Jan 31, 2017 10:25:12 AM Executing power management status on Host
> ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
>
> Watching the timestamps, the culprit seems the reboot time of
> ovmsrv05 that detects some LUNs in owned state and other ones in unowned
> Full messages of both hosts here:
>
https://drive.google.com/file/d/0BwoPbcrMv8mvekZQT1pjc0NMRlU/view?usp=sha...
> and
>
https://drive.google.com/file/d/0BwoPbcrMv8mvcjBCYVdFZWdXTms/view?usp=sha...
>
> At this time there are 4 LUNs globally seen by the two hosts but only
> 1 of them is currently configured as the only storage domain in oVirt
> cluster.
>
> [root@ovmsrv05 ~]# multipath -l | grep ^36
> 3600a0b8000299aa80000d08b55014119 dm-5 IBM ,1814 FAStT
> 3600a0b80002999020000cd3c5501458f dm-3 IBM ,1814 FAStT
> 3600a0b80002999020000ccf855011198 dm-2 IBM ,1814 FAStT
> 3600a0b8000299aa80000d08955014098 dm-4 IBM ,1814 FAStT
>
> the configured one:
> [root@ovmsrv05 ~]# multipath -l 3600a0b8000299aa80000d08b55014119
> 3600a0b8000299aa80000d08b55014119 dm-5 IBM ,1814 FAStT
> size=4.0T features='0' hwhandler='1 rdac' wp=rw
> |-+- policy='service-time 0' prio=0 status=active
> | |- 0:0:1:3 sdl 8:176 active undef running
> | `- 2:0:1:3 sdp 8:240 active undef running
> `-+- policy='service-time 0' prio=0 status=enabled
> |- 0:0:0:3 sdd 8:48 active undef running
> `- 2:0:0:3 sdi 8:128 active undef running
>
> In mesages of booting node, arounf the problem registered by the storage:
> [root@ovmsrv05 ~]# grep owned /var/log/messages
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:1: rdac: LUN 1 (RDAC) (owned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:2: rdac: LUN 2 (RDAC) (owned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:3: rdac: LUN 3 (RDAC)
> (unowned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:1: rdac: LUN 1 (RDAC) (owned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:4: rdac: LUN 4 (RDAC)
> (unowned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:2: rdac: LUN 2 (RDAC) (owned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:1: rdac: LUN 1 (RDAC)
> (unowned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:3: rdac: LUN 3 (RDAC)
> (unowned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:4: rdac: LUN 4 (RDAC)
> (unowned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:2: rdac: LUN 2 (RDAC)
> (unowned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:1: rdac: LUN 1 (RDAC)
> (unowned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:3: rdac: LUN 3 (RDAC) (owned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:2: rdac: LUN 2 (RDAC)
> (unowned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:4: rdac: LUN 4 (RDAC) (owned)
> Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:3: rdac: LUN 3 (RDAC) (owned)
> Jan 31 10:27:39 ovmsrv05 kernel: scsi 2:0:1:4: rdac: LUN 4 (RDAC) (owned)
>
> I don't know exactly the meaning of owned/unowned in the output above..
> Possibly it detects the 0:0:1:3 and 2:0:1:3 paths (those of the
> active group) as "owned" and this could have created problems with
> the active node?
>
> On active node strangely I don't loose all the paths, but the VM has
> been paused anyway
>
> [root@ovmsrv06 log]# grep "remaining active path" /var/log/messages
> Jan 31 10:27:48 ovmsrv06 multipathd:
> 3600a0b8000299aa80000d08b55014119: remaining active paths: 3
> Jan 31 10:27:49 ovmsrv06 multipathd:
> 3600a0b8000299aa80000d08b55014119: remaining active paths: 2
> Jan 31 10:27:56 ovmsrv06 multipathd:
> 3600a0b8000299aa80000d08b55014119: remaining active paths: 3
> Jan 31 10:27:56 ovmsrv06 multipathd:
> 3600a0b8000299aa80000d08b55014119: remaining active paths: 2
> Jan 31 10:27:56 ovmsrv06 multipathd:
> 3600a0b8000299aa80000d08b55014119: remaining active paths: 1
> Jan 31 10:27:57 ovmsrv06 multipathd:
> 3600a0b8000299aa80000d08b55014119: remaining active paths: 2
> Jan 31 10:28:01 ovmsrv06 multipathd:
> 3600a0b8000299aa80000d08b55014119: remaining active paths: 3
> Jan 31 10:28:01 ovmsrv06 multipathd:
> 3600a0b8000299aa80000d08b55014119: remaining active paths: 4
>
> I'm not an expert of this storage array in particular, and of the
> rdac hardware handler in general.
>
> What I see is that multipath.conf on both nodes:
>
> # VDSM REVISION 1.3
>
> defaults {
> polling_interval 5
> no_path_retry fail
> user_friendly_names no
> flush_on_last_del yes
> fast_io_fail_tmo 5
> dev_loss_tmo 30
> max_fds 4096
> }
>
>
> devices {
> device {
> # These settings overrides built-in devices settings. It does
> not apply
> # to devices without built-in settings (these use the
> settings in the
> # "defaults" section), or to devices defined in the
"devices"
> section.
> # Note: This is not available yet on Fedora 21. For more info see
> #
https://bugzilla.redhat.com/1253799
> all_devs yes
> no_path_retry fail
> }
> }
>
>
> beginning of /proc/scsi/scsi
>
> [root@ovmsrv06 ~]# cat /proc/scsi/scsi
> Attached devices:
> Host: scsi1 Channel: 01 Id: 00 Lun: 00
> Vendor: HP Model: LOGICAL VOLUME Rev: 1.86
> Type: Direct-Access ANSI SCSI revision: 05
> Host: scsi0 Channel: 00 Id: 00 Lun: 01
> Vendor: IBM Model: 1814 FAStT Rev: 0916
> Type: Direct-Access ANSI SCSI revision: 05
> ...
>
> To get default acquired config for this storage:
>
> multpathd -k
> > show config
>
> I can see:
>
> device {
> vendor "IBM"
> product "^1814"
> product_blacklist "Universal Xport"
> path_grouping_policy "group_by_prio"
> path_checker "rdac"
> features "0"
> hardware_handler "1 rdac"
> prio "rdac"
> failback immediate
> rr_weight "uniform"
> no_path_retry "fail"
> }
>
>
> and
>
> defaults {
> verbosity 2
> polling_interval 5
> max_polling_interval 20
> reassign_maps "yes"
> multipath_dir "/lib64/multipath"
> path_selector "service-time 0"
> path_grouping_policy "failover"
> uid_attribute "ID_SERIAL"
> prio "const"
> prio_args ""
> features "0"
> path_checker "directio"
> alias_prefix "mpath"
> failback "manual"
> rr_min_io 1000
> rr_min_io_rq 1
> max_fds 4096
> rr_weight "uniform"
> no_path_retry "fail"
> queue_without_daemon "no"
> flush_on_last_del "yes"
> user_friendly_names "no"
> fast_io_fail_tmo 5
> dev_loss_tmo 30
> bindings_file "/etc/multipath/bindings"
> wwids_file /etc/multipath/wwids
> log_checker_err always
> find_multipaths no
> retain_attached_hw_handler no
> detect_prio no
> hw_str_match no
> force_sync no
> deferred_remove no
> ignore_new_boot_devs no
> skip_kpartx no
> config_dir "/etc/multipath/conf.d"
> delay_watch_checks no
> delay_wait_checks no
> retrigger_tries 3
> retrigger_delay 10
> missing_uev_wait_timeout 30
> new_bindings_in_boot no
> }
>
> Any hint on how to tune multipath.conf so that a powering on server
> doesn't create problems to running VMs?
>
> Thanks in advance,
> Gianluca
>
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org
>
http://lists.ovirt.org/mailman/listinfo/users
--
Nathanaël Blanchet
Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--------------06F287541BC9D8DB37199903
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: 8bit
<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>I'm also seeing this error using a Dell MD3800i array. The
multipath errors shown in our logs are different however.</p>
<p>Feb 1 15:11:58 ovirt-node-production2 kernel: dd: sending ioctl
80306d02 to a partition!<br>
Feb 1 15:21:01 ovirt-node-production2 multipathd: dm-31: remove
map (uevent)<br>
Feb 1 15:21:01 ovirt-node-production2 multipathd: dm-31: devmap
not registered, can't remove<br>
Feb 1 15:21:01 ovirt-node-production2 multipathd: dm-31: remove
map (uevent)<br>
</p>
<p>The dd error seems to happen every time that SPM runs a test.<br>
</p>
<br>
<div class="moz-cite-prefix">On 01/31/2017 09:23 AM, Nathanaël
Blanchet wrote:<br>
</div>
<blockquote cite="mid:304eaca8-f962-d994-1607-dbbdd265f0d5@abes.fr"
type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<p>exactly the same issue by there with FC EMC domain storage...</p>
<br>
<div class="moz-cite-prefix">Le 31/01/2017 à 15:20, Gianluca
Cecchi a écrit :<br>
</div>
<blockquote
cite="mid:CAG2kNCxWKBRLZ43OTXfAVJUPpLd3vNbN1ckw3sq2ZCr=tY+CtQ@mail.gmail.com"
type="cite">
<div dir="ltr">Hello,
<div>my test environment is composed by 2 old HP blades BL685c
G1 (ovmsrv05 and ovmsrv06) and they are connected in a SAN
with FC-switches to an old IBM DS4700 storage array.</div>
<div>Apart from being old, they seem all ok from an hw point
of view.</div>
<div>I have configured oVirt 4.0.6 and an FCP storage domain.</div>
<div>The hosts are plain CentOS 7.3 servers fully updated.</div>
<div>It is not an hosted engine environment: the manager is a
vm outside of the cluster.</div>
<div>I have configured power mgmt on both and it works good.</div>
<div><br>
</div>
<div>I have at the moment only one VM for test and it is
doing quite nothing.<br>
</div>
<div><br>
</div>
<div>Starting point: ovmsrv05 is in maintenance (since about 2
days) and the VM is running on ovmsrv06.</div>
<div>I update qemu-kvm package on ovmsrv05 and then I restart
it from web admin gui:</div>
<div>Power Mgmt --> Restart</div>
<div><br>
</div>
<div>Sequence of events in pane and the problem in subject:</div>
<div>
<div>Jan 31, 2017 10:29:43 AM Host ovmsrv05 power management
was verified successfully.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:29:43 AM Status of host ovmsrv05 was
set to Up.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:29:38 AM Executing power management
status on Host ovmsrv05 using Proxy Host ovmsrv06 and
Fence Agent ilo:10.4.192.212.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:29:29 AM Activation of host ovmsrv05
initiated by admin(a)internal-authz.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:28:05 AM VM ol65 has recovered from
paused back to up.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:27:55 AM VM ol65 has been paused due to
storage I/O problem.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:27:55 AM VM ol65 has been paused.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:52 AM Host ovmsrv05 was restarted by
admin(a)internal-authz.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:52 AM Host ovmsrv05 was started by
admin(a)internal-authz.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:52 AM Power management start of Host
ovmsrv05 succeeded.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:50 AM Executing power management
status on Host ovmsrv05 using Proxy Host ovmsrv06 and
Fence Agent ilo:10.4.192.212.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:37 AM Executing power management
start on Host ovmsrv05 using Proxy Host ovmsrv06 and Fence
Agent ilo:10.4.192.212.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:37 AM Power management start of Host
ovmsrv05 initiated.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:37 AM Auto fence for host ovmsrv05
was started.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:37 AM All VMs' status on Non
Responsive Host ovmsrv05 were changed to 'Down' by
admin@internal-authz</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:36 AM Host ovmsrv05 was stopped by
admin(a)internal-authz.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:36 AM Power management stop of Host
ovmsrv05 succeeded.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:34 AM Executing power management
status on Host ovmsrv05 using Proxy Host ovmsrv06 and
Fence Agent ilo:10.4.192.212.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:15 AM Executing power management
stop on Host ovmsrv05 using Proxy Host ovmsrv06 and Fence
Agent ilo:10.4.192.212.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:15 AM Power management stop of Host
ovmsrv05 initiated.</div>
<div><span class="gmail-Apple-tab-span"
style="white-space:pre"> </span></div>
<div>Jan 31, 2017 10:25:12 AM Executing power management
status on Host ovmsrv05 using Proxy Host ovmsrv06 and
Fence Agent ilo:10.4.192.212.</div>
</div>
<div><br>
</div>
<div>Watching the timestamps, the culprit seems the reboot
time of ovmsrv05 that detects some LUNs in owned state and
other ones in unowned</div>
<div>Full messages of both hosts here:</div>
<div><a moz-do-not-send="true"
href="https://drive.google.com/file/d/0BwoPbcrMv8mvekZQT1pjc0NMRlU/v...
</div>
<div>and</div>
<div><a moz-do-not-send="true"
href="https://drive.google.com/file/d/0BwoPbcrMv8mvcjBCYVdFZWdXTms/v...
</div>
<div><br>
</div>
<div>At this time there are 4 LUNs globally seen by the two
hosts but only 1 of them is currently configured as the only
storage domain in oVirt cluster.</div>
<div><br>
</div>
<div>
<div>[root@ovmsrv05 ~]# multipath -l | grep ^36</div>
<div>3600a0b8000299aa80000d08b55014119 dm-5 IBM ,1814
FAStT </div>
<div>3600a0b80002999020000cd3c5501458f dm-3 IBM ,1814
FAStT </div>
<div>3600a0b80002999020000ccf855011198 dm-2 IBM ,1814
FAStT </div>
<div>3600a0b8000299aa80000d08955014098 dm-4 IBM ,1814
FAStT </div>
</div>
<div><br>
</div>
<div>the configured one:</div>
<div>
<div>[root@ovmsrv05 ~]# multipath -l
3600a0b8000299aa80000d08b55014119</div>
<div>3600a0b8000299aa80000d08b55014119 dm-5 IBM ,1814
FAStT </div>
<div>size=4.0T features='0' hwhandler='1 rdac'
wp=rw</div>
<div>|-+- policy='service-time 0' prio=0
status=active</div>
<div>| |- 0:0:1:3 sdl 8:176 active undef running</div>
<div>| `- 2:0:1:3 sdp 8:240 active undef running</div>
<div>`-+- policy='service-time 0' prio=0
status=enabled</div>
<div> |- 0:0:0:3 sdd 8:48 active undef running</div>
<div> `- 2:0:0:3 sdi 8:128 active undef running</div>
</div>
<div><br>
</div>
<div>In mesages of booting node, arounf the problem registered
by the storage:</div>
<div>
<div>[root@ovmsrv05 ~]# grep owned /var/log/messages</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:1: rdac:
LUN 1 (RDAC) (owned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:2: rdac:
LUN 2 (RDAC) (owned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:3: rdac:
LUN 3 (RDAC) (unowned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:1: rdac:
LUN 1 (RDAC) (owned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:4: rdac:
LUN 4 (RDAC) (unowned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:2: rdac:
LUN 2 (RDAC) (owned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:1: rdac:
LUN 1 (RDAC) (unowned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:3: rdac:
LUN 3 (RDAC) (unowned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:4: rdac:
LUN 4 (RDAC) (unowned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:2: rdac:
LUN 2 (RDAC) (unowned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:1: rdac:
LUN 1 (RDAC) (unowned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:3: rdac:
LUN 3 (RDAC) (owned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:2: rdac:
LUN 2 (RDAC) (unowned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:4: rdac:
LUN 4 (RDAC) (owned)</div>
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:3: rdac:
LUN 3 (RDAC) (owned)</div>
<div>Jan 31 10:27:39 ovmsrv05 kernel: scsi 2:0:1:4: rdac:
LUN 4 (RDAC) (owned)</div>
</div>
<div><br>
</div>
<div>I don't know exactly the meaning of owned/unowned in the
output above..</div>
<div>Possibly it detects the 0:0:1:3 and 2:0:1:3 paths (those
of the active group) as "owned" and this could have created
problems with the active node?</div>
<div><br>
</div>
<div>On active node strangely I don't loose all the paths, but
the VM has been paused anyway</div>
<div><br>
</div>
<div>
<div>[root@ovmsrv06 log]# grep "remaining active
path" /var/log/messages </div>
<div>Jan 31 10:27:48 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths:
3</div>
<div>Jan 31 10:27:49 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths:
2</div>
<div>Jan 31 10:27:56 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths:
3</div>
<div>Jan 31 10:27:56 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths:
2</div>
<div>Jan 31 10:27:56 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths:
1</div>
<div>Jan 31 10:27:57 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths:
2</div>
<div>Jan 31 10:28:01 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths:
3</div>
<div>Jan 31 10:28:01 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths:
4</div>
</div>
<div><br>
</div>
<div>I'm not an expert of this storage array in particular,
and of the rdac hardware handler in general.</div>
<div><br>
</div>
<div>What I see is that multipath.conf on both nodes:</div>
<div><br>
</div>
<div>
<div># VDSM REVISION 1.3</div>
<div><br>
</div>
<div>defaults {</div>
<div> polling_interval 5</div>
<div> no_path_retry fail</div>
<div> user_friendly_names no</div>
<div> flush_on_last_del yes</div>
<div> fast_io_fail_tmo 5</div>
<div> dev_loss_tmo 30</div>
<div> max_fds 4096</div>
<div>}</div>
<div><br>
</div>
<div><br>
</div>
<div>devices {</div>
<div> device {</div>
<div> # These settings overrides built-in devices
settings. It does not apply</div>
<div> # to devices without built-in settings (these
use the settings in the</div>
<div> # "defaults" section), or to devices defined in
the "devices" section.</div>
<div> # Note: This is not available yet on Fedora 21.
For more info see</div>
<div> # <a moz-do-not-send="true"
href="https://bugzilla.redhat.com/1253799">https://bugzilla....
<div> all_devs yes</div>
<div> no_path_retry fail</div>
<div> }</div>
<div>}</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>beginning of /proc/scsi/scsi</div>
<div><br>
</div>
<div>[root@ovmsrv06 ~]# cat /proc/scsi/scsi </div>
<div>Attached devices:</div>
<div>Host: scsi1 Channel: 01 Id: 00 Lun: 00</div>
<div> Vendor: HP Model: LOGICAL VOLUME Rev: 1.86</div>
<div> Type: Direct-Access ANSI SCSI
revision: 05</div>
<div>Host: scsi0 Channel: 00 Id: 00 Lun: 01</div>
<div> Vendor: IBM Model: 1814 FAStT Rev: 0916</div>
<div> Type: Direct-Access ANSI SCSI
revision: 05</div>
</div>
<div>...</div>
<div><br>
</div>
<div>To get default acquired config for this storage:</div>
<div>
<div><br>
</div>
<div>multpathd -k</div>
<div>> show config</div>
<div><br>
</div>
<div>I can see:</div>
<div><br>
</div>
<div> device {</div>
<div> vendor "IBM"</div>
<div> product "^1814"</div>
<div> product_blacklist "Universal
Xport"</div>
<div> path_grouping_policy
"group_by_prio"</div>
<div> path_checker "rdac"</div>
<div> features "0"</div>
<div> hardware_handler "1 rdac"</div>
<div> prio "rdac"</div>
<div> failback immediate</div>
<div> rr_weight "uniform"</div>
<div> no_path_retry "fail"</div>
<div> }</div>
<div><br>
</div>
<div><br>
</div>
<div>and</div>
<div><br>
</div>
<div>defaults {</div>
<div> verbosity 2</div>
<div> polling_interval 5</div>
<div> max_polling_interval 20</div>
<div> reassign_maps "yes"</div>
<div> multipath_dir "/lib64/multipath"</div>
<div> path_selector "service-time 0"</div>
<div> path_grouping_policy "failover"</div>
<div> uid_attribute "ID_SERIAL"</div>
<div> prio "const"</div>
<div> prio_args ""</div>
<div> features "0"</div>
<div> path_checker "directio"</div>
<div> alias_prefix "mpath"</div>
<div> failback "manual"</div>
<div> rr_min_io 1000</div>
<div> rr_min_io_rq 1</div>
<div> max_fds 4096</div>
<div> rr_weight "uniform"</div>
<div> no_path_retry "fail"</div>
<div> queue_without_daemon "no"</div>
<div> flush_on_last_del "yes"</div>
<div> user_friendly_names "no"</div>
<div> fast_io_fail_tmo 5</div>
<div> dev_loss_tmo 30</div>
<div> bindings_file
"/etc/multipath/bindings"</div>
<div> wwids_file /etc/multipath/wwids</div>
<div> log_checker_err always</div>
<div> find_multipaths no</div>
<div> retain_attached_hw_handler no</div>
<div> detect_prio no</div>
<div> hw_str_match no</div>
<div> force_sync no</div>
<div> deferred_remove no</div>
<div> ignore_new_boot_devs no</div>
<div> skip_kpartx no</div>
<div> config_dir "/etc/multipath/conf.d"</div>
<div> delay_watch_checks no</div>
<div> delay_wait_checks no</div>
<div> retrigger_tries 3</div>
<div> retrigger_delay 10</div>
<div> missing_uev_wait_timeout 30</div>
<div> new_bindings_in_boot no</div>
<div>}</div>
<div><br>
</div>
</div>
<div>Any hint on how to tune multipath.conf so that a powering
on server doesn't create problems to running VMs?</div>
<div><br>
</div>
<div>Thanks in advance,</div>
<div>Gianluca</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://lists.ovirt.org/mailman/listinfo/users">http://...
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Nathanaël Blanchet
Supervision réseau
Pôle Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5
Tél. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
<a moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:blanchet@abes.fr">blanchet@abes.fr</a> </pre>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated"
href="mailto:Users@ovirt.org">Users@ovirt.org</a>
<a class="moz-txt-link-freetext"
href="http://lists.ovirt.org/mailman/listinfo/users">http://...
</pre>
</blockquote>
<br>
</body>
</html>
--------------06F287541BC9D8DB37199903--