This is a multi-part message in MIME format.
--------------D335D695CB22377475546514
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
exactly the same issue by there with FC EMC domain storage...
Le 31/01/2017 =E0 15:20, Gianluca Cecchi a =E9crit :
Hello,
my test environment is composed by 2 old HP blades BL685c G1 (ovmsrv05=20
and ovmsrv06) and they are connected in a SAN with FC-switches to an=20
old IBM DS4700 storage array.
Apart from being old, they seem all ok from an hw point of view.
I have configured oVirt 4.0.6 and an FCP storage domain.
The hosts are plain CentOS 7.3 servers fully updated.
It is not an hosted engine environment: the manager is a vm outside of=20
the cluster.
I have configured power mgmt on both and it works good.
I have at the moment only one VM for test and it is doing quite nothin=
g.
Starting point: ovmsrv05 is in maintenance (since about 2 days) and=20
the VM is running on ovmsrv06.
I update qemu-kvm package on ovmsrv05 and then I restart it from web=20
admin gui:
Power Mgmt --> Restart
Sequence of events in pane and the problem in subject:
Jan 31, 2017 10:29:43 AM Host ovmsrv05 power management was verified=20
successfully.
Jan 31, 2017 10:29:43 AM Status of host ovmsrv05 was set to Up.
Jan 31, 2017 10:29:38 AM Executing power management status on Host=20
ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:29:29 AM Activation of host ovmsrv05 initiated by=20
admin@internal-authz.
Jan 31, 2017 10:28:05 AM VM ol65 has recovered from paused back to up.
Jan 31, 2017 10:27:55 AM VM ol65 has been paused due to storage I/O=20
problem.
Jan 31, 2017 10:27:55 AM VM ol65 has been paused.
Jan 31, 2017 10:25:52 AM Host ovmsrv05 was restarted by=20
admin@internal-authz.
Jan 31, 2017 10:25:52 AM Host ovmsrv05 was started by=20
admin@internal-authz.
Jan 31, 2017 10:25:52 AM Power management start of Host ovmsrv05=20
succeeded.
Jan 31, 2017 10:25:50 AM Executing power management status on Host=20
ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:25:37 AM Executing power management start on Host=20
ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:25:37 AM Power management start of Host ovmsrv05=20
initiated.
Jan 31, 2017 10:25:37 AM Auto fence for host ovmsrv05 was started.
Jan 31, 2017 10:25:37 AM All VMs' status on Non Responsive Host=20
ovmsrv05 were changed to 'Down' by admin@internal-authz
Jan 31, 2017 10:25:36 AM Host ovmsrv05 was stopped by=20
admin@internal-authz.
Jan 31, 2017 10:25:36 AM Power management stop of Host ovmsrv05 succeed=
ed.
Jan 31, 2017 10:25:34 AM Executing power management status on
Host=20
ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:25:15 AM Executing power management stop on Host=20
ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Jan 31, 2017 10:25:15 AM Power management stop of Host ovmsrv05 initiat=
ed.
Jan 31, 2017 10:25:12 AM Executing power management status on
Host=20
ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent ilo:10.4.192.212.
Watching the timestamps, the culprit seems the reboot time of ovmsrv05=20
that detects some LUNs in owned state and other ones in unowned
Full messages of both hosts here:
https://drive.google.com/file/d/0BwoPbcrMv8mvekZQT1pjc0NMRlU/view?usp=3D= sharing
sharing
At this time there are 4 LUNs globally seen by the two hosts but only=20
1 of them is currently configured as the only storage domain in oVirt=20
cluster.
[root@ovmsrv05 ~]# multipath -l | grep ^36
3600a0b8000299aa80000d08b55014119 dm-5 IBM ,1814 FAStT
3600a0b80002999020000cd3c5501458f dm-3 IBM ,1814 FAStT
3600a0b80002999020000ccf855011198 dm-2 IBM ,1814 FAStT
3600a0b8000299aa80000d08955014098 dm-4 IBM ,1814 FAStT
the configured one:
[root@ovmsrv05 ~]# multipath -l 3600a0b8000299aa80000d08b55014119
3600a0b8000299aa80000d08b55014119 dm-5 IBM ,1814 FAStT
size=3D4.0T features=3D'0' hwhandler=3D'1 rdac' wp=3Drw
|-+- policy=3D'service-time 0' prio=3D0 status=3Dactive
| |- 0:0:1:3 sdl 8:176 active undef running
| `- 2:0:1:3 sdp 8:240 active undef running
`-+- policy=3D'service-time 0' prio=3D0 status=3Denabled
|- 0:0:0:3 sdd 8:48 active undef running
`- 2:0:0:3 sdi 8:128 active undef running
In mesages of booting node, arounf the problem registered by the storag=
e:
[root@ovmsrv05 ~]# grep owned /var/log/messages
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:1: rdac: LUN 1 (RDAC) (owne=
d)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:2: rdac: LUN 2 (RDAC)
(owne=
d)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:3: rdac: LUN 3 (RDAC)=20
(unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:1: rdac: LUN 1 (RDAC) (owne=
d)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:4: rdac: LUN 4 (RDAC)=20
(unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:2: rdac: LUN 2 (RDAC) (owne=
d)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:1: rdac: LUN 1 (RDAC)=20
(unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:3: rdac: LUN 3 (RDAC)=20
(unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:4: rdac: LUN 4 (RDAC)=20
(unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:2: rdac: LUN 2 (RDAC)=20
(unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:1: rdac: LUN 1 (RDAC)=20
(unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:3: rdac: LUN 3 (RDAC) (owne=
d)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:2: rdac: LUN 2 (RDAC)=20
(unowned)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:4: rdac: LUN 4 (RDAC) (owne=
d)
Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:3: rdac: LUN 3 (RDAC)
(owne=
d)
Jan 31 10:27:39 ovmsrv05 kernel: scsi 2:0:1:4: rdac: LUN 4 (RDAC)
(owne=
d)
I don't know exactly the meaning of owned/unowned in the output above..
Possibly it detects the 0:0:1:3 and 2:0:1:3 paths (those of the active=20
group) as "owned" and this could have created problems with the active=20
node?
On active node strangely I don't loose all the paths, but the VM has=20
been paused anyway
[root@ovmsrv06 log]# grep "remaining active path" /var/log/messages
Jan 31 10:27:48 ovmsrv06 multipathd:=20
3600a0b8000299aa80000d08b55014119: remaining active paths: 3
Jan 31 10:27:49 ovmsrv06 multipathd:=20
3600a0b8000299aa80000d08b55014119: remaining active paths: 2
Jan 31 10:27:56 ovmsrv06 multipathd:=20
3600a0b8000299aa80000d08b55014119: remaining active paths: 3
Jan 31 10:27:56 ovmsrv06 multipathd:=20
3600a0b8000299aa80000d08b55014119: remaining active paths: 2
Jan 31 10:27:56 ovmsrv06 multipathd:=20
3600a0b8000299aa80000d08b55014119: remaining active paths: 1
Jan 31 10:27:57 ovmsrv06 multipathd:=20
3600a0b8000299aa80000d08b55014119: remaining active paths: 2
Jan 31 10:28:01 ovmsrv06 multipathd:=20
3600a0b8000299aa80000d08b55014119: remaining active paths: 3
Jan 31 10:28:01 ovmsrv06 multipathd:=20
3600a0b8000299aa80000d08b55014119: remaining active paths: 4
I'm not an expert of this storage array in particular, and of the rdac=20
hardware handler in general.
What I see is that multipath.conf on both nodes:
# VDSM REVISION 1.3
defaults {
polling_interval 5
no_path_retry fail
user_friendly_names no
flush_on_last_del yes
fast_io_fail_tmo 5
dev_loss_tmo 30
max_fds 4096
}
devices {
device {
# These settings overrides built-in devices settings. It does=20
not apply
# to devices without built-in settings (these use the settings=20
in the
# "defaults" section), or to devices defined in the
"devices"=20
section.
# Note: This is not available yet on Fedora 21. For more info s=
ee
#
https://bugzilla.redhat.com/1253799
all_devs yes
no_path_retry fail
}
}
beginning of /proc/scsi/scsi
[root@ovmsrv06 ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi1 Channel: 01 Id: 00 Lun: 00
Vendor: HP Model: LOGICAL VOLUME Rev: 1.86
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi0 Channel: 00 Id: 00 Lun: 01
Vendor: IBM Model: 1814 FAStT Rev: 0916
Type: Direct-Access ANSI SCSI revision: 05
...
To get default acquired config for this storage:
multpathd -k
> show config
I can see:
device {
vendor "IBM"
product "^1814"
product_blacklist "Universal Xport"
path_grouping_policy "group_by_prio"
path_checker "rdac"
features "0"
hardware_handler "1 rdac"
prio "rdac"
failback immediate
rr_weight "uniform"
no_path_retry "fail"
}
and
defaults {
verbosity 2
polling_interval 5
max_polling_interval 20
reassign_maps "yes"
multipath_dir "/lib64/multipath"
path_selector "service-time 0"
path_grouping_policy "failover"
uid_attribute "ID_SERIAL"
prio "const"
prio_args ""
features "0"
path_checker "directio"
alias_prefix "mpath"
failback "manual"
rr_min_io 1000
rr_min_io_rq 1
max_fds 4096
rr_weight "uniform"
no_path_retry "fail"
queue_without_daemon "no"
flush_on_last_del "yes"
user_friendly_names "no"
fast_io_fail_tmo 5
dev_loss_tmo 30
bindings_file "/etc/multipath/bindings"
wwids_file /etc/multipath/wwids
log_checker_err always
find_multipaths no
retain_attached_hw_handler no
detect_prio no
hw_str_match no
force_sync no
deferred_remove no
ignore_new_boot_devs no
skip_kpartx no
config_dir "/etc/multipath/conf.d"
delay_watch_checks no
delay_wait_checks no
retrigger_tries 3
retrigger_delay 10
missing_uev_wait_timeout 30
new_bindings_in_boot no
}
Any hint on how to tune multipath.conf so that a powering on server=20
doesn't create problems to running VMs?
Thanks in advance,
Gianluca
_______________________________________________
Users mailing list
Users(a)ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
--=20
Nathana=EBl Blanchet
Supervision r=E9seau
P=F4le Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5 =09
T=E9l. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
blanchet(a)abes.fr
--------------D335D695CB22377475546514
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
<html
<head
<meta content=3D"text/html;
charset=3Dwindows-1252"
http-equiv=3D"Content-Type"
</head
<body
bgcolor=3D"#FFFFFF" text=3D"#000000"
<p>exactly the same issue by there with FC EMC domain storage...</p
<br
<div class=3D"moz-cite-prefix">Le 31/01/2017 =E0 15:20, Gianluca Cecc=
hi
a =E9crit=A0:<br
</div
<blockquote
cite=3D"mid:CAG2kNCxWKBRLZ43OTXfAVJUPpLd3vNbN1ckw3sq2ZCr=3DtY+CtQ@mail.gm=
ail.com"
type=3D"cite"
<div
dir=3D"ltr">Hello,
<div>my test environment is composed by 2 old HP blades BL685c
G1 (ovmsrv05 and ovmsrv06) and they are connected in a SAN
with FC-switches to an old IBM DS4700 storage array.</div
<div>Apart from being old, they seem all ok from
an hw point of
view.</div
<div>I have configured
oVirt 4.0.6 and an FCP storage domain.</di=
v
<div>The hosts are plain CentOS 7.3 servers fully
updated.</div
<div>It is not an
hosted engine environment: the manager is a vm
outside of the cluster.</div
<div>I have configured power mgmt on both and it works good.</div=
<div><br
</div
<div>I have at the
moment =A0only one VM for test and it is doing
quite nothing.<br
</div
<div><br
</div
<div>Starting point:
ovmsrv05 is in maintenance (since about 2
days) and the VM is running on ovmsrv06.</div
<div>I update qemu-kvm package on ovmsrv05 and then I restart it
from web admin gui:</div
<div>Power Mgmt --> Restart</div
<div><br
</div
<div>Sequence of events in pane and the problem
in subject:</div
<div
<div>Jan 31, 2017 10:29:43 AM Host ovmsrv05
power management
was verified successfully.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:29:43 AM Status of host ovmsrv05 was set
to Up.</div
<div><span
class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:29:38 AM Executing power management
status on Host ovmsrv05 using Proxy Host ovmsrv06 and Fence
Agent ilo:10.4.192.212.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:29:29 AM Activation of host ovmsrv05
initiated by admin(a)internal-authz.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:28:05 AM VM ol65 has recovered from
paused back to up.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:27:55 AM VM ol65 has been paused due to
storage I/O problem.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:27:55 AM VM ol65 has been paused.</div
<div><span
class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:52 AM Host ovmsrv05 was restarted by
admin(a)internal-authz.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:52 AM Host ovmsrv05 was started by
admin(a)internal-authz.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:52 AM Power management start of Host
ovmsrv05 succeeded.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:50 AM Executing power management
status on Host ovmsrv05 using Proxy Host ovmsrv06 and Fence
Agent ilo:10.4.192.212.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:37 AM Executing power management start
on Host ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent
ilo:10.4.192.212.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:37 AM Power management start of Host
ovmsrv05 initiated.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:37 AM Auto fence for host ovmsrv05 was
started.</div
<div><span
class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:37 AM All VMs' status on Non
Responsive Host ovmsrv05 were changed to 'Down' by
admin@internal-authz</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:36 AM Host ovmsrv05 was stopped by
admin(a)internal-authz.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:36 AM Power management stop of Host
ovmsrv05 succeeded.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:34 AM Executing power management
status on Host ovmsrv05 using Proxy Host ovmsrv06 and Fence
Agent ilo:10.4.192.212.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:15 AM Executing power management stop
on Host ovmsrv05 using Proxy Host ovmsrv06 and Fence Agent
ilo:10.4.192.212.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:15 AM Power management stop of Host
ovmsrv05 initiated.</div
<div><span class=3D"gmail-Apple-tab-span" style=3D"white-space:=
pre"> </span></div
<div>Jan 31, 2017 10:25:12 AM Executing power management
status on Host ovmsrv05 using Proxy Host ovmsrv06 and Fence
Agent ilo:10.4.192.212.</div
</div
<div><br
</div
<div>Watching the timestamps, the culprit seems the reboot time
of ovmsrv05 that detects some LUNs in owned state and other
ones in unowned</div
<div>Full messages of
both hosts here:</div
<div><a
moz-do-not-send=3D"true"
href=3D"https://drive.google.com/file/d/0BwoPbcrMv8mvekZQT1pjc0NMRlU...
?usp=3Dsharing">https://drive.google.com/file/d/0BwoPbcrMv8mvekZQT1pjc0NM=
RlU/view?usp=3Dsharing</a><br
</div
<div>and</div
<div><a moz-do-not-send=3D"true"
href=3D"https://drive.google.com/file/d/0BwoPbcrMv8mvcjBCYVdFZWdXTms...
?usp=3Dsharing">https://drive.google.com/file/d/0BwoPbcrMv8mvcjBCYVdFZWdX=
Tms/view?usp=3Dsharing</a><br
</div
<div><br
</div
<div>At this time there are 4 LUNs globally seen by the two
hosts but only 1 of them is currently configured as the only
storage domain in oVirt cluster.</div
<div><br
</div
<div
<div>[root@ovmsrv05 ~]# multipath -l | grep ^36</div
<div>3600a0b8000299aa80000d08b55014119 dm-5 IBM =A0 =A0 ,1814 =A0=
=A0
=A0FAStT=A0</div
<div>3600a0b80002999020000cd3c5501458f dm-3 IBM =A0 =A0 ,1814 =A0=
=A0
=A0FAStT=A0</div
<div>3600a0b80002999020000ccf855011198 dm-2 IBM =A0 =A0 ,1814 =A0=
=A0
=A0FAStT=A0</div
<div>3600a0b8000299aa80000d08955014098 dm-4 IBM =A0 =A0 ,1814 =A0=
=A0
=A0FAStT=A0</div
</div
<div><br
</div
<div>the configured
one:</div
<div
<div>[root@ovmsrv05 ~]# multipath -l
3600a0b8000299aa80000d08b55014119</div
<div>3600a0b8000299aa80000d08b55014119 dm-5 IBM =A0 =A0 ,1814 =A0=
=A0
=A0FAStT=A0</div
<div>size=3D4.0T
features=3D'0' hwhandler=3D'1 rdac' wp=3Drw</d=
iv
<div>|-+- policy=3D'service-time 0'
prio=3D0 status=3Dactive</d=
iv
<div>| |- 0:0:1:3 sdl 8:176 active undef
running</div
<div>| `- 2:0:1:3
sdp 8:240 active undef running</div
<div>`-+- policy=3D'service-time 0' prio=3D0 status=3Denabled</=
div
<div>=A0 |- 0:0:0:3 sdd 8:48 =A0active undef
running</div
<div>=A0 `- 2:0:0:3
sdi 8:128 active undef running</div
</div
<div><br
</div
<div>In mesages of booting node, arounf the problem registered
by the storage:</div
<div
<div>[root@ovmsrv05 ~]# grep owned
/var/log/messages</div
<div>Jan 31 10:27:38
ovmsrv05 kernel: scsi 0:0:0:1: rdac: LUN
1 (RDAC) (owned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:2: rdac: LUN
2 (RDAC) (owned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:3: rdac: LUN
3 (RDAC) (unowned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:1: rdac: LUN
1 (RDAC) (owned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:0:4: rdac: LUN
4 (RDAC) (unowned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:2: rdac: LUN
2 (RDAC) (owned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:1: rdac: LUN
1 (RDAC) (unowned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:3: rdac: LUN
3 (RDAC) (unowned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:0:4: rdac: LUN
4 (RDAC) (unowned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:2: rdac: LUN
2 (RDAC) (unowned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:1: rdac: LUN
1 (RDAC) (unowned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:3: rdac: LUN
3 (RDAC) (owned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:2: rdac: LUN
2 (RDAC) (unowned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 0:0:1:4: rdac: LUN
4 (RDAC) (owned)</div
<div>Jan 31 10:27:38 ovmsrv05 kernel: scsi 2:0:1:3: rdac: LUN
3 (RDAC) (owned)</div
<div>Jan 31 10:27:39 ovmsrv05 kernel: scsi 2:0:1:4: rdac: LUN
4 (RDAC) (owned)</div
</div
<div><br
</div
<div>I don't know exactly the meaning of owned/unowned in the
output above..</div
<div>Possibly it
detects the 0:0:1:3 and 2:0:1:3 paths (those of
the active group) as "owned" and this could have created
problems with the active node?</div
<div><br
</div
<div>On active node strangely I don't loose
all the paths, but
the VM has been paused anyway</div
<div><br
</div
<div
<div>[root@ovmsrv06 log]# grep "remaining active
path"=A0/var/log/messages=A0</div
<div>Jan 31 10:27:48 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths: 3<=
/div
<div>Jan 31 10:27:49 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths: 2<=
/div
<div>Jan 31 10:27:56 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths: 3<=
/div
<div>Jan 31 10:27:56 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths: 2<=
/div
<div>Jan 31 10:27:56 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths: 1<=
/div
<div>Jan 31 10:27:57 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths: 2<=
/div
<div>Jan 31 10:28:01 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths: 3<=
/div
<div>Jan 31 10:28:01 ovmsrv06 multipathd:
3600a0b8000299aa80000d08b55014119: remaining active paths: 4<=
/div
</div
<div><br
</div
<div>I'm not an expert of this storage array
in particular, and
of the rdac hardware handler in general.</div
<div><br
</div
<div>What I see is that multipath.conf on both
nodes:</div
<div><br
</div
<div
<div># VDSM REVISION 1.3</div
<div><br
</div
<div>defaults
{</div
<div>=A0 =A0
polling_interval =A0 =A0 =A0 =A0 =A0 =A05</div
<div>=A0 =A0 no_path_retry =A0 =A0 =A0 =A0 =A0 =A0 =A0 fail</di=
v
<div>=A0 =A0 user_friendly_names =A0 =A0 =A0
=A0 no</div
<div>=A0 =A0
flush_on_last_del =A0 =A0 =A0 =A0 =A0 yes</div
<div>=A0 =A0 fast_io_fail_tmo =A0 =A0 =A0 =A0 =A0 =A05</div
<div>=A0 =A0 dev_loss_tmo =A0 =A0 =A0 =A0 =A0
=A0 =A0 =A030</di=
v
<div>=A0 =A0 max_fds =A0 =A0 =A0 =A0 =A0 =A0
=A0 =A0 =A0 =A0 40=
96</div
<div>}</div
<div><br
</div
<div><br
</div
<div>devices {</div
<div>=A0 =A0 device
{</div
<div>=A0 =A0 =A0 =A0
# These settings overrides built-in device=
s
settings. It does not apply</div
<div>=A0 =A0 =A0 =A0 # to devices without built-in settings (th=
ese use
the settings in the</div
<div>=A0 =A0 =A0 =A0 # "defaults" section), or to devices defin=
ed in
the "devices" section.</div
<div>=A0 =A0 =A0 =A0 # Note: This is not available yet on Fedor=
a 21.
For more info see</div
<div>=A0 =A0 =A0 =A0 # <a moz-do-not-send=3D"true"
href=3D"https://bugzilla.redhat.com/1253799">https://bugzil=
la.redhat.com/1253799</a></div
<div>=A0 =A0 =A0 =A0 all_devs =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ye=
s</div
<div>=A0 =A0 =A0 =A0
no_path_retry =A0 =A0 =A0 =A0 =A0 fail</di=
v
<div>=A0 =A0 }</div
<div>}</div
</div
<div><br
</div
<div><br
</div
<div
<div>beginning of /proc/scsi/scsi</div
<div><br
</div
<div>[root@ovmsrv06 ~]# cat
/proc/scsi/scsi=A0</div
<div>Attached
devices:</div
<div>Host: scsi1
Channel: 01 Id: 00 Lun: 00</div
<div>=A0 Vendor: HP
=A0 =A0 =A0 Model: LOGICAL VOLUME =A0 Rev: =
1.86</div
<div>=A0 Type: =A0
Direct-Access =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0ANSI =A0SCSI
revision: 05</div
<div>Host: scsi0
Channel: 00 Id: 00 Lun: 01</div
<div>=A0 Vendor: IBM
=A0 =A0 =A0Model: 1814 =A0 =A0 =A0FAStT =A0=
Rev: 0916</div
<div>=A0 Type: =A0
Direct-Access =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0ANSI =A0SCSI
revision: 05</div
</div
<div>...</div
<div><br
</div
<div>To get default acquired config for this
storage:</div
<div
<div><br
</div
<div>multpathd
-k</div
<div>> show
config</div
<div><br
</div
<div>I can see:</div
<div><br
</div
<div>=A0 =A0 =A0 =A0 device {</div
<div>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 vendor "IBM"</div
<div>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 product
"^1814"</div
<div>=A0 =A0 =A0 =A0
=A0 =A0 =A0 =A0 product_blacklist "Univers=
al Xport"</div
<div>=A0 =A0 =A0 =A0
=A0 =A0 =A0 =A0 path_grouping_policy "grou=
p_by_prio"</div
<div>=A0 =A0 =A0 =A0
=A0 =A0 =A0 =A0 path_checker "rdac"</div
<div>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 features "0"</div
<div>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
hardware_handler "1 rdac"<=
/div
<div>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 prio
"rdac"</div
<div>=A0 =A0 =A0 =A0
=A0 =A0 =A0 =A0 failback immediate</div
<div>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rr_weight "uniform"</div
<div>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
no_path_retry "fail"</div
<div>=A0 =A0 =A0 =A0 }</div
<div><br
</div
<div><br
</div
<div>and</div
<div><br
</div
<div>defaults
{</div
<div>=A0 =A0 =A0 =A0
verbosity 2</div
<div>=A0 =A0 =A0 =A0
polling_interval 5</div
<div>=A0 =A0 =A0 =A0
max_polling_interval 20</div
<div>=A0 =A0 =A0 =A0
reassign_maps "yes"</div
<div>=A0 =A0 =A0 =A0 multipath_dir "/lib64/multipath"</div
<div>=A0 =A0 =A0 =A0 path_selector
"service-time 0"</div
<div>=A0 =A0 =A0 =A0
path_grouping_policy "failover"</div
<div>=A0 =A0 =A0 =A0 uid_attribute "ID_SERIAL"</div
<div>=A0 =A0 =A0 =A0 prio
"const"</div
<div>=A0 =A0 =A0 =A0
prio_args ""</div
<div>=A0 =A0 =A0 =A0
features "0"</div
<div>=A0 =A0 =A0 =A0
path_checker "directio"</div
<div>=A0 =A0 =A0 =A0 alias_prefix "mpath"</div
<div>=A0 =A0 =A0 =A0 failback
"manual"</div
<div>=A0 =A0 =A0 =A0
rr_min_io 1000</div
<div>=A0 =A0 =A0 =A0
rr_min_io_rq 1</div
<div>=A0 =A0 =A0 =A0
max_fds 4096</div
<div>=A0 =A0 =A0 =A0
rr_weight "uniform"</div
<div>=A0 =A0 =A0 =A0 no_path_retry "fail"</div
<div>=A0 =A0 =A0 =A0 queue_without_daemon
"no"</div
<div>=A0 =A0 =A0 =A0
flush_on_last_del "yes"</div
<div>=A0 =A0 =A0 =A0 user_friendly_names "no"</div
<div>=A0 =A0 =A0 =A0 fast_io_fail_tmo
5</div
<div>=A0 =A0 =A0 =A0
dev_loss_tmo 30</div
<div>=A0 =A0 =A0 =A0
bindings_file "/etc/multipath/bindings"</d=
iv
<div>=A0 =A0 =A0 =A0 wwids_file
/etc/multipath/wwids</div
<div>=A0 =A0 =A0 =A0
log_checker_err always</div
<div>=A0 =A0 =A0 =A0
find_multipaths no</div
<div>=A0 =A0 =A0 =A0
retain_attached_hw_handler no</div
<div>=A0 =A0 =A0 =A0 detect_prio no</div
<div>=A0 =A0 =A0 =A0 hw_str_match no</div
<div>=A0 =A0 =A0 =A0 force_sync no</div
<div>=A0 =A0 =A0 =A0 deferred_remove no</div
<div>=A0 =A0 =A0 =A0 ignore_new_boot_devs no</div
<div>=A0 =A0 =A0 =A0 skip_kpartx no</div
<div>=A0 =A0 =A0 =A0 config_dir "/etc/multipath/conf.d"</div
<div>=A0 =A0 =A0 =A0 delay_watch_checks
no</div
<div>=A0 =A0 =A0 =A0
delay_wait_checks no</div
<div>=A0 =A0 =A0 =A0
retrigger_tries 3</div
<div>=A0 =A0 =A0 =A0
retrigger_delay 10</div
<div>=A0 =A0 =A0 =A0
missing_uev_wait_timeout 30</div
<div>=A0 =A0 =A0 =A0 new_bindings_in_boot no</div
<div>}</div
<div><br
</div
</div
<div>Any hint on how
to tune multipath.conf so that a powering
on server doesn't create problems to running VMs?</div
<div><br
</div
<div>Thanks in
advance,</div
<div>Gianluca</div
</div
<br
<fieldset class=3D"mimeAttachmentHeader"></fieldset
<br
<pre wrap=3D"">_______________________________________________
Users mailing list
<a class=3D"moz-txt-link-abbreviated"
href=3D"mailto:Users@ovirt.org">Use=
rs(a)ovirt.org</a
<a
class=3D"moz-txt-link-freetext"
href=3D"http://lists.ovirt.org/mailman=
/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a
</pre
</blockquote
<br
<pre class=3D"moz-signature"
cols=3D"72">--=20
Nathana=EBl Blanchet
Supervision r=E9seau
P=F4le Infrastrutures Informatiques
227 avenue Professeur-Jean-Louis-Viala
34193 MONTPELLIER CEDEX 5 =09
T=E9l. 33 (0)4 67 54 84 55
Fax 33 (0)4 67 54 84 14
<a class=3D"moz-txt-link-abbreviated"
href=3D"mailto:blanchet@abes.fr">bl=
anchet(a)abes.fr</a> </pre
</body
</html
--------------D335D695CB22377475546514--