Re: [ovirt-users] Host remains in contending for storage pool manager

newer
vdsm and selinux

older
using ovirt-scheduler-proxy

Gianluca Cecchi

4 Nov 2014 4 Nov '14

5:40 p.m.

On Tue, Nov 4, 2014 at 11:39 AM, Itamar Heim <iheim@redhat.com> wrote:

...

On 10/27/2014 06:48 PM, Gianluca Cecchi wrote:

...
Hello, an iSCSI SD went down and it seems it is not able to come up again.

From an iSCSI point of view all seems ok..

[root@ovnode04 vdsm]# iscsiadm -m session -P 1 Target: iqn.2014-07.local.localdomain:store1 Current Portal: 10.10.1.71:3260 <http://10.10.1.71:3260>,1 Persistent Portal: 10.10.1.71:3260 <http://10.10.1.71:3260>,1

********** Interface: ********** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:5d9b31319a8e Iface IPaddress: 10.10.1.61 Iface HWaddress: <empty> Iface Netdev: <empty> SID: 1 iSCSI Connection State: LOGGED IN iSCSI Session State: LOGGED_IN Internal iscsid Session State: NO CHANGE

[root@ovnode04 vdsm]# multipath -l 1p_iscsi_store1_l dm-2 IET,VIRTUAL-DISK size=200G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:0:0:1 sdb 8:16 active undef running

Access to the lun seems ok: [root@ovnode04 ~]# time dd if=/dev/mapper/1p_iscsi_store1_l of=/dev/null bs=1024k count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 9.24202 s, 116 MB/s

real0m9.247s user0m0.002s sys0m0.463s

But it continues to remain in contending and never comes to master.

Failed to activate Storage Domain istore1 (Data Center iscsidc) by ovadmin

I only have this host in this cluster. What to do? Already tried to restart both engine service on engine host and vdsmd service on the host.

Any other commands to check anything?

Gianluca

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

resolved?

yes, thanks. I had to restart also host itself to regain access to SD.... One note about the SD: it is iSCSI configured as sw iscsi target in HA with Pacemaker and drbd. I had a problem on one node and apparently this was not so transparent to oVirt host. I didn't have time to check more deeply.... Gianluca

Attachments:

attachment.html (text/html — 3.5 KB)

Show replies by date

Arman Khalatyan

4 Nov 4 Nov

6:34 p.m.

New subject: Host remains in contending for storage pool manager

I will he I.teresting to see your iscsi setup with drbd. Did you got splitbrain before failure? Did you check if your target went to readonly mode? Thanks Arman. On Nov 4, 2014 5:40 PM, "Gianluca Cecchi" <gianluca.cecchi@gmail.com> wrote:

...

On Tue, Nov 4, 2014 at 11:39 AM, Itamar Heim <iheim@redhat.com> wrote:

...
On 10/27/2014 06:48 PM, Gianluca Cecchi wrote:

...
Hello, an iSCSI SD went down and it seems it is not able to come up again.

From an iSCSI point of view all seems ok..

[root@ovnode04 vdsm]# iscsiadm -m session -P 1 Target: iqn.2014-07.local.localdomain:store1 Current Portal: 10.10.1.71:3260 <http://10.10.1.71:3260>,1 Persistent Portal: 10.10.1.71:3260 <http://10.10.1.71:3260>,1

********** Interface: ********** Iface Name: default Iface Transport: tcp Iface Initiatorname: iqn.1994-05.com.redhat:5d9b31319a8e Iface IPaddress: 10.10.1.61 Iface HWaddress: <empty> Iface Netdev: <empty> SID: 1 iSCSI Connection State: LOGGED IN iSCSI Session State: LOGGED_IN Internal iscsid Session State: NO CHANGE

[root@ovnode04 vdsm]# multipath -l 1p_iscsi_store1_l dm-2 IET,VIRTUAL-DISK size=200G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 1:0:0:1 sdb 8:16 active undef running

Access to the lun seems ok: [root@ovnode04 ~]# time dd if=/dev/mapper/1p_iscsi_store1_l of=/dev/null bs=1024k count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 9.24202 s, 116 MB/s

real0m9.247s user0m0.002s sys0m0.463s

But it continues to remain in contending and never comes to master.

Failed to activate Storage Domain istore1 (Data Center iscsidc) by ovadmin

I only have this host in this cluster. What to do? Already tried to restart both engine service on engine host and vdsmd service on the host.

Any other commands to check anything?

Gianluca

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

resolved?

yes, thanks. I had to restart also host itself to regain access to SD....

One note about the SD: it is iSCSI configured as sw iscsi target in HA with Pacemaker and drbd. I had a problem on one node and apparently this was not so transparent to oVirt host. I didn't have time to check more deeply....

Gianluca

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Gianluca Cecchi

5 Nov 5 Nov

10:30 a.m.

New subject: Host remains in contending for storage pool manager

On Tue, Nov 4, 2014 at 6:34 PM, Arman Khalatyan <arm2arm@gmail.com> wrote:

...

I will he I.teresting to see your iscsi setup with drbd. Did you got splitbrain before failure? Did you check if your target went to readonly mode? Thanks Arman.

I used some information provided here, even if it is with CentOS 5.7 and lvm on top of drbd, while in my setup I have CentOS 6.5 and drbd on top of lvm: http://blogs.mindspew-age.com/2012/04/05/adventures-in-high-availability-ha-... - my drbd resource definition for iSCSI HA: [root@srvmgmt01 ~]# cat iscsiha.res resource iscsiha { disk { disk-flushes no; md-flushes no; fencing resource-and-stonith; } device minor 2; disk /dev/iscsihavg/iscsihalv; syncer { rate 30M; verify-alg md5; } handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } on srvmgmt01.localdomain.local { address 192.168.230.51:7790; meta-disk internal; } on srvmgmt02.localdomain.local { address 192.168.230.52:7790; meta-disk internal; } } - tgtd is setup to start on both nodes at startup iscsi and iscsid services configured to off - Put the agents iSCSILogicalUnit e iSCSITarget under /usr/lib/ocf/resource.d/heartbeat/ on both nodes downloaded from here, as they are not provided in plain CentOS: http://linux-ha.org/doc/man-pages/re-ra-iSCSITarget.html - Here below the pcs steps to create the group: pcs cluster cib iscsiha_cfg pcs -f iscsiha_cfg resource create p_drbd_iscsiha ocf:linbit:drbd drbd_resource=iscsiha \ op monitor interval="29s" role="Master" timeout="30" op monitor interval="31s" \ role="Slave" timeout="30" op start interval="0" timeout="240" op stop interval="0" timeout="100" pcs -f iscsiha_cfg resource master ms_drbd_iscsiha p_drbd_iscsiha \ master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true pcs -f iscsiha_cfg resource create p_iscsi_store1 ocf:heartbeat:iSCSITarget \ params implementation="tgt" iqn="iqn.2014-07.local.localdomain:store1" tid="1" \ allowed_initiators="10.10.1.61 10.10.1.62 10.10.1.63" incoming_username="iscsiuser" incoming_password="iscsipwd" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ op monitor interval="30" timeout="60" pcs -f iscsiha_cfg resource create p_iscsi_store1_lun1 ocf:heartbeat:iSCSILogicalUnit \ params implementation="tgt" target_iqn="iqn.2014-07.local.localdomain:store1" lun="1" \ path="/dev/drbd/by-res/iscsiha" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ op monitor interval="30" timeout="60" pcs -f iscsiha_cfg resource create p_ip_iscsi ocf:heartbeat:IPaddr2 \ params ip="10.10.1.71" \ op start interval="0" timeout="20" \ op stop interval="0" timeout="20" \ op monitor interval="30" timeout="20" pcs -f iscsiha_cfg resource create p_portblock-store1-block ocf:heartbeat:portblock \ params ip="10.10.1.71" portno="3260" protocol="tcp" action="block" pcs -f iscsiha_cfg resource create p_portblock-store1-unblock ocf:heartbeat:portblock \ params ip="10.10.1.71" portno="3260" protocol="tcp" action="unblock" \ op monitor interval="30s" pcs -f iscsiha_cfg resource group add g_iscsiha p_portblock-store1-block p_ip_iscsi p_iscsi_store1 \ p_iscsi_store1_lun1 p_portblock-store1-unblock pcs -f iscsiha_cfg constraint colocation add Started g_iscsiha with Master ms_drbd_iscsiha INFINITY pcs -f iscsiha_cfg constraint order promote ms_drbd_iscsiha then start g_iscsiha pcs cluster cib-push iscsiha_cfg - output of "crm_mon -1" Resource Group: g_iscsiha p_portblock-store1-block (ocf::heartbeat:portblock): Started srvmgmt01.localdomain.local p_ip_iscsi (ocf::heartbeat:IPaddr2): Started srvmgmt01.localdomain.local p_iscsi_store1 (ocf::heartbeat:iSCSITarget): Started srvmgmt01.localdomain.local p_iscsi_store1_lun1 (ocf::heartbeat:iSCSILogicalUnit): Started srvmgmt01.localdomain.local p_portblock-store1-unblock (ocf::heartbeat:portblock): Started srvmgmt01.localdomain.local - output of tgtadm on both nodes while srvmgmt01 is active for the group [root@srvmgmt01 ~]# tgtadm --mode target --op show Target 1: iqn.2014-07.local.localdomain:store1 System information: Driver: iscsi State: ready I_T nexus information: LUN information: LUN: 0 Type: controller SCSI ID: IET 00010000 SCSI SN: beaf10 Size: 0 MB, Block size: 1 Online: Yes Removable media: No Prevent removal: No Readonly: No Backing store type: null Backing store path: None Backing store flags: LUN: 1 Type: disk SCSI ID: p_iscsi_store1_l SCSI SN: 66666a41 Size: 214738 MB, Block size: 512 Online: Yes Removable media: No Prevent removal: No Readonly: No Backing store type: rdwr Backing store path: /dev/drbd/by-res/iscsiha Backing store flags: Account information: iscsiuser ACL information: 10.10.1.61 10.10.1.62 10.10.1.63 on the passive node: [root@srvmgmt02 heartbeat]# tgtadm --mode target --op show [root@srvmgmt02 heartbeat]# TBV performance and tuning values taken from here: http://www.dbarticles.com/centos-6-iscsi-tgtd-setup-and-performance-adjustme... my cluster is basic for testing so not critical for my environment... at the momento only 1Gbit/s network and one adapter for drbd replica and one for iSCSI traffic Tested with some I/O basic benchmarks on VM insisting on the SD and I got about 90-95MB/s on both drbd and iSCSI networks. Also relocation of iSCSI service while benchmark active seemed not to cause problems with SD and VM. - I also enabled iptables on cluster nodes so that the initiators (oVirt hosts) could connect to the ip alias dedicated to iSCSI servicing: in /etc/sysconfig/iptables: # iSCSI -A INPUT -p tcp -m tcp -d 10.10.1.71 --dport 3260 -j ACCEPT I have to recheck the logs to give exact scenario of what happened causing the problem.... not being a critical system is not so well monitored at the moment... comments welcome Gianluca

4046

Age (days ago)

4047

Last active (days ago)

List overview

Download

2 comments

2 participants

participants (2)

Arman Khalatyan
Gianluca Cecchi