On Tue, Nov 4, 2014 at 6:34 PM, Arman Khalatyan <arm2arm@gmail.com> wrote:

I will he I.teresting to see your iscsi setup with drbd. Did you got splitbrain before failure?
Did you check if your target went to readonly mode?
Thanks
Arman.

I used some information provided here, even if it is with CentOS 5.7 and lvm on top of drbd, while in my setup I have CentOS 6.5 and drbd on top of lvm:

http://blogs.mindspew-age.com/2012/04/05/adventures-in-high-availability-ha-iscsi-with-drbd-iscsi-and-pacemaker/

- my drbd resource definition for iSCSI HA:

[root@srvmgmt01 ~]# cat iscsiha.res

resource iscsiha {

disk {

disk-flushes no;

md-flushes no;

fencing resource-and-stonith;

}

device minor 2;

disk /dev/iscsihavg/iscsihalv;

syncer {

rate 30M;

verify-alg md5;

}

handlers {

fence-peer "/usr/lib/drbd/crm-fence-peer.sh";

after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";

}

on srvmgmt01.localdomain.local {

address 192.168.230.51:7790;

meta-disk internal;

}

on srvmgmt02.localdomain.local {

address 192.168.230.52:7790;

meta-disk internal;

}

- tgtd is setup to start on both nodes at startup

iscsi and iscsid services configured to off

- Put the agents iSCSILogicalUnit e iSCSITarget under

/usr/lib/ocf/resource.d/heartbeat/ on both nodes

downloaded from here, as they are not provided in plain CentOS:

http://linux-ha.org/doc/man-pages/re-ra-iSCSITarget.html

- Here below the pcs steps to create the group:

pcs cluster cib iscsiha_cfg

pcs -f iscsiha_cfg resource create p_drbd_iscsiha ocf:linbit:drbd drbd_resource=iscsiha \

op monitor interval="29s" role="Master" timeout="30" op monitor interval="31s" \

role="Slave" timeout="30" op start interval="0" timeout="240" op stop interval="0" timeout="100"

pcs -f iscsiha_cfg resource master ms_drbd_iscsiha p_drbd_iscsiha \

master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

pcs -f iscsiha_cfg resource create p_iscsi_store1 ocf:heartbeat:iSCSITarget \

params implementation="tgt" iqn="iqn.2014-07.local.localdomain:store1" tid="1" \

allowed_initiators="10.10.1.61 10.10.1.62 10.10.1.63" incoming_username="iscsiuser" incoming_password="iscsipwd" \

op start interval="0" timeout="60" \

op stop interval="0" timeout="60" \

op monitor interval="30" timeout="60"

pcs -f iscsiha_cfg resource create p_iscsi_store1_lun1 ocf:heartbeat:iSCSILogicalUnit \

params implementation="tgt" target_iqn="iqn.2014-07.local.localdomain:store1" lun="1" \

path="/dev/drbd/by-res/iscsiha" \

op start interval="0" timeout="60" \

op stop interval="0" timeout="60" \

op monitor interval="30" timeout="60"

pcs -f iscsiha_cfg resource create p_ip_iscsi ocf:heartbeat:IPaddr2 \

params ip="10.10.1.71" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor interval="30" timeout="20"

pcs -f iscsiha_cfg resource create p_portblock-store1-block ocf:heartbeat:portblock \

params ip="10.10.1.71" portno="3260" protocol="tcp" action="block"

pcs -f iscsiha_cfg resource create p_portblock-store1-unblock ocf:heartbeat:portblock \

params ip="10.10.1.71" portno="3260" protocol="tcp" action="unblock" \

op monitor interval="30s"

pcs -f iscsiha_cfg resource group add g_iscsiha p_portblock-store1-block p_ip_iscsi p_iscsi_store1 \

p_iscsi_store1_lun1 p_portblock-store1-unblock

pcs -f iscsiha_cfg constraint colocation add Started g_iscsiha with Master ms_drbd_iscsiha INFINITY

pcs -f iscsiha_cfg constraint order promote ms_drbd_iscsiha then start g_iscsiha

pcs cluster cib-push iscsiha_cfg

- output of "crm_mon -1"

Resource Group: g_iscsiha

p_portblock-store1-block (ocf::heartbeat:portblock): Started srvmgmt01.localdomain.local

p_ip_iscsi (ocf::heartbeat:IPaddr2): Started srvmgmt01.localdomain.local

p_iscsi_store1 (ocf::heartbeat:iSCSITarget): Started srvmgmt01.localdomain.local

p_iscsi_store1_lun1 (ocf::heartbeat:iSCSILogicalUnit): Started srvmgmt01.localdomain.local

p_portblock-store1-unblock (ocf::heartbeat:portblock): Started srvmgmt01.localdomain.local

- output of tgtadm on both nodes while srvmgmt01 is active for the group

[root@srvmgmt01 ~]# tgtadm --mode target --op show

Target 1: iqn.2014-07.local.localdomain:store1

System information:

Driver: iscsi

State: ready

I_T nexus information:

LUN information:

LUN: 0

Type: controller

SCSI ID: IET 00010000

SCSI SN: beaf10

Size: 0 MB, Block size: 1

Online: Yes

Removable media: No

Prevent removal: No

Readonly: No

Backing store type: null

Backing store path: None

Backing store flags:

LUN: 1

Type: disk

SCSI ID: p_iscsi_store1_l

SCSI SN: 66666a41

Size: 214738 MB, Block size: 512

Online: Yes

Removable media: No

Prevent removal: No

Readonly: No

Backing store type: rdwr

Backing store path: /dev/drbd/by-res/iscsiha

Backing store flags:

Account information:

iscsiuser

ACL information:

10.10.1.61

10.10.1.62

10.10.1.63

on the passive node:

[root@srvmgmt02 heartbeat]# tgtadm --mode target --op show

[root@srvmgmt02 heartbeat]#

TBV performance and tuning values taken from here:

http://www.dbarticles.com/centos-6-iscsi-tgtd-setup-and-performance-adjustments/

my cluster is basic for testing so not critical for my environment...

at the momento only 1Gbit/s network and one adapter for drbd replica and one for iSCSI traffic

Tested with some I/O basic benchmarks on VM insisting on the SD and I got about 90-95MB/s on both drbd and iSCSI networks. Also relocation of iSCSI service while benchmark active seemed not to cause problems with SD and VM.

- I also enabled iptables on cluster nodes so that the initiators (oVirt hosts) could connect to the ip alias dedicated to iSCSI servicing:

in /etc/sysconfig/iptables:

# iSCSI

-A INPUT -p tcp -m tcp -d 10.10.1.71 --dport 3260 -j ACCEPT

I have to recheck the logs to give exact scenario of what happened causing the problem.... not being a critical system is not so well monitored at the moment...

comments welcome

Gianluca