----- Original Message -----
From: "Boyan Tabakov" <blade(a)alslayer.net>
To: "Nir Soffer" <nsoffer(a)redhat.com>
Cc: users(a)ovirt.org
Sent: Wednesday, February 19, 2014 7:18:36 PM
Subject: Re: [Users] SD Disk's Logical Volume not visible/activated on some nodes
Hello,
On 19.2.2014, 17:09, Nir Soffer wrote:
> ----- Original Message -----
>> From: "Boyan Tabakov" <blade(a)alslayer.net>
>> To: users(a)ovirt.org
>> Sent: Tuesday, February 18, 2014 3:34:49 PM
>> Subject: [Users] SD Disk's Logical Volume not visible/activated on some
>> nodes
>>
>> Hello,
>>
>> I have ovirt 3.3 installed on on FC 19 hosts with vdsm 4.13.3-2.fc19.
>
> Which version of ovirt 3.3 is this? (3.3.2? 3.3.3?)
ovirt-engine is 3.3.2-1.fc19
>> One of the hosts (host1) is engine + node + SPM and the other host2 is
>> just a node. I have an iSCSI storage domain configured and accessible
>> from both nodes.
>>
>> When creating a new disk in the SD, the underlying logical volume gets
>> properly created (seen in vgdisplay output on host1), but doesn't seem
>> to be automatically picked by host2.
>
> How do you know it is not seen on host2?
It's not present in the output of vgdisplay -v nor vgs.
>
>> Consequently, when creating/booting
>> a VM with the said disk attached, the VM fails to start on host2,
>> because host2 can't see the LV. Similarly, if the VM is started on
>> host1, it fails to migrate to host2. Extract from host2 log is in the
>> end. The LV in question is 6b35673e-7062-4716-a6c8-d5bf72fe3280.
>>
>> As far as I could track quickly the vdsm code, there is only call to lvs
>> and not to lvscan or lvchange so the host2 LVM doesn't fully refresh.
>> The only workaround so far has been to restart VDSM on host2, which
>> makes it refresh all LVM data properly.
>>
>> When is host2 supposed to pick up any newly created LVs in the SD VG?
>> Any suggestions where the problem might be?
>
> When you create a new lv on the shared storage, the new lv should be
> visible on the other host. Lets start by verifying that you do see
> the new lv after a disk was created.
>
> Try this:
>
> 1. Create a new disk, and check the disk uuid in the engine ui
> 2. On another machine, run this command:
>
> lvs -o vg_name,lv_name,tags
>
> You can identify the new lv using tags, which should contain the new disk
> uuid.
>
> If you don't see the new lv from the other host, please provide
> /var/log/messages
> and /var/log/sanlock.log.
Just tried that. The disk is not visible on the non-SPM node.
This means that storage is not accessible from this host.
On the SPM node (where the LV is visible in vgs output):
Feb 19 19:10:43 host1 vdsm root WARNING File:
/rhev/data-center/61f15cc0-8bba-482d-8a81-cd636a581b58/3307f6fa-dd58-43db-ab23-b1fb299006c7/images/4d15543c-4c45-4c23-bbe3-f10b9084472a/3e0ce8cb-3740-49d7-908e-d025875ac9a2
already removed
Feb 19 19:10:45 host1 multipathd: dm-65: remove map (uevent)
Feb 19 19:10:45 host1 multipathd: dm-65: devmap not registered, can't remove
Feb 19 19:10:45 host1 multipathd: dm-65: remove map (uevent)
Feb 19 19:10:54 host1 kernel: [1652684.864746] dd: sending ioctl
80306d02 to a partition!
Feb 19 19:10:54 host1 kernel: [1652684.963931] dd: sending ioctl
80306d02 to a partition!
No recent entries in sanlock.log on the SPM node.
On the non-SPM node (the one that doesn't show the LV in vgs output),
there are no relevant entries in /var/log/messages.
Strange - sanlock errors are logged to /var/log/messages. It would be helpful if
you attach this log - we may find something in it.
Here's the full
sanlock.log for that host:
2014-01-30 16:28:09+0200 1324 [2335]: sanlock daemon started 2.8 host
18bd0a27-c280-4007-98f2-d2e7e73cd8b5.xenon.futu
2014-01-30 16:59:51+0200 5 [609]: sanlock daemon started 2.8 host
4a7627e2-296a-4e48-a7e2-f6bcecac07ab.xenon.futu
2014-01-31 09:51:43+0200 60717 [614]: s1 lockspace
3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids:0
2014-01-31 16:03:51+0200 83045 [613]: s1:r1 resource
3307f6fa-dd58-43db-ab23-b1fb299006c7:SDM:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/leases:1048576
for 8,16,30268
2014-01-31 16:18:01+0200 83896 [614]: s1:r2 resource
3307f6fa-dd58-43db-ab23-b1fb299006c7:SDM:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/leases:1048576
for 8,16,30268
2014-02-06 05:24:10+0200 563065 [31453]: 3307f6fa aio timeout 0
0x7fc37c0008c0:0x7fc37c0008d0:0x7fc391f5f000 ioto 10 to_count 1
2014-02-06 05:24:10+0200 563065 [31453]: s1 delta_renew read rv -202
offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids
Sanlock cannot write to the ids lockspace
2014-02-06 05:24:10+0200 563065 [31453]: s1 renewal error -202
delta_length 10 last_success 563034
2014-02-06 05:24:21+0200 563076 [31453]: 3307f6fa aio timeout 0
0x7fc37c000910:0x7fc37c000920:0x7fc391d5c000 ioto 10 to_count 2
2014-02-06 05:24:21+0200 563076 [31453]: s1 delta_renew read rv -202
offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids
2014-02-06 05:24:21+0200 563076 [31453]: s1 renewal error -202
delta_length 11 last_success 563034
2014-02-06 05:24:32+0200 563087 [31453]: 3307f6fa aio timeout 0
0x7fc37c000960:0x7fc37c000970:0x7fc391c5a000 ioto 10 to_count 3
2014-02-06 05:24:32+0200 563087 [31453]: s1 delta_renew read rv -202
offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids
2014-02-06 05:24:32+0200 563087 [31453]: s1 renewal error -202
delta_length 11 last_success 563034
2014-02-06 05:24:40+0200 563094 [609]: s1 check_our_lease warning 60
last_success 563034
2014-02-06 05:24:41+0200 563095 [609]: s1 check_our_lease warning 61
last_success 563034
2014-02-06 05:24:42+0200 563096 [609]: s1 check_our_lease warning 62
last_success 563034
2014-02-06 05:24:43+0200 563097 [609]: s1 check_our_lease warning 63
last_success 563034
2014-02-06 05:24:43+0200 563098 [31453]: 3307f6fa aio timeout 0
0x7fc37c0009b0:0x7fc37c0009c0:0x7fc391b58000 ioto 10 to_count 4
2014-02-06 05:24:43+0200 563098 [31453]: s1 delta_renew read rv -202
offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids
2014-02-06 05:24:43+0200 563098 [31453]: s1 renewal error -202
delta_length 11 last_success 563034
2014-02-06 05:24:44+0200 563098 [609]: s1 check_our_lease warning 64
last_success 563034
2014-02-06 05:24:45+0200 563099 [609]: s1 check_our_lease warning 65
last_success 563034
2014-02-06 05:24:46+0200 563100 [609]: s1 check_our_lease warning 66
last_success 563034
2014-02-06 05:24:47+0200 563101 [609]: s1 check_our_lease warning 67
last_success 563034
2014-02-06 05:24:48+0200 563102 [609]: s1 check_our_lease warning 68
last_success 563034
2014-02-06 05:24:49+0200 563103 [609]: s1 check_our_lease warning 69
last_success 563034
2014-02-06 05:24:50+0200 563104 [609]: s1 check_our_lease warning 70
last_success 563034
2014-02-06 05:24:51+0200 563105 [609]: s1 check_our_lease warning 71
last_success 563034
2014-02-06 05:24:52+0200 563106 [609]: s1 check_our_lease warning 72
last_success 563034
2014-02-06 05:24:53+0200 563107 [609]: s1 check_our_lease warning 73
last_success 563034
2014-02-06 05:24:54+0200 563108 [609]: s1 check_our_lease warning 74
last_success 563034
2014-02-06 05:24:54+0200 563109 [31453]: s1 delta_renew read rv -2
offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids
2014-02-06 05:24:54+0200 563109 [31453]: s1 renewal error -2
delta_length 11 last_success 563034
2014-02-06 05:24:55+0200 563109 [609]: s1 check_our_lease warning 75
last_success 563034
2014-02-06 05:24:56+0200 563110 [609]: s1 check_our_lease warning 76
last_success 563034
2014-02-06 05:24:57+0200 563111 [609]: s1 check_our_lease warning 77
last_success 563034
2014-02-06 05:24:58+0200 563112 [609]: s1 check_our_lease warning 78
last_success 563034
2014-02-06 05:24:59+0200 563113 [609]: s1 check_our_lease warning 79
last_success 563034
2014-02-06 05:25:00+0200 563114 [609]: s1 check_our_lease failed 80
2014-02-06 05:25:00+0200 563114 [609]: s1 all pids clear
2014-02-06 05:25:05+0200 563119 [31453]: s1 delta_renew read rv -2
offset 0 /dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids
2014-02-06 05:25:05+0200 563119 [31453]: s1 renewal error -2
delta_length 10 last_success 563034
2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 0
0x7fc37c0008c0 busy
2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 1
0x7fc37c000910 busy
2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 2
0x7fc37c000960 busy
2014-02-06 05:25:15+0200 563129 [31453]: 3307f6fa close_task_aio 3
0x7fc37c0009b0 busy
2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 0
0x7fc37c0008c0 busy
2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 1
0x7fc37c000910 busy
2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 2
0x7fc37c000960 busy
2014-02-06 05:25:25+0200 563139 [31453]: 3307f6fa close_task_aio 3
0x7fc37c0009b0 busy
2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 0
0x7fc37c0008c0 busy
2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 1
0x7fc37c000910 busy
2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 2
0x7fc37c000960 busy
2014-02-06 05:25:35+0200 563149 [31453]: 3307f6fa close_task_aio 3
0x7fc37c0009b0 busy
2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 0
0x7fc37c0008c0 busy
2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 1
0x7fc37c000910 busy
2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 2
0x7fc37c000960 busy
2014-02-06 05:25:45+0200 563159 [31453]: 3307f6fa close_task_aio 3
0x7fc37c0009b0 busy
2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 0
0x7fc37c0008c0 busy
2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 1
0x7fc37c000910 busy
2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 2
0x7fc37c000960 busy
2014-02-06 05:25:55+0200 563169 [31453]: 3307f6fa close_task_aio 3
0x7fc37c0009b0 busy
2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect
0x7fc37c0008c0:0x7fc37c0008d0:0x7fc391f5f000 result 1048576:0 close free
2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect
0x7fc37c000910:0x7fc37c000920:0x7fc391d5c000 result 1048576:0 close free
2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect
0x7fc37c000960:0x7fc37c000970:0x7fc391c5a000 result 1048576:0 close free
2014-02-06 05:25:59+0200 563173 [31453]: 3307f6fa aio collect
0x7fc37c0009b0:0x7fc37c0009c0:0x7fc391b58000 result 1048576:0 close free
2014-02-06 05:26:09+0200 563183 [614]: s2 lockspace
3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids:0
Sanlock can access the storage again - did you change something around this time?
2014-02-18 14:22:16+0200 1632150 [614]: s3 lockspace
3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids:0
2014-02-18 14:38:16+0200 1633111 [614]: s4 lockspace
3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids:0
2014-02-18 16:13:07+0200 1638801 [613]: s5 lockspace
3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids:0
2014-02-18 19:28:09+0200 1650504 [614]: s6 lockspace
3307f6fa-dd58-43db-ab23-b1fb299006c7:2:/dev/3307f6fa-dd58-43db-ab23-b1fb299006c7/ids:0
Last entry is from yesterday, while I just created a new disk.
What was the status of this host in the engine from 2014-02-06 05:24:10+0200 to 2014-02-18
14:22:16?
vdsm.log and engine.log for this time frame will make it more clear.
Nir