--------------020403010102000502070801
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: quoted-printable
Federico, thank you for your help so far. Lots of more information below=
=2E
On 1/27/2014 4:46 PM, Federico Simoncelli wrote:
----- Original Message -----
> From: "Ted Miller" <tmiller(a)hcjb.org>
>
> On 1/27/2014 3:47 AM, Federico Simoncelli wrote:
>> Maybe someone from gluster can identify easily what happened. Meanwhi=
le
if
>> you just want to repair your data-center you could try with:
>>
>> $ cd /rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16=
-40dc-ac67-13d387c6eb4c/dom_md/
>> $ touch ids
>> $ sanlock direct init -s 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:id=
s:1048576
I tried your suggestion, and it helped, but it was not enough.
[root@office4a ~]$ cd
/rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac=
67-13d387c6eb4c/dom_md/
[root@office4a dom_md]$ touch ids
[root@office4a dom_md]$ sanlock direct init -s 0322a407-2b16-40dc-ac6=
7-13d387c6eb4c:0:ids:1048576
init done 0
Let me explain a little.
When the problem originally happened, the sanlock.log started having -223=
=20
error messages. 10 seconds later the log switched from -223 messages to =
-90=20
messages. Running your little script changed the error from -90 back to =
-223.
I hope you can send me another script that will get rid of the -223 messa=
ges.
Here is the sanlock.log as I ran your script:
2014-01-27 19:40:41-0500 39281 [3803]: s13 lockspace 0322a407-2b16-40=
dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/032=
2a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
2014-01-27 19:40:41-0500 39281 [22751]: 0322a407 aio collect 0 0x7f54=
240008c0:0x7f54240008d0:0x7f5424101000 result 0:0 match len 512
2014-01-27 19:40:41-0500 39281 [22751]: read_sectors delta_leader off=
set 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b=
16-40dc-ac67-13d387c6eb4c/dom_md/ids
2014-01-27 19:40:42-0500 39282 [3803]: s13 add_lockspace fail result =
-90
2014-01-27 19:40:47-0500 39287 [3803]: s14 lockspace 0322a407-2b16-40=
dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/032=
2a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
2014-01-27 19:40:47-0500 39287 [22795]: 0322a407 aio collect 0 0x7f54=
240008c0:0x7f54240008d0:0x7f5424101000 result 0:0 match len 512
2014-01-27 19:40:47-0500 39287 [22795]: read_sectors delta_leader off=
set 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b=
16-40dc-ac67-13d387c6eb4c/dom_md/ids
2014-01-27 19:40:48-0500 39288 [3803]: s14 add_lockspace fail result =
-90
2014-01-27 19:40:56-0500 39296 [3802]: s15 lockspace 0322a407-2b16-40=
dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/032=
2a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
2014-01-27 19:40:56-0500 39296 [22866]: verify_leader 2 wrong magic 0=
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-1=
3d387c6eb4c/dom_md/ids
2014-01-27 19:40:56-0500 39296 [22866]: leader1 delta_acquire_begin e=
rror -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2
2014-01-27 19:40:56-0500 39296 [22866]: leader2 path /rhev/data-cente=
r/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_m=
d/ids offset 0
2014-01-27 19:40:56-0500 39296 [22866]: leader3 m 0 v 0 ss 0 nh 0 mh =
0 oi 0 og 0 lv 0
2014-01-27 19:40:56-0500 39296 [22866]: leader4 sn rn ts 0 cs 0
2014-01-27 19:40:57-0500 39297 [3802]: s15 add_lockspace fail result =
-223
2014-01-27 19:40:57-0500 39297 [3802]: s16 lockspace 0322a407-2b16-40=
dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/032=
2a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
2014-01-27 19:40:57-0500 39297 [22870]: verify_leader 2 wrong magic 0=
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-1=
3d387c6eb4c/dom_md/ids
2014-01-27 19:40:57-0500 39297 [22870]: leader1 delta_acquire_begin e=
rror -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2
2014-01-27 19:40:57-0500 39297 [22870]: leader2 path /rhev/data-cente=
r/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_m=
d/ids offset 0
2014-01-27 19:40:57-0500 39297 [22870]: leader3 m 0 v 0 ss 0 nh 0 mh =
0 oi 0 og 0 lv 0
2014-01-27 19:40:57-0500 39297 [22870]: leader4 sn rn ts 0 cs 0
2014-01-27 19:40:58-0500 39298 [3802]: s16 add_lockspace fail result =
-223
2014-01-27 19:41:07-0500 39307 [3802]: s17 lockspace 0322a407-2b16-40=
dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/032=
2a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
Unfortunately, I think the error looks about the same to vdsm, because=20
/var/log/messages shows the same two lines in the calling scripts on the =
callback lists (66 & 425, if I remember right).
When I get up in the morning, I will be looking for another magic potion =
from=20
your pen. :)
> Federico,
>
> I won't be able to do anything to the ovirt setup for another 5 hours =
or
so
> (it is a trial system I am working on at home, I am at work),
but I w=
ill try
> your repair script and report back.
>
> In bugzilla 862975 they suggested turning off write-behind caching and=
"eager
> locking" on the gluster volume to avoid/reduce the problems
that come =
from
> many different computers all writing to the same file(s) on a
very fre=
quent
> basis. If I interpret the comment in the bug correctly, it did
seem t=
o help
> in that situation. My situation is a little different. My
gluster se=
tup is
> replicate only, replica 3 (though there are only two hosts). I
was no=
t
> stress-testing it, I was just using it, trying to figure out how
I can=
import
> some old VMWare VMs without an ESXi server to run them on.
Have you done anything similar to what is described here in comment 21?=
I have no idea, because that bug is so dangerous that the general public =
is=20
not allowed to even see it. I get an "You are not authorized to access .=
=2E.."=20
message. :) However, see just below.
When did you realize that you weren't able to use the data-center
anymo=
re?
Can you describe exactly what you did and what happened, for
example:
1. I created the data center (up and running)
2. I tried to import some VMs from VMWare
3. During the import (or after it) the data-center went in the contendi=
ng state
...
Did something special happened?
It took me a few hours of going back through and matching times on logs t=
o=20
figure it out, as it has been a week since it happened. When I saw some=20
things in the logs, I finally remembered the circumstances. I will summa=
rize=20
it for you, because I think this is what matters. If you want actual log=
s,=20
tell me which logs and for which events, and I will package them up for y=
ou.
The host suspended all VMs because gluster hit the 90% full limit. It too=
k me=20
a while to figure out what was going on. I expanded the logical volume, =
but=20
the Data Center would not come back up.
The GUI events log says:
Invalid status on Data Center mill. Setting Data Center status to Non=
Responsive (On host office4a, Error: Network error during communications=
with the Host.).
The host /var/log/messages says:
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 5213=
89 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD=
/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 5213=
89 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-a=
c67-13d387c6eb4c host_id 2
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 5213=
89 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0=
322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 5213=
89 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 5213=
89 [30933]: leader4 sn rn ts 0 cs 0
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 5213=
89 [30933]: s123 delta_renew verify_leader error -223
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 5213=
89 [30933]: s123 renewal error -223 delta_length 0 last_success 521369
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [3362]: s123 check_our_lease corrupt -223
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [3362]: s123 kill 32599 sig 15 count 1
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD=
/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-a=
c67-13d387c6eb4c host_id 2
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0=
322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: leader4 sn rn ts 0 cs 0
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: s123 delta_renew verify_leader error -223
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: s123 renewal error -223 delta_length 0 last_success 521369
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD=
/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-a=
c67-13d387c6eb4c host_id 2
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0=
322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: leader4 sn rn ts 0 cs 0
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: s123 delta_renew verify_leader error -223
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 5213=
90 [30933]: s123 renewal error -223 delta_length 0 last_success 521369
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 5213=
91 [3362]: s123 kill 32599 sig 15 count 2
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 5213=
91 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD=
/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 5213=
91 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-a=
c67-13d387c6eb4c host_id 2
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 5213=
91 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0=
322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 5213=
91 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 5213=
91 [30933]: leader4 sn rn ts 0 cs 0
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 5213=
91 [30933]: s123 delta_renew verify_leader error -223
[snip many repeats]
Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 5213=
98 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 5213=
98 [30933]: leader4 sn rn ts 0 cs 0
Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 5213=
98 [30933]: s123 delta_renew verify_leader error -223
Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 5213=
98 [30933]: s123 renewal error -223 delta_length 0 last_success 521369
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 5213=
99 [3362]: s123 kill 32599 sig 15 count 10
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 5213=
99 [3362]: dead 32599 ci 4 count 10
Jan 22 00:44:58 office4a respawn: slave '/usr/share/vdsm/vdsm --pidfi=
le /var/run/vdsm/vdsmd.pid' died, respawning slave
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 5213=
99 [3362]: s123 all pids clear
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 5213=
99 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD=
/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 5213=
99 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-a=
c67-13d387c6eb4c host_id 2
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 5213=
99 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0=
322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 5213=
99 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 5213=
99 [30933]: leader4 sn rn ts 0 cs 0
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 5213=
99 [30933]: s123 delta_renew verify_leader error -223
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 5213=
99 [30933]: s123 renewal error -223 delta_length 0 last_success 521369
Jan 22 00:44:59 office4a vdsm vds WARNING Unable to load the json rpc=
server module. Please make sure it is installed.
Jan 22 00:44:59 office4a vdsm vm.Vm WARNING vmId=3D`d2983edf-7a76-438=
f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix',
'al=
ias': 'channel0', 'type': 'channel', 'address':
{'bus': '0', 'controller'=
: '0', 'type': 'virtio-serial', 'port': '1'}}'
found
Jan 22 00:44:59 office4a vdsm vm.Vm WARNING vmId=3D`d2983edf-7a76-438=
f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix',
'al=
ias': 'channel1', 'type': 'channel', 'address':
{'bus': '0', 'controller'=
: '0', 'type': 'virtio-serial', 'port': '2'}}'
found
Jan 22 00:44:59 office4a vdsm vm.Vm WARNING vmId=3D`d2983edf-7a76-438=
f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device':
'spicevmc',=
'alias': 'channel2', 'type': 'channel',
'address': {'bus': '0', 'control=
ler': '0', 'type': 'virtio-serial', 'port':
'3'}}' found
Jan 22 00:45:00 office4a vdsm vm.Vm WARNING vmId=3D`d2983edf-7a76-438=
f-9ea3-6e6dd3d6f54b`::_readPauseCode unsupported by libvirt vm
Jan 22 00:45:00 office4a kernel: ata1: hard resetting link
Jan 22 00:45:00 office4a kernel: ata1: SATA link up 1.5 Gbps (SStatus=
113 SControl 300)
Jan 22 00:45:00 office4a kernel: ata1.00: configured for UDMA/133
Jan 22 00:45:00 office4a kernel: ata1: EH complete
Jan 22 00:45:01 office4a kernel: ata2: hard resetting link
Jan 22 00:45:01 office4a kernel: ata2: SATA link up 3.0 Gbps (SStatus=
123 SControl 300)
Jan 22 00:45:01 office4a kernel: ata2.00: configured for UDMA/133
Jan 22 00:45:01 office4a kernel: ata2: EH complete
Jan 22 00:45:02 office4a kernel: ata3: hard resetting link
Jan 22 00:45:02 office4a kernel: ata3: SATA link up 3.0 Gbps (SStatus=
123 SControl 300)
Jan 22 00:45:02 office4a kernel: ata3.00: configured for UDMA/133
Jan 22 00:45:02 office4a kernel: ata3: EH complete
Jan 22 00:45:03 office4a kernel: ata4: hard resetting link
Jan 22 00:45:03 office4a kernel: ata4: SATA link down (SStatus 0 SCon=
trol 300)
Jan 22 00:45:03 office4a kernel: ata4: EH complete
Jan 22 00:45:04 office4a kernel: ata5: soft resetting link
Jan 22 00:45:04 office4a kernel: ata5.00: configured for UDMA/33
Jan 22 00:45:04 office4a kernel: ata5: EH complete
Jan 22 00:45:05 office4a kernel: ata6: soft resetting link
Jan 22 00:45:05 office4a kernel: ata6: EH complete
Jan 22 00:45:08 office4a kernel: ata1: hard resetting link
Jan 22 00:45:08 office4a kernel: ata1: SATA link up 1.5 Gbps (SStatus=
113 SControl 300)
Jan 22 00:45:08 office4a kernel: ata1.00: configured for UDMA/133
Jan 22 00:45:08 office4a kernel: ata1: EH complete
Jan 22 00:45:09 office4a kernel: ata2: hard resetting link
Jan 22 00:45:09 office4a kernel: ata2: SATA link up 3.0 Gbps (SStatus=
123 SControl 300)
Jan 22 00:45:09 office4a kernel: ata2.00: configured for UDMA/133
Jan 22 00:45:09 office4a kernel: ata2: EH complete
Jan 22 00:45:10 office4a kernel: ata3: hard resetting link
Jan 22 00:45:10 office4a kernel: ata3: SATA link up 3.0 Gbps (SStatus=
123 SControl 300)
Jan 22 00:45:10 office4a kernel: ata3.00: configured for UDMA/133
Jan 22 00:45:10 office4a kernel: ata3: EH complete
Jan 22 00:45:11 office4a kernel: ata4: hard resetting link
Jan 22 00:45:11 office4a kernel: ata4: SATA link down (SStatus 0 SCon=
trol 300)
Jan 22 00:45:11 office4a kernel: ata4: EH complete
Jan 22 00:45:12 office4a kernel: ata5: soft resetting link
Jan 22 00:45:12 office4a kernel: ata5.00: configured for UDMA/33
Jan 22 00:45:12 office4a kernel: ata5: EH complete
Jan 22 00:45:13 office4a kernel: ata6: soft resetting link
Jan 22 00:45:13 office4a kernel: ata6: EH complete
Jan 22 00:45:15 office4a sanlock[3362]: 2014-01-22 00:45:15-0500 5214=
16 [17150]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center=
/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md=
/ids
Jan 22 00:45:16 office4a sanlock[3362]: 2014-01-22 00:45:16-0500 5214=
17 [3367]: s124 add_lockspace fail result -90
Jan 22 00:45:20 office4a vdsm vm.Vm WARNING vmId=3D`d2983edf-7a76-438=
f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix',
'al=
ias': 'channel0', 'type': 'channel', 'address':
{'bus': '0', 'controller'=
: '0', 'type': 'virtio-serial', 'port': '1'}}'
found
Jan 22 00:45:20 office4a vdsm vm.Vm WARNING vmId=3D`d2983edf-7a76-438=
f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix',
'al=
ias': 'channel1', 'type': 'channel', 'address':
{'bus': '0', 'controller'=
: '0', 'type': 'virtio-serial', 'port': '2'}}'
found
Jan 22 00:45:20 office4a vdsm vm.Vm WARNING vmId=3D`d2983edf-7a76-438=
f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device':
'spicevmc',=
'alias': 'channel2', 'type': 'channel',
'address': {'bus': '0', 'control=
ler': '0', 'type': 'virtio-serial', 'port':
'3'}}' found
Jan 22 00:45:21 office4a sanlock[3362]: 2014-01-22 00:45:21-0500 5214=
22 [17199]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center=
/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md=
/ids
Jan 22 00:45:22 office4a sanlock[3362]: 2014-01-22 00:45:22-0500 5214=
23 [3368]: s125 add_lockspace fail result -90
Jan 22 00:45:22 office4a vdsm TaskManager.Task ERROR Task=3D`2fefe6f9=
-4dac-45f2-b237-0104ce890146`::Unexpected error#012Traceback (most recent=
call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _r=
un#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/storage/ta=
sk.py", line 318, in run#012 return self.cmd(*self.argslist, **self.ar=
gsdict)#012 File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#=
012 self.masterDomain.acquireHostId(self.id)#012 File "/usr/share/vds=
m/storage/sd.py", line 458, in acquireHostId#012 self._clusterLock.acq=
uireHostId(hostId, async)#012 File "/usr/share/vdsm/storage/clusterlock.=
py", line 189, in acquireHostId#012 raise se.AcquireHostIdFailure(self=
=2E_sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: ('0322a40=
7-2b16-40dc-ac67-13d387c6eb4c', SanlockException(90, 'Sanlock lockspace a=
dd failure', 'Message too long'))
Jan 22 00:45:25 office4a sanlock[3362]: 2014-01-22 00:45:25-0500 5214=
26 [17243]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center=
/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md=
/ids
Jan 22 00:45:26 office4a sanlock[3362]: 2014-01-22 00:45:26-0500 5214=
27 [3368]: s126 add_lockspace fail result -90
Jan 22 00:45:33 office4a vdsm TaskManager.Task ERROR Task=3D`8f8931ba=
-a9f6-4225-a951-2d608b3b1d70`::Unexpected error#012Traceback (most recent=
call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _r=
un#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/logUtils.p=
y", line 45, in wrapper#012 res =3D f(*args, **kwargs)#012 File "/usr=
/share/vdsm/storage/hsm.py", line 2110, in getAllTasksStatuses#012 all=
TasksStatus =3D sp.getAllTasksStatuses()#012 File "/usr/share/vdsm/stora=
ge/securable.py", line 66, in wrapper#012 raise SecureError()#012Secur=
eError
Jan 22 00:45:36 office4a sanlock[3362]: 2014-01-22 00:45:36-0500 5214=
37 [17334]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center=
/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md=
/ids
Jan 22 00:45:37 office4a sanlock[3362]: 2014-01-22 00:45:37-0500 5214=
38 [3368]: s127 add_lockspace fail result -90
Jan 22 00:45:37 office4a vdsm TaskManager.Task ERROR Task=3D`6b7da394=
-63b6-4e52-ac73-ab98f3f1e60f`::Unexpected error#012Traceback (most recent=
call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _r=
un#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/storage/ta=
sk.py", line 318, in run#012 return self.cmd(*self.argslist, **self.ar=
gsdict)#012 File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#=
012 self.masterDomain.acquireHostId(self.id)#012 File "/usr/share/vds=
m/storage/sd.py", line 458, in acquireHostId#012 self._clusterLock.acq=
uireHostId(hostId, async)#012 File "/usr/share/vdsm/storage/clusterlock.=
py", line 185, in acquireHostId#012 raise se.AcquireHostIdFailure(self=
=2E_sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: ('0322a40=
7-2b16-40dc-ac67-13d387c6eb4c', SanlockException(115, 'Sanlock lockspace =
add failure', 'Operation now in progress'))
Jan 22 00:45:46 office4a sanlock[3362]: 2014-01-22 00:45:46-0500 5214=
47 [17410]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center=
/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md=
/ids
Jan 22 00:45:47 office4a sanlock[3362]: 2014-01-22 00:45:47-0500 5214=
48 [3367]: s128 add_lockspace fail result -90
Jan 22 00:45:48 office4a vdsm TaskManager.Task ERROR Task=3D`c2486380=
-a0c8-45d1-a63f-e0e660f1e118`::Unexpected error#012Traceback (most recent=
call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _r=
un#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/logUtils.p=
y", line 45, in wrapper#012 res =3D f(*args, **kwargs)#012 File "/usr=
/share/vdsm/storage/hsm.py", line 2110, in getAllTasksStatuses#012 all=
TasksStatus =3D sp.getAllTasksStatuses()#012 File "/usr/share/vdsm/stora=
ge/securable.py", line 66, in wrapper#012 raise SecureError()#012Secur=
eError
Jan 22 00:45:50 office4a sanlock[3362]: 2014-01-22 00:45:50-0500 5214=
51 [17457]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center=
/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md=
/ids
Jan 22 00:45:51 office4a sanlock[3362]: 2014-01-22 00:45:51-0500 5214=
52 [3368]: s129 add_lockspace fail result -90
> I am guessing that what makes cluster storage have the (Master) design=
ation
> is that this is the one that actually contains the sanlocks? If
so, w=
ould it
> make sense to set up a gluster volume to be (Master), but not use
it f=
or VM
> storage, just for storing the sanlock info? Separate gluster
volume(s=
) could
> then have the VMs on it(them), and would not need the
optimizations tu=
rned
> off.
Any domain must be able to become the master at any time. Without a mas=
ter
the data center is unusable (at the present time), that's why we
migrat=
e (or
reconstruct) it on another domain when necessary.
Ted Miller
--------------020403010102000502070801
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Federico, thank you for your help so far. Lots of more information
below.<br>
<br>
On 1/27/2014 4:46 PM, Federico Simoncelli wrote:<br>
<blockquote
cite="mid:1916841992.8207492.1390859170488.JavaMail.root@redhat.com"
type="cite">
<pre wrap="">----- Original Message -----
</pre>
<blockquote type="cite">
<pre wrap="">From: "Ted Miller" <a
class="moz-txt-link-rfc2396E"
href="mailto:tmiller@hcjb.org"><tmiller@hcjb.org></a>
On 1/27/2014 3:47 AM, Federico Simoncelli wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Maybe someone from gluster can identify easily what
happened. Meanwhile if
you just want to repair your data-center you could try with:
$ cd
/rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/
$ touch ids
$ sanlock direct init -s 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576</pre>
</blockquote>
</blockquote>
</blockquote>
<br>
I tried your suggestion, and it helped, but it was not enough. <br>
<blockquote>
<pre>[root@office4a ~]$ cd
<span class="moz-txt-slash"><span
class="moz-txt-tag">/</span>rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md<span
class="moz-txt-tag">/</span></span>
</pre>
<pre>[root@office4a dom_md]$ touch ids
</pre>
<pre>[root@office4a dom_md]$ sanlock direct init -s
0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576
</pre>
<pre>init done 0
</pre>
</blockquote>
Let me explain a little.<br>
<br>
When the problem originally happened, the sanlock.log started having
-223 error messages. 10 seconds later the log switched from -223
messages to -90 messages. Running your little script changed the
error from -90 back to -223.<br>
<br>
I hope you can send me another script that will get rid of the -223
messages.<br>
<br>
Here is the sanlock.log as I ran your script:<br>
<br>
<blockquote>
<pre>2014-01-27 19:40:41-0500 39281 [3803]: s13 lockspace
0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
</pre>
<pre>2014-01-27 19:40:41-0500 39281 [22751]: 0322a407 aio collect 0
0x7f54240008c0:0x7f54240008d0:0x7f5424101000 result 0:0 match len 512
</pre>
<pre>2014-01-27 19:40:41-0500 39281 [22751]: read_sectors delta_leader offset
512 rv -90
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>2014-01-27 19:40:42-0500 39282 [3803]: s13 add_lockspace fail result -90
</pre>
<pre>2014-01-27 19:40:47-0500 39287 [3803]: s14 lockspace
0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
</pre>
<pre>2014-01-27 19:40:47-0500 39287 [22795]: 0322a407 aio collect 0
0x7f54240008c0:0x7f54240008d0:0x7f5424101000 result 0:0 match len 512
</pre>
<pre>2014-01-27 19:40:47-0500 39287 [22795]: read_sectors delta_leader offset
512 rv -90
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>2014-01-27 19:40:48-0500 39288 [3803]: s14 add_lockspace fail result -90
</pre>
<pre>2014-01-27 19:40:56-0500 39296 [3802]: s15 lockspace
0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
</pre>
<pre>2014-01-27 19:40:56-0500 39296 [22866]: verify_leader 2 wrong magic 0
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>2014-01-27 19:40:56-0500 39296 [22866]: leader1 delta_acquire_begin error
-223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2
</pre>
<pre>2014-01-27 19:40:56-0500 39296 [22866]: leader2 path
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
offset 0
</pre>
<pre>2014-01-27 19:40:56-0500 39296 [22866]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi
0 og 0 lv 0
</pre>
<pre>2014-01-27 19:40:56-0500 39296 [22866]: leader4 sn rn
ts 0 cs 0
</pre>
<pre>2014-01-27 19:40:57-0500 39297 [3802]: s15 add_lockspace fail result
-223
</pre>
<pre>2014-01-27 19:40:57-0500 39297 [3802]: s16 lockspace
0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
</pre>
<pre>2014-01-27 19:40:57-0500 39297 [22870]: verify_leader 2 wrong magic 0
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>2014-01-27 19:40:57-0500 39297 [22870]: leader1 delta_acquire_begin error
-223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2
</pre>
<pre>2014-01-27 19:40:57-0500 39297 [22870]: leader2 path
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
offset 0
</pre>
<pre>2014-01-27 19:40:57-0500 39297 [22870]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi
0 og 0 lv 0
</pre>
<pre>2014-01-27 19:40:57-0500 39297 [22870]: leader4 sn rn
ts 0 cs 0
</pre>
<pre>2014-01-27 19:40:58-0500 39298 [3802]: s16 add_lockspace fail result
-223
</pre>
<pre>2014-01-27 19:41:07-0500 39307 [3802]: s17 lockspace
0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
</pre>
</blockquote>
Unfortunately, I think the error looks about the same to vdsm,
because /var/log/messages shows the same two lines in the calling
scripts on the callback lists (66 & 425, if I remember right).<br>
<br>
When I get up in the morning, I will be looking for another magic
potion from your pen. :)<br>
<br>
<blockquote
cite="mid:1916841992.8207492.1390859170488.JavaMail.root@redhat.com"
type="cite">
<blockquote type="cite">
<pre wrap="">Federico,
I won't be able to do anything to the ovirt setup for another 5 hours or so
(it is a trial system I am working on at home, I am at work), but I will try
your repair script and report back.
In bugzilla 862975 they suggested turning off write-behind caching and "eager
locking" on the gluster volume to avoid/reduce the problems that come from
many different computers all writing to the same file(s) on a very frequent
basis. If I interpret the comment in the bug correctly, it did seem to help
in that situation. My situation is a little different. My gluster setup is
replicate only, replica 3 (though there are only two hosts). I was not
stress-testing it, I was just using it, trying to figure out how I can import
some old VMWare VMs without an ESXi server to run them on.
</pre>
</blockquote>
<pre wrap="">
Have you done anything similar to what is described here in comment 21?
<a class="moz-txt-link-freetext"
href="https://bugzilla.redhat.com/show_bug.cgi?id=859589#c21"&g...
</blockquote>
<br>
I have no idea, because that bug is so dangerous that the general
public is not allowed to even see it. I get an "You are not
authorized to access ...." message. :) However, see just
below.<br>
<br>
<blockquote
cite="mid:1916841992.8207492.1390859170488.JavaMail.root@redhat.com"
type="cite">
<pre wrap="">When did you realize that you weren't able to use
the data-center anymore?
Can you describe exactly what you did and what happened, for example:
1. I created the data center (up and running)
2. I tried to import some VMs from VMWare
3. During the import (or after it) the data-center went in the contending state
...
Did something special happened?
</pre>
</blockquote>
<br>
It took me a few hours of going back through and matching times on
logs to figure it out, as it has been a week since it happened.
When I saw some things in the logs, I finally remembered the
circumstances. I will summarize it for you, because I think this is
what matters. If you want actual logs, tell me which logs and for
which events, and I will package them up for you.<br>
<br>
The host suspended all VMs because gluster hit the 90% full limit.
It took me a while to figure out what was going on. I expanded the
logical volume, but the Data Center would not come back up.<br>
<br>
The GUI events log says: <br>
<blockquote>
<pre>Invalid status on Data Center mill. Setting Data Center status to Non
Responsive (On host office4a, Error: Network error during communications with the
Host.).</pre>
</blockquote>
<br>
The host /var/log/messages says:<br>
<br>
<div class="moz-text-flowed" style="font-size: 15px;"
lang="x-western">
<blockquote>
<pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389
[30933]: verify_leader 2 wrong magic 0
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389
[30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c
host_id 2
</pre>
<pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389
[30933]: leader2 path
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
offset 0
</pre>
<pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389
[30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
</pre>
<pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389
[30933]: leader4 sn rn ts 0 cs 0
</pre>
<pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389
[30933]: s123 delta_renew verify_leader error -223
</pre>
<pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389
[30933]: s123 renewal error -223 delta_length 0 last_success 521369
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[3362]: s123 check_our_lease corrupt -223
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[3362]: s123 kill 32599 sig 15 count 1
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: verify_leader 2 wrong magic 0
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c
host_id 2
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: leader2 path
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
offset 0
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: leader4 sn rn ts 0 cs 0
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: s123 delta_renew verify_leader error -223
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: s123 renewal error -223 delta_length 0 last_success 521369
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: verify_leader 2 wrong magic 0
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c
host_id 2
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: leader2 path
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
offset 0
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: leader4 sn rn ts 0 cs 0
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: s123 delta_renew verify_leader error -223
</pre>
<pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390
[30933]: s123 renewal error -223 delta_length 0 last_success 521369
</pre>
<pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391
[3362]: s123 kill 32599 sig 15 count 2
</pre>
<pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391
[30933]: verify_leader 2 wrong magic 0
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391
[30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c
host_id 2
</pre>
<pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391
[30933]: leader2 path
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
offset 0
</pre>
<pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391
[30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
</pre>
<pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391
[30933]: leader4 sn rn ts 0 cs 0
</pre>
<pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391
[30933]: s123 delta_renew verify_leader error -223
</pre>
<pre>[snip many repeats]
Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: leader3 m
0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
</pre>
<pre>Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398
[30933]: leader4 sn rn ts 0 cs 0
</pre>
<pre>Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398
[30933]: s123 delta_renew verify_leader error -223
</pre>
<pre>Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398
[30933]: s123 renewal error -223 delta_length 0 last_success 521369
</pre>
<pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399
[3362]: s123 kill 32599 sig 15 count 10
</pre>
<pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399
[3362]: dead 32599 ci 4 count 10
</pre>
<pre>Jan 22 00:44:58 office4a respawn: slave '/usr/share/vdsm/vdsm
--pidfile /var/run/vdsm/vdsmd.pid' died, respawning slave
</pre>
<pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399
[3362]: s123 all pids clear
</pre>
<pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399
[30933]: verify_leader 2 wrong magic 0
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399
[30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c
host_id 2
</pre>
<pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399
[30933]: leader2 path
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
offset 0
</pre>
<pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399
[30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
</pre>
<pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399
[30933]: leader4 sn rn ts 0 cs 0
</pre>
<pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399
[30933]: s123 delta_renew verify_leader error -223
</pre>
<pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399
[30933]: s123 renewal error -223 delta_length 0 last_success 521369
</pre>
<pre>Jan 22 00:44:59 office4a vdsm vds WARNING Unable to load the json rpc
server module. Please make sure it is installed.
</pre>
<pre>Jan 22 00:44:59 office4a vdsm vm.Vm WARNING
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device:
'{'device': 'unix', 'alias': 'channel0',
'type': 'channel', 'address': {'bus': '0',
'controller': '0', 'type': 'virtio-serial',
'port': '1'}}' found
</pre>
<pre>Jan 22 00:44:59 office4a vdsm vm.Vm WARNING
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device:
'{'device': 'unix', 'alias': 'channel1',
'type': 'channel', 'address': {'bus': '0',
'controller': '0', 'type': 'virtio-serial',
'port': '2'}}' found
</pre>
<pre>Jan 22 00:44:59 office4a vdsm vm.Vm WARNING
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device:
'{'device': 'spicevmc', 'alias': 'channel2',
'type': 'channel', 'address': {'bus': '0',
'controller': '0', 'type': 'virtio-serial',
'port': '3'}}' found
</pre>
<pre>Jan 22 00:45:00 office4a vdsm vm.Vm WARNING
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::_readPauseCode unsupported by libvirt vm
</pre>
<pre>Jan 22 00:45:00 office4a kernel: ata1: hard resetting link
</pre>
<pre>Jan 22 00:45:00 office4a kernel: ata1: SATA link up 1.5 Gbps (SStatus
113 SControl 300)
</pre>
<pre>Jan 22 00:45:00 office4a kernel: ata1.00: configured for UDMA/133
</pre>
<pre>Jan 22 00:45:00 office4a kernel: ata1: EH complete
</pre>
<pre>Jan 22 00:45:01 office4a kernel: ata2: hard resetting link
</pre>
<pre>Jan 22 00:45:01 office4a kernel: ata2: SATA link up 3.0 Gbps (SStatus
123 SControl 300)
</pre>
<pre>Jan 22 00:45:01 office4a kernel: ata2.00: configured for UDMA/133
</pre>
<pre>Jan 22 00:45:01 office4a kernel: ata2: EH complete
</pre>
<pre>Jan 22 00:45:02 office4a kernel: ata3: hard resetting link
</pre>
<pre>Jan 22 00:45:02 office4a kernel: ata3: SATA link up 3.0 Gbps (SStatus
123 SControl 300)
</pre>
<pre>Jan 22 00:45:02 office4a kernel: ata3.00: configured for UDMA/133
</pre>
<pre>Jan 22 00:45:02 office4a kernel: ata3: EH complete
</pre>
<pre>Jan 22 00:45:03 office4a kernel: ata4: hard resetting link
</pre>
<pre>Jan 22 00:45:03 office4a kernel: ata4: SATA link down (SStatus 0
SControl 300)
</pre>
<pre>Jan 22 00:45:03 office4a kernel: ata4: EH complete
</pre>
<pre>Jan 22 00:45:04 office4a kernel: ata5: soft resetting link
</pre>
<pre>Jan 22 00:45:04 office4a kernel: ata5.00: configured for UDMA/33
</pre>
<pre>Jan 22 00:45:04 office4a kernel: ata5: EH complete
</pre>
<pre>Jan 22 00:45:05 office4a kernel: ata6: soft resetting link
</pre>
<pre>Jan 22 00:45:05 office4a kernel: ata6: EH complete
</pre>
<pre>Jan 22 00:45:08 office4a kernel: ata1: hard resetting link
</pre>
<pre>Jan 22 00:45:08 office4a kernel: ata1: SATA link up 1.5 Gbps (SStatus
113 SControl 300)
</pre>
<pre>Jan 22 00:45:08 office4a kernel: ata1.00: configured for UDMA/133
</pre>
<pre>Jan 22 00:45:08 office4a kernel: ata1: EH complete
</pre>
<pre>Jan 22 00:45:09 office4a kernel: ata2: hard resetting link
</pre>
<pre>Jan 22 00:45:09 office4a kernel: ata2: SATA link up 3.0 Gbps (SStatus
123 SControl 300)
</pre>
<pre>Jan 22 00:45:09 office4a kernel: ata2.00: configured for UDMA/133
</pre>
<pre>Jan 22 00:45:09 office4a kernel: ata2: EH complete
</pre>
<pre>Jan 22 00:45:10 office4a kernel: ata3: hard resetting link
</pre>
<pre>Jan 22 00:45:10 office4a kernel: ata3: SATA link up 3.0 Gbps (SStatus
123 SControl 300)
</pre>
<pre>Jan 22 00:45:10 office4a kernel: ata3.00: configured for UDMA/133
</pre>
<pre>Jan 22 00:45:10 office4a kernel: ata3: EH complete
</pre>
<pre>Jan 22 00:45:11 office4a kernel: ata4: hard resetting link
</pre>
<pre>Jan 22 00:45:11 office4a kernel: ata4: SATA link down (SStatus 0
SControl 300)
</pre>
<pre>Jan 22 00:45:11 office4a kernel: ata4: EH complete
</pre>
<pre>Jan 22 00:45:12 office4a kernel: ata5: soft resetting link
</pre>
<pre>Jan 22 00:45:12 office4a kernel: ata5.00: configured for UDMA/33
</pre>
<pre>Jan 22 00:45:12 office4a kernel: ata5: EH complete
</pre>
<pre>Jan 22 00:45:13 office4a kernel: ata6: soft resetting link
</pre>
<pre>Jan 22 00:45:13 office4a kernel: ata6: EH complete
</pre>
<pre>Jan 22 00:45:15 office4a sanlock[3362]: 2014-01-22 00:45:15-0500 521416
[17150]: read_sectors delta_leader offset 512 rv -90
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>Jan 22 00:45:16 office4a sanlock[3362]: 2014-01-22 00:45:16-0500 521417
[3367]: s124 add_lockspace fail result -90
</pre>
<pre>Jan 22 00:45:20 office4a vdsm vm.Vm WARNING
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device:
'{'device': 'unix', 'alias': 'channel0',
'type': 'channel', 'address': {'bus': '0',
'controller': '0', 'type': 'virtio-serial',
'port': '1'}}' found
</pre>
<pre>Jan 22 00:45:20 office4a vdsm vm.Vm WARNING
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device:
'{'device': 'unix', 'alias': 'channel1',
'type': 'channel', 'address': {'bus': '0',
'controller': '0', 'type': 'virtio-serial',
'port': '2'}}' found
</pre>
<pre>Jan 22 00:45:20 office4a vdsm vm.Vm WARNING
vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device:
'{'device': 'spicevmc', 'alias': 'channel2',
'type': 'channel', 'address': {'bus': '0',
'controller': '0', 'type': 'virtio-serial',
'port': '3'}}' found
</pre>
<pre>Jan 22 00:45:21 office4a sanlock[3362]: 2014-01-22 00:45:21-0500 521422
[17199]: read_sectors delta_leader offset 512 rv -90
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>Jan 22 00:45:22 office4a sanlock[3362]: 2014-01-22 00:45:22-0500 521423
[3368]: s125 add_lockspace fail result -90
</pre>
<pre>Jan 22 00:45:22 office4a vdsm TaskManager.Task ERROR
Task=`2fefe6f9-4dac-45f2-b237-0104ce890146`::Unexpected error#012Traceback (most recent
call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in
_run#012 return fn(*args, **kargs)#012 File
"/usr/share/vdsm/storage/task.py", line 318, in
run#012 return self.cmd(*self.argslist,
**self.argsdict)#012 File "/usr/share/vdsm/storage/sp.py", line 273,
in startSpm#012
self.masterDomain.acquireHostId(self.id)#012 File
"/usr/share/vdsm/storage/sd.py", line 458, in
acquireHostId#012 self._clusterLock.acquireHostId(hostId,
async)#012 File "/usr/share/vdsm/storage/clusterlock.py", line 189, in
acquireHostId#012 raise
se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id:
('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(90, 'Sanlock
lockspace add failure', 'Me
ssage too long'))
</pre>
<pre>Jan 22 00:45:25 office4a sanlock[3362]: 2014-01-22 00:45:25-0500 521426
[17243]: read_sectors delta_leader offset 512 rv -90
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>Jan 22 00:45:26 office4a sanlock[3362]: 2014-01-22 00:45:26-0500 521427
[3368]: s126 add_lockspace fail result -90
</pre>
<pre>Jan 22 00:45:33 office4a vdsm TaskManager.Task ERROR
Task=`8f8931ba-a9f6-4225-a951-2d608b3b1d70`::Unexpected error#012Traceback (most recent
call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in
_run#012 return fn(*args, **kargs)#012 File
"/usr/share/vdsm/logUtils.py", line 45, in
wrapper#012 res = f(*args, **kwargs)#012 File
"/usr/share/vdsm/storage/hsm.py", line 2110, in
getAllTasksStatuses#012 allTasksStatus =
sp.getAllTasksStatuses()#012 File
"/usr/share/vdsm/storage/securable.py", line 66, in
wrapper#012 raise SecureError()#012SecureError
</pre>
<pre>Jan 22 00:45:36 office4a sanlock[3362]: 2014-01-22 00:45:36-0500 521437
[17334]: read_sectors delta_leader offset 512 rv -90
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>Jan 22 00:45:37 office4a sanlock[3362]: 2014-01-22 00:45:37-0500 521438
[3368]: s127 add_lockspace fail result -90
</pre>
<pre>Jan 22 00:45:37 office4a vdsm TaskManager.Task ERROR
Task=`6b7da394-63b6-4e52-ac73-ab98f3f1e60f`::Unexpected error#012Traceback (most recent
call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in
_run#012 return fn(*args, **kargs)#012 File
"/usr/share/vdsm/storage/task.py", line 318, in
run#012 return self.cmd(*self.argslist,
**self.argsdict)#012 File "/usr/share/vdsm/storage/sp.py", line 273,
in startSpm#012
self.masterDomain.acquireHostId(self.id)#012 File
"/usr/share/vdsm/storage/sd.py", line 458, in
acquireHostId#012 self._clusterLock.acquireHostId(hostId,
async)#012 File "/usr/share/vdsm/storage/clusterlock.py", line 185, in
acquireHostId#012 raise
se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id:
('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(115, 'Sanlock
lockspace add failure', 'O
peration now in progress'))
</pre>
<pre>Jan 22 00:45:46 office4a sanlock[3362]: 2014-01-22 00:45:46-0500 521447
[17410]: read_sectors delta_leader offset 512 rv -90
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>Jan 22 00:45:47 office4a sanlock[3362]: 2014-01-22 00:45:47-0500 521448
[3367]: s128 add_lockspace fail result -90
</pre>
<pre>Jan 22 00:45:48 office4a vdsm TaskManager.Task ERROR
Task=`c2486380-a0c8-45d1-a63f-e0e660f1e118`::Unexpected error#012Traceback (most recent
call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in
_run#012 return fn(*args, **kargs)#012 File
"/usr/share/vdsm/logUtils.py", line 45, in
wrapper#012 res = f(*args, **kwargs)#012 File
"/usr/share/vdsm/storage/hsm.py", line 2110, in
getAllTasksStatuses#012 allTasksStatus =
sp.getAllTasksStatuses()#012 File
"/usr/share/vdsm/storage/securable.py", line 66, in
wrapper#012 raise SecureError()#012SecureError
</pre>
<pre>Jan 22 00:45:50 office4a sanlock[3362]: 2014-01-22 00:45:50-0500 521451
[17457]: read_sectors delta_leader offset 512 rv -90
/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
</pre>
<pre>Jan 22 00:45:51 office4a sanlock[3362]: 2014-01-22 00:45:51-0500 521452
[3368]: s129 add_lockspace fail result -90
</pre>
</blockquote>
<pre>
</pre>
</div>
<blockquote
cite="mid:1916841992.8207492.1390859170488.JavaMail.root@redhat.com"
type="cite">
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap="">I am guessing that what makes cluster storage have
the (Master) designation
is that this is the one that actually contains the sanlocks? If so, would it
make sense to set up a gluster volume to be (Master), but not use it for VM
storage, just for storing the sanlock info? Separate gluster volume(s) could
then have the VMs on it(them), and would not need the optimizations turned
off.
</pre>
</blockquote>
<pre wrap="">
Any domain must be able to become the master at any time. Without a master
the data center is unusable (at the present time), that's why we migrate (or
reconstruct) it on another domain when necessary.
</pre>
</blockquote>
Ted Miller<br>
<br>
</body>
</html>
--------------020403010102000502070801--