[Users] Data Center stuck between "Non Responsive" and "Contending"
Ted Miller
tmiller at hcjb.org
Tue Jan 28 02:22:36 EST 2014
Federico, thank you for your help so far. Lots of more information below.
On 1/27/2014 4:46 PM, Federico Simoncelli wrote:
> ----- Original Message -----
>> From: "Ted Miller" <tmiller at hcjb.org>
>>
>> On 1/27/2014 3:47 AM, Federico Simoncelli wrote:
>>> Maybe someone from gluster can identify easily what happened. Meanwhile if
>>> you just want to repair your data-center you could try with:
>>>
>>> $ cd /rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/
>>> $ touch ids
>>> $ sanlock direct init -s 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576
I tried your suggestion, and it helped, but it was not enough.
[root at office4a ~]$ cd
/rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/
[root at office4a dom_md]$ touch ids
[root at office4a dom_md]$ sanlock direct init -s 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576
init done 0
Let me explain a little.
When the problem originally happened, the sanlock.log started having -223
error messages. 10 seconds later the log switched from -223 messages to -90
messages. Running your little script changed the error from -90 back to -223.
I hope you can send me another script that will get rid of the -223 messages.
Here is the sanlock.log as I ran your script:
2014-01-27 19:40:41-0500 39281 [3803]: s13 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
2014-01-27 19:40:41-0500 39281 [22751]: 0322a407 aio collect 0 0x7f54240008c0:0x7f54240008d0:0x7f5424101000 result 0:0 match len 512
2014-01-27 19:40:41-0500 39281 [22751]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
2014-01-27 19:40:42-0500 39282 [3803]: s13 add_lockspace fail result -90
2014-01-27 19:40:47-0500 39287 [3803]: s14 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
2014-01-27 19:40:47-0500 39287 [22795]: 0322a407 aio collect 0 0x7f54240008c0:0x7f54240008d0:0x7f5424101000 result 0:0 match len 512
2014-01-27 19:40:47-0500 39287 [22795]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
2014-01-27 19:40:48-0500 39288 [3803]: s14 add_lockspace fail result -90
2014-01-27 19:40:56-0500 39296 [3802]: s15 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
2014-01-27 19:40:56-0500 39296 [22866]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
2014-01-27 19:40:56-0500 39296 [22866]: leader1 delta_acquire_begin error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2
2014-01-27 19:40:56-0500 39296 [22866]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0
2014-01-27 19:40:56-0500 39296 [22866]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
2014-01-27 19:40:56-0500 39296 [22866]: leader4 sn rn ts 0 cs 0
2014-01-27 19:40:57-0500 39297 [3802]: s15 add_lockspace fail result -223
2014-01-27 19:40:57-0500 39297 [3802]: s16 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
2014-01-27 19:40:57-0500 39297 [22870]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
2014-01-27 19:40:57-0500 39297 [22870]: leader1 delta_acquire_begin error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2
2014-01-27 19:40:57-0500 39297 [22870]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0
2014-01-27 19:40:57-0500 39297 [22870]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
2014-01-27 19:40:57-0500 39297 [22870]: leader4 sn rn ts 0 cs 0
2014-01-27 19:40:58-0500 39298 [3802]: s16 add_lockspace fail result -223
2014-01-27 19:41:07-0500 39307 [3802]: s17 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0
Unfortunately, I think the error looks about the same to vdsm, because
/var/log/messages shows the same two lines in the calling scripts on the
callback lists (66 & 425, if I remember right).
When I get up in the morning, I will be looking for another magic potion from
your pen. :)
>> Federico,
>>
>> I won't be able to do anything to the ovirt setup for another 5 hours or so
>> (it is a trial system I am working on at home, I am at work), but I will try
>> your repair script and report back.
>>
>> In bugzilla 862975 they suggested turning off write-behind caching and "eager
>> locking" on the gluster volume to avoid/reduce the problems that come from
>> many different computers all writing to the same file(s) on a very frequent
>> basis. If I interpret the comment in the bug correctly, it did seem to help
>> in that situation. My situation is a little different. My gluster setup is
>> replicate only, replica 3 (though there are only two hosts). I was not
>> stress-testing it, I was just using it, trying to figure out how I can import
>> some old VMWare VMs without an ESXi server to run them on.
> Have you done anything similar to what is described here in comment 21?
>
> https://bugzilla.redhat.com/show_bug.cgi?id=859589#c21
I have no idea, because that bug is so dangerous that the general public is
not allowed to even see it. I get an "You are not authorized to access ...."
message. :) However, see just below.
> When did you realize that you weren't able to use the data-center anymore?
> Can you describe exactly what you did and what happened, for example:
>
> 1. I created the data center (up and running)
> 2. I tried to import some VMs from VMWare
> 3. During the import (or after it) the data-center went in the contending state
> ...
>
> Did something special happened?
It took me a few hours of going back through and matching times on logs to
figure it out, as it has been a week since it happened. When I saw some
things in the logs, I finally remembered the circumstances. I will summarize
it for you, because I think this is what matters. If you want actual logs,
tell me which logs and for which events, and I will package them up for you.
The host suspended all VMs because gluster hit the 90% full limit. It took me
a while to figure out what was going on. I expanded the logical volume, but
the Data Center would not come back up.
The GUI events log says:
Invalid status on Data Center mill. Setting Data Center status to Non Responsive (On host office4a, Error: Network error during communications with the Host.).
The host /var/log/messages says:
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: leader4 sn rn ts 0 cs 0
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: s123 delta_renew verify_leader error -223
Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: s123 renewal error -223 delta_length 0 last_success 521369
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [3362]: s123 check_our_lease corrupt -223
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [3362]: s123 kill 32599 sig 15 count 1
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader4 sn rn ts 0 cs 0
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: s123 delta_renew verify_leader error -223
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: s123 renewal error -223 delta_length 0 last_success 521369
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader4 sn rn ts 0 cs 0
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: s123 delta_renew verify_leader error -223
Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: s123 renewal error -223 delta_length 0 last_success 521369
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [3362]: s123 kill 32599 sig 15 count 2
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: leader4 sn rn ts 0 cs 0
Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: s123 delta_renew verify_leader error -223
[snip many repeats]
Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: leader4 sn rn ts 0 cs 0
Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: s123 delta_renew verify_leader error -223
Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: s123 renewal error -223 delta_length 0 last_success 521369
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [3362]: s123 kill 32599 sig 15 count 10
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [3362]: dead 32599 ci 4 count 10
Jan 22 00:44:58 office4a respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died, respawning slave
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [3362]: s123 all pids clear
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: leader4 sn rn ts 0 cs 0
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: s123 delta_renew verify_leader error -223
Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: s123 renewal error -223 delta_length 0 last_success 521369
Jan 22 00:44:59 office4a vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed.
Jan 22 00:44:59 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found
Jan 22 00:44:59 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel1', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}}' found
Jan 22 00:44:59 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'spicevmc', 'alias': 'channel2', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '3'}}' found
Jan 22 00:45:00 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::_readPauseCode unsupported by libvirt vm
Jan 22 00:45:00 office4a kernel: ata1: hard resetting link
Jan 22 00:45:00 office4a kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 22 00:45:00 office4a kernel: ata1.00: configured for UDMA/133
Jan 22 00:45:00 office4a kernel: ata1: EH complete
Jan 22 00:45:01 office4a kernel: ata2: hard resetting link
Jan 22 00:45:01 office4a kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 22 00:45:01 office4a kernel: ata2.00: configured for UDMA/133
Jan 22 00:45:01 office4a kernel: ata2: EH complete
Jan 22 00:45:02 office4a kernel: ata3: hard resetting link
Jan 22 00:45:02 office4a kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 22 00:45:02 office4a kernel: ata3.00: configured for UDMA/133
Jan 22 00:45:02 office4a kernel: ata3: EH complete
Jan 22 00:45:03 office4a kernel: ata4: hard resetting link
Jan 22 00:45:03 office4a kernel: ata4: SATA link down (SStatus 0 SControl 300)
Jan 22 00:45:03 office4a kernel: ata4: EH complete
Jan 22 00:45:04 office4a kernel: ata5: soft resetting link
Jan 22 00:45:04 office4a kernel: ata5.00: configured for UDMA/33
Jan 22 00:45:04 office4a kernel: ata5: EH complete
Jan 22 00:45:05 office4a kernel: ata6: soft resetting link
Jan 22 00:45:05 office4a kernel: ata6: EH complete
Jan 22 00:45:08 office4a kernel: ata1: hard resetting link
Jan 22 00:45:08 office4a kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jan 22 00:45:08 office4a kernel: ata1.00: configured for UDMA/133
Jan 22 00:45:08 office4a kernel: ata1: EH complete
Jan 22 00:45:09 office4a kernel: ata2: hard resetting link
Jan 22 00:45:09 office4a kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 22 00:45:09 office4a kernel: ata2.00: configured for UDMA/133
Jan 22 00:45:09 office4a kernel: ata2: EH complete
Jan 22 00:45:10 office4a kernel: ata3: hard resetting link
Jan 22 00:45:10 office4a kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 22 00:45:10 office4a kernel: ata3.00: configured for UDMA/133
Jan 22 00:45:10 office4a kernel: ata3: EH complete
Jan 22 00:45:11 office4a kernel: ata4: hard resetting link
Jan 22 00:45:11 office4a kernel: ata4: SATA link down (SStatus 0 SControl 300)
Jan 22 00:45:11 office4a kernel: ata4: EH complete
Jan 22 00:45:12 office4a kernel: ata5: soft resetting link
Jan 22 00:45:12 office4a kernel: ata5.00: configured for UDMA/33
Jan 22 00:45:12 office4a kernel: ata5: EH complete
Jan 22 00:45:13 office4a kernel: ata6: soft resetting link
Jan 22 00:45:13 office4a kernel: ata6: EH complete
Jan 22 00:45:15 office4a sanlock[3362]: 2014-01-22 00:45:15-0500 521416 [17150]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:45:16 office4a sanlock[3362]: 2014-01-22 00:45:16-0500 521417 [3367]: s124 add_lockspace fail result -90
Jan 22 00:45:20 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found
Jan 22 00:45:20 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel1', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}}' found
Jan 22 00:45:20 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'spicevmc', 'alias': 'channel2', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '3'}}' found
Jan 22 00:45:21 office4a sanlock[3362]: 2014-01-22 00:45:21-0500 521422 [17199]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:45:22 office4a sanlock[3362]: 2014-01-22 00:45:22-0500 521423 [3368]: s125 add_lockspace fail result -90
Jan 22 00:45:22 office4a vdsm TaskManager.Task ERROR Task=`2fefe6f9-4dac-45f2-b237-0104ce890146`::Unexpected error#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/storage/task.py", line 318, in run#012 return self.cmd(*self.argslist, **self.argsdict)#012 File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012 self.masterDomain.acquireHostId(self.id)#012 File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012 self._clusterLock.acquireHostId(hostId, async)#012 File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId#012 raise se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: ('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long'))
Jan 22 00:45:25 office4a sanlock[3362]: 2014-01-22 00:45:25-0500 521426 [17243]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:45:26 office4a sanlock[3362]: 2014-01-22 00:45:26-0500 521427 [3368]: s126 add_lockspace fail result -90
Jan 22 00:45:33 office4a vdsm TaskManager.Task ERROR Task=`8f8931ba-a9f6-4225-a951-2d608b3b1d70`::Unexpected error#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012 res = f(*args, **kwargs)#012 File "/usr/share/vdsm/storage/hsm.py", line 2110, in getAllTasksStatuses#012 allTasksStatus = sp.getAllTasksStatuses()#012 File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper#012 raise SecureError()#012SecureError
Jan 22 00:45:36 office4a sanlock[3362]: 2014-01-22 00:45:36-0500 521437 [17334]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:45:37 office4a sanlock[3362]: 2014-01-22 00:45:37-0500 521438 [3368]: s127 add_lockspace fail result -90
Jan 22 00:45:37 office4a vdsm TaskManager.Task ERROR Task=`6b7da394-63b6-4e52-ac73-ab98f3f1e60f`::Unexpected error#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/storage/task.py", line 318, in run#012 return self.cmd(*self.argslist, **self.argsdict)#012 File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012 self.masterDomain.acquireHostId(self.id)#012 File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012 self._clusterLock.acquireHostId(hostId, async)#012 File "/usr/share/vdsm/storage/clusterlock.py", line 185, in acquireHostId#012 raise se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: ('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(115, 'Sanlock lockspace add failure', 'Operation now in progress'))
Jan 22 00:45:46 office4a sanlock[3362]: 2014-01-22 00:45:46-0500 521447 [17410]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:45:47 office4a sanlock[3362]: 2014-01-22 00:45:47-0500 521448 [3367]: s128 add_lockspace fail result -90
Jan 22 00:45:48 office4a vdsm TaskManager.Task ERROR Task=`c2486380-a0c8-45d1-a63f-e0e660f1e118`::Unexpected error#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line 857, in _run#012 return fn(*args, **kargs)#012 File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012 res = f(*args, **kwargs)#012 File "/usr/share/vdsm/storage/hsm.py", line 2110, in getAllTasksStatuses#012 allTasksStatus = sp.getAllTasksStatuses()#012 File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper#012 raise SecureError()#012SecureError
Jan 22 00:45:50 office4a sanlock[3362]: 2014-01-22 00:45:50-0500 521451 [17457]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
Jan 22 00:45:51 office4a sanlock[3362]: 2014-01-22 00:45:51-0500 521452 [3368]: s129 add_lockspace fail result -90
>
>> I am guessing that what makes cluster storage have the (Master) designation
>> is that this is the one that actually contains the sanlocks? If so, would it
>> make sense to set up a gluster volume to be (Master), but not use it for VM
>> storage, just for storing the sanlock info? Separate gluster volume(s) could
>> then have the VMs on it(them), and would not need the optimizations turned
>> off.
> Any domain must be able to become the master at any time. Without a master
> the data center is unusable (at the present time), that's why we migrate (or
> reconstruct) it on another domain when necessary.
Ted Miller
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140128/492713e1/attachment-0001.html>
More information about the Users
mailing list