[Users] Data Center stuck between "Non Responsive" and "Contending"

Ted Miller tmiller at hcjb.org
Tue Jan 28 07:22:36 UTC 2014


Federico, thank you for your help so far.  Lots of more information below.

On 1/27/2014 4:46 PM, Federico Simoncelli wrote:
> ----- Original Message -----
>> From: "Ted Miller" <tmiller at hcjb.org>
>>
>> On 1/27/2014 3:47 AM, Federico Simoncelli wrote:
>>> Maybe someone from gluster can identify easily what happened. Meanwhile if
>>> you just want to repair your data-center you could try with:
>>>
>>>    $ cd /rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/
>>>    $ touch ids
>>>    $ sanlock direct init -s 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576

I tried your suggestion, and it helped, but it was not enough.

    [root at office4a ~]$ cd
    /rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/

    [root at office4a dom_md]$ touch ids

    [root at office4a dom_md]$ sanlock direct init -s 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576

    init done 0

Let me explain a little.

When the problem originally happened, the sanlock.log started having -223 
error messages.  10 seconds later the log switched from -223 messages to -90 
messages.  Running your little script changed the error from -90 back to -223.

I hope you can send me another script that will get rid of the -223 messages.

Here is the sanlock.log as I ran your script:

    2014-01-27 19:40:41-0500 39281 [3803]: s13 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

    2014-01-27 19:40:41-0500 39281 [22751]: 0322a407 aio collect 0 0x7f54240008c0:0x7f54240008d0:0x7f5424101000 result 0:0 match len 512

    2014-01-27 19:40:41-0500 39281 [22751]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    2014-01-27 19:40:42-0500 39282 [3803]: s13 add_lockspace fail result -90

    2014-01-27 19:40:47-0500 39287 [3803]: s14 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

    2014-01-27 19:40:47-0500 39287 [22795]: 0322a407 aio collect 0 0x7f54240008c0:0x7f54240008d0:0x7f5424101000 result 0:0 match len 512

    2014-01-27 19:40:47-0500 39287 [22795]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    2014-01-27 19:40:48-0500 39288 [3803]: s14 add_lockspace fail result -90

    2014-01-27 19:40:56-0500 39296 [3802]: s15 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

    2014-01-27 19:40:56-0500 39296 [22866]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    2014-01-27 19:40:56-0500 39296 [22866]: leader1 delta_acquire_begin error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

    2014-01-27 19:40:56-0500 39296 [22866]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

    2014-01-27 19:40:56-0500 39296 [22866]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

    2014-01-27 19:40:56-0500 39296 [22866]: leader4 sn  rn  ts 0 cs 0

    2014-01-27 19:40:57-0500 39297 [3802]: s15 add_lockspace fail result -223

    2014-01-27 19:40:57-0500 39297 [3802]: s16 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

    2014-01-27 19:40:57-0500 39297 [22870]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    2014-01-27 19:40:57-0500 39297 [22870]: leader1 delta_acquire_begin error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

    2014-01-27 19:40:57-0500 39297 [22870]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

    2014-01-27 19:40:57-0500 39297 [22870]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

    2014-01-27 19:40:57-0500 39297 [22870]: leader4 sn  rn  ts 0 cs 0

    2014-01-27 19:40:58-0500 39298 [3802]: s16 add_lockspace fail result -223

    2014-01-27 19:41:07-0500 39307 [3802]: s17 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

Unfortunately, I think the error looks about the same to vdsm, because 
/var/log/messages shows the same two lines in the calling scripts on the 
callback lists (66 & 425, if I remember right).

When I get up in the morning, I will be looking for another magic potion from 
your pen. :)

>> Federico,
>>
>> I won't be able to do anything to the ovirt setup for another 5 hours or so
>> (it is a trial system I am working on  at home, I am at work), but I will try
>> your repair script and report back.
>>
>> In bugzilla 862975 they suggested turning off write-behind caching and "eager
>> locking" on the gluster volume to avoid/reduce the problems that come from
>> many different computers all writing to the same file(s) on a very frequent
>> basis.  If I interpret the comment in the bug correctly, it did seem to help
>> in that situation.  My situation is a little different.  My gluster setup is
>> replicate only, replica 3 (though there are only two hosts).  I was not
>> stress-testing it, I was just using it, trying to figure out how I can import
>> some old VMWare VMs without an ESXi server to run them on.
> Have you done anything similar to what is described here in comment 21?
>
> https://bugzilla.redhat.com/show_bug.cgi?id=859589#c21

I have no idea, because that bug is so dangerous that the general public is 
not allowed to even see it.  I get an "You are not authorized to access ...." 
message. :)   However, see just below.

> When did you realize that you weren't able to use the data-center anymore?
> Can you describe exactly what you did and what happened, for example:
>
> 1. I created the data center (up and running)
> 2. I tried to import some VMs from VMWare
> 3. During the import (or after it) the data-center went in the contending state
> ...
>
> Did something special happened?

It took me a few hours of going back through and matching times on logs to 
figure it out, as it has been a week since it happened. When I saw some 
things in the logs, I finally remembered the circumstances.  I will summarize 
it for you, because I think this is what matters.  If you want actual logs, 
tell me which logs and for which events, and I will package them up for you.

The host suspended all VMs because gluster hit the 90% full limit. It took me 
a while to figure out what was going on.  I expanded the logical volume, but 
the Data Center would not come back up.

The GUI events log says:

    Invalid status on Data Center mill. Setting Data Center status to Non Responsive (On host office4a, Error: Network error during communications with the Host.).


The host /var/log/messages says:

    Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

    Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

    Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

    Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: leader4 sn  rn  ts 0 cs 0

    Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: s123 delta_renew verify_leader error -223

    Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: s123 renewal error -223 delta_length 0 last_success 521369

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [3362]: s123 check_our_lease corrupt -223

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [3362]: s123 kill 32599 sig 15 count 1

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader4 sn  rn  ts 0 cs 0

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: s123 delta_renew verify_leader error -223

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: s123 renewal error -223 delta_length 0 last_success 521369

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader4 sn  rn  ts 0 cs 0

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: s123 delta_renew verify_leader error -223

    Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: s123 renewal error -223 delta_length 0 last_success 521369

    Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [3362]: s123 kill 32599 sig 15 count 2

    Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

    Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

    Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

    Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: leader4 sn  rn  ts 0 cs 0

    Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: s123 delta_renew verify_leader error -223

    [snip many repeats]

    Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

    Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: leader4 sn  rn  ts 0 cs 0

    Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: s123 delta_renew verify_leader error -223

    Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: s123 renewal error -223 delta_length 0 last_success 521369

    Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [3362]: s123 kill 32599 sig 15 count 10

    Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [3362]: dead 32599 ci 4 count 10

    Jan 22 00:44:58 office4a respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died, respawning slave

    Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [3362]: s123 all pids clear

    Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

    Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

    Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

    Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: leader4 sn  rn  ts 0 cs 0

    Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: s123 delta_renew verify_leader error -223

    Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: s123 renewal error -223 delta_length 0 last_success 521369

    Jan 22 00:44:59 office4a vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed.

    Jan 22 00:44:59 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found

    Jan 22 00:44:59 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel1', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}}' found

    Jan 22 00:44:59 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'spicevmc', 'alias': 'channel2', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '3'}}' found

    Jan 22 00:45:00 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::_readPauseCode unsupported by libvirt vm

    Jan 22 00:45:00 office4a kernel: ata1: hard resetting link

    Jan 22 00:45:00 office4a kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

    Jan 22 00:45:00 office4a kernel: ata1.00: configured for UDMA/133

    Jan 22 00:45:00 office4a kernel: ata1: EH complete

    Jan 22 00:45:01 office4a kernel: ata2: hard resetting link

    Jan 22 00:45:01 office4a kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

    Jan 22 00:45:01 office4a kernel: ata2.00: configured for UDMA/133

    Jan 22 00:45:01 office4a kernel: ata2: EH complete

    Jan 22 00:45:02 office4a kernel: ata3: hard resetting link

    Jan 22 00:45:02 office4a kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

    Jan 22 00:45:02 office4a kernel: ata3.00: configured for UDMA/133

    Jan 22 00:45:02 office4a kernel: ata3: EH complete

    Jan 22 00:45:03 office4a kernel: ata4: hard resetting link

    Jan 22 00:45:03 office4a kernel: ata4: SATA link down (SStatus 0 SControl 300)

    Jan 22 00:45:03 office4a kernel: ata4: EH complete

    Jan 22 00:45:04 office4a kernel: ata5: soft resetting link

    Jan 22 00:45:04 office4a kernel: ata5.00: configured for UDMA/33

    Jan 22 00:45:04 office4a kernel: ata5: EH complete

    Jan 22 00:45:05 office4a kernel: ata6: soft resetting link

    Jan 22 00:45:05 office4a kernel: ata6: EH complete

    Jan 22 00:45:08 office4a kernel: ata1: hard resetting link

    Jan 22 00:45:08 office4a kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

    Jan 22 00:45:08 office4a kernel: ata1.00: configured for UDMA/133

    Jan 22 00:45:08 office4a kernel: ata1: EH complete

    Jan 22 00:45:09 office4a kernel: ata2: hard resetting link

    Jan 22 00:45:09 office4a kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

    Jan 22 00:45:09 office4a kernel: ata2.00: configured for UDMA/133

    Jan 22 00:45:09 office4a kernel: ata2: EH complete

    Jan 22 00:45:10 office4a kernel: ata3: hard resetting link

    Jan 22 00:45:10 office4a kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

    Jan 22 00:45:10 office4a kernel: ata3.00: configured for UDMA/133

    Jan 22 00:45:10 office4a kernel: ata3: EH complete

    Jan 22 00:45:11 office4a kernel: ata4: hard resetting link

    Jan 22 00:45:11 office4a kernel: ata4: SATA link down (SStatus 0 SControl 300)

    Jan 22 00:45:11 office4a kernel: ata4: EH complete

    Jan 22 00:45:12 office4a kernel: ata5: soft resetting link

    Jan 22 00:45:12 office4a kernel: ata5.00: configured for UDMA/33

    Jan 22 00:45:12 office4a kernel: ata5: EH complete

    Jan 22 00:45:13 office4a kernel: ata6: soft resetting link

    Jan 22 00:45:13 office4a kernel: ata6: EH complete

    Jan 22 00:45:15 office4a sanlock[3362]: 2014-01-22 00:45:15-0500 521416 [17150]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    Jan 22 00:45:16 office4a sanlock[3362]: 2014-01-22 00:45:16-0500 521417 [3367]: s124 add_lockspace fail result -90

    Jan 22 00:45:20 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found

    Jan 22 00:45:20 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel1', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}}' found

    Jan 22 00:45:20 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'spicevmc', 'alias': 'channel2', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '3'}}' found

    Jan 22 00:45:21 office4a sanlock[3362]: 2014-01-22 00:45:21-0500 521422 [17199]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    Jan 22 00:45:22 office4a sanlock[3362]: 2014-01-22 00:45:22-0500 521423 [3368]: s125 add_lockspace fail result -90

    Jan 22 00:45:22 office4a vdsm TaskManager.Task ERROR Task=`2fefe6f9-4dac-45f2-b237-0104ce890146`::Unexpected error#012Traceback (most recent call last):#012  File "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return fn(*args, **kargs)#012  File "/usr/share/vdsm/storage/task.py", line 318, in run#012    return self.cmd(*self.argslist, **self.argsdict)#012  File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012    self.masterDomain.acquireHostId(self.id)#012  File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012    self._clusterLock.acquireHostId(hostId, async)#012  File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId#012    raise se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: ('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long'))

    Jan 22 00:45:25 office4a sanlock[3362]: 2014-01-22 00:45:25-0500 521426 [17243]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    Jan 22 00:45:26 office4a sanlock[3362]: 2014-01-22 00:45:26-0500 521427 [3368]: s126 add_lockspace fail result -90

    Jan 22 00:45:33 office4a vdsm TaskManager.Task ERROR Task=`8f8931ba-a9f6-4225-a951-2d608b3b1d70`::Unexpected error#012Traceback (most recent call last):#012  File "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012    res = f(*args, **kwargs)#012  File "/usr/share/vdsm/storage/hsm.py", line 2110, in getAllTasksStatuses#012    allTasksStatus = sp.getAllTasksStatuses()#012  File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper#012    raise SecureError()#012SecureError

    Jan 22 00:45:36 office4a sanlock[3362]: 2014-01-22 00:45:36-0500 521437 [17334]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    Jan 22 00:45:37 office4a sanlock[3362]: 2014-01-22 00:45:37-0500 521438 [3368]: s127 add_lockspace fail result -90

    Jan 22 00:45:37 office4a vdsm TaskManager.Task ERROR Task=`6b7da394-63b6-4e52-ac73-ab98f3f1e60f`::Unexpected error#012Traceback (most recent call last):#012  File "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return fn(*args, **kargs)#012  File "/usr/share/vdsm/storage/task.py", line 318, in run#012    return self.cmd(*self.argslist, **self.argsdict)#012  File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012    self.masterDomain.acquireHostId(self.id)#012  File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012    self._clusterLock.acquireHostId(hostId, async)#012  File "/usr/share/vdsm/storage/clusterlock.py", line 185, in acquireHostId#012    raise se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: ('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(115, 'Sanlock lockspace add failure', 'Operation now in progress'))

    Jan 22 00:45:46 office4a sanlock[3362]: 2014-01-22 00:45:46-0500 521447 [17410]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    Jan 22 00:45:47 office4a sanlock[3362]: 2014-01-22 00:45:47-0500 521448 [3367]: s128 add_lockspace fail result -90

    Jan 22 00:45:48 office4a vdsm TaskManager.Task ERROR Task=`c2486380-a0c8-45d1-a63f-e0e660f1e118`::Unexpected error#012Traceback (most recent call last):#012  File "/usr/share/vdsm/storage/task.py", line 857, in _run#012    return fn(*args, **kargs)#012  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012    res = f(*args, **kwargs)#012  File "/usr/share/vdsm/storage/hsm.py", line 2110, in getAllTasksStatuses#012    allTasksStatus = sp.getAllTasksStatuses()#012  File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper#012    raise SecureError()#012SecureError

    Jan 22 00:45:50 office4a sanlock[3362]: 2014-01-22 00:45:50-0500 521451 [17457]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

    Jan 22 00:45:51 office4a sanlock[3362]: 2014-01-22 00:45:51-0500 521452 [3368]: s129 add_lockspace fail result -90

>
>> I am guessing that what makes cluster storage have the (Master) designation
>> is that this is the one that actually contains the sanlocks?  If so, would it
>> make sense to set up a gluster volume to be (Master), but not use it for VM
>> storage, just for storing the sanlock info?  Separate gluster volume(s) could
>> then have the VMs on it(them), and would not need the optimizations turned
>> off.
> Any domain must be able to become the master at any time. Without a master
> the data center is unusable (at the present time), that's why we migrate (or
> reconstruct) it on another domain when necessary.
Ted Miller

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20140128/492713e1/attachment-0001.html>


More information about the Users mailing list