<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Federico, thank you for your help so far.&nbsp; Lots of more information

    below.<br>

    <br>

    On 1/27/2014 4:46 PM, Federico Simoncelli wrote:<br>

    <blockquote

      cite="mid:1916841992.8207492.1390859170488.JavaMail.root@redhat.com"

      type="cite">

      <pre wrap="">----- Original Message -----

</pre>

      <blockquote type="cite">

        <pre wrap="">From: "Ted Miller" <a class="moz-txt-link-rfc2396E" href="mailto:tmiller@hcjb.org">&lt;tmiller@hcjb.org&gt;</a>

On 1/27/2014 3:47 AM, Federico Simoncelli wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">Maybe someone from gluster can identify easily what happened. Meanwhile if

you just want to repair your data-center you could try with:

  $ cd /rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/

  $ touch ids

  $ sanlock direct init -s 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576</pre>

        </blockquote>

      </blockquote>

    </blockquote>

    <br>

    I tried your suggestion, and it helped, but it was not enough.&nbsp; <br>

    <blockquote>

      <pre>[root@office4a ~]$ cd 

<span class="moz-txt-slash"><span class="moz-txt-tag">/</span>rhev/data-center/mnt/glusterSD/10.41.65.2\:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md<span class="moz-txt-tag">/</span></span>

</pre>

      <pre>[root@office4a dom_md]$ touch ids

</pre>

      <pre>[root@office4a dom_md]$ sanlock direct init -s 0322a407-2b16-40dc-ac67-13d387c6eb4c:0:ids:1048576

</pre>

      <pre>init done 0

</pre>

    </blockquote>

    Let me explain a little.<br>

    <br>

    When the problem originally happened, the sanlock.log started having

    -223 error messages.&nbsp; 10 seconds later the log switched from -223

    messages to -90 messages.&nbsp; Running your little script changed the

    error from -90 back to -223.<br>

    <br>

    I hope you can send me another script that will get rid of the -223

    messages.<br>

    <br>

    Here is the sanlock.log as I ran your script:<br>

    <br>

    <blockquote>

      <pre>2014-01-27 19:40:41-0500 39281 [3803]: s13 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

</pre>

      <pre>2014-01-27 19:40:41-0500 39281 [22751]: 0322a407 aio collect 0 0x7f54240008c0:0x7f54240008d0:0x7f5424101000 result 0:0 match len 512

</pre>

      <pre>2014-01-27 19:40:41-0500 39281 [22751]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

      <pre>2014-01-27 19:40:42-0500 39282 [3803]: s13 add_lockspace fail result -90

</pre>

      <pre>2014-01-27 19:40:47-0500 39287 [3803]: s14 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

</pre>

      <pre>2014-01-27 19:40:47-0500 39287 [22795]: 0322a407 aio collect 0 0x7f54240008c0:0x7f54240008d0:0x7f5424101000 result 0:0 match len 512

</pre>

      <pre>2014-01-27 19:40:47-0500 39287 [22795]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

      <pre>2014-01-27 19:40:48-0500 39288 [3803]: s14 add_lockspace fail result -90

</pre>

      <pre>2014-01-27 19:40:56-0500 39296 [3802]: s15 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

</pre>

      <pre>2014-01-27 19:40:56-0500 39296 [22866]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

      <pre>2014-01-27 19:40:56-0500 39296 [22866]: leader1 delta_acquire_begin error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

</pre>

      <pre>2014-01-27 19:40:56-0500 39296 [22866]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

</pre>

      <pre>2014-01-27 19:40:56-0500 39296 [22866]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

</pre>

      <pre>2014-01-27 19:40:56-0500 39296 [22866]: leader4 sn&nbsp; rn&nbsp; ts 0 cs 0

</pre>

      <pre>2014-01-27 19:40:57-0500 39297 [3802]: s15 add_lockspace fail result -223

</pre>

      <pre>2014-01-27 19:40:57-0500 39297 [3802]: s16 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

</pre>

      <pre>2014-01-27 19:40:57-0500 39297 [22870]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

      <pre>2014-01-27 19:40:57-0500 39297 [22870]: leader1 delta_acquire_begin error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

</pre>

      <pre>2014-01-27 19:40:57-0500 39297 [22870]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

</pre>

      <pre>2014-01-27 19:40:57-0500 39297 [22870]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

</pre>

      <pre>2014-01-27 19:40:57-0500 39297 [22870]: leader4 sn&nbsp; rn&nbsp; ts 0 cs 0

</pre>

      <pre>2014-01-27 19:40:58-0500 39298 [3802]: s16 add_lockspace fail result -223

</pre>

      <pre>2014-01-27 19:41:07-0500 39307 [3802]: s17 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c:2:/rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids:0

</pre>

    </blockquote>

    Unfortunately, I think the error looks about the same to vdsm,

    because /var/log/messages shows the same two lines in the calling

    scripts on the callback lists (66 &amp; 425, if I remember right).<br>

    <br>

    When I get up in the morning, I will be looking for another magic

    potion from your pen. :)<br>

    <br>

    <blockquote

      cite="mid:1916841992.8207492.1390859170488.JavaMail.root@redhat.com"

      type="cite">

      <blockquote type="cite">

        <pre wrap="">Federico,

I won't be able to do anything to the ovirt setup for another 5 hours or so

(it is a trial system I am working on  at home, I am at work), but I will try

your repair script and report back.

In bugzilla 862975 they suggested turning off write-behind caching and "eager

locking" on the gluster volume to avoid/reduce the problems that come from

many different computers all writing to the same file(s) on a very frequent

basis.  If I interpret the comment in the bug correctly, it did seem to help

in that situation.  My situation is a little different.  My gluster setup is

replicate only, replica 3 (though there are only two hosts).  I was not

stress-testing it, I was just using it, trying to figure out how I can import

some old VMWare VMs without an ESXi server to run them on.

</pre>

      </blockquote>

      <pre wrap="">

Have you done anything similar to what is described here in comment 21?

<a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=859589#c21">https://bugzilla.redhat.com/show_bug.cgi?id=859589#c21</a></pre>

    </blockquote>

    <br>

    I have no idea, because that bug is so dangerous that the general

    public is not allowed to even see it.&nbsp; I get an "You are not

    authorized to access ...." message. :) &nbsp; However, see just below.<br>

    <br>

    <blockquote

      cite="mid:1916841992.8207492.1390859170488.JavaMail.root@redhat.com"

      type="cite">

      <pre wrap="">When did you realize that you weren't able to use the data-center anymore?

Can you describe exactly what you did and what happened, for example:

1. I created the data center (up and running)

2. I tried to import some VMs from VMWare

3. During the import (or after it) the data-center went in the contending state

...

Did something special happened?

</pre>

    </blockquote>

    <br>

    It took me a few hours of going back through and matching times on

    logs to figure it out, as it has been a week since it happened.&nbsp;

    When I saw some things in the logs, I finally remembered the

    circumstances.&nbsp; I will summarize it for you, because I think this is

    what matters.&nbsp; If you want actual logs, tell me which logs and for

    which events, and I will package them up for you.<br>

    <br>

    The host suspended all VMs because gluster hit the 90% full limit.&nbsp;

    It took me a while to figure out what was going on.&nbsp; I expanded the

    logical volume, but the Data Center would not come back up.<br>

    <br>

    The GUI events log says:&nbsp; <br>

    <blockquote>

      <pre>Invalid status on Data Center mill. Setting Data Center status to Non Responsive (On host office4a, Error: Network error during communications with the Host.).</pre>

    </blockquote>

    <br>

    The host /var/log/messages says:<br>

    <br>

    <div class="moz-text-flowed" style="font-size: 15px;"

      lang="x-western">

      <blockquote>

        <pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

        <pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

</pre>

        <pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

</pre>

        <pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

</pre>

        <pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: leader4 sn&nbsp; rn&nbsp; ts 0 cs 0

</pre>

        <pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: s123 delta_renew verify_leader error -223

</pre>

        <pre>Jan 22 00:44:48 office4a sanlock[3362]: 2014-01-22 00:44:48-0500 521389 [30933]: s123 renewal error -223 delta_length 0 last_success 521369

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [3362]: s123 check_our_lease corrupt -223

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [3362]: s123 kill 32599 sig 15 count 1

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader4 sn&nbsp; rn&nbsp; ts 0 cs 0

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: s123 delta_renew verify_leader error -223

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: s123 renewal error -223 delta_length 0 last_success 521369

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: leader4 sn&nbsp; rn&nbsp; ts 0 cs 0

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: s123 delta_renew verify_leader error -223

</pre>

        <pre>Jan 22 00:44:49 office4a sanlock[3362]: 2014-01-22 00:44:49-0500 521390 [30933]: s123 renewal error -223 delta_length 0 last_success 521369

</pre>

        <pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [3362]: s123 kill 32599 sig 15 count 2

</pre>

        <pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

        <pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

</pre>

        <pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

</pre>

        <pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

</pre>

        <pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: leader4 sn&nbsp; rn&nbsp; ts 0 cs 0

</pre>

        <pre>Jan 22 00:44:50 office4a sanlock[3362]: 2014-01-22 00:44:50-0500 521391 [30933]: s123 delta_renew verify_leader error -223

</pre>

        <pre>[snip many repeats]

Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

</pre>

        <pre>Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: leader4 sn&nbsp; rn&nbsp; ts 0 cs 0

</pre>

        <pre>Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: s123 delta_renew verify_leader error -223

</pre>

        <pre>Jan 22 00:44:57 office4a sanlock[3362]: 2014-01-22 00:44:57-0500 521398 [30933]: s123 renewal error -223 delta_length 0 last_success 521369

</pre>

        <pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [3362]: s123 kill 32599 sig 15 count 10

</pre>

        <pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [3362]: dead 32599 ci 4 count 10

</pre>

        <pre>Jan 22 00:44:58 office4a respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died, respawning slave

</pre>

        <pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [3362]: s123 all pids clear

</pre>

        <pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: verify_leader 2 wrong magic 0 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

        <pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: leader1 delta_renew error -223 lockspace 0322a407-2b16-40dc-ac67-13d387c6eb4c host_id 2

</pre>

        <pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: leader2 path /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids offset 0

</pre>

        <pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: leader3 m 0 v 0 ss 0 nh 0 mh 0 oi 0 og 0 lv 0

</pre>

        <pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: leader4 sn&nbsp; rn&nbsp; ts 0 cs 0

</pre>

        <pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: s123 delta_renew verify_leader error -223

</pre>

        <pre>Jan 22 00:44:58 office4a sanlock[3362]: 2014-01-22 00:44:58-0500 521399 [30933]: s123 renewal error -223 delta_length 0 last_success 521369

</pre>

        <pre>Jan 22 00:44:59 office4a vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed.

</pre>

        <pre>Jan 22 00:44:59 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found

</pre>

        <pre>Jan 22 00:44:59 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel1', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}}' found

</pre>

        <pre>Jan 22 00:44:59 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'spicevmc', 'alias': 'channel2', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '3'}}' found

</pre>

        <pre>Jan 22 00:45:00 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::_readPauseCode unsupported by libvirt vm

</pre>

        <pre>Jan 22 00:45:00 office4a kernel: ata1: hard resetting link

</pre>

        <pre>Jan 22 00:45:00 office4a kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

</pre>

        <pre>Jan 22 00:45:00 office4a kernel: ata1.00: configured for UDMA/133

</pre>

        <pre>Jan 22 00:45:00 office4a kernel: ata1: EH complete

</pre>

        <pre>Jan 22 00:45:01 office4a kernel: ata2: hard resetting link

</pre>

        <pre>Jan 22 00:45:01 office4a kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

</pre>

        <pre>Jan 22 00:45:01 office4a kernel: ata2.00: configured for UDMA/133

</pre>

        <pre>Jan 22 00:45:01 office4a kernel: ata2: EH complete

</pre>

        <pre>Jan 22 00:45:02 office4a kernel: ata3: hard resetting link

</pre>

        <pre>Jan 22 00:45:02 office4a kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

</pre>

        <pre>Jan 22 00:45:02 office4a kernel: ata3.00: configured for UDMA/133

</pre>

        <pre>Jan 22 00:45:02 office4a kernel: ata3: EH complete

</pre>

        <pre>Jan 22 00:45:03 office4a kernel: ata4: hard resetting link

</pre>

        <pre>Jan 22 00:45:03 office4a kernel: ata4: SATA link down (SStatus 0 SControl 300)

</pre>

        <pre>Jan 22 00:45:03 office4a kernel: ata4: EH complete

</pre>

        <pre>Jan 22 00:45:04 office4a kernel: ata5: soft resetting link

</pre>

        <pre>Jan 22 00:45:04 office4a kernel: ata5.00: configured for UDMA/33

</pre>

        <pre>Jan 22 00:45:04 office4a kernel: ata5: EH complete

</pre>

        <pre>Jan 22 00:45:05 office4a kernel: ata6: soft resetting link

</pre>

        <pre>Jan 22 00:45:05 office4a kernel: ata6: EH complete

</pre>

        <pre>Jan 22 00:45:08 office4a kernel: ata1: hard resetting link

</pre>

        <pre>Jan 22 00:45:08 office4a kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

</pre>

        <pre>Jan 22 00:45:08 office4a kernel: ata1.00: configured for UDMA/133

</pre>

        <pre>Jan 22 00:45:08 office4a kernel: ata1: EH complete

</pre>

        <pre>Jan 22 00:45:09 office4a kernel: ata2: hard resetting link

</pre>

        <pre>Jan 22 00:45:09 office4a kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

</pre>

        <pre>Jan 22 00:45:09 office4a kernel: ata2.00: configured for UDMA/133

</pre>

        <pre>Jan 22 00:45:09 office4a kernel: ata2: EH complete

</pre>

        <pre>Jan 22 00:45:10 office4a kernel: ata3: hard resetting link

</pre>

        <pre>Jan 22 00:45:10 office4a kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

</pre>

        <pre>Jan 22 00:45:10 office4a kernel: ata3.00: configured for UDMA/133

</pre>

        <pre>Jan 22 00:45:10 office4a kernel: ata3: EH complete

</pre>

        <pre>Jan 22 00:45:11 office4a kernel: ata4: hard resetting link

</pre>

        <pre>Jan 22 00:45:11 office4a kernel: ata4: SATA link down (SStatus 0 SControl 300)

</pre>

        <pre>Jan 22 00:45:11 office4a kernel: ata4: EH complete

</pre>

        <pre>Jan 22 00:45:12 office4a kernel: ata5: soft resetting link

</pre>

        <pre>Jan 22 00:45:12 office4a kernel: ata5.00: configured for UDMA/33

</pre>

        <pre>Jan 22 00:45:12 office4a kernel: ata5: EH complete

</pre>

        <pre>Jan 22 00:45:13 office4a kernel: ata6: soft resetting link

</pre>

        <pre>Jan 22 00:45:13 office4a kernel: ata6: EH complete

</pre>

        <pre>Jan 22 00:45:15 office4a sanlock[3362]: 2014-01-22 00:45:15-0500 521416 [17150]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

        <pre>Jan 22 00:45:16 office4a sanlock[3362]: 2014-01-22 00:45:16-0500 521417 [3367]: s124 add_lockspace fail result -90

</pre>

        <pre>Jan 22 00:45:20 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found

</pre>

        <pre>Jan 22 00:45:20 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel1', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}}' found

</pre>

        <pre>Jan 22 00:45:20 office4a vdsm vm.Vm WARNING vmId=`d2983edf-7a76-438f-9ea3-6e6dd3d6f54b`::Unknown type found, device: '{'device': 'spicevmc', 'alias': 'channel2', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '3'}}' found

</pre>

        <pre>Jan 22 00:45:21 office4a sanlock[3362]: 2014-01-22 00:45:21-0500 521422 [17199]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

        <pre>Jan 22 00:45:22 office4a sanlock[3362]: 2014-01-22 00:45:22-0500 521423 [3368]: s125 add_lockspace fail result -90

</pre>

        <pre>Jan 22 00:45:22 office4a vdsm TaskManager.Task ERROR Task=`2fefe6f9-4dac-45f2-b237-0104ce890146`::Unexpected error#012Traceback (most recent call last):#012&nbsp; File "/usr/share/vdsm/storage/task.py", line 857, in _run#012&nbsp;&nbsp;&nbsp; return fn(*args, **kargs)#012&nbsp; File "/usr/share/vdsm/storage/task.py", line 318, in run#012&nbsp;&nbsp;&nbsp; return self.cmd(*self.argslist, **self.argsdict)#012&nbsp; File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012&nbsp;&nbsp;&nbsp; self.masterDomain.acquireHostId(self.id)#012&nbsp; File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012&nbsp;&nbsp;&nbsp; self._clusterLock.acquireHostId(hostId, async)#012&nbsp; File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId#012&nbsp;&nbsp;&nbsp; raise se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: ('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(90, 'Sanlock lockspace add failure', 'Me

ssage too long'))

</pre>

        <pre>Jan 22 00:45:25 office4a sanlock[3362]: 2014-01-22 00:45:25-0500 521426 [17243]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

        <pre>Jan 22 00:45:26 office4a sanlock[3362]: 2014-01-22 00:45:26-0500 521427 [3368]: s126 add_lockspace fail result -90

</pre>

        <pre>Jan 22 00:45:33 office4a vdsm TaskManager.Task ERROR Task=`8f8931ba-a9f6-4225-a951-2d608b3b1d70`::Unexpected error#012Traceback (most recent call last):#012&nbsp; File "/usr/share/vdsm/storage/task.py", line 857, in _run#012&nbsp;&nbsp;&nbsp; return fn(*args, **kargs)#012&nbsp; File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012&nbsp;&nbsp;&nbsp; res = f(*args, **kwargs)#012&nbsp; File "/usr/share/vdsm/storage/hsm.py", line 2110, in getAllTasksStatuses#012&nbsp;&nbsp;&nbsp; allTasksStatus = sp.getAllTasksStatuses()#012&nbsp; File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper#012&nbsp;&nbsp;&nbsp; raise SecureError()#012SecureError

</pre>

        <pre>Jan 22 00:45:36 office4a sanlock[3362]: 2014-01-22 00:45:36-0500 521437 [17334]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

        <pre>Jan 22 00:45:37 office4a sanlock[3362]: 2014-01-22 00:45:37-0500 521438 [3368]: s127 add_lockspace fail result -90

</pre>

        <pre>Jan 22 00:45:37 office4a vdsm TaskManager.Task ERROR Task=`6b7da394-63b6-4e52-ac73-ab98f3f1e60f`::Unexpected error#012Traceback (most recent call last):#012&nbsp; File "/usr/share/vdsm/storage/task.py", line 857, in _run#012&nbsp;&nbsp;&nbsp; return fn(*args, **kargs)#012&nbsp; File "/usr/share/vdsm/storage/task.py", line 318, in run#012&nbsp;&nbsp;&nbsp; return self.cmd(*self.argslist, **self.argsdict)#012&nbsp; File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012&nbsp;&nbsp;&nbsp; self.masterDomain.acquireHostId(self.id)#012&nbsp; File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012&nbsp;&nbsp;&nbsp; self._clusterLock.acquireHostId(hostId, async)#012&nbsp; File "/usr/share/vdsm/storage/clusterlock.py", line 185, in acquireHostId#012&nbsp;&nbsp;&nbsp; raise se.AcquireHostIdFailure(self._sdUUID, e)#012AcquireHostIdFailure: Cannot acquire host id: ('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(115, 'Sanlock lockspace add failure', 'O

peration now in progress'))

</pre>

        <pre>Jan 22 00:45:46 office4a sanlock[3362]: 2014-01-22 00:45:46-0500 521447 [17410]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

        <pre>Jan 22 00:45:47 office4a sanlock[3362]: 2014-01-22 00:45:47-0500 521448 [3367]: s128 add_lockspace fail result -90

</pre>

        <pre>Jan 22 00:45:48 office4a vdsm TaskManager.Task ERROR Task=`c2486380-a0c8-45d1-a63f-e0e660f1e118`::Unexpected error#012Traceback (most recent call last):#012&nbsp; File "/usr/share/vdsm/storage/task.py", line 857, in _run#012&nbsp;&nbsp;&nbsp; return fn(*args, **kargs)#012&nbsp; File "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012&nbsp;&nbsp;&nbsp; res = f(*args, **kwargs)#012&nbsp; File "/usr/share/vdsm/storage/hsm.py", line 2110, in getAllTasksStatuses#012&nbsp;&nbsp;&nbsp; allTasksStatus = sp.getAllTasksStatuses()#012&nbsp; File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper#012&nbsp;&nbsp;&nbsp; raise SecureError()#012SecureError

</pre>

        <pre>Jan 22 00:45:50 office4a sanlock[3362]: 2014-01-22 00:45:50-0500 521451 [17457]: read_sectors delta_leader offset 512 rv -90 /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids

</pre>

        <pre>Jan 22 00:45:51 office4a sanlock[3362]: 2014-01-22 00:45:51-0500 521452 [3368]: s129 add_lockspace fail result -90

</pre>

      </blockquote>

      <pre>

</pre>

    </div>

    <blockquote

      cite="mid:1916841992.8207492.1390859170488.JavaMail.root@redhat.com"

      type="cite">

      <pre wrap="">

</pre>

      <blockquote type="cite">

        <pre wrap="">I am guessing that what makes cluster storage have the (Master) designation

is that this is the one that actually contains the sanlocks?  If so, would it

make sense to set up a gluster volume to be (Master), but not use it for VM

storage, just for storing the sanlock info?  Separate gluster volume(s) could

then have the VMs on it(them), and would not need the optimizations turned

off.

</pre>

      </blockquote>

      <pre wrap="">

Any domain must be able to become the master at any time. Without a master

the data center is unusable (at the present time), that's why we migrate (or

reconstruct) it on another domain when necessary.

</pre>

    </blockquote>

    Ted Miller<br>

    <br>

  </body>

</html>