Hi Dafna,
My sincere apologies for not coming back to you sooner on this. I've
finally had a chance to start investigating, but in between my last
discussion and now, updates have been done on both the hosts and the
engine, so perhaps something there has fixed it, as I haven't had a
pause happen in quite a long time.
When trying to gather the info you requested above I think I've found
what is causing all the excessive logging...that I sent through
previously...
I have a VM called Proxy, which a few years back ran out of disk
space, and wouldn't boot, as it required an fsck, but we'd get an
unknown storage error when doing an fsck on the image, so we had to
attach a new LUN and dd out the entire image, then run an fsck, and
then re-import the image, which got the VM operational again. A while
back we tried to remove the old disk image, and received a storage
error, and looking at this now I see that it appears the old image
never successfully removed. If I look at the VM under Disks I can see
the old disk still attached in place, but there is an hourglass
instead of a green arrow showing. Also right clicking on the Disk the
only option you can choose is Add, so something seems to still have
this locked.
In the logs I have the same error showing over and over...
AttributeError: 'Drive' object has no attribute 'format'
Thread-313::DEBUG::2014-02-24
16:44:30,056::libvirtconnection::108::libvirtconnection::(wrapper)
Unknown libvirterror: ecode: 8 edom: 10 level: 2 message: invalid
argument: invalid path
/rhev/data-center/mnt/blockSD/0e6991ae-6238-4c61-96d2-ca8fed35161e/images/6128b18f-eee9-422e-bc8a-f3b9fe331b09/38ac4afa-22e9-4359-ac16-3ff5d7b3b6db
not assigned to domain
Thread-313::ERROR::2014-02-24
16:44:30,057::sampling::355::vm.Vm::(collect)
vmId=`23b9212c-1e25-4003-aa18-b1e819bf6bb1`::Stats function failed:
<AdvancedStatsFunction _highWrite at 0x1c9de30>
Traceback (most recent call last):
File "/usr/share/vdsm/sampling.py", line 351, in collect
statsFunction()
File "/usr/share/vdsm/sampling.py", line 226, in __call__
retValue = self._function(*args, **kwargs)
File "/usr/share/vdsm/vm.py", line 528, in _highWrite
self._vm.extendDrivesIfNeeded()
File "/usr/share/vdsm/vm.py", line 2288, in extendDrivesIfNeeded
capacity, alloc, physical = self._dom.blockInfo(drive.path, 0)
File "/usr/share/vdsm/vm.py", line 841, in f
ret = attr(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
line 76, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1814, in blockInfo
if ret is None: raise libvirtError ('virDomainGetBlockInfo()
failed', dom=self)
libvirtError: invalid argument: invalid path
/rhev/data-center/mnt/blockSD/0e6991ae-6238-4c61-96d2-ca8fed35161e/images/6128b18f-eee9-422e-bc8a-f3b9fe331b09/38ac4afa-22e9-4359-ac16-3ff5d7b3b6db
not assigned to domain
Any ideas on how to get rid of the "corrupt" disk finally?
Thanks.
Regards.
Neil Wilson.
On Wed, Jan 29, 2014 at 5:32 PM, Dafna Ron <dron(a)redhat.com> wrote:
mmm... I think that there is a bug with the iso domain.... and I am
not sure
if it was already opened.
can you help me to debug this and see if its related? :)
I think that you have some intermittent network issues to the iso domain and
every time it happens, the vms that have booted with a cd (even if you
detached it) would pause.
I have a second suspicion... is it possible that the vms that pause had a cd
and you ejected it at some point? perhaps after or during the network issues
you had on the 14th?
can you run dumpxml from libvirt? let me know if you need help with this
command.
Thanks,
Dafna
On 01/29/2014 02:16 PM, Neil wrote:
>
> Hi Dafna,
>
>
> On Wed, Jan 29, 2014 at 1:14 PM, Dafna Ron <dron(a)redhat.com> wrote:
>>
>> The reason I asked about the size if because this was the original issue
>> no?
>> vm's pausing on lack of space?
>
> Apologies, I just wanted to make sure it was still about this pausing
> and not the original migration issue that I think you were also
> helping me with a few weeks back.
>
>> You're having a problem with your data domains.
>> Can you check the rout from the hosts to the storage? I think that you
>> have
>> some disconnection to the storage from the hosts
>> since it's random and not from all the vm's I would suggest that its a
>> routing problem?
>> Thanks,
>> Dafna
>
> The connections to the main data domain is 8Gb Fibre Channel directly
> from each of the hosts to the FC SAN, so if it is a connection issue
> then I can't understand how anything would be working. Or am I barking
> up the wrong tree completely? There were some ethernet network
> bridging changes on each of the hosts in early January, but this would
> only affect the NFS mounted ISO domain, or could this be the cause of
> the problems?
>
> Is this disconnection causing the huge log files that I sent previously?
>
> Thank you.
>
> Regards.
>
> Neil Wilson.
>
>
>> On 01/29/2014 08:00 AM, Neil wrote:
>>>
>>> Sorry, more on this issue, I see my logs are rapidly filling up my
>>> disk space on node02 with this error in /var/log/messages...
>>>
>>> Jan 29 09:56:53 node02 vdsm vm.Vm ERROR
>>> vmId=`dfa2cf7c-3f0e-42e3-b495-10ccb3e0c71b`::Stats function failed:
>>> <AdvancedStatsFunction _highWrite at 0x1c2fb90>#012Traceback (most
>>> recent call last):#012 File "/usr/share/vdsm/sampling.py", line
351,
>>> in collect#012 statsFunction()#012 File
>>> "/usr/share/vdsm/sampling.py", line 226, in __call__#012
retValue =
>>> self._function(*args, **kwargs)#012 File "/usr/share/vdsm/vm.py",
>>> line 513, in _highWrite#012 self._vm._dom.blockInfo(vmDrive.path,
>>> 0)#012 File "/usr/share/vdsm/vm.py", line 835, in f#012 ret =
>>> attr(*args, **kwargs)#012 File
>>> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
line
>>> 76, in wrapper#012 ret = f(*args, **kwargs)#012 File
>>> "/usr/lib64/python2.6/site-packages/libvirt.py", line 1814, in
>>> blockInfo#012 if ret is None: raise libvirtError
>>> ('virDomainGetBlockInfo() failed', dom=self)#012libvirtError:
invalid
>>> argument: invalid path
>>>
>>>
>>>
/rhev/data-center/mnt/blockSD/0e6991ae-6238-4c61-96d2-ca8fed35161e/images/fac8a3bb-e414-43c0-affc-6e2628757a28/6c3e5ae8-23fc-4196-ba42-778bdc0fbad8
>>> not assigned to domain
>>> Jan 29 09:56:53 node02 vdsm vm.Vm ERROR
>>> vmId=`ac2a3f99-a6db-4cae-955d-efdfb901abb7`::Stats function failed:
>>> <AdvancedStatsFunction _highWrite at 0x1c2fb90>#012Traceback (most
>>> recent call last):#012 File "/usr/share/vdsm/sampling.py", line
351,
>>> in collect#012 statsFunction()#012 File
>>> "/usr/share/vdsm/sampling.py", line 226, in __call__#012
retValue =
>>> self._function(*args, **kwargs)#012 File "/usr/share/vdsm/vm.py",
>>> line 509, in _highWrite#012 if not vmDrive.blockDev or
>>> vmDrive.format != 'cow':#012AttributeError: 'Drive' object
has no
>>> attribute 'format'
>>>
>>> Not sure if this is related at all though?
>>>
>>> Thanks.
>>>
>>> Regards.
>>>
>>> Neil Wilson.
>>>
>>> On Wed, Jan 29, 2014 at 9:02 AM, Neil <nwilson123(a)gmail.com> wrote:
>>>>
>>>> Hi Dafna,
>>>>
>>>> Thanks for clarifying that, I found the migration issue and this was
>>>> resolved once I sorted out the ISO domain problem.
>>>>
>>>> I'm sorry I don't understand your last question?
>>>> "> after the engine restart, do you still see a problem with the
size
>>>> or did the report of size changed?"
>>>>
>>>> The migration issue was resolved, it's now just trying to track down
>>>> why the two VM's paused on their own, one on the 8th of Jan(I think)
>>>> and one on the 19th of Jan.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>> Regards.
>>>>
>>>> Neil Wilson.
>>>>
>>>>
>>>> On Tue, Jan 28, 2014 at 8:18 PM, Dafna Ron <dron(a)redhat.com>
wrote:
>>>>>
>>>>> yes - engine lost communication with vdsm and it has no way of
knowing
>>>>> if
>>>>> the host is down or if there was a network issue so a network issue
>>>>> would
>>>>> cause the same errors that I see in the logs.
>>>>>
>>>>> The error you put on the iso is the reason the vm's have failed
>>>>> migration -
>>>>> if a vm is run with a cd and the cd is gone than the vm will not be
>>>>> able
>>>>> to
>>>>> be migrated.
>>>>>
>>>>> after the engine restart, do you still see a problem with the size
or
>>>>> did
>>>>> the report of size changed?
>>>>>
>>>>> Dafna
>>>>>
>>>>>
>>>>> On 01/28/2014 01:02 PM, Neil wrote:
>>>>>>
>>>>>> Hi Dafna,
>>>>>>
>>>>>> Thanks for coming back to me. I'll try answer your queries
one by
>>>>>> one.
>>>>>>
>>>>>> On Tue, Jan 28, 2014 at 1:38 PM, Dafna Ron
<dron(a)redhat.com> wrote:
>>>>>>>
>>>>>>> you had a problem with your storage on the 14th of Jan and
one of
>>>>>>> the
>>>>>>> hosts
>>>>>>> rebooted (if you have the vdsm log from that day than I can
see what
>>>>>>> happened on vdsm side)
>>>>>>> in engine, I could see a problem with the export domain and
this
>>>>>>> should
>>>>>>> not
>>>>>>> have cause a reboot.
>>>>>>
>>>>>> 1.) I don't unfortunately have logs going back that far.
Looking at
>>>>>> all 3 hosts uptime, the one with the least uptime is 21 days,
the
>>>>>> others are all over 40 days, so there definitely wasn't a
host that
>>>>>> rebooted on the 14th of Jan, would a network issue or Firewall
issue
>>>>>> also cause the error you've seen to look as if a host
rebooted? There
>>>>>> was a bonding mode change on the 14th of January, so perhaps
this
>>>>>> caused the issue?
>>>>>>
>>>>>>
>>>>>>> Can you tell me if you had a problem with the data
>>>>>>> domain as well or was it just the export domain? were you
having any
>>>>>>> vm's
>>>>>>> exported/imported at that time?
>>>>>>> In any case - this is a bug.
>>>>>>
>>>>>> 2.) I think this was the same day that the bonding mode was
changed
>>>>>> on
>>>>>> the host while the host was live (by mistake), and had SPM
running on
>>>>>> it. I haven't done any importing or exporting for a few years
on this
>>>>>> oVirt setup.
>>>>>>
>>>>>>
>>>>>>> As for the vm's - if the vm's are no longer in
migrating state than
>>>>>>> please
>>>>>>> restart ovirt-engine service (looks like a cache issue)
>>>>>>
>>>>>> 3.) Restarted ovirt-engine, logging now appears to be normal
without
>>>>>> any
>>>>>> errors.
>>>>>>
>>>>>>
>>>>>>> if they are in migrating state - there should have been a
timeout a
>>>>>>> long
>>>>>>> time ago.
>>>>>>> can you please run 'vdsClient -s 0 list table' and
'virsh -r list'
>>>>>>> on
>>>>>>> both
>>>>>>> all hosts?
>>>>>>
>>>>>> 4.) Ran on all hosts...
>>>>>>
>>>>>>
node01.blabla.com
>>>>>> 63da7faa-f92a-4652-90f2-b6660a4fb7b3 11232 adam
Up
>>>>>> 502170aa-0fc6-4287-bb08-5844be6e0352 13986 babbage
Up
>>>>>> ff9036fb-1499-45e4-8cde-e350eee3c489 26733 reports
Up
>>>>>> 2736197b-6dc3-4155-9a29-9306ca64881d 13804 tux
Up
>>>>>> 0a3af7b2-ea94-42f3-baeb-78b950af4402 25257 Moodle
Up
>>>>>>
>>>>>> Id Name State
>>>>>> ----------------------------------------------------
>>>>>> 1 adam running
>>>>>> 2 reports running
>>>>>> 4 tux running
>>>>>> 6 Moodle running
>>>>>> 7 babbage running
>>>>>>
>>>>>>
node02.blabla.com
>>>>>> dfa2cf7c-3f0e-42e3-b495-10ccb3e0c71b 2879 spam
Up
>>>>>> 23b9212c-1e25-4003-aa18-b1e819bf6bb1 32454 proxy02
Up
>>>>>> ac2a3f99-a6db-4cae-955d-efdfb901abb7 5605 software
Up
>>>>>> 179c293b-e6a3-4ec6-a54c-2f92f875bc5e 8870 zimbra
Up
>>>>>>
>>>>>> Id Name State
>>>>>> ----------------------------------------------------
>>>>>> 9 proxy02 running
>>>>>> 10 spam running
>>>>>> 12 software running
>>>>>> 13 zimbra running
>>>>>>
>>>>>>
node03.blabla.com
>>>>>> e42b7ccc-ce04-4308-aeb2-2291399dd3ef 25809 dhcp
Up
>>>>>> 16d3f077-b74c-4055-97d0-423da78d8a0c 23939 oliver
Up
>>>>>>
>>>>>> Id Name State
>>>>>> ----------------------------------------------------
>>>>>> 13 oliver running
>>>>>> 14 dhcp running
>>>>>>
>>>>>>
>>>>>>> Last thing is that your ISO domain seems to be having issues
as
>>>>>>> well.
>>>>>>> This should not effect the host status but if any of the
vm's were
>>>>>>> booted
>>>>>>> from an iso or have an iso attached in the boot sequence this
will
>>>>>>> explain
>>>>>>> the migration issue.
>>>>>>
>>>>>> There was an ISO domain issue a while back, but this was
corrected
>>>>>> about 2 weeks ago after iptables re-enabled itself on boot after
>>>>>> running updates, I've checked now and the ISO domain appears
to be
>>>>>> fine and I can see all the images stored within.
>>>>>>
>>>>>> I've stumbled across what appears to be another error and all
three
>>>>>> hosts are showing this over and over in /var/log/messages, and
I'm
>>>>>> not
>>>>>> sure if it's related? ...
>>>>>>
>>>>>> Jan 28 14:58:59 node01 vdsm vm.Vm ERROR
>>>>>> vmId=`63da7faa-f92a-4652-90f2-b6660a4fb7b3`::Stats function
failed:
>>>>>> <AdvancedStatsFunction _highWrite at
0x2ce0998>#012Traceback (most
>>>>>> recent call last):#012 File
"/usr/share/vdsm/sampling.py", line 351,
>>>>>> in collect#012 statsFunction()#012 File
>>>>>> "/usr/share/vdsm/sampling.py", line 226, in
__call__#012 retValue
>>>>>> =
>>>>>> self._function(*args, **kwargs)#012 File
"/usr/share/vdsm/vm.py",
>>>>>> line 509, in _highWrite#012 if not vmDrive.blockDev or
>>>>>> vmDrive.format != 'cow':#012AttributeError:
'Drive' object has no
>>>>>> attribute 'format'
>>>>>>
>>>>>> I've attached the full vdsm log from node02 to this reply.
>>>>>>
>>>>>> Please shout if you need anything else.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>> Neil Wilson.
>>>>>>
>>>>>>> On 01/28/2014 09:28 AM, Neil wrote:
>>>>>>>>
>>>>>>>> Hi guys,
>>>>>>>>
>>>>>>>> Sorry for the very late reply, I've been out of the
office doing
>>>>>>>> installations.
>>>>>>>> Unfortunately due to the time delay, my oldest logs are
only as far
>>>>>>>> back as the attached.
>>>>>>>>
>>>>>>>> I've only grep'd for Thread-286029 in the vdsm
log. The engine.log
>>>>>>>> I'm
>>>>>>>> not sure what info is required, so the full log is
attached.
>>>>>>>>
>>>>>>>> Please shout if you need any info or further details.
>>>>>>>>
>>>>>>>> Thank you very much.
>>>>>>>>
>>>>>>>> Regards.
>>>>>>>>
>>>>>>>> Neil Wilson.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 24, 2014 at 10:55 AM, Meital Bourvine
>>>>>>>> <mbourvin(a)redhat.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Could you please attach the engine.log from the same
time?
>>>>>>>>>
>>>>>>>>> thanks!
>>>>>>>>>
>>>>>>>>> ----- Original Message -----
>>>>>>>>>>
>>>>>>>>>> From: "Neil"
<nwilson123(a)gmail.com>
>>>>>>>>>> To: dron(a)redhat.com
>>>>>>>>>> Cc: "users" <users(a)ovirt.org>
>>>>>>>>>> Sent: Wednesday, January 22, 2014 1:14:25 PM
>>>>>>>>>> Subject: Re: [Users] Vm's being paused
>>>>>>>>>>
>>>>>>>>>> Hi Dafna,
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> The vdsm logs are quite large, so I've only
attached the logs for
>>>>>>>>>> the
>>>>>>>>>> pause of the VM called Babbage on the 19th of
Jan.
>>>>>>>>>>
>>>>>>>>>> As for snapshots, Babbage has one from June 2013
and Reports has
>>>>>>>>>> two
>>>>>>>>>> from June and Oct 2013.
>>>>>>>>>>
>>>>>>>>>> I'm using FC storage, with 11 VM's and 3
nodes/hosts, 9 of the 11
>>>>>>>>>> VM's
>>>>>>>>>> have thin provisioned disks.
>>>>>>>>>>
>>>>>>>>>> Please shout if you'd like any further info
or logs.
>>>>>>>>>>
>>>>>>>>>> Thank you.
>>>>>>>>>>
>>>>>>>>>> Regards.
>>>>>>>>>>
>>>>>>>>>> Neil Wilson.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 22, 2014 at 10:58 AM, Dafna Ron
<dron(a)redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Neil,
>>>>>>>>>>>
>>>>>>>>>>> Can you please attach the vdsm logs?
>>>>>>>>>>> also, as for the vm's, do they have any
snapshots?
>>>>>>>>>>> from your suggestion to allocate more luns,
are you using iscsi
>>>>>>>>>>> or
>>>>>>>>>>> FC?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Dafna
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 01/22/2014 08:45 AM, Neil wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the replies guys,
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at my two VM's that have
paused so far through the
>>>>>>>>>>>> oVirt
>>>>>>>>>>>> GUI
>>>>>>>>>>>> the following sizes show under Disks.
>>>>>>>>>>>>
>>>>>>>>>>>> VM Reports:
>>>>>>>>>>>> Virtual Size 35GB, Actual Size 41GB
>>>>>>>>>>>> Looking on the Centos OS side, Disk size
is 33G and used is 12G
>>>>>>>>>>>> with
>>>>>>>>>>>> 19G available (40%) usage.
>>>>>>>>>>>>
>>>>>>>>>>>> VM Babbage:
>>>>>>>>>>>> Virtual Size is 40GB, Actual Size 53GB
>>>>>>>>>>>> On the Server 2003 OS side, Disk size is
39.9Gb and used is
>>>>>>>>>>>> 16.3G,
>>>>>>>>>>>> so
>>>>>>>>>>>> under 50% usage.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Do you see any issues with the above
stats?
>>>>>>>>>>>>
>>>>>>>>>>>> Then my main Datacenter storage is as
follows...
>>>>>>>>>>>>
>>>>>>>>>>>> Size: 6887 GB
>>>>>>>>>>>> Available: 1948 GB
>>>>>>>>>>>> Used: 4939 GB
>>>>>>>>>>>> Allocated: 1196 GB
>>>>>>>>>>>> Over Allocation: 61%
>>>>>>>>>>>>
>>>>>>>>>>>> Could there be a problem here? I can
allocate additional LUNS
>>>>>>>>>>>> if
>>>>>>>>>>>> you
>>>>>>>>>>>> feel the space isn't correctly
allocated.
>>>>>>>>>>>>
>>>>>>>>>>>> Apologies for going on about this, but
I'm really concerned
>>>>>>>>>>>> that
>>>>>>>>>>>> something isn't right and I might
have a serious problem if an
>>>>>>>>>>>> important machine locks up.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you and much appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards.
>>>>>>>>>>>>
>>>>>>>>>>>> Neil Wilson.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 21, 2014 at 7:02 PM, Dafna
Ron <dron(a)redhat.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> the storage space is configured in
percentages and not
>>>>>>>>>>>>> physical
>>>>>>>>>>>>> size.
>>>>>>>>>>>>> so if 20G is less than 10% (default
config) of your storage it
>>>>>>>>>>>>> will
>>>>>>>>>>>>> pause
>>>>>>>>>>>>> the vms regardless of how much GB you
still have.
>>>>>>>>>>>>> this is configurable though so you
can change it to less than
>>>>>>>>>>>>> 10%
>>>>>>>>>>>>> if
>>>>>>>>>>>>> you
>>>>>>>>>>>>> like.
>>>>>>>>>>>>>
>>>>>>>>>>>>> to answer the second question,
vm's will not pause on ENOSpace
>>>>>>>>>>>>> error
>>>>>>>>>>>>> if
>>>>>>>>>>>>> they
>>>>>>>>>>>>> run out of space internally but only
if the external storage
>>>>>>>>>>>>> cannot
>>>>>>>>>>>>> be
>>>>>>>>>>>>> consumed. so only if you run out of
space in the storage and
>>>>>>>>>>>>> and
>>>>>>>>>>>>> not
>>>>>>>>>>>>> if
>>>>>>>>>>>>> vm
>>>>>>>>>>>>> runs out of space in its on fs.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 01/21/2014 09:51 AM, Neil wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sorry, attached is engine.log
I've taken out the two sections
>>>>>>>>>>>>>> where
>>>>>>>>>>>>>> each of the VM's were
paused.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does the error "VM babbage
has paused due to no Storage space
>>>>>>>>>>>>>> error"
>>>>>>>>>>>>>> mean the main storage domain has
run out of storage, or that
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> VM
>>>>>>>>>>>>>> has run out?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Both VM's appear to have been
running on node01 when they
>>>>>>>>>>>>>> were
>>>>>>>>>>>>>> paused.
>>>>>>>>>>>>>> My vdsm versions are all...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> vdsm-cli-4.13.0-11.el6.noarch
>>>>>>>>>>>>>>
vdsm-python-cpopen-4.13.0-11.el6.x86_64
>>>>>>>>>>>>>> vdsm-xmlrpc-4.13.0-11.el6.noarch
>>>>>>>>>>>>>> vdsm-4.13.0-11.el6.x86_64
>>>>>>>>>>>>>> vdsm-python-4.13.0-11.el6.x86_64
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I currently have a 61% over
allocation ratio on my primary
>>>>>>>>>>>>>> storage
>>>>>>>>>>>>>> domain, with 1948GB available.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Neil Wilson.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 21, 2014 at 11:24 AM,
Neil <nwilson123(a)gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sorry for only coming back to
you now.
>>>>>>>>>>>>>>> The VM's are thin
provisioned. The Server 2003 VM hasn't run
>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> disk space there is about
20Gigs free, and the usage barely
>>>>>>>>>>>>>>> grows
>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>> the VM only shares printers.
The other VM that paused is
>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> thin
>>>>>>>>>>>>>>> provisioned disks and also
has plenty space, this guest is
>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>> Centos 6.3 64bit and only
runs basic reporting.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> After the 2003 guest was
rebooted, the network card showed
>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>> unplugged in ovirt, and we
had to remove it, and re-add it
>>>>>>>>>>>>>>> again
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> order to correct the issue.
The Centos VM did not have the
>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm concerned that this
might happen to a VM that's quite
>>>>>>>>>>>>>>> critical,
>>>>>>>>>>>>>>> any thoughts or ideas?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The only recent changes have
been updating from Dreyou 3.2
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> official Centos repo and
updating to 3.3.1-2. Prior to
>>>>>>>>>>>>>>> updating I
>>>>>>>>>>>>>>> haven't had this issue.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any assistance is greatly
appreciated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Neil Wilson.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Jan 19, 2014 at 8:20
PM, Dan Yasny
>>>>>>>>>>>>>>> <dyasny(a)gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Do you have the VMs on
thin provisioned storage or sparse
>>>>>>>>>>>>>>>> disks?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Pausing happens when the
VM has an IO error or runs out of
>>>>>>>>>>>>>>>> space
>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> storage domain, and it is
done intentionally, so that the
>>>>>>>>>>>>>>>> VM
>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>> experience a disk
corruption. If you have thin provisioned
>>>>>>>>>>>>>>>> disks,
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> the VM
>>>>>>>>>>>>>>>> writes to it's disks
faster than the disks can grow, this
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> exactly
>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>> you will see
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Jan 19, 2014 at
10:04 AM, Neil
>>>>>>>>>>>>>>>>
<nwilson123(a)gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've had two
different Vm's randomly pause this past week
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>>>> ovirt
>>>>>>>>>>>>>>>>> the error received is
something like 'vm ran out of
>>>>>>>>>>>>>>>>> storage
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>> paused'.
>>>>>>>>>>>>>>>>> Resuming the vm's
didn't work and I had to force them off
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> then on
>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>> resolved the issue.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Has anyone had this
issue before?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I realise this is
very vague so if you could please let me
>>>>>>>>>>>>>>>>> know
>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>> logs
>>>>>>>>>>>>>>>>> to send in.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank you
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Neil Wilson
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Dafna Ron
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Dafna Ron
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Users mailing list
>>>>>>>>>> Users(a)ovirt.org
>>>>>>>>>>
http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>
>>>>>>> --
>>>>>>> Dafna Ron
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dafna Ron
>>
>>
>>
>> --
>> Dafna Ron
--
Dafna Ron