[ovirt-users] Ovirt backups lead to unresponsive VM

Shani Leviim sleviim at redhat.com
Wed Feb 7 15:31:28 UTC 2018


Hi Alex,
Sorry for the mail's delay.

>From a brief look at your logs, I've noticed that the error you've got at
the engine's log was logged at 2018-02-03 00:22:56,
while your vdsm's log ends at 2018-02-03 00:01:01.
Is there a way you can reproduce a fuller vdsm log?


*Regards,*

*Shani Leviim*

On Sat, Feb 3, 2018 at 5:41 PM, Alex K <rightkicktech at gmail.com> wrote:

> Attaching vdm log from host that trigerred the error, where the Vm that
> was being cloned was running at that time.
>
> thanx,
> Alex
>
> On Sat, Feb 3, 2018 at 5:20 PM, Yaniv Kaul <ykaul at redhat.com> wrote:
>
>>
>>
>> On Feb 3, 2018 3:24 PM, "Alex K" <rightkicktech at gmail.com> wrote:
>>
>> Hi All,
>>
>> I have reproduced the backups failure. The VM that failed is named
>> Win-FileServer and is a Windows 2016 server 64bit with 300GB of disk.
>> During the cloning step the VM went unresponsive and I had to stop/start
>> it.
>> I am attaching the logs.I have another VM with same OS (named DC-Server
>> within the logs) but with smaller disk (60GB) which does not give any error
>> when it is cloned.
>> I see a line:
>>
>> EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call
>> Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM
>> v2.sitedomain command SnapshotVDS failed: Message timeout which can be
>> caused by communication issues
>>
>>
>> I suggest adding relevant vdsm.log as well.
>> Y.
>>
>>
>> I appreciate any advise why I am facing such issue with the backups.
>>
>> thanx,
>> Alex
>>
>> On Tue, Jan 30, 2018 at 12:49 AM, Alex K <rightkicktech at gmail.com> wrote:
>>
>>> Ok. I will reproduce and collect logs.
>>>
>>> Thanx,
>>> Alex
>>>
>>> On Jan 29, 2018 20:21, "Mahdi Adnan" <mahdi.adnan at outlook.com> wrote:
>>>
>>> I have Windows VMs, both client and server.
>>> if you provide the engine.log file we might have a look at it.
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> ------------------------------
>>> *From:* Alex K <rightkicktech at gmail.com>
>>> *Sent:* Monday, January 29, 2018 5:40 PM
>>> *To:* Mahdi Adnan
>>> *Cc:* users
>>> *Subject:* Re: [ovirt-users] Ovirt backups lead to unresponsive VM
>>>
>>> Hi,
>>>
>>> I have observed this logged at host when the issue occurs:
>>>
>>> VDSM command GetStoragePoolInfoVDS failed: Connection reset by peer
>>>
>>> or
>>>
>>> VDSM host.domain command GetStatsVDS failed: Connection reset by peer
>>>
>>> At engine logs have not been able to correlate.
>>>
>>> Are you hosting Windows 2016 server and Windows 10 VMs?
>>> The weird is that I have same setup on other clusters with no issues.
>>>
>>> Thanx,
>>> Alex
>>>
>>> On Sun, Jan 28, 2018 at 9:21 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> We have a cluster of 17 nodes, backed by GlusterFS storage, and using
>>> this same script for backup.
>>> we have no issues with it so far.
>>> have you checked engine log file ?
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> ------------------------------
>>> *From:* users-bounces at ovirt.org <users-bounces at ovirt.org> on behalf of
>>> Alex K <rightkicktech at gmail.com>
>>> *Sent:* Wednesday, January 24, 2018 4:18 PM
>>> *To:* users
>>> *Subject:* [ovirt-users] Ovirt backups lead to unresponsive VM
>>>
>>> Hi all,
>>>
>>> I have a cluster with 3 nodes, using ovirt 4.1 in a self hosted setup on
>>> top glusterfs.
>>> On some VMs (especially one Windows server 2016 64bit with 500 GB of
>>> disk). Guest agents are installed at VMs. i almost always observe that
>>> during the backup of the VM the VM is rendered unresponsive (dashboard
>>> shows a question mark at the VM status and VM does not respond to ping or
>>> to anything).
>>>
>>> For scheduled backups I use:
>>>
>>> https://github.com/wefixit-AT/oVirtBackup
>>>
>>> The script does the following:
>>>
>>> 1. snapshot VM (this is done ok without any failure)
>>>
>>> 2. Clone snapshot (this steps renders the VM unresponsive)
>>>
>>> 3. Export Clone
>>>
>>> 4. Delete clone
>>>
>>> 5. Delete snapshot
>>>
>>>
>>> Do you have any similar experience? Any suggestions to address this?
>>>
>>> I have never seen such issue with hosted Linux VMs.
>>>
>>> The cluster has enough storage to accommodate the clone.
>>>
>>>
>>> Thanx,
>>>
>>> Alex
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20180207/f87cb09b/attachment.html>


More information about the Users mailing list