On Sun, Mar 29, 2020 at 2:42 AM Gianluca Cecchi
<gianluca.cecchi(a)gmail.com> wrote:
On Sat, Mar 28, 2020 at 7:34 PM Nir Soffer <nsoffer(a)redhat.com> wrote:
>
> On Sat, Mar 28, 2020 at 5:00 AM Gianluca Cecchi
> <gianluca.cecchi(a)gmail.com> wrote:
> ...
> > Further information.
> > What I see around time frame in gluster brick log file
gluster_bricks-vmstore-vmstore.log (timestamp is behind 1 hour in log file)
> >
> > [2020-03-27 23:30:38.575808] I [MSGID: 101055] [client_t.c:436:gf_client_unref]
0-vmstore-server: Shutting down connection
CTX_ID:6e8f70b8-1946-4505-860f-be90e5807cb3-GRAPH_ID:0-PID:223418-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0
> > [2020-03-27 23:35:15.281449] E [MSGID: 113072]
[posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed: offset 0, [Invalid
argument]
> > [2020-03-27 23:35:15.281545] E [MSGID: 115067]
[server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-vmstore-server: 34139378: WRITEV 10
(00d9fe81-8a31-498e-8401-7b9d1477378e), client:
CTX_ID:d04437ba-ef98-43df-864f-5e9d3738620a-GRAPH_ID:0-PID:27687-HOST:ovirt.mydomain.local-PC_NAME:vmstore-client-0-RECON_NO:-0,
error-xlator: vmstore-posix [Invalid argument]
> > [2020-03-27 23:40:15.415794] E [MSGID: 113072]
[posix-inode-fd-ops.c:1886:posix_writev] 0-vmstore-posix: write failed: offset 0, [Invalid
argument]
>
> Invalid arguments are expected when activating a storage domain, and
> every 5 minutes when
> storage domain are refreshed. The writes are performed to to a temporary file at
> /rhev/data-center/mnt/server:_path/.prob-random-uuid
>
> These logs do not show the path, so we don't know if the writes are
> related to block size probing.
>
> But in vdsm log we see:
>
> 2020-03-27 00:40:08,979+0100 INFO (monitor/665ff83)
> [storage.StorageDomain] Removing remnants of deleted images []
> (fileSD:726)
>
> This call happens when vdsm is refreshing storage domain. Right before
> this log, vdsm try to detect the underlying
> storage block size.
>
> So it looks like the gluster logs are related to block size probing
> and are not related to the
> I/O error that caused the VM to pause.
[snip]
>
> Looking at both "abnormal vm stop" and storage refresh events:
>
[snip]
>
> I don't see any relation between refreshes and the abnormal vm stop events.
>
> I think the key to understanding this is to enable more verbose logs
> in gluster understand what was
> the failure that caused the vm to stop.
>
Ah, ok. Thanks
It seems default gluster logs level are INFO and I can have them more verbose for a
limited amount of time seeing if more information is provided.
Can I do it with VMs running and only doing sort of reload of the service or do I have to
stop all to do it?
I don't know about gluster logs, you obviously cannot stop the server
or the mount
helper on the client side to change log level, and I don't know if
they support reloading
configuration while running. Asking on gluster mailing list will help.
> It would also help if we had detailed error logs in qemu log in
> /var/log/libvirt/qemu/vm-name.log
I will find them. The system is not available to check right now
>
> Did you enable libvirt logs? We may have more information about the error there.
>
> You can enable logs by modifying these lines in /etc/libvirt/libvirtd.conf:
>
> log_filters="1:qemu 1:libvirt 4:object 4:json 4:event 1:util"
> log_outputs="1:file:/var/log/libvirt/libvirtd.log"
>
> And restart libvirt.
>
> Note that libvirt log may be huge, so you need to watch it and change
> the log level or filter after you collect what you need.
>
> To log only warning and errors use:
>
> log_outputs="3:file:/var/log/libvirt/libvirtd.log"
>
> Someone from gluster should help debugging this.
>
> Nir
>
Ok, I could also try this way if enabling more verbose gluster logs is not sufficient.
Enabling warnings in libvirt logs is probably wanted anyway. The
warnings and errors
can add more info about this failure.
See this for changing libvirt log level without restarting libvirt:
https://wiki.libvirt.org/page/DebugLogs#Runtime_setting
Nir