[JIRA] (OVIRT-1422) "systemctl docker resatert" can get stuck forever
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1422?page=com.atlassian.jir... ]
Barak Korren commented on OVIRT-1422:
-------------------------------------
Stack trace of the stuck docker process, hopefully we can find a way to automatically recover from this:
{code}
[root@vm0064 ~]# pstack 12221
#0 0x00007f5bb876b5dd in ppoll () from /lib64/libc.so.6
#1 0x00005624383beae0 in bus_poll.lto_priv ()
#2 0x000056243838b92e in start_unit.lto_priv ()
#3 0x0000562438350b97 in main ()
{code}
> "systemctl docker resatert" can get stuck forever
> -------------------------------------------------
>
> Key: OVIRT-1422
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1422
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Components: oVirt CI
> Reporter: Barak Korren
> Assignee: infra
> Labels: containers
>
> It seems that in some situations the "{{systemctl docker restart}}" command can get stuck forever.
> The seems to be related to storage setup issue but may not be.
> The issue was detected on an FC24 slave:
> vm0064.workers-phx.ovirt.org
> Seem linked tickets for more details and implications.
--
This message was sent by Atlassian JIRA
(v1000.1010.2#100044)
7 years, 5 months
[JIRA] (OVIRT-1421) Docker storage issues on FC24 nodes
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1421?page=com.atlassian.jir... ]
Barak Korren edited comment on OVIRT-1421 at 5/31/17 10:49 AM:
---------------------------------------------------------------
It looks like perhaps this issue is not the cause for the startup failure of the docker deamon, because the same situation exists on other FC24 slaves, and the deamon starts up well there.
Opening individual separate ticket about that: OVIRT-1422
was (Author: bkorren(a)redhat.com):
It looks like perhaps this issue is not the cause for the startup failure of the docker deamon, because the same situation exists on other FC24 slaves, and the deamon starts up well there.
Opening individual separate ticket about that.
> Docker storage issues on FC24 nodes
> -----------------------------------
>
> Key: OVIRT-1421
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1421
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Components: oVirt CI
> Reporter: Barak Korren
> Assignee: infra
> Priority: High
> Labels: containers
>
> docekr startup seems to be failing on FC24 nodes and making other dokcer commands get stuck forever.
> The core reason behind the failure seems to be that the Docker storage setup failed:
> {code}
> [root@vm0064 ~]# systemctl status docker-storage-setup
> ● docker-storage-setup.service - Docker Storage Setup
> Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; disabled; vendor preset: disabled)
> Active: failed (Result: exit-code) since Tue 2017-05-30 03:14:49 UTC; 1 day 2h ago
> Process: 1489 ExecStart=/usr/bin/docker-storage-setup (code=exited, status=1/FAILURE)
> Main PID: 1489 (code=exited, status=1/FAILURE)
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Starting Docker Storage Setup...
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: INFO: Volume group backing root filesystem could not be determined
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: ERROR: No valid volume group found. Exiting.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Main process exited, code=exited, status=1/FAILURE
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Failed to start Docker Storage Setup.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Unit entered failed state.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Failed with result 'exit-code'.
> {code}
> This seem to be does to the fact that the system's root filesystem is on a plain disk rather then LVM. This is although the docker default DM storage driver is supposed to be able to use a plaing file on the filesystem. Need to check if this is an issue specific to this slave of something that has to do with all FC24 slaves.
--
This message was sent by Atlassian JIRA
(v1000.1010.2#100044)
7 years, 5 months