[JIRA] (OVIRT-1421) Docker storage issues on FC24 nodes
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1421?page=com.atlassian.jir... ]
Barak Korren commented on OVIRT-1421:
-------------------------------------
It looks like perhaps this issue is not the cause for the startup failure of the docker deamon, because the same situation exists on other FC24 slaves, and the deamon starts up well there.
Opening individual separate ticket about that.
> Docker storage issues on FC24 nodes
> -----------------------------------
>
> Key: OVIRT-1421
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1421
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Components: oVirt CI
> Reporter: Barak Korren
> Assignee: infra
> Priority: High
> Labels: containers
>
> docekr startup seems to be failing on FC24 nodes and making other dokcer commands get stuck forever.
> The core reason behind the failure seems to be that the Docker storage setup failed:
> {code}
> [root@vm0064 ~]# systemctl status docker-storage-setup
> ● docker-storage-setup.service - Docker Storage Setup
> Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; disabled; vendor preset: disabled)
> Active: failed (Result: exit-code) since Tue 2017-05-30 03:14:49 UTC; 1 day 2h ago
> Process: 1489 ExecStart=/usr/bin/docker-storage-setup (code=exited, status=1/FAILURE)
> Main PID: 1489 (code=exited, status=1/FAILURE)
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Starting Docker Storage Setup...
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: INFO: Volume group backing root filesystem could not be determined
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: ERROR: No valid volume group found. Exiting.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Main process exited, code=exited, status=1/FAILURE
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Failed to start Docker Storage Setup.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Unit entered failed state.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Failed with result 'exit-code'.
> {code}
> This seem to be does to the fact that the system's root filesystem is on a plain disk rather then LVM. This is although the docker default DM storage driver is supposed to be able to use a plaing file on the filesystem. Need to check if this is an issue specific to this slave of something that has to do with all FC24 slaves.
--
This message was sent by Atlassian JIRA
(v1000.1010.2#100044)
7 years, 5 months
[JIRA] (OVIRT-1421) Docker storage issues on FC24 nodes
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1421?page=com.atlassian.jir... ]
Barak Korren commented on OVIRT-1421:
-------------------------------------
https://bugzilla.redhat.com/show_bug.cgi?id=1330714
> Docker storage issues on FC24 nodes
> -----------------------------------
>
> Key: OVIRT-1421
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1421
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Components: oVirt CI
> Reporter: Barak Korren
> Assignee: infra
> Priority: High
> Labels: containers
>
> docekr startup seems to be failing on FC24 nodes and making other dokcer commands get stuck forever.
> The core reason behind the failure seems to be that the Docker storage setup failed:
> {code}
> [root@vm0064 ~]# systemctl status docker-storage-setup
> ● docker-storage-setup.service - Docker Storage Setup
> Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; disabled; vendor preset: disabled)
> Active: failed (Result: exit-code) since Tue 2017-05-30 03:14:49 UTC; 1 day 2h ago
> Process: 1489 ExecStart=/usr/bin/docker-storage-setup (code=exited, status=1/FAILURE)
> Main PID: 1489 (code=exited, status=1/FAILURE)
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Starting Docker Storage Setup...
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: INFO: Volume group backing root filesystem could not be determined
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: ERROR: No valid volume group found. Exiting.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Main process exited, code=exited, status=1/FAILURE
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Failed to start Docker Storage Setup.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Unit entered failed state.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Failed with result 'exit-code'.
> {code}
> This seem to be does to the fact that the system's root filesystem is on a plain disk rather then LVM. This is although the docker default DM storage driver is supposed to be able to use a plaing file on the filesystem. Need to check if this is an issue specific to this slave of something that has to do with all FC24 slaves.
--
This message was sent by Atlassian JIRA
(v1000.1010.2#100044)
7 years, 5 months
[JIRA] (OVIRT-1421) Docker storage issues on FC24 nodes
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1421?page=com.atlassian.jir... ]
Barak Korren commented on OVIRT-1421:
-------------------------------------
https://github.com/projectatomic/container-storage-setup/issues/53
> Docker storage issues on FC24 nodes
> -----------------------------------
>
> Key: OVIRT-1421
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1421
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Components: oVirt CI
> Reporter: Barak Korren
> Assignee: infra
> Priority: High
> Labels: containers
>
> docekr startup seems to be failing on FC24 nodes and making other dokcer commands get stuck forever.
> The core reason behind the failure seems to be that the Docker storage setup failed:
> {code}
> [root@vm0064 ~]# systemctl status docker-storage-setup
> ● docker-storage-setup.service - Docker Storage Setup
> Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; disabled; vendor preset: disabled)
> Active: failed (Result: exit-code) since Tue 2017-05-30 03:14:49 UTC; 1 day 2h ago
> Process: 1489 ExecStart=/usr/bin/docker-storage-setup (code=exited, status=1/FAILURE)
> Main PID: 1489 (code=exited, status=1/FAILURE)
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Starting Docker Storage Setup...
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: INFO: Volume group backing root filesystem could not be determined
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: ERROR: No valid volume group found. Exiting.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Main process exited, code=exited, status=1/FAILURE
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Failed to start Docker Storage Setup.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Unit entered failed state.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Failed with result 'exit-code'.
> {code}
> This seem to be does to the fact that the system's root filesystem is on a plain disk rather then LVM. This is although the docker default DM storage driver is supposed to be able to use a plaing file on the filesystem. Need to check if this is an issue specific to this slave of something that has to do with all FC24 slaves.
--
This message was sent by Atlassian JIRA
(v1000.1010.2#100044)
7 years, 5 months
[JIRA] (OVIRT-1421) Docker storage issues on FC24 nodes
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1421?page=com.atlassian.jir... ]
Barak Korren commented on OVIRT-1421:
-------------------------------------
Looks like the DM loopback devices exist on both EL7 and FC24:
{code}
[root@vm0024 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
vda 253:0 0 40G 0 disk
├─vda1 253:1 0 512M 0 part /boot
├─vda2 253:2 0 4G 0 part [SWAP]
└─vda3 253:3 0 35.5G 0 part /
loop0 7:0 0 100G 0 loop
└─docker-253:3-101073748-pool 252:0 0 100G 0 dm
loop1 7:1 0 2G 0 loop
└─docker-253:3-101073748-pool 252:0 0 100G 0 dm
{code}
{code}
[root@vm0064 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop1 7:1 0 2G 0 loop
└─docker-252:3-33877273-pool 253:0 0 100G 0 dm
sr0 11:0 1 1024M 0 rom
loop0 7:0 0 100G 0 loop
└─docker-252:3-33877273-pool 253:0 0 100G 0 dm
vda 252:0 0 40G 0 disk
├─vda2 252:2 0 4G 0 part [SWAP]
├─vda3 252:3 0 35.8G 0 part /
└─vda1 252:1 0 256M 0 part /boot
{code}
This looks more and more like a recent regression with "{{docker-storage-setup}}" that has more sever repercussions on FC24 because of a more strict systemd setup.
> Docker storage issues on FC24 nodes
> -----------------------------------
>
> Key: OVIRT-1421
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1421
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Components: oVirt CI
> Reporter: Barak Korren
> Assignee: infra
> Priority: High
> Labels: containers
>
> docekr startup seems to be failing on FC24 nodes and making other dokcer commands get stuck forever.
> The core reason behind the failure seems to be that the Docker storage setup failed:
> {code}
> [root@vm0064 ~]# systemctl status docker-storage-setup
> ● docker-storage-setup.service - Docker Storage Setup
> Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; disabled; vendor preset: disabled)
> Active: failed (Result: exit-code) since Tue 2017-05-30 03:14:49 UTC; 1 day 2h ago
> Process: 1489 ExecStart=/usr/bin/docker-storage-setup (code=exited, status=1/FAILURE)
> Main PID: 1489 (code=exited, status=1/FAILURE)
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Starting Docker Storage Setup...
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: INFO: Volume group backing root filesystem could not be determined
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: ERROR: No valid volume group found. Exiting.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Main process exited, code=exited, status=1/FAILURE
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Failed to start Docker Storage Setup.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Unit entered failed state.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Failed with result 'exit-code'.
> {code}
> This seem to be does to the fact that the system's root filesystem is on a plain disk rather then LVM. This is although the docker default DM storage driver is supposed to be able to use a plaing file on the filesystem. Need to check if this is an issue specific to this slave of something that has to do with all FC24 slaves.
--
This message was sent by Atlassian JIRA
(v1000.1010.2#100044)
7 years, 5 months
[JIRA] (OVIRT-1421) Docker storage issues on FC24 nodes
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1421?page=com.atlassian.jir... ]
Barak Korren commented on OVIRT-1421:
-------------------------------------
More information:
The original issue was found on "{{vm0064.workers-phx.ovirt.org}}" which is an FC24 VM.
This is what you get when you try to run "{{docker-storage-setup}}" manually on it:
{code}
# docker-storage-setup
INFO: Volume group backing root filesystem could not be determined
ERROR: No valid volume group found. Exiting.
{code}
We seem to be seeing a very similar issue on EL7. the following is from "{{vm0024.workers-phx.ovirt.org}}".
{code}
[root@vm0024 ~]# systemctl status docker-storage-setup
● docker-storage-setup.service - Docker Storage Setup
Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2017-05-31 03:00:42 UTC; 3h 53min ago
Process: 16223 ExecStart=/usr/bin/docker-storage-setup (code=exited, status=1/FAILURE)
Main PID: 16223 (code=exited, status=1/FAILURE)
May 31 03:00:42 vm0024.workers-phx.ovirt.org systemd[1]: Starting Docker Storage Setup...
May 31 03:00:42 vm0024.workers-phx.ovirt.org docker-storage-setup[16223]: INFO: Volume group backing root filesystem could not be determined
May 31 03:00:42 vm0024.workers-phx.ovirt.org docker-storage-setup[16223]: ERROR: No valid volume group found. Exiting.
May 31 03:00:42 vm0024.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: main process exited, code=exited, status=1/FAILURE
May 31 03:00:42 vm0024.workers-phx.ovirt.org systemd[1]: Failed to start Docker Storage Setup.
May 31 03:00:42 vm0024.workers-phx.ovirt.org systemd[1]: Unit docker-storage-setup.service entered failed state.
May 31 03:00:42 vm0024.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service failed.
{code}
This is the output when running the command manually:
{code}
[root@vm0024 ~]# docker-storage-setup
INFO: Volume group backing root filesystem could not be determined
ERROR: No valid volume group found. Exiting.
{code}
This is, strangely enough, not causing processes to get stuck on EL7 though:
{code}
[root@vm0024 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
{code}
In fact, things seem to be working fine despite the failure:
{code}
[root@vm0024 ~]# docker run --rm -it centos hostname
62d53a5ab5a7
{code}
I suspect we're seeing a recent regression in docker, and previously the setup script knew how to handle nodes without LVM. I'm not finding evidence of an upgrade in yum/dnf logs though.
> Docker storage issues on FC24 nodes
> -----------------------------------
>
> Key: OVIRT-1421
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1421
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Components: oVirt CI
> Reporter: Barak Korren
> Assignee: infra
> Priority: High
> Labels: containers
>
> docekr startup seems to be failing on FC24 nodes and making other dokcer commands get stuck forever.
> The core reason behind the failure seems to be that the Docker storage setup failed:
> {code}
> [root@vm0064 ~]# systemctl status docker-storage-setup
> ● docker-storage-setup.service - Docker Storage Setup
> Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; disabled; vendor preset: disabled)
> Active: failed (Result: exit-code) since Tue 2017-05-30 03:14:49 UTC; 1 day 2h ago
> Process: 1489 ExecStart=/usr/bin/docker-storage-setup (code=exited, status=1/FAILURE)
> Main PID: 1489 (code=exited, status=1/FAILURE)
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Starting Docker Storage Setup...
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: INFO: Volume group backing root filesystem could not be determined
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: ERROR: No valid volume group found. Exiting.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Main process exited, code=exited, status=1/FAILURE
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Failed to start Docker Storage Setup.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Unit entered failed state.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Failed with result 'exit-code'.
> {code}
> This seem to be does to the fact that the system's root filesystem is on a plain disk rather then LVM. This is although the docker default DM storage driver is supposed to be able to use a plaing file on the filesystem. Need to check if this is an issue specific to this slave of something that has to do with all FC24 slaves.
--
This message was sent by Atlassian JIRA
(v1000.1010.2#100044)
7 years, 5 months
[JIRA] (OVIRT-1421) Docker storage issues on FC24 nodes
by Barak Korren (oVirt JIRA)
[ https://ovirt-jira.atlassian.net/browse/OVIRT-1421?page=com.atlassian.jir... ]
Barak Korren commented on OVIRT-1421:
-------------------------------------
Need to also look into why we didn't see this so far. Did no jobs run on FC24 slaves so far? Are slaves without LVM very rare?
[~ederevea] can you provide some insights here?
> Docker storage issues on FC24 nodes
> -----------------------------------
>
> Key: OVIRT-1421
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1421
> Project: oVirt - virtualization made easy
> Issue Type: Bug
> Components: oVirt CI
> Reporter: Barak Korren
> Assignee: infra
> Priority: High
> Labels: containers
>
> docekr startup seems to be failing on FC24 nodes and making other dokcer commands get stuck forever.
> The core reason behind the failure seems to be that the Docker storage setup failed:
> {code}
> [root@vm0064 ~]# systemctl status docker-storage-setup
> ● docker-storage-setup.service - Docker Storage Setup
> Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; disabled; vendor preset: disabled)
> Active: failed (Result: exit-code) since Tue 2017-05-30 03:14:49 UTC; 1 day 2h ago
> Process: 1489 ExecStart=/usr/bin/docker-storage-setup (code=exited, status=1/FAILURE)
> Main PID: 1489 (code=exited, status=1/FAILURE)
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Starting Docker Storage Setup...
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: INFO: Volume group backing root filesystem could not be determined
> May 30 03:14:49 vm0064.workers-phx.ovirt.org docker-storage-setup[1489]: ERROR: No valid volume group found. Exiting.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Main process exited, code=exited, status=1/FAILURE
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: Failed to start Docker Storage Setup.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Unit entered failed state.
> May 30 03:14:49 vm0064.workers-phx.ovirt.org systemd[1]: docker-storage-setup.service: Failed with result 'exit-code'.
> {code}
> This seem to be does to the fact that the system's root filesystem is on a plain disk rather then LVM. This is although the docker default DM storage driver is supposed to be able to use a plaing file on the filesystem. Need to check if this is an issue specific to this slave of something that has to do with all FC24 slaves.
--
This message was sent by Atlassian JIRA
(v1000.1010.2#100044)
7 years, 5 months