[Users] Gluster not aligned with double maintenance... problem
Gianluca Cecchi
gianluca.cecchi at gmail.com
Thu Oct 17 11:45:51 EDT 2013
Hello,
One engine and two hosts all with updated f19 (despite on their names)
and ovirt updates-testing repo enabled.
So I have 3.3.0.1-1 and vdsm-4.12.1-4
kernel is 3.11.2-201.fc19.x86_64 (problems booting with latest
3.11.4-201.fc19.x86_64)
Storage domain configured with gluster as in f19 (3.4.1-1.fc19.x86_64
recompiled binding to port 50152+) and distributed replicated bricks
I do this kind of operations:
- power off all VMs (to start clean)
- put both hosts in maintenance
- shutdown both hosts
- startup one host
- activate one host in webadmin gui
after about 2-3 minutes delay it comes up and so it has its own
gluster copy active
- power on VM and write 3Gb on it
[g.cecchi at c6s ~]$ sudo time dd if=/dev/zero bs=1024k count=3096 of=/testfile
3096+0 records in
3096+0 records out
3246391296 bytes (3.2 GB) copied, 42.3414 s, 76.7 MB/s
0.01user 7.99system 0:42.34elapsed 18%CPU (0avgtext+0avgdata 7360maxresident)k
0inputs+6352984outputs (0major+493minor)pagefaults 0swaps
Originally the gluster fs had 13Gb used, so now has 16Gb (see
/rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata below):
[root at f18ovn01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/fedora-root 15G 5.6G 8.1G 41% /
devtmpfs 24G 0 24G 0% /dev
tmpfs 24G 8.0K 24G 1% /dev/shm
tmpfs 24G 656K 24G 1% /run
tmpfs 24G 0 24G 0%
/sys/fs/cgroup
tmpfs 24G 0 24G 0% /tmp
/dev/mapper/3600508b1001037414d4b3039383300021 477M 103M 345M 23% /boot
/dev/mapper/fedora-ISO_GLUSTER 10G 33M 10G 1%
/gluster/ISO_GLUSTER
/dev/mapper/fedora-DATA_GLUSTER 30G 16G 15G 52%
/gluster/DATA_GLUSTER
f18ovn01.mydomain:gvdata 30G 16G 15G 52%
/rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata
- power on second host
>From a gluster point of view it seems ok
[root at f18ovn03 glusterfs]# gluster volume status
Status of volume: gviso
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick f18ovn01.mydomain:/gluster/ISO_GLUSTER/bric
k1 50153 Y 1314
Brick f18ovn03.mydomain:/gluster/ISO_GLUSTER/bric
k1 50153 Y 1275
NFS Server on localhost 2049 Y 1288
Self-heal Daemon on localhost N/A Y 1295
NFS Server on 192.168.3.1 2049 Y 1328
Self-heal Daemon on 192.168.3.1 N/A Y 1335
There are no active volume tasks
Status of volume: gvdata
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick f18ovn01.mydomain:/gluster/DATA_GLUSTER/bri
ck1 50152 Y 1313
Brick f18ovn03.mydomaint:/gluster/DATA_GLUSTER/bri
ck1 50152 Y 1280
NFS Server on localhost 2049 Y 1288
Self-heal Daemon on localhost N/A Y 1295
NFS Server on 192.168.3.1 2049 Y 1328
Self-heal Daemon on 192.168.3.1 N/A Y 1335
There are no active volume tasks
But actually I don't see any network sync and fs remains in fact at 13GB...
[root at f18ovn03 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/fedora-root 15G 4.7G 8.9G 35% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 560K 16G 1% /run
tmpfs 16G 0 16G 0%
/sys/fs/cgroup
tmpfs 16G 0 16G 0% /tmp
/dev/mapper/3600508b1001037424d4b3035343800031 477M 103M 345M 23% /boot
/dev/mapper/fedora-DATA_GLUSTER 30G 13G 18G 44%
/gluster/DATA_GLUSTER
/dev/mapper/fedora-ISO_GLUSTER 10G 33M 10G 1%
/gluster/ISO_GLUSTER
I wait some minutes but nothing changes...
- I activate from webadmin gui this second host and it comes up in Up state
But actually it is not synced from a storage point of view so in my
opinion it should come up...
Now I see on it:
[root at f18ovn03 bricks]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/fedora-root 15G 4.7G 8.9G 35% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 588K 16G 1% /run
tmpfs 16G 0 16G 0%
/sys/fs/cgroup
tmpfs 16G 0 16G 0% /tmp
/dev/mapper/3600508b1001037424d4b3035343800031 477M 103M 345M 23% /boot
/dev/mapper/fedora-DATA_GLUSTER 30G 13G 18G 44%
/gluster/DATA_GLUSTER
/dev/mapper/fedora-ISO_GLUSTER 10G 33M 10G 1%
/gluster/ISO_GLUSTER
f18ovn01.mydomain:gvdata 30G 13G 18G 44%
/rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata
So also the /rhev/data-center/mnt/glusterSD/f18ovn01.mydomain:gvdata
view is incorrect (13Gb instead of 16Gb)
I think that oVirt should have some sort of detection of this and
avoid activation, or put a warning because in case of f18ovn01
mainteneance it could not becaome SPM.
For example oVirt could check heal info.
Normally I see no heal for ok volumes
eg on a gluster volume (gviso) on this same cluster, planned to be
used for iso and with no data I get:
[root at f18ovn03 bricks]# gluster volume heal gviso info
Gathering Heal info on volume gviso has been successful
Brick f18ovn01.mydomain:/gluster/ISO_GLUSTER/brick1
Number of entries: 0
Brick f18ovn03.mydomain:/gluster/ISO_GLUSTER/brick1
Number of entries: 0
Instead on this gvdata one I do get now:
[root at f18ovn03 bricks]# gluster volume heal gvdata info
Gathering Heal info on volume gvdata has been successful
Brick f18ovn01.mydomain:/gluster/DATA_GLUSTER/brick1
Number of entries: 6
/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids
/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/images/15f9ca1c-c435-4892-9eb7-0c84583b2a7d/a123801a-0a4d-4a47-a426-99d8480d2e49
/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/images/a5e4f67b-50b5-4740-9990-39deb8812445/53408cb0-bcd4-40de-bc69-89d59b7b5bc2
/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/leases
/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md
/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/metadata
Brick f18ovn03.mydomain:/gluster/DATA_GLUSTER/brick1
Number of entries: 5
/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/ids
<gfid:59a4d113-5881-4147-95c4-b5dee9872ad3>
/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md
<gfid:db9985f8-117e-4340-98b9-a62bd963f6bf>
/d0b96d4a-62aa-4e9f-b50e-f7a0cb5be291/dom_md/metadata
What do you think about this?
What is recommended gluster command to fix f18ovn03?
Let me know which kind of ovrt/gluster logs could help.
Thanks in advance,
Gianluca
More information about the Users
mailing list