On Thu, Jan 12, 2017 at 12:02 PM, Mark Greenall
<m.greenall(a)iontrading.com> wrote:
Firstly, thanks @Yaniv and thanks @Nir for your responses.
@Yaniv, in answer to this:
>> Why do you have 1 SD per VM?
It's a combination of performance and ease of management. We ran some IO tests with
various configurations and settled on this one for a balance of reduced IO contention and
ease of management. If there is a better recommended way of handling these then I'm
all ears. If you believe having a large amount of storage domains adds to the problem then
we can also review the setup.
Yes, having one storage domain per vm is an extremely fragile way to use storage
domains. This makes you system very fragile - any problem in
monitoring one of the
45 storage domains can make entire host non-operational.
You should use storage domains for grouping volumes that need to be separated
from other volumes, for example production, staging, different users,
different types
of storage, etc.
If some vms need high IO, and you want to have one or more devices per vm,
you should use direct luns.
If you need snapshots, live storage migration, etc, use volumes on storage
domain.
I looked at the logs, and I can explain why your system becomes non-operational.
Grepping the domain monitor logs, we see that many storage domains have
very slow (up to 749 seconds read delay):
(filtered the log with awk, I don't have the command now)
Thread-12::DEBUG::2017-01-11
15:09:18,785::check::327::storage.check::(_check_completed)
'/dev/7dfeac70-eaa1-4ba6-ad2a-e3c11564ee3b/metadata' elapsed=0.05
Thread-12::DEBUG::2017-01-11
15:09:28,780::check::327::storage.check::(_check_completed)
'/dev/7dfeac70-eaa1-4ba6-ad2a-e3c11564ee3b/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:09:38,778::check::327::storage.check::(_check_completed)
'/dev/7dfeac70-eaa1-4ba6-ad2a-e3c11564ee3b/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:09:48,777::check::327::storage.check::(_check_completed)
'/dev/7dfeac70-eaa1-4ba6-ad2a-e3c11564ee3b/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:09:55,863::check::327::storage.check::(_check_completed)
'/dev/e70839af-77dd-40c0-a541-d364d30e859a/metadata' elapsed=0.02
Thread-12::DEBUG::2017-01-11
15:09:55,957::check::327::storage.check::(_check_completed)
'/dev/1198e513-bdc8-4d5f-8ee5-8e8dc30d309d/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:09:56,070::check::327::storage.check::(_check_completed)
'/dev/640ac4d3-1e14-465a-9a72-cc2f2c4cfe26/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:09:56,741::check::327::storage.check::(_check_completed)
'/dev/6e98b678-a955-49b8-aad7-e1e52e26db1f/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:09:56,798::check::327::storage.check::(_check_completed)
'/dev/02d31cfc-f095-42e6-8396-d4dbebbb4fed/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:09:57,080::check::327::storage.check::(_check_completed)
'/dev/4b23a421-5c1f-4541-a007-c93b7af4986b/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:09:57,248::check::327::storage.check::(_check_completed)
'/dev/5d8d49e2-ce0e-402e-9348-94f9576e2e28/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:09:57,425::check::327::storage.check::(_check_completed)
'/dev/9fcbb7b1-a13a-499a-a534-119360d57f00/metadata' elapsed=0.08
Thread-12::DEBUG::2017-01-11
15:09:57,715::check::327::storage.check::(_check_completed)
'/dev/a25ded63-2c31-4f1d-a65a-5390e47fda99/metadata' elapsed=0.04
Thread-12::DEBUG::2017-01-11
15:09:57,750::check::327::storage.check::(_check_completed)
'/dev/f6a91d2f-ccae-4440-b1a7-f62ee750a58c/metadata' elapsed=0.05
Thread-12::DEBUG::2017-01-11
15:09:58,007::check::327::storage.check::(_check_completed)
'/dev/bfb1d6b2-b610-4565-b818-ab6ee856e023/metadata' elapsed=0.07
Thread-12::DEBUG::2017-01-11
15:09:58,170::check::327::storage.check::(_check_completed)
'/dev/84cfcb68-190f-4836-8294-d5752c07b762/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:09:58,556::check::327::storage.check::(_check_completed)
'/dev/2204a85e-c8c7-4e1e-b8e6-e392645077c6/metadata' elapsed=0.07
Thread-12::DEBUG::2017-01-11
15:09:58,805::check::327::storage.check::(_check_completed)
'/dev/7dfeac70-eaa1-4ba6-ad2a-e3c11564ee3b/metadata' elapsed=0.07
Thread-12::DEBUG::2017-01-11
15:09:59,093::check::327::storage.check::(_check_completed)
'/dev/78e59ee0-13ac-4176-8950-837498ba6038/metadata' elapsed=0.06
Thread-12::DEBUG::2017-01-11
15:09:59,159::check::327::storage.check::(_check_completed)
'/dev/b66a2944-a056-4a48-a3f9-83f509df5d1b/metadata' elapsed=0.06
Thread-12::DEBUG::2017-01-11
15:09:59,218::check::327::storage.check::(_check_completed)
'/dev/da05d769-27c2-4270-9bba-5277bf3636e6/metadata' elapsed=0.06
Thread-12::DEBUG::2017-01-11
15:09:59,247::check::327::storage.check::(_check_completed)
'/dev/819b51c0-96d7-43c2-b120-7adade60a2e2/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:09:59,363::check::327::storage.check::(_check_completed)
'/dev/24499abc-0e16-48a2-8512-6c34e99dfa5f/metadata' elapsed=0.02
Thread-12::DEBUG::2017-01-11
15:09:59,573::check::327::storage.check::(_check_completed)
'/dev/6b39fdce-38ac-4d36-8e58-300c7082a9c0/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:10:00,077::check::327::storage.check::(_check_completed)
'/dev/1bde2522-d1dd-414f-9c48-c8a961071689/metadata' elapsed=0.10
Thread-12::DEBUG::2017-01-11
15:10:00,121::check::327::storage.check::(_check_completed)
'/dev/0d2ad817-56b5-413e-886b-1be1777ed99a/metadata' elapsed=0.04
Thread-12::DEBUG::2017-01-11
15:10:00,238::check::327::storage.check::(_check_completed)
'/dev/e54f3d6e-4790-430c-9fd9-fd6bf89d6413/metadata' elapsed=0.13
Thread-12::DEBUG::2017-01-11
15:10:00,307::check::327::storage.check::(_check_completed)
'/dev/e6062875-dcf9-4401-8738-befc9629c2d5/metadata' elapsed=0.10
Thread-12::DEBUG::2017-01-11
15:10:00,345::check::327::storage.check::(_check_completed)
'/dev/b819ee61-e907-441b-b3c1-9ff3ab1686d4/metadata' elapsed=0.04
Thread-12::DEBUG::2017-01-11
15:10:00,381::check::327::storage.check::(_check_completed)
'/dev/1335e5e4-3c3e-4538-966e-0717098f52db/metadata' elapsed=0.05
Thread-12::DEBUG::2017-01-11
15:10:00,702::check::327::storage.check::(_check_completed)
'/dev/9567770b-dfbc-489b-98f7-c7b2ca636e3d/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:10:02,268::check::327::storage.check::(_check_completed)
'/dev/6e77516f-e074-426e-b375-64b6e22ac579/metadata' elapsed=0.06
Thread-12::DEBUG::2017-01-11
15:10:02,296::check::327::storage.check::(_check_completed)
'/dev/8b1d2548-5c2b-46df-a1ac-4acf28ca99a8/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:10:05,873::check::327::storage.check::(_check_completed)
'/dev/e70839af-77dd-40c0-a541-d364d30e859a/metadata' elapsed=0.03
Thread-12::DEBUG::2017-01-11
15:12:06,446::check::327::storage.check::(_check_completed)
'/dev/6e77516f-e074-426e-b375-64b6e22ac579/metadata' elapsed=0.08
Thread-12::DEBUG::2017-01-11
15:18:06,707::check::327::storage.check::(_check_completed)
'/dev/26ece5da-9e44-45c9-af7e-df074a3da9ac/metadata' elapsed=361.77
Thread-12::DEBUG::2017-01-11
15:18:06,711::check::327::storage.check::(_check_completed)
'/dev/fb1ff10b-151f-4344-86d0-691c79acc16e/metadata' elapsed=361.75
Thread-12::DEBUG::2017-01-11
15:18:06,914::check::327::storage.check::(_check_completed)
'/dev/6b50dac1-eba7-448d-bb8f-892f8cb41197/metadata' elapsed=361.92
Thread-12::DEBUG::2017-01-11
15:18:07,048::check::327::storage.check::(_check_completed)
'/dev/4fa1053f-ca53-4e0d-b33f-5e10443d0f76/metadata' elapsed=362.04
Thread-12::DEBUG::2017-01-11
15:20:09,530::check::327::storage.check::(_check_completed)
'/dev/b4d27568-7c66-4447-8330-f145f886fded/metadata' elapsed=484.50
Thread-12::DEBUG::2017-01-11
15:20:47,848::check::327::storage.check::(_check_completed)
'/dev/60afa0bc-b022-4e71-8687-0e5afadd0ef2/metadata' elapsed=522.79
Thread-12::DEBUG::2017-01-11
15:21:20,545::check::327::storage.check::(_check_completed)
'/dev/8890fd59-45b3-46ac-88d5-3020ec71bb92/metadata' elapsed=555.45
Thread-12::DEBUG::2017-01-11
15:21:21,360::check::327::storage.check::(_check_completed)
'/dev/ed3abca9-742e-4b83-891b-958c03aa3fcb/metadata' elapsed=556.17
Thread-12::DEBUG::2017-01-11
15:21:22,094::check::327::storage.check::(_check_completed)
'/dev/9b941172-1b62-4e3c-9af4-168c90140ffe/metadata' elapsed=556.98
Thread-12::DEBUG::2017-01-11
15:21:22,097::check::327::storage.check::(_check_completed)
'/dev/02393b5e-ae7b-42e2-95c7-863ad0ad7d29/metadata' elapsed=556.89
Thread-12::DEBUG::2017-01-11
15:21:22,098::check::327::storage.check::(_check_completed)
'/dev/5b930223-96e2-4eb5-b8bd-575c349c880c/metadata' elapsed=556.84
Thread-12::DEBUG::2017-01-11
15:21:22,209::check::327::storage.check::(_check_completed)
'/dev/7af85dde-01ef-442f-8475-4d96ab84e4c7/metadata' elapsed=556.92
Thread-12::DEBUG::2017-01-11
15:22:49,640::check::327::storage.check::(_check_completed)
'/dev/f35bda70-0be1-4907-82b6-5c676ee83cbe/metadata' elapsed=644.31
Thread-12::DEBUG::2017-01-11
15:23:27,979::check::327::storage.check::(_check_completed)
'/dev/e69c4f4c-a195-4c21-bd69-9d6a2a38abf8/metadata' elapsed=682.63
Thread-12::DEBUG::2017-01-11
15:24:15,172::check::327::storage.check::(_check_completed)
'/dev/664039e4-30d8-4aea-8a40-9abdc36ea3f4/metadata' elapsed=729.78
Thread-12::DEBUG::2017-01-11
15:24:15,250::check::327::storage.check::(_check_completed)
'/dev/d3341c6f-c11e-4edd-82f6-90dcbd08b8c9/metadata' elapsed=729.84
Thread-12::DEBUG::2017-01-11
15:24:15,298::check::327::storage.check::(_check_completed)
'/dev/1a380869-8c8e-4cf3-9fc1-758addf7a30d/metadata' elapsed=729.84
Thread-12::DEBUG::2017-01-11
15:24:19,416::check::327::storage.check::(_check_completed)
'/dev/1b26f8ac-10c6-4595-b0f0-95bdfd42f05c/metadata' elapsed=733.94
Thread-12::DEBUG::2017-01-11
15:24:22,984::check::327::storage.check::(_check_completed)
'/dev/9017d145-4dab-44bb-864a-f4b4616a239f/metadata' elapsed=737.48
Thread-12::DEBUG::2017-01-11
15:24:35,552::check::327::storage.check::(_check_completed)
'/dev/1198e513-bdc8-4d5f-8ee5-8e8dc30d309d/metadata' elapsed=750.00
Thread-12::DEBUG::2017-01-11
15:24:35,553::check::327::storage.check::(_check_completed)
'/dev/640ac4d3-1e14-465a-9a72-cc2f2c4cfe26/metadata' elapsed=749.98
Thread-12::DEBUG::2017-01-11
15:24:35,553::check::327::storage.check::(_check_completed)
'/dev/6e98b678-a955-49b8-aad7-e1e52e26db1f/metadata' elapsed=749.94
Thread-12::DEBUG::2017-01-11
15:24:35,553::check::327::storage.check::(_check_completed)
'/dev/02d31cfc-f095-42e6-8396-d4dbebbb4fed/metadata' elapsed=749.90
Thread-12::DEBUG::2017-01-11
15:24:35,554::check::327::storage.check::(_check_completed)
'/dev/4b23a421-5c1f-4541-a007-c93b7af4986b/metadata' elapsed=749.88
Thread-12::DEBUG::2017-01-11
15:24:35,554::check::327::storage.check::(_check_completed)
'/dev/5d8d49e2-ce0e-402e-9348-94f9576e2e28/metadata' elapsed=749.85
Thread-12::DEBUG::2017-01-11
15:24:35,555::check::327::storage.check::(_check_completed)
'/dev/9fcbb7b1-a13a-499a-a534-119360d57f00/metadata' elapsed=749.83
Thread-12::DEBUG::2017-01-11
15:24:35,555::check::327::storage.check::(_check_completed)
'/dev/a25ded63-2c31-4f1d-a65a-5390e47fda99/metadata' elapsed=749.78
Thread-12::DEBUG::2017-01-11
15:24:35,555::check::327::storage.check::(_check_completed)
'/dev/f6a91d2f-ccae-4440-b1a7-f62ee750a58c/metadata' elapsed=749.76
Thread-12::DEBUG::2017-01-11
15:24:35,556::check::327::storage.check::(_check_completed)
'/dev/bfb1d6b2-b610-4565-b818-ab6ee856e023/metadata' elapsed=749.73
Thread-12::DEBUG::2017-01-11
15:24:35,556::check::327::storage.check::(_check_completed)
'/dev/84cfcb68-190f-4836-8294-d5752c07b762/metadata' elapsed=749.69
Thread-12::DEBUG::2017-01-11
15:24:35,556::check::327::storage.check::(_check_completed)
'/dev/2204a85e-c8c7-4e1e-b8e6-e392645077c6/metadata' elapsed=749.67
Thread-12::DEBUG::2017-01-11
15:24:35,557::check::327::storage.check::(_check_completed)
'/dev/7dfeac70-eaa1-4ba6-ad2a-e3c11564ee3b/metadata' elapsed=749.63
Thread-12::DEBUG::2017-01-11
15:24:35,557::check::327::storage.check::(_check_completed)
'/dev/78e59ee0-13ac-4176-8950-837498ba6038/metadata' elapsed=749.61
Thread-12::DEBUG::2017-01-11
15:24:35,557::check::327::storage.check::(_check_completed)
'/dev/b66a2944-a056-4a48-a3f9-83f509df5d1b/metadata' elapsed=749.56
Thread-12::DEBUG::2017-01-11
15:24:35,558::check::327::storage.check::(_check_completed)
'/dev/da05d769-27c2-4270-9bba-5277bf3636e6/metadata' elapsed=749.54
Thread-12::DEBUG::2017-01-11
15:24:35,558::check::327::storage.check::(_check_completed)
'/dev/819b51c0-96d7-43c2-b120-7adade60a2e2/metadata' elapsed=749.52
Thread-12::DEBUG::2017-01-11
15:24:35,559::check::327::storage.check::(_check_completed)
'/dev/24499abc-0e16-48a2-8512-6c34e99dfa5f/metadata' elapsed=749.48
Thread-12::DEBUG::2017-01-11
15:24:35,559::check::327::storage.check::(_check_completed)
'/dev/6b39fdce-38ac-4d36-8e58-300c7082a9c0/metadata' elapsed=749.45
Thread-12::DEBUG::2017-01-11
15:24:35,559::check::327::storage.check::(_check_completed)
'/dev/1bde2522-d1dd-414f-9c48-c8a961071689/metadata' elapsed=749.41
Thread-12::DEBUG::2017-01-11
15:24:35,560::check::327::storage.check::(_check_completed)
'/dev/0d2ad817-56b5-413e-886b-1be1777ed99a/metadata' elapsed=749.39
Thread-12::DEBUG::2017-01-11
15:24:35,560::check::327::storage.check::(_check_completed)
'/dev/e54f3d6e-4790-430c-9fd9-fd6bf89d6413/metadata' elapsed=749.34
Thread-12::DEBUG::2017-01-11
15:24:35,560::check::327::storage.check::(_check_completed)
'/dev/e6062875-dcf9-4401-8738-befc9629c2d5/metadata' elapsed=749.33
Thread-12::DEBUG::2017-01-11
15:24:35,561::check::327::storage.check::(_check_completed)
'/dev/b819ee61-e907-441b-b3c1-9ff3ab1686d4/metadata' elapsed=749.28
Thread-12::DEBUG::2017-01-11
15:24:35,561::check::327::storage.check::(_check_completed)
'/dev/1335e5e4-3c3e-4538-966e-0717098f52db/metadata' elapsed=749.26
Thread-12::DEBUG::2017-01-11
15:24:35,561::check::327::storage.check::(_check_completed)
'/dev/9567770b-dfbc-489b-98f7-c7b2ca636e3d/metadata' elapsed=749.22
Thread-12::DEBUG::2017-01-11
15:24:36,207::check::327::storage.check::(_check_completed)
'/dev/8b1d2548-5c2b-46df-a1ac-4acf28ca99a8/metadata' elapsed=749.80
Thread-12::DEBUG::2017-01-11
15:24:36,210::check::327::storage.check::(_check_completed)
'/dev/e70839af-77dd-40c0-a541-d364d30e859a/metadata' elapsed=749.78
Thread-12::DEBUG::2017-01-11
15:24:36,211::check::327::storage.check::(_check_completed)
'/dev/6e77516f-e074-426e-b375-64b6e22ac579/metadata' elapsed=0.65
Thread-12::DEBUG::2017-01-11
15:24:36,264::check::327::storage.check::(_check_completed)
'/dev/26ece5da-9e44-45c9-af7e-df074a3da9ac/metadata' elapsed=0.67
Thread-12::DEBUG::2017-01-11
15:24:36,264::check::327::storage.check::(_check_completed)
'/dev/fb1ff10b-151f-4344-86d0-691c79acc16e/metadata' elapsed=0.67
Thread-12::DEBUG::2017-01-11
15:24:36,265::check::327::storage.check::(_check_completed)
'/dev/6b50dac1-eba7-448d-bb8f-892f8cb41197/metadata' elapsed=0.62
Thread-12::DEBUG::2017-01-11
15:24:36,265::check::327::storage.check::(_check_completed)
'/dev/4fa1053f-ca53-4e0d-b33f-5e10443d0f76/metadata' elapsed=0.60
Thread-12::DEBUG::2017-01-11
15:24:36,265::check::327::storage.check::(_check_completed)
'/dev/b4d27568-7c66-4447-8330-f145f886fded/metadata' elapsed=0.55
Thread-12::DEBUG::2017-01-11
15:24:36,266::check::327::storage.check::(_check_completed)
'/dev/60afa0bc-b022-4e71-8687-0e5afadd0ef2/metadata' elapsed=0.53
Thread-12::DEBUG::2017-01-11
15:24:36,266::check::327::storage.check::(_check_completed)
'/dev/8890fd59-45b3-46ac-88d5-3020ec71bb92/metadata' elapsed=0.48
Thread-12::DEBUG::2017-01-11
15:24:36,267::check::327::storage.check::(_check_completed)
'/dev/9b941172-1b62-4e3c-9af4-168c90140ffe/metadata' elapsed=0.46
Thread-12::DEBUG::2017-01-11
15:24:36,267::check::327::storage.check::(_check_completed)
'/dev/ed3abca9-742e-4b83-891b-958c03aa3fcb/metadata' elapsed=0.42
Looking in vdsm at the time when this issue started, we can see:
1. Thread-12 starting read check for domain 6e77516f-e074-426e-b375-64b6e22ac579
Thread-12::DEBUG::2017-01-11
15:12:06,365::check::296::storage.check::(_start_process) START check
'/dev/6e77516f-e074-426e-b375-64b6e22ac579/metadata'
cmd=['/usr/bin/taskset', '--cpu-list', '0-47',
'/usr/bin/dd',
'if=/dev/6e77516f-e074-426e-b375-64b6e22ac579/metadata',
'of=/dev/null', 'bs=4096', 'count=1', 'iflag=direct']
delay=114.16
2. Thread-12 finished checking domain 6e77516f-e074-426e-b375-64b6e22ac579
Thread-12::DEBUG::2017-01-11
15:12:06,446::check::327::storage.check::(_check_completed) FINISH
check '/dev/6e77516f-e074-426e-b375-64b6e22ac579/metadata' rc=0
err=bytearray(b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB)
copied, 0.000452604 s, 9.0 MB/s\n') elapsed=0.08
The next thing this thread should do is to acquire the host id, but Thread-24
tried first, and is blocking now Thread-12
(there no logs from this thread until 15:15:25).
3. Thread 24 trying to acquire host id for domain
6e77516f-e074-426e-b375-64b6e22ac579:
Thread-24::INFO::2017-01-11
15:12:06,491::clusterlock::236::Storage.SANLock::(acquireHostId)
Acquiring host id for domain 6e77516f-e074-426e-b375-64b6e22ac579 (id:
1)
4. This should take no time, as we are using async=True, but it took 3 minutes:
Thread-24::DEBUG::2017-01-11
15:15:25,011::clusterlock::254::Storage.SANLock::(acquireHostId) Host
id for domain 6e77516f-e074-426e-b375-64b6e22ac579 successfully
acquired (id: 1)
5. Thread-12 trying to acquire the host id for same domain:
Thread-12::INFO::2017-01-11
15:15:25,012::clusterlock::236::Storage.SANLock::(acquireHostId)
Acquiring host id for domain 6e77516f-e074-426e-b375-64b6e22ac579 (id:
1)
This is a bug, only one of the threads should try to take acquire the host id,
but normally this operation is very fast so it does not matter, but here it took
3 minutes:
6. Thread 12 finish to acquire host id
Thread-12::DEBUG::2017-01-11
15:18:06,707::clusterlock::254::Storage.SANLock::(acquireHostId) Host
id for domain 6e77516f-e074-426e-b375-64b6e22ac579 successfully
acquired (id: 1)
Thread-12 is responsible for checking readability of the storage
domain, it should
never block. Here it was blocked for 6 minutes, this cause host to become
non-operational.
The root cause of these issues is sanlock blocking for long time when acquiring
host id in async mode.
There is no information in sanlock log why async operation was blocked for long
time.
Can you file a bug about this, and attach the log you sent in this thread?
https://bugzilla.redhat.com/enter_bug.cgi?product=vdsm
Next steps:
1. Please enable debug logging in sanlock log:
edit /etc/sanlock/sanlock.conf and set:
logfile_priority = 7
Move the host to maintenance and restart sanlock
This may give us more details in the next try, to understand why sanlock
blocks for long time.
2. Try vdsm patch elimiting the delays in the monitoring thrad.
You can try this patch, eliminating the delays in domain monitoring if sanlock
blocks for long time when acquiring a host id. With this patch, only storage
domains where acquiring the host id is slow will be affected.
https://gerrit.ovirt.org/70450
Adding David (sanlock maintainer), he will probably add more info how to
debug the unexpected blocking in sanlock.add_lockspace().
Please report the results with these changes.
Nir