
On Thu, Jan 12, 2017 at 12:02 PM, Mark Greenall <m.greenall@iontrading.com> wrote:
Firstly, thanks @Yaniv and thanks @Nir for your responses.
@Yaniv, in answer to this:
Why do you have 1 SD per VM?
It's a combination of performance and ease of management. We ran some IO tests with various configurations and settled on this one for a balance of reduced IO contention and ease of management. If there is a better recommended way of handling these then I'm all ears. If you believe having a large amount of storage domains adds to the problem then we can also review the setup.
Yes, having one storage domain per vm is an extremely fragile way to use storage domains. This makes you system very fragile - any problem in monitoring one of the 45 storage domains can make entire host non-operational. You should use storage domains for grouping volumes that need to be separated from other volumes, for example production, staging, different users, different types of storage, etc. If some vms need high IO, and you want to have one or more devices per vm, you should use direct luns. If you need snapshots, live storage migration, etc, use volumes on storage domain. I looked at the logs, and I can explain why your system becomes non-operational. Grepping the domain monitor logs, we see that many storage domains have very slow (up to 749 seconds read delay): (filtered the log with awk, I don't have the command now) Thread-12::DEBUG::2017-01-11 15:09:18,785::check::327::storage.check::(_check_completed) '/dev/7dfeac70-eaa1-4ba6-ad2a-e3c11564ee3b/metadata' elapsed=0.05 Thread-12::DEBUG::2017-01-11 15:09:28,780::check::327::storage.check::(_check_completed) '/dev/7dfeac70-eaa1-4ba6-ad2a-e3c11564ee3b/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:09:38,778::check::327::storage.check::(_check_completed) '/dev/7dfeac70-eaa1-4ba6-ad2a-e3c11564ee3b/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:09:48,777::check::327::storage.check::(_check_completed) '/dev/7dfeac70-eaa1-4ba6-ad2a-e3c11564ee3b/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:09:55,863::check::327::storage.check::(_check_completed) '/dev/e70839af-77dd-40c0-a541-d364d30e859a/metadata' elapsed=0.02 Thread-12::DEBUG::2017-01-11 15:09:55,957::check::327::storage.check::(_check_completed) '/dev/1198e513-bdc8-4d5f-8ee5-8e8dc30d309d/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:09:56,070::check::327::storage.check::(_check_completed) '/dev/640ac4d3-1e14-465a-9a72-cc2f2c4cfe26/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:09:56,741::check::327::storage.check::(_check_completed) '/dev/6e98b678-a955-49b8-aad7-e1e52e26db1f/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:09:56,798::check::327::storage.check::(_check_completed) '/dev/02d31cfc-f095-42e6-8396-d4dbebbb4fed/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:09:57,080::check::327::storage.check::(_check_completed) '/dev/4b23a421-5c1f-4541-a007-c93b7af4986b/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:09:57,248::check::327::storage.check::(_check_completed) '/dev/5d8d49e2-ce0e-402e-9348-94f9576e2e28/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:09:57,425::check::327::storage.check::(_check_completed) '/dev/9fcbb7b1-a13a-499a-a534-119360d57f00/metadata' elapsed=0.08 Thread-12::DEBUG::2017-01-11 15:09:57,715::check::327::storage.check::(_check_completed) '/dev/a25ded63-2c31-4f1d-a65a-5390e47fda99/metadata' elapsed=0.04 Thread-12::DEBUG::2017-01-11 15:09:57,750::check::327::storage.check::(_check_completed) '/dev/f6a91d2f-ccae-4440-b1a7-f62ee750a58c/metadata' elapsed=0.05 Thread-12::DEBUG::2017-01-11 15:09:58,007::check::327::storage.check::(_check_completed) '/dev/bfb1d6b2-b610-4565-b818-ab6ee856e023/metadata' elapsed=0.07 Thread-12::DEBUG::2017-01-11 15:09:58,170::check::327::storage.check::(_check_completed) '/dev/84cfcb68-190f-4836-8294-d5752c07b762/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:09:58,556::check::327::storage.check::(_check_completed) '/dev/2204a85e-c8c7-4e1e-b8e6-e392645077c6/metadata' elapsed=0.07 Thread-12::DEBUG::2017-01-11 15:09:58,805::check::327::storage.check::(_check_completed) '/dev/7dfeac70-eaa1-4ba6-ad2a-e3c11564ee3b/metadata' elapsed=0.07 Thread-12::DEBUG::2017-01-11 15:09:59,093::check::327::storage.check::(_check_completed) '/dev/78e59ee0-13ac-4176-8950-837498ba6038/metadata' elapsed=0.06 Thread-12::DEBUG::2017-01-11 15:09:59,159::check::327::storage.check::(_check_completed) '/dev/b66a2944-a056-4a48-a3f9-83f509df5d1b/metadata' elapsed=0.06 Thread-12::DEBUG::2017-01-11 15:09:59,218::check::327::storage.check::(_check_completed) '/dev/da05d769-27c2-4270-9bba-5277bf3636e6/metadata' elapsed=0.06 Thread-12::DEBUG::2017-01-11 15:09:59,247::check::327::storage.check::(_check_completed) '/dev/819b51c0-96d7-43c2-b120-7adade60a2e2/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:09:59,363::check::327::storage.check::(_check_completed) '/dev/24499abc-0e16-48a2-8512-6c34e99dfa5f/metadata' elapsed=0.02 Thread-12::DEBUG::2017-01-11 15:09:59,573::check::327::storage.check::(_check_completed) '/dev/6b39fdce-38ac-4d36-8e58-300c7082a9c0/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:10:00,077::check::327::storage.check::(_check_completed) '/dev/1bde2522-d1dd-414f-9c48-c8a961071689/metadata' elapsed=0.10 Thread-12::DEBUG::2017-01-11 15:10:00,121::check::327::storage.check::(_check_completed) '/dev/0d2ad817-56b5-413e-886b-1be1777ed99a/metadata' elapsed=0.04 Thread-12::DEBUG::2017-01-11 15:10:00,238::check::327::storage.check::(_check_completed) '/dev/e54f3d6e-4790-430c-9fd9-fd6bf89d6413/metadata' elapsed=0.13 Thread-12::DEBUG::2017-01-11 15:10:00,307::check::327::storage.check::(_check_completed) '/dev/e6062875-dcf9-4401-8738-befc9629c2d5/metadata' elapsed=0.10 Thread-12::DEBUG::2017-01-11 15:10:00,345::check::327::storage.check::(_check_completed) '/dev/b819ee61-e907-441b-b3c1-9ff3ab1686d4/metadata' elapsed=0.04 Thread-12::DEBUG::2017-01-11 15:10:00,381::check::327::storage.check::(_check_completed) '/dev/1335e5e4-3c3e-4538-966e-0717098f52db/metadata' elapsed=0.05 Thread-12::DEBUG::2017-01-11 15:10:00,702::check::327::storage.check::(_check_completed) '/dev/9567770b-dfbc-489b-98f7-c7b2ca636e3d/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:10:02,268::check::327::storage.check::(_check_completed) '/dev/6e77516f-e074-426e-b375-64b6e22ac579/metadata' elapsed=0.06 Thread-12::DEBUG::2017-01-11 15:10:02,296::check::327::storage.check::(_check_completed) '/dev/8b1d2548-5c2b-46df-a1ac-4acf28ca99a8/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:10:05,873::check::327::storage.check::(_check_completed) '/dev/e70839af-77dd-40c0-a541-d364d30e859a/metadata' elapsed=0.03 Thread-12::DEBUG::2017-01-11 15:12:06,446::check::327::storage.check::(_check_completed) '/dev/6e77516f-e074-426e-b375-64b6e22ac579/metadata' elapsed=0.08 Thread-12::DEBUG::2017-01-11 15:18:06,707::check::327::storage.check::(_check_completed) '/dev/26ece5da-9e44-45c9-af7e-df074a3da9ac/metadata' elapsed=361.77 Thread-12::DEBUG::2017-01-11 15:18:06,711::check::327::storage.check::(_check_completed) '/dev/fb1ff10b-151f-4344-86d0-691c79acc16e/metadata' elapsed=361.75 Thread-12::DEBUG::2017-01-11 15:18:06,914::check::327::storage.check::(_check_completed) '/dev/6b50dac1-eba7-448d-bb8f-892f8cb41197/metadata' elapsed=361.92 Thread-12::DEBUG::2017-01-11 15:18:07,048::check::327::storage.check::(_check_completed) '/dev/4fa1053f-ca53-4e0d-b33f-5e10443d0f76/metadata' elapsed=362.04 Thread-12::DEBUG::2017-01-11 15:20:09,530::check::327::storage.check::(_check_completed) '/dev/b4d27568-7c66-4447-8330-f145f886fded/metadata' elapsed=484.50 Thread-12::DEBUG::2017-01-11 15:20:47,848::check::327::storage.check::(_check_completed) '/dev/60afa0bc-b022-4e71-8687-0e5afadd0ef2/metadata' elapsed=522.79 Thread-12::DEBUG::2017-01-11 15:21:20,545::check::327::storage.check::(_check_completed) '/dev/8890fd59-45b3-46ac-88d5-3020ec71bb92/metadata' elapsed=555.45 Thread-12::DEBUG::2017-01-11 15:21:21,360::check::327::storage.check::(_check_completed) '/dev/ed3abca9-742e-4b83-891b-958c03aa3fcb/metadata' elapsed=556.17 Thread-12::DEBUG::2017-01-11 15:21:22,094::check::327::storage.check::(_check_completed) '/dev/9b941172-1b62-4e3c-9af4-168c90140ffe/metadata' elapsed=556.98 Thread-12::DEBUG::2017-01-11 15:21:22,097::check::327::storage.check::(_check_completed) '/dev/02393b5e-ae7b-42e2-95c7-863ad0ad7d29/metadata' elapsed=556.89 Thread-12::DEBUG::2017-01-11 15:21:22,098::check::327::storage.check::(_check_completed) '/dev/5b930223-96e2-4eb5-b8bd-575c349c880c/metadata' elapsed=556.84 Thread-12::DEBUG::2017-01-11 15:21:22,209::check::327::storage.check::(_check_completed) '/dev/7af85dde-01ef-442f-8475-4d96ab84e4c7/metadata' elapsed=556.92 Thread-12::DEBUG::2017-01-11 15:22:49,640::check::327::storage.check::(_check_completed) '/dev/f35bda70-0be1-4907-82b6-5c676ee83cbe/metadata' elapsed=644.31 Thread-12::DEBUG::2017-01-11 15:23:27,979::check::327::storage.check::(_check_completed) '/dev/e69c4f4c-a195-4c21-bd69-9d6a2a38abf8/metadata' elapsed=682.63 Thread-12::DEBUG::2017-01-11 15:24:15,172::check::327::storage.check::(_check_completed) '/dev/664039e4-30d8-4aea-8a40-9abdc36ea3f4/metadata' elapsed=729.78 Thread-12::DEBUG::2017-01-11 15:24:15,250::check::327::storage.check::(_check_completed) '/dev/d3341c6f-c11e-4edd-82f6-90dcbd08b8c9/metadata' elapsed=729.84 Thread-12::DEBUG::2017-01-11 15:24:15,298::check::327::storage.check::(_check_completed) '/dev/1a380869-8c8e-4cf3-9fc1-758addf7a30d/metadata' elapsed=729.84 Thread-12::DEBUG::2017-01-11 15:24:19,416::check::327::storage.check::(_check_completed) '/dev/1b26f8ac-10c6-4595-b0f0-95bdfd42f05c/metadata' elapsed=733.94 Thread-12::DEBUG::2017-01-11 15:24:22,984::check::327::storage.check::(_check_completed) '/dev/9017d145-4dab-44bb-864a-f4b4616a239f/metadata' elapsed=737.48 Thread-12::DEBUG::2017-01-11 15:24:35,552::check::327::storage.check::(_check_completed) '/dev/1198e513-bdc8-4d5f-8ee5-8e8dc30d309d/metadata' elapsed=750.00 Thread-12::DEBUG::2017-01-11 15:24:35,553::check::327::storage.check::(_check_completed) '/dev/640ac4d3-1e14-465a-9a72-cc2f2c4cfe26/metadata' elapsed=749.98 Thread-12::DEBUG::2017-01-11 15:24:35,553::check::327::storage.check::(_check_completed) '/dev/6e98b678-a955-49b8-aad7-e1e52e26db1f/metadata' elapsed=749.94 Thread-12::DEBUG::2017-01-11 15:24:35,553::check::327::storage.check::(_check_completed) '/dev/02d31cfc-f095-42e6-8396-d4dbebbb4fed/metadata' elapsed=749.90 Thread-12::DEBUG::2017-01-11 15:24:35,554::check::327::storage.check::(_check_completed) '/dev/4b23a421-5c1f-4541-a007-c93b7af4986b/metadata' elapsed=749.88 Thread-12::DEBUG::2017-01-11 15:24:35,554::check::327::storage.check::(_check_completed) '/dev/5d8d49e2-ce0e-402e-9348-94f9576e2e28/metadata' elapsed=749.85 Thread-12::DEBUG::2017-01-11 15:24:35,555::check::327::storage.check::(_check_completed) '/dev/9fcbb7b1-a13a-499a-a534-119360d57f00/metadata' elapsed=749.83 Thread-12::DEBUG::2017-01-11 15:24:35,555::check::327::storage.check::(_check_completed) '/dev/a25ded63-2c31-4f1d-a65a-5390e47fda99/metadata' elapsed=749.78 Thread-12::DEBUG::2017-01-11 15:24:35,555::check::327::storage.check::(_check_completed) '/dev/f6a91d2f-ccae-4440-b1a7-f62ee750a58c/metadata' elapsed=749.76 Thread-12::DEBUG::2017-01-11 15:24:35,556::check::327::storage.check::(_check_completed) '/dev/bfb1d6b2-b610-4565-b818-ab6ee856e023/metadata' elapsed=749.73 Thread-12::DEBUG::2017-01-11 15:24:35,556::check::327::storage.check::(_check_completed) '/dev/84cfcb68-190f-4836-8294-d5752c07b762/metadata' elapsed=749.69 Thread-12::DEBUG::2017-01-11 15:24:35,556::check::327::storage.check::(_check_completed) '/dev/2204a85e-c8c7-4e1e-b8e6-e392645077c6/metadata' elapsed=749.67 Thread-12::DEBUG::2017-01-11 15:24:35,557::check::327::storage.check::(_check_completed) '/dev/7dfeac70-eaa1-4ba6-ad2a-e3c11564ee3b/metadata' elapsed=749.63 Thread-12::DEBUG::2017-01-11 15:24:35,557::check::327::storage.check::(_check_completed) '/dev/78e59ee0-13ac-4176-8950-837498ba6038/metadata' elapsed=749.61 Thread-12::DEBUG::2017-01-11 15:24:35,557::check::327::storage.check::(_check_completed) '/dev/b66a2944-a056-4a48-a3f9-83f509df5d1b/metadata' elapsed=749.56 Thread-12::DEBUG::2017-01-11 15:24:35,558::check::327::storage.check::(_check_completed) '/dev/da05d769-27c2-4270-9bba-5277bf3636e6/metadata' elapsed=749.54 Thread-12::DEBUG::2017-01-11 15:24:35,558::check::327::storage.check::(_check_completed) '/dev/819b51c0-96d7-43c2-b120-7adade60a2e2/metadata' elapsed=749.52 Thread-12::DEBUG::2017-01-11 15:24:35,559::check::327::storage.check::(_check_completed) '/dev/24499abc-0e16-48a2-8512-6c34e99dfa5f/metadata' elapsed=749.48 Thread-12::DEBUG::2017-01-11 15:24:35,559::check::327::storage.check::(_check_completed) '/dev/6b39fdce-38ac-4d36-8e58-300c7082a9c0/metadata' elapsed=749.45 Thread-12::DEBUG::2017-01-11 15:24:35,559::check::327::storage.check::(_check_completed) '/dev/1bde2522-d1dd-414f-9c48-c8a961071689/metadata' elapsed=749.41 Thread-12::DEBUG::2017-01-11 15:24:35,560::check::327::storage.check::(_check_completed) '/dev/0d2ad817-56b5-413e-886b-1be1777ed99a/metadata' elapsed=749.39 Thread-12::DEBUG::2017-01-11 15:24:35,560::check::327::storage.check::(_check_completed) '/dev/e54f3d6e-4790-430c-9fd9-fd6bf89d6413/metadata' elapsed=749.34 Thread-12::DEBUG::2017-01-11 15:24:35,560::check::327::storage.check::(_check_completed) '/dev/e6062875-dcf9-4401-8738-befc9629c2d5/metadata' elapsed=749.33 Thread-12::DEBUG::2017-01-11 15:24:35,561::check::327::storage.check::(_check_completed) '/dev/b819ee61-e907-441b-b3c1-9ff3ab1686d4/metadata' elapsed=749.28 Thread-12::DEBUG::2017-01-11 15:24:35,561::check::327::storage.check::(_check_completed) '/dev/1335e5e4-3c3e-4538-966e-0717098f52db/metadata' elapsed=749.26 Thread-12::DEBUG::2017-01-11 15:24:35,561::check::327::storage.check::(_check_completed) '/dev/9567770b-dfbc-489b-98f7-c7b2ca636e3d/metadata' elapsed=749.22 Thread-12::DEBUG::2017-01-11 15:24:36,207::check::327::storage.check::(_check_completed) '/dev/8b1d2548-5c2b-46df-a1ac-4acf28ca99a8/metadata' elapsed=749.80 Thread-12::DEBUG::2017-01-11 15:24:36,210::check::327::storage.check::(_check_completed) '/dev/e70839af-77dd-40c0-a541-d364d30e859a/metadata' elapsed=749.78 Thread-12::DEBUG::2017-01-11 15:24:36,211::check::327::storage.check::(_check_completed) '/dev/6e77516f-e074-426e-b375-64b6e22ac579/metadata' elapsed=0.65 Thread-12::DEBUG::2017-01-11 15:24:36,264::check::327::storage.check::(_check_completed) '/dev/26ece5da-9e44-45c9-af7e-df074a3da9ac/metadata' elapsed=0.67 Thread-12::DEBUG::2017-01-11 15:24:36,264::check::327::storage.check::(_check_completed) '/dev/fb1ff10b-151f-4344-86d0-691c79acc16e/metadata' elapsed=0.67 Thread-12::DEBUG::2017-01-11 15:24:36,265::check::327::storage.check::(_check_completed) '/dev/6b50dac1-eba7-448d-bb8f-892f8cb41197/metadata' elapsed=0.62 Thread-12::DEBUG::2017-01-11 15:24:36,265::check::327::storage.check::(_check_completed) '/dev/4fa1053f-ca53-4e0d-b33f-5e10443d0f76/metadata' elapsed=0.60 Thread-12::DEBUG::2017-01-11 15:24:36,265::check::327::storage.check::(_check_completed) '/dev/b4d27568-7c66-4447-8330-f145f886fded/metadata' elapsed=0.55 Thread-12::DEBUG::2017-01-11 15:24:36,266::check::327::storage.check::(_check_completed) '/dev/60afa0bc-b022-4e71-8687-0e5afadd0ef2/metadata' elapsed=0.53 Thread-12::DEBUG::2017-01-11 15:24:36,266::check::327::storage.check::(_check_completed) '/dev/8890fd59-45b3-46ac-88d5-3020ec71bb92/metadata' elapsed=0.48 Thread-12::DEBUG::2017-01-11 15:24:36,267::check::327::storage.check::(_check_completed) '/dev/9b941172-1b62-4e3c-9af4-168c90140ffe/metadata' elapsed=0.46 Thread-12::DEBUG::2017-01-11 15:24:36,267::check::327::storage.check::(_check_completed) '/dev/ed3abca9-742e-4b83-891b-958c03aa3fcb/metadata' elapsed=0.42 Looking in vdsm at the time when this issue started, we can see: 1. Thread-12 starting read check for domain 6e77516f-e074-426e-b375-64b6e22ac579 Thread-12::DEBUG::2017-01-11 15:12:06,365::check::296::storage.check::(_start_process) START check '/dev/6e77516f-e074-426e-b375-64b6e22ac579/metadata' cmd=['/usr/bin/taskset', '--cpu-list', '0-47', '/usr/bin/dd', 'if=/dev/6e77516f-e074-426e-b375-64b6e22ac579/metadata', 'of=/dev/null', 'bs=4096', 'count=1', 'iflag=direct'] delay=114.16 2. Thread-12 finished checking domain 6e77516f-e074-426e-b375-64b6e22ac579 Thread-12::DEBUG::2017-01-11 15:12:06,446::check::327::storage.check::(_check_completed) FINISH check '/dev/6e77516f-e074-426e-b375-64b6e22ac579/metadata' rc=0 err=bytearray(b'1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000452604 s, 9.0 MB/s\n') elapsed=0.08 The next thing this thread should do is to acquire the host id, but Thread-24 tried first, and is blocking now Thread-12 (there no logs from this thread until 15:15:25). 3. Thread 24 trying to acquire host id for domain 6e77516f-e074-426e-b375-64b6e22ac579: Thread-24::INFO::2017-01-11 15:12:06,491::clusterlock::236::Storage.SANLock::(acquireHostId) Acquiring host id for domain 6e77516f-e074-426e-b375-64b6e22ac579 (id: 1) 4. This should take no time, as we are using async=True, but it took 3 minutes: Thread-24::DEBUG::2017-01-11 15:15:25,011::clusterlock::254::Storage.SANLock::(acquireHostId) Host id for domain 6e77516f-e074-426e-b375-64b6e22ac579 successfully acquired (id: 1) 5. Thread-12 trying to acquire the host id for same domain: Thread-12::INFO::2017-01-11 15:15:25,012::clusterlock::236::Storage.SANLock::(acquireHostId) Acquiring host id for domain 6e77516f-e074-426e-b375-64b6e22ac579 (id: 1) This is a bug, only one of the threads should try to take acquire the host id, but normally this operation is very fast so it does not matter, but here it took 3 minutes: 6. Thread 12 finish to acquire host id Thread-12::DEBUG::2017-01-11 15:18:06,707::clusterlock::254::Storage.SANLock::(acquireHostId) Host id for domain 6e77516f-e074-426e-b375-64b6e22ac579 successfully acquired (id: 1) Thread-12 is responsible for checking readability of the storage domain, it should never block. Here it was blocked for 6 minutes, this cause host to become non-operational. The root cause of these issues is sanlock blocking for long time when acquiring host id in async mode. There is no information in sanlock log why async operation was blocked for long time. Can you file a bug about this, and attach the log you sent in this thread? https://bugzilla.redhat.com/enter_bug.cgi?product=vdsm Next steps: 1. Please enable debug logging in sanlock log: edit /etc/sanlock/sanlock.conf and set: logfile_priority = 7 Move the host to maintenance and restart sanlock This may give us more details in the next try, to understand why sanlock blocks for long time. 2. Try vdsm patch elimiting the delays in the monitoring thrad. You can try this patch, eliminating the delays in domain monitoring if sanlock blocks for long time when acquiring a host id. With this patch, only storage domains where acquiring the host id is slow will be affected. https://gerrit.ovirt.org/70450 Adding David (sanlock maintainer), he will probably add more info how to debug the unexpected blocking in sanlock.add_lockspace(). Please report the results with these changes. Nir