Hi Sahina,
Many thanks for your response and apologies for my delay in getting back to you.
How was the schedule created - is this using the Remote Data Sync
Setup under Storage domain?
Ovirt is configured in ‘Gluster’ mode, no VM support. When snapshotting we are taking a
snapshot of the full Gluster volume.
To configure the snapshot schedule I did the following;
Login to Ovirt WebUI
From left hand menu select ‘Storage’ and ‘Volumes'
I then selected the volume I wanted to snapshot by clicking on the link within the ‘Name’
column
From here I selected the ‘Snapshots’ tab
From the top menu options I selected the drop down ‘Snapshot’
From the drop down options I selected ‘New’
A new window appeared titled ‘Create/Schedule Snapshot’
I entered a snapshot prefix and description into the available fields and selected the
‘Schedule’ page
On the schedule page I selected ‘Minute’ from the ‘Recurrence’ drop down
Set ‘Interval’ to every ’30’ minutes
Changed timezone to ‘Europe/London=(GMT+00:00) London Standard Time’
Left value in ‘Start Schedule by’ at default value
Set schedule to ‘No End Date’
Click 'OK'
Interestingly I get the following message on the ‘Create/Schedule Snapshot’ page before
clicking on OK;
Frequent creation of snapshots would overload the cluster
Gluster CLI based snapshot scheduling is enabled. It would be disabled once volume
snapshots scheduled from UI.
What is interesting is that I have not enabled 'Gluster CLI based snapshot
scheduling’.
After clicking OK I am returned to the Volume Snapshots tab.
From this point I get no snapshots created according to the schedule set.
At the time of clicking OK in the WebUI to enable the schedule I get the following in the
engine log;
2018-05-14 09:24:11,068Z WARN [org.ovirt.engine.core.dal.job.ExecutionMessageDirector]
(default task-128) [85d0b16f-2c0c-464f-bbf1-682c062a4871] The message key
'ScheduleGlusterVolumeSnapshot' is missing from
'bundles/ExecutionMessages'
2018-05-14 09:24:11,090Z INFO
[org.ovirt.engine.core.bll.gluster.ScheduleGlusterVolumeSnapshotCommand] (default
task-128) [85d0b16f-2c0c-464f-bbf1-682c062a4871] Before acquiring and wait lock
'EngineLock:{exclusiveLocks='[712da1df-4c11-405a-8fb6-f99aebc185c1=GLUSTER_SNAPSHOT]',
sharedLocks=''}'
2018-05-14 09:24:11,090Z INFO
[org.ovirt.engine.core.bll.gluster.ScheduleGlusterVolumeSnapshotCommand] (default
task-128) [85d0b16f-2c0c-464f-bbf1-682c062a4871] Lock-wait acquired to object
'EngineLock:{exclusiveLocks='[712da1df-4c11-405a-8fb6-f99aebc185c1=GLUSTER_SNAPSHOT]',
sharedLocks=''}'
2018-05-14 09:24:11,111Z INFO
[org.ovirt.engine.core.bll.gluster.ScheduleGlusterVolumeSnapshotCommand] (default
task-128) [85d0b16f-2c0c-464f-bbf1-682c062a4871] Running command:
ScheduleGlusterVolumeSnapshotCommand internal: false. Entities affected : ID:
712da1df-4c11-405a-8fb6-f99aebc185c1 Type: GlusterVolumeAction group
MANIPULATE_GLUSTER_VOLUME with role type ADMIN
2018-05-14 09:24:11,148Z INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-128)
[85d0b16f-2c0c-464f-bbf1-682c062a4871] EVENT_ID: GLUSTER_VOLUME_SNAPSHOT_SCHEDULED(4,134),
Snapshots scheduled on volume glustervol0 of cluster NOSS-LD5.
2018-05-14 09:24:11,156Z INFO
[org.ovirt.engine.core.bll.gluster.ScheduleGlusterVolumeSnapshotCommand] (default
task-128) [85d0b16f-2c0c-464f-bbf1-682c062a4871] Lock freed to object
'EngineLock:{exclusiveLocks='[712da1df-4c11-405a-8fb6-f99aebc185c1=GLUSTER_SNAPSHOT]',
sharedLocks=''}'
Could you please provide the engine.log from the time the schedule
was setup and including the time the schedule was supposed to run?
The original log file is no longer present, so I removed the old schedule and created a
new schedule, as per the instructions above, earlier today. I have therefor attached the
engine log from today. The new schedule, which was set to run every 30 minutes, has not
produced any snapshots after around 2 hours.
Please let me know if you require any further information.
Many thanks,
Mark Betham.
>
>
>
> On Thu, May 3, 2018 at 4:37 PM, Mark Betham <mark.betham(a)googlemail.com
<mailto:mark.betham@googlemail.com>> wrote:
> Hi Ovirt community,
>
> I am hoping you will be able to help with a problem I am experiencing when trying to
schedule a snapshot of my Gluster volumes using the Ovirt portal.
>
> Below is an overview of the environment;
>
> I have an Ovirt instance running which is managing our Gluster storage. We are
running Ovirt version "4.2.2.6-1.el7.centos", Gluster version
"glusterfs-3.13.2-2.el7" on a base OS of "CentOS Linux release 7.4.1708
(Core)", Kernel "3.10.0 - 693.21.1.el7.x86_64", VDSM version
"vdsm-4.20.23-1.el7.centos". All of the versions of software are the latest
release and have been fully patched where necessary.
>
> Ovirt has been installed and configured in "Gluster" mode only, no
virtualisation. The Ovirt platform runs from one of the Gluster storage nodes.
>
> Gluster runs with 2 clusters, each located at a different physical site (UK and DE).
Each of the storage clusters contain 3 storage nodes. Each storage cluster contains a
single gluster volume. The Gluster volume is 3 * Replicated. The Gluster volume runs on
top of a LVM thin vol which has been provisioned with a XFS filesystem. The system is
running a Geo-rep between the 2 geo-diverse clusters.
>
> The host servers running at the primary site are of specification 1 * Intel(R)
Xeon(R) CPU E3-1270 v5 @ 3.60GHz (8 core with HT), 64GB Ram, LSI MegaRAID SAS 9271 with
bbu and cache, 8 * SAS 10K 2.5" 1.8TB enterprise drives configured in a RAID 10 array
to give 6.52TB of useable space. The host servers running at the secondary site are of
specification 1 * Intel(R) Xeon(R) CPU E3-1271 v3 @ 3.60GHz (8 core with HT), 32GB Ram,
LSI MegaRAID SAS 9260 with bbu and cache, 8 * SAS 10K 2.5" 1.8TB enterprise drives
configured in a RAID 10 array to give 6.52TB of useable space. The secondary site is for
DR use only.
>
> When I first starting experiencing the issue and was unable to resolve it, I carried
out a full rebuild from scratch across the two storage clusters. I had spent some time
troubleshooting the issue but felt it worthwhile to ensure I had a clean platform, void of
any potential issues which may be there due to some of the previous work carried out. The
platform was rebuilt and data re-ingested. It is probably worth mentioning that this
environment will become our new production platform, we will be migrating data and
services to this new platform from our existing Gluster storage cluster. The date for the
migration activity is getting closer so available time has become an issue and will not
permit another full rebuild of the platform without impacting delivery date.
>
> After the rebuild with both storage clusters online, available and managed within the
Ovirt platform I conducted some basic commissioning checks and I found no issues. The
next step I took at this point was to setup the Geo-replication. This was brought online
with no issues and data was seen to be synchronised without any problems. At this point
the data re-ingestion was started and the new data was synchronised by the
Geo-replication.
>
> The first step in bringing the snapshot schedule online was to validate that
snapshots could be taken outside of the scheduler. Taking a manual snapshot via the OVirt
portal worked without issue. Several were taken on both primary and secondary clusters.
At this point a schedule was created on the primary site cluster via the Ovirt portal to
create a snapshot of the storage at hourly intervals. The schedule was created
successfully however no snapshots were ever created. Examining the logs did not show
anything which I believed was a direct result of the faulty schedule but it is quite
possible I missed something.
>
How was the schedule created - is this using the Remote Data Sync
Setup under Storage domain?
>
>
> I reviewed many online articles, bug reports and application manuals in relation to
snapshotting. There were several loosely related support articles around snapshotting but
none of the recommendations seemed to work. I did the same with manuals and again nothing
that seemed to work. What I did find were several references to running snapshots along
with geo-replication and that the geo-replication should be paused when creating. So I
removed all existing references to any snapshot schedule, paused the Geo-repl and
recreated the snapshot schedule. The schedule was never actioned and no snapshots were
created. Removed Geo-repl entirely, remove all schedules and carried out a reboot of the
entire platform. When the system was fully back online and no pending heal operations the
schedule was re-added for the primary site only. No difference in the results and no
snapshots were created from the schedule.
>
> I have now reached the point where I feel I require assistance and hence this email
request.
>
> If you require any further data then please let me know and I will do my best to get
it for you.
>
Could you please provide the engine.log from the time the schedule
was setup and including the time the schedule was supposed to run?
>
>
>
> Any help you can give would be greatly appreciated.
>
> Many thanks,
>
> Mark Betham
>
> _______________________________________________
> Users mailing list
> Users(a)ovirt.org <mailto:Users@ovirt.org>
>
http://lists.ovirt.org/mailman/listinfo/users
<
http://lists.ovirt.org/mailman/listinfo/users>
>
>