On Wed, Dec 30, 2015 at 7:50 PM, John Florian <jflorian(a)doubledog.org> wrote:
On 12/29/2015 02:02 AM, Yedidyah Bar David wrote:
> On Tue, Dec 29, 2015 at 12:51 AM, John Florian <jflorian(a)doubledog.org> wrote:
>> I'm trying to run the engine-backup script via a Bacula job using the
>> RunScript option so that the engine-backup dumps its output someplace
>> where Bacula will collect it once engine-backup finishes. However the
>> job is failing and with enough digging I eventually learned the script
>> was writing the following in /tmp/hs_err_pid5789.log:
>>
>> #
>> # There is insufficient memory for the Java Runtime Environment to continue.
>> # Native memory allocation (mmap) failed to map 2555904 bytes for
>> committing reserved memory.
>> # Possible reasons:
>> # The system is out of physical RAM or swap space
>> # In 32 bit mode, the process size limit was hit
>> # Possible solutions:
>> # Reduce memory load on the system
>> # Increase physical memory or swap space
>> # Check if swap backing store is full
>> # Use 64 bit Java on a 64 bit OS
>> # Decrease Java heap size (-Xmx/-Xms)
>> # Decrease number of Java threads
>> # Decrease Java thread stack sizes (-Xss)
>> # Set larger code cache with -XX:ReservedCodeCacheSize=
>> # This output file may be truncated or incomplete.
>> #
>> # Out of Memory Error (os_linux.cpp:2627), pid=5789, tid=140709998221056
>> #
>> # JRE version: (8.0_65-b17) (build )
>> # Java VM: OpenJDK 64-Bit Server VM (25.65-b01 mixed mode linux-amd64
>> compressed oops)
>> # Failed to write core dump. Core dumps have been disabled. To enable
>> core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>>
>>
>> So is there any good way to reduce the Java heap size? I mean I know
>> what -Xmx does, but where might I try setting it, ideally so that it
>> affects the engine-backup only? Any idea of good setting for a very
>> small environment with a dozen VMs?
> engine-backup does not directly call nor need java.
>
> AFAICS it only calls it indirectly as part of some other initialization
> by running java-home [1], which is a script that decides what JAVA_HOME
> to use for the engine. This script only runs 'java -version', which imo
> should not need that much memory. Perhaps there is something else I do
> not fully understand, such as bacula severely limiting available resources
> for the process it runs, or something like that.
>
> If you only want to debug it, and not as a recommended final solution,
> you can create a script [2] which only outputs the needed java home.
> Simply run [1] and make [2] echo the same thing. If [2] exists, [1] will
> only run it and nothing else, as you can see inside it.
>
> I do not think this will work - quite likely engine-backup will fail
> shortly later, if indeed it gets access to so little memory. Please
> report back. Thanks and good luck,
>
> [1] /usr/share/ovirt-engine/bin/java-home
> [2] /usr/share/ovirt-engine/bin/java-home.local
Thanks for the info and response Didi. Doing the above did allow the
backup to run successfully.
OK.
I had also replaced the Bacula RunScript
with "bash -c ulimit" which reported unlimited but I don't play with
those types of limits enough to know if that's correctly reporting to
what engine-backup is constrained.
And was this enough?
I did occur to me that perhaps a
better way to learn of any such constraints would be to query Bacula's
file daemon (the only necessary Bacula component running on client
systems that are getting backed up) since I suspect it must be this
component that's actually spawning the RunScript client side. From the
Bacula Director (server side) I queried the status of the client which
is my oVirt engine and it reports:
europa.doubledog.org-fd Version: 5.2.13 (19 February 2013)
x86_64-redhat-linux-gnu redhat (Core)
Daemon started 28-Dec-15 16:08. Jobs: run=2 running=0.
Heap: heap=32,768 smbytes=190,247 max_bytes=1,599,864 bufs=100
max_bufs=6,758
Sizeof: boffset_t=8 size_t=8 debug=0 trace=0
Alas, I know of no way to increase any of the bacula-fd limits. If I
dead-end here, perhaps I'll query the Bacula mailing lists.
For both yourself and for others, I think it's best to continue with
this route.
Also note that I have no idea how much memory pg_dump might need on
a larger database, also including dwh which tends to get larger faster
than the engine's.
Meanwhile I tried the following for a more permanent solution but this
failed same as before:
# diff -u java-home.orig-3.6.1.3 java-home
--- java-home.orig-3.6.1.3 2015-12-10 13:07:44.000000000 -0500
+++ java-home 2015-12-30 12:12:45.779462769 -0500
@@ -13,7 +13,7 @@
local ret=1
if [ -x "${dir}/bin/java" ]; then
- local version="$("${dir}/bin/java" -version 2>&1 |
sed \
+ local version="$("${dir}/bin/java" -Xmx 8 -version
2>&1
| sed \
-e 's/^openjdk version "1\.8\.0.*/VERSION_OK/' \
-e 's/^java version "1\.7\.0.*/VERSION_OK/' \
-e 's/^OpenJDK .*(.*).*/VENDOR_OK/' \
No idea here, you might try passing other options, and/or strace/valgrind/etc,
and/or monitor with other (including java-specific) tools, etc., and/or ask
Java experts (I am not one). Adding Juan.
If this script is merely checking the validity of the JRE/JDK, should
it not be possible to have a test on the rpm details first and only
proceed as it does now if that doesn't work? The current tests should
work w/o much regard for how the JRE/JDK got installed, but if it was
installed via rpm it seems a simpler test could be used as a shortcut.
Patches are welcome :-)
Note that current code is designed to be compatible with many environments,
including different el/fedora versions, upgrades inside them etc., and
the $0.local was added mainly to allow supporting other systems (including
gentoo) where $0.local will also be shipped/packaged by the distribution.
Obviously we can add similar patches to make it even more complex, but as
I wrote above, not sure it's worth it - because if memory is your only
problem, you might simply postpone it this way.
Best,
--
Didi