On 12/31/2015 08:48 AM, Yedidyah Bar David wrote:
On Wed, Dec 30, 2015 at 7:50 PM, John Florian
<jflorian(a)doubledog.org> wrote:
> On 12/29/2015 02:02 AM, Yedidyah Bar David wrote:
>> On Tue, Dec 29, 2015 at 12:51 AM, John Florian <jflorian(a)doubledog.org>
wrote:
>>> I'm trying to run the engine-backup script via a Bacula job using the
>>> RunScript option so that the engine-backup dumps its output someplace
>>> where Bacula will collect it once engine-backup finishes. However the
>>> job is failing and with enough digging I eventually learned the script
>>> was writing the following in /tmp/hs_err_pid5789.log:
>>>
>>> #
>>> # There is insufficient memory for the Java Runtime Environment to continue.
>>> # Native memory allocation (mmap) failed to map 2555904 bytes for
>>> committing reserved memory.
>>> # Possible reasons:
>>> # The system is out of physical RAM or swap space
>>> # In 32 bit mode, the process size limit was hit
>>> # Possible solutions:
>>> # Reduce memory load on the system
>>> # Increase physical memory or swap space
>>> # Check if swap backing store is full
>>> # Use 64 bit Java on a 64 bit OS
>>> # Decrease Java heap size (-Xmx/-Xms)
>>> # Decrease number of Java threads
>>> # Decrease Java thread stack sizes (-Xss)
>>> # Set larger code cache with -XX:ReservedCodeCacheSize=
>>> # This output file may be truncated or incomplete.
>>> #
>>> # Out of Memory Error (os_linux.cpp:2627), pid=5789, tid=140709998221056
>>> #
>>> # JRE version: (8.0_65-b17) (build )
>>> # Java VM: OpenJDK 64-Bit Server VM (25.65-b01 mixed mode linux-amd64
>>> compressed oops)
>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>> #
>>>
>>>
>>> So is there any good way to reduce the Java heap size? I mean I know
>>> what -Xmx does, but where might I try setting it, ideally so that it
>>> affects the engine-backup only? Any idea of good setting for a very
>>> small environment with a dozen VMs?
>> engine-backup does not directly call nor need java.
>>
>> AFAICS it only calls it indirectly as part of some other initialization
>> by running java-home [1], which is a script that decides what JAVA_HOME
>> to use for the engine. This script only runs 'java -version', which imo
>> should not need that much memory. Perhaps there is something else I do
>> not fully understand, such as bacula severely limiting available resources
>> for the process it runs, or something like that.
>>
>> If you only want to debug it, and not as a recommended final solution,
>> you can create a script [2] which only outputs the needed java home.
>> Simply run [1] and make [2] echo the same thing. If [2] exists, [1] will
>> only run it and nothing else, as you can see inside it.
>>
>> I do not think this will work - quite likely engine-backup will fail
>> shortly later, if indeed it gets access to so little memory. Please
>> report back. Thanks and good luck,
>>
>> [1] /usr/share/ovirt-engine/bin/java-home
>> [2] /usr/share/ovirt-engine/bin/java-home.local
> Thanks for the info and response Didi. Doing the above did allow the
> backup to run successfully.
OK.
> I had also replaced the Bacula RunScript
> with "bash -c ulimit" which reported unlimited but I don't play with
> those types of limits enough to know if that's correctly reporting to
> what engine-backup is constrained.
And was this enough?
> I did occur to me that perhaps a
> better way to learn of any such constraints would be to query Bacula's
> file daemon (the only necessary Bacula component running on client
> systems that are getting backed up) since I suspect it must be this
> component that's actually spawning the RunScript client side. From the
> Bacula Director (server side) I queried the status of the client which
> is my oVirt engine and it reports:
>
> europa.doubledog.org-fd Version: 5.2.13 (19 February 2013)
> x86_64-redhat-linux-gnu redhat (Core)
> Daemon started 28-Dec-15 16:08. Jobs: run=2 running=0.
> Heap: heap=32,768 smbytes=190,247 max_bytes=1,599,864 bufs=100
> max_bufs=6,758
> Sizeof: boffset_t=8 size_t=8 debug=0 trace=0
>
> Alas, I know of no way to increase any of the bacula-fd limits. If I
> dead-end here, perhaps I'll query the Bacula mailing lists.
For both yourself and for others, I think it's best to continue with
this route.
Also note that I have no idea how much memory pg_dump might need on
a larger database, also including dwh which tends to get larger faster
than the engine's.
>
> Meanwhile I tried the following for a more permanent solution but this
> failed same as before:
>
> # diff -u java-home.orig-3.6.1.3 java-home
> --- java-home.orig-3.6.1.3 2015-12-10 13:07:44.000000000 -0500
> +++ java-home 2015-12-30 12:12:45.779462769 -0500
> @@ -13,7 +13,7 @@
> local ret=1
>
> if [ -x "${dir}/bin/java" ]; then
> - local version="$("${dir}/bin/java" -version
2>&1 | sed \
> + local version="$("${dir}/bin/java" -Xmx 8 -version
2>&1
> | sed \
> -e 's/^openjdk version "1\.8\.0.*/VERSION_OK/'
\
> -e 's/^java version "1\.7\.0.*/VERSION_OK/' \
> -e 's/^OpenJDK .*(.*).*/VENDOR_OK/' \
No idea here, you might try passing other options, and/or strace/valgrind/etc,
and/or monitor with other (including java-specific) tools, etc., and/or ask
Java experts (I am not one). Adding Juan.
I believe that this isn't really a memory problem, as the amount of
memory that the Java virtual machine is requesting is very small, less
than 3 MiB. It is probably related to the fact that the Bacula daemon
that runs the script runs in its own SELinux "bacula_t" context. You can
quickly verify this by temporarily disabling SELinux, trying to perform
the backup, and then enabling it again:
# setenforce 0
# Perform the backup
# setenforce 1
You should also see a description of the problem in the
/var/log/audit/audit.log file. When I tried it I saw this:
type=AVC msg=audit(1451571576.334:336): avc: denied { execmem } for
pid=4622 comm="java" scontext=system_u:system_r:bacula_t:s0
tcontext=system_u:system_r:bacula_t:s0 tclass=process
That message says that the Java virtual machine is trying to map an area
of memory that is both writeable and executable. That makes sense, it is
probably an area used by the HotSpot compiler, that generates code
during runtime. But this happens to be forbidden for the "bacula_t"
SELinux context.
You have several alternatives here. The more drastic one is to disable
SELinux permanently, setting the SELINUX variable in /etc/selinux/config
to permissive or disabled. This is bad idea in general, and if I
remember correctly oVirt doesn't work well with SELinux disabled.
You can also just disable SELinux for the bacula daemon, removing the
"bacula" policy module, and then restarting them:
# semodule -r bacula
# systemctl restart bacula-fd
This isn't good idea either, as it will remove the "bacula.pp" file,
which isn't a configuration file and will come back when you upgrade the
SELinux RPMs.
Another thing you can do is set only the "bacula_t" type to permissive:
# semanage permissive -a bacula_t
This service won't then enjoy the SELinux protection, but the others
will. This is probably the better choice.
Finally, you can also create your own policy module, allowing to the
"bacula_t" context the "execmem" operation. The easiest way to do
this
is to use the "audit2allow" tool, which generates the policy module from
the audit log:
# audit2allow -M mypolicy <<.
type=AVC msg=audit(1451571576.334:336): avc: denied { execmem } for
pid=4622 comm="java" scontext=system_u:system_r:bacula_t:s0
tcontext=system_u:system_r:bacula_t:s0 tclass=process
.
This will generate a "mypolicy.pp" file that allows that operation. You
can then activate it like this:
# sepolicy -i mypolicy.pp
>
>
> If this script is merely checking the validity of the JRE/JDK, should
> it not be possible to have a test on the rpm details first and only
> proceed as it does now if that doesn't work? The current tests should
> work w/o much regard for how the JRE/JDK got installed, but if it was
> installed via rpm it seems a simpler test could be used as a shortcut.
Patches are welcome :-)
Note that current code is designed to be compatible with many environments,
including different el/fedora versions, upgrades inside them etc., and
the $0.local was added mainly to allow supporting other systems (including
gentoo) where $0.local will also be shipped/packaged by the distribution.
Obviously we can add similar patches to make it even more complex, but as
I wrote above, not sure it's worth it - because if memory is your only
problem, you might simply postpone it this way.
Best,
--
Dirección Comercial: C/Jose Bardasano Baos, 9, Edif. Gorbea 3, planta
3ºD, 28016 Madrid, Spain
Inscrita en el Reg. Mercantil de Madrid – C.I.F. B82657941 - Red Hat S.L.