[ovirt-users] Can I reduce the Java heap size of engine-backup???

John Florian jflorian at doubledog.org
Thu Dec 31 16:41:03 UTC 2015


On 12/31/2015 10:42 AM, Juan Hernández wrote:
> On 12/31/2015 08:48 AM, Yedidyah Bar David wrote:
>> On Wed, Dec 30, 2015 at 7:50 PM, John Florian <jflorian at doubledog.org> wrote:
>>> On 12/29/2015 02:02 AM, Yedidyah Bar David wrote:
>>>> On Tue, Dec 29, 2015 at 12:51 AM, John Florian <jflorian at doubledog.org> wrote:
>>>>> I'm trying to run the engine-backup script via a Bacula job using the
>>>>> RunScript option so that the engine-backup dumps its output someplace
>>>>> where Bacula will collect it once engine-backup finishes.  However the
>>>>> job is failing and with enough digging I eventually learned the script
>>>>> was writing the following in /tmp/hs_err_pid5789.log:
>>>>>
>>>>> #
>>>>> # There is insufficient memory for the Java Runtime Environment to continue.
>>>>> # Native memory allocation (mmap) failed to map 2555904 bytes for
>>>>> committing reserved memory.
>>>>> # Possible reasons:
>>>>> #   The system is out of physical RAM or swap space
>>>>> #   In 32 bit mode, the process size limit was hit
>>>>> # Possible solutions:
>>>>> #   Reduce memory load on the system
>>>>> #   Increase physical memory or swap space
>>>>> #   Check if swap backing store is full
>>>>> #   Use 64 bit Java on a 64 bit OS
>>>>> #   Decrease Java heap size (-Xmx/-Xms)
>>>>> #   Decrease number of Java threads
>>>>> #   Decrease Java thread stack sizes (-Xss)
>>>>> #   Set larger code cache with -XX:ReservedCodeCacheSize=
>>>>> # This output file may be truncated or incomplete.
>>>>> #
>>>>> #  Out of Memory Error (os_linux.cpp:2627), pid=5789, tid=140709998221056
>>>>> #
>>>>> # JRE version:  (8.0_65-b17) (build )
>>>>> # Java VM: OpenJDK 64-Bit Server VM (25.65-b01 mixed mode linux-amd64
>>>>> compressed oops)
>>>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>>>> #
>>>>>
>>>>>
>>>>> So is there any good way to reduce the Java heap size?  I mean I know
>>>>> what -Xmx does, but where might I try setting it, ideally so that it
>>>>> affects the engine-backup only?  Any idea of good setting for a very
>>>>> small environment with a dozen VMs?
>>>> engine-backup does not directly call nor need java.
>>>>
>>>> AFAICS it only calls it indirectly as part of some other initialization
>>>> by running java-home [1], which is a script that decides what JAVA_HOME
>>>> to use for the engine. This script only runs 'java -version', which imo
>>>> should not need that much memory. Perhaps there is something else I do
>>>> not fully understand, such as bacula severely limiting available resources
>>>> for the process it runs, or something like that.
>>>>
>>>> If you only want to debug it, and not as a recommended final solution,
>>>> you can create a script [2] which only outputs the needed java home.
>>>> Simply run [1] and make [2] echo the same thing. If [2] exists, [1] will
>>>> only run it and nothing else, as you can see inside it.
>>>>
>>>> I do not think this will work - quite likely engine-backup will fail
>>>> shortly later, if indeed it gets access to so little memory. Please
>>>> report back. Thanks and good luck,
>>>>
>>>> [1] /usr/share/ovirt-engine/bin/java-home
>>>> [2] /usr/share/ovirt-engine/bin/java-home.local
>>> Thanks for the info and response Didi.  Doing the above did allow the
>>> backup to run successfully.
>> OK.
>>
>>>  I had also replaced the Bacula RunScript
>>> with "bash -c ulimit" which reported unlimited but I don't play with
>>> those types of limits enough to know if that's correctly reporting to
>>> what engine-backup is constrained.
>> And was this enough?
>>
>>>  I did occur to me that perhaps a
>>> better way to learn of any such constraints would be to query Bacula's
>>> file daemon (the only necessary Bacula component running on client
>>> systems that are getting backed up) since I suspect it must be this
>>> component that's actually spawning the RunScript client side.  From the
>>> Bacula Director (server side) I queried the status of the client which
>>> is my oVirt engine and it reports:
>>>
>>> europa.doubledog.org-fd Version: 5.2.13 (19 February 2013)
>>> x86_64-redhat-linux-gnu redhat (Core)
>>> Daemon started 28-Dec-15 16:08. Jobs: run=2 running=0.
>>>  Heap: heap=32,768 smbytes=190,247 max_bytes=1,599,864 bufs=100
>>> max_bufs=6,758
>>>  Sizeof: boffset_t=8 size_t=8 debug=0 trace=0
>>>
>>> Alas, I know of no way to increase any of the bacula-fd limits.  If I
>>> dead-end here, perhaps I'll query the Bacula mailing lists.
>> For both yourself and for others, I think it's best to continue with
>> this route.
>>
>> Also note that I have no idea how much memory pg_dump might need on
>> a larger database, also including dwh which tends to get larger faster
>> than the engine's.
>>
>>> Meanwhile I tried the following for a more permanent solution but this
>>> failed same as before:
>>>
>>> # diff -u java-home.orig-3.6.1.3 java-home
>>> --- java-home.orig-3.6.1.3      2015-12-10 13:07:44.000000000 -0500
>>> +++ java-home   2015-12-30 12:12:45.779462769 -0500
>>> @@ -13,7 +13,7 @@
>>>         local ret=1
>>>
>>>         if [ -x "${dir}/bin/java" ]; then
>>> -               local version="$("${dir}/bin/java" -version 2>&1 | sed \
>>> +               local version="$("${dir}/bin/java" -Xmx 8 -version 2>&1
>>> | sed \
>>>                         -e 's/^openjdk version "1\.8\.0.*/VERSION_OK/' \
>>>                         -e 's/^java version "1\.7\.0.*/VERSION_OK/' \
>>>                         -e 's/^OpenJDK .*(.*).*/VENDOR_OK/' \
>> No idea here, you might try passing other options, and/or strace/valgrind/etc,
>> and/or monitor with other (including java-specific) tools, etc., and/or ask
>> Java experts (I am not one). Adding Juan.
>>
> I believe that this isn't really a memory problem, as the amount of
> memory that the Java virtual machine is requesting is very small, less
> than 3 MiB. It is probably related to the fact that the Bacula daemon
> that runs the script runs in its own SELinux "bacula_t" context. You can
> quickly verify this by temporarily disabling SELinux, trying to perform
> the backup, and then enabling it again:
>
>   # setenforce 0
>   # Perform the backup
>   # setenforce 1
>
> You should also see a description of the problem in the
> /var/log/audit/audit.log file. When I tried it I saw this:
>
>   type=AVC msg=audit(1451571576.334:336): avc:  denied  { execmem } for
>  pid=4622 comm="java" scontext=system_u:system_r:bacula_t:s0
> tcontext=system_u:system_r:bacula_t:s0 tclass=process
>
> That message says that the Java virtual machine is trying to map an area
> of memory that is both writeable and executable. That makes sense, it is
> probably an area used by the HotSpot compiler, that generates code
> during runtime. But this happens to be forbidden for the "bacula_t"
> SELinux context.

Bingo!  I almost discovered this last night.  My original RunScript sent
the output of engine-backup to /tmp for simplicity but my Bacula file
set ignores /tmp so I had to target elsewhere.  That led to AVCs and I
dug into the policy to discover that /var/bacula would be an acceptable,
writable location per SEL policy and still be included in my file set. 
It had occurred to me at that time that perhaps SEL was interfering with
the engine-backup also but I failed to go back and look for that.

>
> You have several alternatives here. The more drastic one is to disable
> SELinux permanently, setting the SELINUX variable in /etc/selinux/config
> to permissive or disabled. This is bad idea in general, and if I
> remember correctly oVirt doesn't work well with SELinux disabled.
>
> You can also just disable SELinux for the bacula daemon, removing the
> "bacula" policy module, and then restarting them:
>
>   # semodule -r bacula
>   # systemctl restart bacula-fd
>
> This isn't good idea either, as it will remove the "bacula.pp" file,
> which isn't a configuration file and will come back when you upgrade the
> SELinux RPMs.
>
> Another thing you can do is set only the "bacula_t" type to permissive:
>
>   # semanage permissive -a bacula_t

Oh cool, I was unaware you could disable selectively like that.

>
> This service won't then enjoy the SELinux protection, but the others
> will. This is probably the better choice.
>
> Finally, you can also create your own policy module, allowing to the
> "bacula_t" context the "execmem" operation. The easiest way to do this
> is to use the "audit2allow" tool, which generates the policy module from
> the audit log:
>
>   # audit2allow -M mypolicy <<.
> type=AVC msg=audit(1451571576.334:336): avc:  denied  { execmem } for
> pid=4622 comm="java" scontext=system_u:system_r:bacula_t:s0
> tcontext=system_u:system_r:bacula_t:s0 tclass=process
> .
>
> This will generate a "mypolicy.pp" file that allows that operation. You
> can then activate it like this:
>
>   # sepolicy -i mypolicy.pp

This is the route I went and it seems to work perfectly.  Thanks for the
excellent write up.  Now I'm glad I forget to follow through on my SEL
investigation as I wouldn't have come up with so correct a solution. 
This seems like a good case for a new SE Boolean, so I've submitted:

https://bugs.centos.org/view.php?id=10052

I'm relatively new to CentOS so hopefully this will get addressed as
fast as most SEL issues reported for Fedora.


Thanks Juan and Didi for the excellent help!  Best wishes for 2016.  :-)

>
>>>
>>> If this script is merely checking the validity of  the JRE/JDK, should
>>> it not be possible to have a test on the rpm details first and only
>>> proceed as it does now if that doesn't work?  The current tests should
>>> work w/o much regard for how the JRE/JDK got installed, but if it was
>>> installed via rpm it seems a simpler test could be used as a shortcut.
>> Patches are welcome :-)
>>
>> Note that current code is designed to be compatible with many environments,
>> including different el/fedora versions, upgrades inside them etc., and
>> the $0.local was added mainly to allow supporting other systems (including
>> gentoo) where $0.local will also be shipped/packaged by the distribution.
>> Obviously we can add similar patches to make it even more complex, but as
>> I wrote above, not sure it's worth it - because if memory is your only
>> problem, you might simply postpone it this way.
>>
>> Best,
>>
>


-- 
John Florian




More information about the Users mailing list