Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Engine
    • Security Level: Users (General product issues)
    • Labels:
      None
    • QA Testing:
      UNDECIDED

      Description

      It seems there is a major problem with direct memory buffers in CloverETL or java, which are used by default for edge buffers. In case many graphs with lots of edges are executed, allocated direct memory size can grow up to size of java heap if enough RAM is available, otherwise Clover will use heap. The direct buffers are freed after GC run, however, the resident memory (RES in `top`) of the java process used by direct memory is not freed. After all threads from the previous job are closed (after ~1 minute), new direct memory is allocated by next run and the resident java process size grows. You can repeat this process until there is no more free RAM on the server. The resident memory is truly allocated because when I tried to start some other application, my java process was killed by kernel OOM Killer.

      The attached jobflow is able to utilize majority of system memory in quite short time using the run – wait 1 minute – run strategy.

      The workaround is to disable direct memory usage in Clover:
      Create new file in tomcat_dir/conf/engine.config with following content: USE_DIRECT_MEMORY = false
      Then add following line at the end oftomcat_dir/conf/clover.config: engine.config.file= tomcat_dir/conf/engine.config

      This will force clover to use heap for edges so you may want to increase heap size accordingly.

        Attachments

          Issue Links

            Activity

            Show
            krivanekm Milan Krivanek added a comment - http://log4gang.blogspot.cz/2015/01/java-rss-increased-by-memory.html
            Show
            krivanekm Milan Krivanek added a comment - https://bugzilla.redhat.com/show_bug.cgi?id=843478
            Hide
            krivanekm Milan Krivanek added a comment -

            See sun.nio.ch.Util.getTemporaryDirectBuffer() - an implementation of buffer cache used in FileChannelImpl.

            Show
            krivanekm Milan Krivanek added a comment - See sun.nio.ch.Util.getTemporaryDirectBuffer() - an implementation of buffer cache used in FileChannelImpl .
            Hide
            krivanekm Milan Krivanek added a comment -

            https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en

            distribution:/opt/apache-tomcat-javlinApps/bin # ldd --version
            ldd (GNU libc) 2.11.2
            Copyright (C) 2009 Free Software Foundation, Inc.
            This is free software; see the source for copying conditions.  There is NO
            warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
            Written by Roland McGrath and Ulrich Drepper.
            
            Show
            krivanekm Milan Krivanek added a comment - https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en distribution:/opt/apache-tomcat-javlinApps/bin # ldd --version ldd (GNU libc) 2.11.2 Copyright (C) 2009 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper.
            Hide
            krivanekm Milan Krivanek added a comment - - edited

            The server crashed even if the machine had 5.5 GB RAM. See pmap_Xmx2.5g,5.5GB_RAM.txt.

            getconf PAGE_SIZE
            4096
            
            Show
            krivanekm Milan Krivanek added a comment - - edited The server crashed even if the machine had 5.5 GB RAM. See pmap_Xmx2.5g,5.5GB_RAM.txt . getconf PAGE_SIZE 4096
            Hide
            urbanj Jaroslav Urban added a comment -

            Milan try this stress tests:

            • 2GB heap & 0.5GB direct

            If it still crashes, then try the same but with disabled direct memory.

            Show
            urbanj Jaroslav Urban added a comment - Milan try this stress tests: 2GB heap & 0.5GB direct If it still crashes, then try the same but with disabled direct memory.
            Hide
            krivanekm Milan Krivanek added a comment - - edited

            Ubuntu 14.04.2 LTS:

            • 2 GB heap, 1 GB direct: ran for about two weeks with 4.7 GB RAM, about 100 MB remained
            • 2 GB heap, 2 GB direct:
              • crashed even with 8.5 GB RAM, RSS was more than 7.73 GB
              • ran with 9 GB RAM, peak RSS was 8,187 GB

            Captured with the following command:
            watch -n 1 -t "(ps -p1377 -o rss | tail -n 1) | tee -a rss.log"

            Does not seem to be deterministic, during one attempt with 8 GB RAM the server kept running with about 5.5 GB RSS, during another it crashed with RSS 6.82 GB.

            Show
            krivanekm Milan Krivanek added a comment - - edited Ubuntu 14.04.2 LTS: 2 GB heap, 1 GB direct: ran for about two weeks with 4.7 GB RAM, about 100 MB remained 2 GB heap, 2 GB direct: crashed even with 8.5 GB RAM, RSS was more than 7.73 GB ran with 9 GB RAM, peak RSS was 8,187 GB Captured with the following command: watch -n 1 -t "(ps -p1377 -o rss | tail -n 1) | tee -a rss.log" Does not seem to be deterministic, during one attempt with 8 GB RAM the server kept running with about 5.5 GB RSS, during another it crashed with RSS 6.82 GB.
            Hide
            zatopekm Martin Zatopek added a comment -

            This is the way how to run the monitoring on background:

            while true; do sleep 2; ps -p8227 -o rss | tail -n 1 >> rss.log; done &
            disown
            
            Show
            zatopekm Martin Zatopek added a comment - This is the way how to run the monitoring on background: while true; do sleep 2; ps -p8227 -o rss | tail -n 1 >> rss.log; done & disown

              People

              • Assignee:
                krivanekm Milan Krivanek
                Reporter:
                zatopekm Martin Zatopek
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: