sample

ONR TPCP Project

University of California, Los Angeles


Welcome to our ONR TPCP project webpage.

This page is an access controlled website for sharing tool deliverables with the CMU SEI team for validation. We will be updating the tool resulting from our research periodically, and will be happy to answer any questions that the CMU SEI team has. The contact person for JShrink is Jaspreet Arora(jasa92@g.ucla.edu), JReduce is Christian Kalhauge (kalhauge@cs.ucla.edu), and JInline for Christian Navasca(cnavasca253@cs.ucla.edu).


You can find our repository here.

Setup



There are two options to execute this tool explained below.

Using a VM

You can find both tools and subject programs in the VM image. You are required to install the following software, described below.

  • VirtualBox 5.2.22
  • vagrant 2.2.2

To setup and enter the VM, please run the following:


vagrant up
vagrant ssh
cd /vagrant

Running on a Linux Machine

You are required to install the following software, described below.

  • Stack 1.9.3 (Installation instructions here)
    • After installation, run stack upgrade --binary-version 1.9.3
  • Java 8
  • Git
  • Python 3

Setup

Some of the tools require setup. You can run the setup for all the projects with the following command:

./jdebloat.py setup

Running the tools

The tool can be executed through the interface provided by the jdebloat.py script

The usage for the script can be listed with the help([-h]) option as follows.


./jdebloat.py -h

usage: jdebloat.py [-h] {clean,setup,run}

positional arguments:
  {clean,setup,run}

optional arguments:
  -h, --help         show this help message and exit

The three positional arguments available for each tool in the package:

  1. Setup - to perform setup and compilation for the tool
  2. Run - to execute the tool with the benchmark projects
  3. Clean - to perform cleanup for the tool

Examples:

Run all 3 debloat tools in sequence.


./jdebloat.py setup
./jdebloat.py run
./jdebloat.py clean

To run the JReduce tool, run:


./jdebloat.py setup jreduce
./jdebloat.py run jreduce

To run the JShrink tool, run:


./jdebloat.py setup jshrink
./jdebloat.py run jshrink

To run the JInline tool, run:


./jdebloat.py setup jinline
./jdebloat.py run jinline

Directory Structure

  • results [Directory containing the benchmark results]
  • data [Contains misc. data used by the tools]
  • jdebloat.py [The script which runs JDebloat]
  • output [The output directory]
  • README.mkd [The setup README]
  • scripts [Contains scripts used by jdebloat.py to run the tools]
  • tools [Contains the JShrink, JReduce, and JInline tools]
    • javaq [Contains the javaq tool, used for data collection]
    • jinline [Contains the JInline tool]
      • README.md [The JInline tool README file]
    • jshrink [Contains the JShrink tool]
      • README.md [The JShrink README file]
    • jreduce [Contains the JReduce tool]
      • README.md [The JReduce README file]

Benchmark Results

We tested JDebloat on 25 benchmarks and found the following reductions:

NameReduction
aragozin/jvm-tools 64.20%
ata4/disunity 25.64%
Bukkit/Bukkit 66.49%
eirslett/frontend-maven-plugin 99.99%
google/gson 30.05%
JakeWharton/DiskLruCache 20.20%
JakeWharton/retrofit1-okhttp3-client 22.70%
JakeWharton/RxReplayingShare 47.70%
JCTools/JCTools 90.70%
junit-team/junit4 20.21%
kevinsawicki/http-request 19.80%
mabe02/lanterna 24.99%
pagehelper/Mybatis-PageHelper 30.25%
pedrovgs/Algorithms 36.74%
qiujiayu/AutoLoadCache 71.02%
square/javapoet 20.51%
square/moshi 99.56%
takari/maven-wrapper 74.45%
alibaba/TProfiler 97.15%
dieforfree/qart4j 100.00%
dubboclub/dubbokeeper 80.09%
JakeWharton/RxRelay 27.80%
sockeqwe/fragmentargs 23.73%
tomighty/tomighty 29.13%
zeroturnaround/zt-zip 26.61%

The links to all of these repositories, as well as the commits we used, are listed in data/benchmarks.csv

JShrink



JShrink takes a java project as input and removed uninvoked methods and classes based on static and dynamic call graph analysis. While this functionality is similar to JRed, it differs in three major ways. First, in order to identify call targets invoked using Java reflection, JShrink uses TamiFlex reflection call analysis, thus improving the safety of method removal. We also use JMtrace, a native profiling agent using JVM TI API, which captures the use of dynamic features in Java code and augments static reachability analysis in JShrink. Secondly, we remove the body of each uninvoked method and enable the inserting of a custom warning message to indicate where debloating has been applied. Third, we allow various options for entry points such as all main methods, all public methods (excluding tests), and/or all JUnit tests.

Warning : The current version being released is a first prototype and still in active development. During the duration of the ONR-TPCP project, we will be making continuous improvements and releasing the upgraded version in a timely manner.

Technical Details

JShrink works by generating a static call graph of an input program. It proceeds to remove methods that are not used based on static call graph analysis. When using JShrink, the user is required to specify entry points for constructing the call graph. JShrink provides three pre-programmed options: (1) all main methods, (2) all public methods (excluding tests), and/or (3) all JUnit Tests. The user may also specify custom entry points if required.

Using the Soot Bytecode optimization framework, we remove unused Java bytecode methods. The user has the option of either completely removing the method, removing the method's body, or replacing the method's body with a RuntimeException.

Due to Java's Reflection functionality, we are incapable of creating a complete call graph with standard call graph analysis libraries alone. To overcome this, we use TamiFlex. TamiFlex observes the execution of a Java program under the given test suite and notes the reflective method invocations --- where these reflective calls are made within a Java application, and what are the call targets.

JShrink runs TamiFlex with the target Java project's existing test cases as input. We then extract all method invocations that were made via reflection. JMtrace is used to extract any additional method invocations which might have resulted from the use of Java dynamic features such as dynamic classloading, dynamic proxy, JNI, etc.

We set these as additional entry points for the static call graph analysis. This thereby results in safer debloating.

Current Restrictions and Limitations

  1. JShrink works only with Java 1.8.
  2. It requires a user to specify an entry point.
  3. Handling reflective calls and other dynamic features is enabled for Maven projects only. In other words, the --tamiflex and --jmtrace options only work when targeting a Maven Project.
  4. If the --tamiflex option is specified, the --test-entry option is automatically set, since Tamiflex uses tests as entry points to analyze reflective calls.
  5. --use-spark will use the Spark Call Graph analysis. Spark is not as conservative as the default call graph analysis (CHA) and may cause errors (we know of instance where Spark does not produce a complete call graph).

Usage

To execute the JShrink tool with the benchmarks, simply run ./jdebloat.py run jshrink in the VM provided. The debloated programs, can be found in output/JShrink, along with a summary of the size reduction achieved in output/JShrink/<BENCHMARK>/size_info.dat.

If running the tool independently is required, please read the following usage notes:

usage: JShrink.jar [-a <arg>] [-c <arg>] [-ch <path>] [-d] [-e <Exception Message>]
       [-f <TamiFlex Jar>] [-h] [-i <arg>] [-jm <path>] [-k] [-l <arg>] [-m] [-n <arg>]
       [-o] [-p] [-r] [-s] [-t <arg>] [-u] [--usecache] [-v]
An application to get the call-graph analysis of an application and to
wipe unused methods
 -a,--app-classpath <arg>                     Specify the application
                                              classpath
 -c,--custom-entry <arg>                      Specify custom entry points
                                              in syntax of
                                              '<[classname]:[public?]
                                              [static?] [returnType]
                                              [methodName]([args...?])>'
 -ch,--checkpoint <path>	      	      Maintain and revert to checkpoints in
					      case a transformation leads to test failure
 -d,--debug                                   Run JShrink in 'debug'
                                              mode. Used for testing
 -e,--include-exception <Exception Message>   Specify if an exception
                                              message should be included
                                              in a wiped method (Optional
                                              argument: the message)
 -f,--tamiflex <TamiFlex Jar>                 Enable TamiFlex
 -jm,--jmtrace <path/to/jmtrace/folder>       Enable Dynamic Profiling
 -h,--help                                    Help
 -i,--ignore-classes <arg>                    Specify classes that should
                                              not be delete or modified
 -k,--use-spark                               Use Spark call graph
                                              analysis (Uses CHA by
                                              default)
 -l,--lib-classpath <arg>                     Specify the classpath for
                                              libraries
 -m,--main-entry                              Include the main method as
                                              an entry point
 -n,--maven-project <arg>                     Instead of targeting using
                                              lib/app/test classpaths, a
                                              Maven project directory may
                                              be specified
 -o,--remove-classes                          Remove unused classes
 -p,--prune-app                               Prune the application
                                              classes as well
 -r,--remove-methods                          Remove methods header and
                                              body (by default, the bodies
                                              are wiped)
 -s,--test-entry                              Include the test methods as
                                              entry points
 -t,--test-classpath <arg>                    Specify the test classpath
 -u,--public-entry                            Include public methods as
                                              entry points
 --use-cache				      Cache static analysis call graph of project
 -v,--verbose                                 Run JShrink in 'verbose'
                                              mode. Outputs analysed
                                              methods and touched methods

Example usage case 1: Use a Maven project as an application, specify entry points as all main methods, all public methods, and all existing testcases, and consider Java reflective calls using Tamiflex

java -jar jshrink.jar --maven-project <PROJECT_DIR> --public-entry --main-entry --test-entry --prune-app --remove-methods --tamiflex <TAMFLEX_JAR>

--maven-project <PROJECT_DIR> specifies the Maven project to be debloated.

--public-entry --main-entry --test-entry states that all entry points (all public, the main methods, and test methods) should be used as entry points to generate the call graph.

--prune-app specifies that that the application code should be debloated as well as the dependency code.

--remove-methods specifies that methods should be removed in their entirety. By default, only their bodies are removed.

--tamiflex <TAMIFLEX_JAR> specifies that TamiFlex should be used to find reflective calls. The argument is the location of the TamiFlex Jar.

Example usage case 2: Use a non-Maven project as an application, specify main methods as an entry point, and do not consider reflective calls using Tamiflex

java -jar jshrink.jar --app-classpath <APP_CLASSPATH> --lib-classpath <LIBRARY_CLASSPATH> --test-classpath <TEST_CLASSPATH> --include-exception "ERROR, METHOD REMOVED"

--app-classpath <APP_CLASSPATH> --lib-classpath<LIBRARY_CLASSPATH> --test-classpath <TEST_CLASSPATH> specifies the application, library, and test classpaths of the target.

--include-exception "ERROR, METHOD REMOVE" specifies that when a method's body is wiped it should be replaced with a Runtime exception with the message "ERROR, METHOD REMOVE".

Example usage case 3: Use a Maven project as an application, perform call graph analysis with Spark, and remove unused classes

java -jar jshrink.jar --maven-project <PROJECT_DIR> --main-entry --remove-classes --use-spark

--remove-classes specifies that classes whose methods are all removed, and contain no accessible static methods, are to be removed completely.

--use-spark specifies that Spark Call Graph analysis should be used.

Results

Running our tool on the benchmarks yields the following result.

BenchmarkSize Before Debloat (Bytes)Size after Debloat (Bytes)Reduction
JavaPoet2347462303751.86%
JavaVerbalExpressions14746147460.00%
Curator10427613807125222.60%
RxRelay5108491457441010.45%

Descriptions of benchmark applications

  • JavaPoet is a Java API for generating .java source files.
  • DiskLruCache is a library that provides a cache bounded by an amount of space on a file-system.
  • JavaVerbalExpression is a Java library that helps in the construction of difficult regular expressions.
  • Curator is a set of Java libraries to improve Apache ZooKeeper.
  • JUnit4 is a framework to write repeatable tests for Java.
  • RxRelay is a Relay library for RxJava.

Results on other projects.

BenchmarkReduction
alibaba_TProfiler10.17%
aragozin_jvm-tools4.20%
Bukkit_Bukkit18.54%
dieforfree_qart4j46.82%
dubboclub_dubbokeeper17.32%
eirslett_frontend-maven-plugin22.44%
google_gson5.52%
JakeWharton_DiskLruCache1.65%
JakeWharton_retrofit1-okhttp3-client11.46%
JakeWharton_RxRelay17.47%
JakeWharton_RxReplayingShare22.13%
junit-team_junit46.93%
kevinsawicki_http-request6.55%
mabe02_lanterna1.96%
notnoop_java-apns18.88%
pagehelper_Mybatis-PageHelper23.91%
pedrovgs_Algorithms5.46%
qiujiayu_AutoLoadCache20.19%
sockeqwe_fragmentargs11.59%
square_moshi0.22%
tomighty_tomighty20.10%
zeroturnaround_zt-zip11.32%

Method wiping

In our tool, the default behavior is to wipe the method body of each uninvoked method. We show below an example of a Java method in the Jimple format

.method public static staticShortMethodNoParams()Ljava/lang/Short;
    .limit stack 2
    .limit locals 1
    getstatic java/lang/System/out Ljava/io/PrintStream;
    astore_0
    aload_0
    ldc "staticShortMethodNoParams touched"
    invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
    iconst_3
    invokestatic java/lang/Short/valueOf(S)Ljava/lang/Short;
    astore_0
    aload_0
    areturn
.end method

After this method's body is wiped, it leaves the method header, while removing the body to the maximum possible extent permissible by the JVM. This is shown below:

.method public static staticShortMethodNoParams()Ljava/lang/Short;
    .limit stack 1
    .limit locals 0
    aconst_null
    areturn
.end method

JReduce



JReduce is a tool that uses a variant of delta-debugging to reduce the classes of a project given a property. The tool was originally build to reduce the bytecode that caused bugs in decompilers. We categorized a bug as any set of classes with no external dependencies, was not able to decompile and then compile again. The goal was therefore to find the smallest set of classes, that still had all their dependencies.

In this case, we use the test suite as the property that we want to preserve. Then we reduce the number of classes to the smallest possible where the tests still succeed.

JReduce is the focus of a paper accepted at FSE'19 which showed a 12x faster reduction of Java ByteCode than previous techniques. JReduce is currently in active development and the progress can be followed on the open source repository.

Technical Details

JReduce works by calculating a dependency graph from classes to other classes. We create the graph by creating an edge from a class to another if the first class mentions the second class.

Using the graph, we calculate all the strongly connected components (SCC). If we include one of the classes in an SCC, all classes in the SCC needs to be included. This means that we can reduce the program, by reducing this list of SCCs.

It is rare that a program runs classes outside the SCC that contains the main class, but can happens if the program uses reflection. We have therefore developed a new reduction technique called Binary Reduction, which can quickly search the list for the few SCC needed to satisfy the predicate.

In our tool, it is also possible to provide a set of core classes. The core classes should not be removed. If a SCC contains a class from the core, it will not be removed. In our case, we set the test-cases as a core.

Usage

To run JReduce on the benchmarks of this project, first setup the tool by running ./jdebloat.py setup jreduce.

Then run ./jdebloat.py run jreduce and the output can be found in the output/jreduce folder.

You can also run the tool on your own benchmarks. Either use the scripts/runjreduce.sh script or you can run JReduce directly:

jreduce -v -o output --cp test.jar -t app.jar -c @classes-in-core.txt \
  <runpredicate> <args..>

Where runpredicate.sh is a script that takes a reduced app.jar and has exit code 0 if the predicate succeeded. In the runjreduce.sh script we use runtest.sh with the test.jar and test.classes.txt.

In the case you want to be adventures; consult the help notes:

Usage: jreduce [-v] [-q] [-D|--log-depth ARG] [-c|--core CORE] [--cp CLASSPATH]
               [--stdlib] [--jre JRE] (-t|--target FILE) (-o|--output FILE)
               [-R|--reducer ARG] [-W|--work-folder ARG] [-K|--keep-folders]
               [-E|--exit-code CODE] [--stdout] [--stderr] [-T|--timelimit SECS]
               CMD [ARG..]
  A command line tool for reducing java programs.

Available options:
  -v                       make it more verbose.
  -q                       make it more quiet.
  -D,--log-depth ARG       set the log depth. (default: -1)
  -c,--core CORE           the core classes to not reduce.
  --cp CLASSPATH           the library classpath, of things not reduced.
  --stdlib                 load the standard library.
  --jre JRE                the location of the stdlib.
  -t,--target FILE         the path to the jar or folder to reduce.
  -o,--output FILE         the path output folder.
  -R,--reducer ARG         the reducing algorithm to use. (default: Binary)
  -W,--work-folder ARG     the work folder.
  -K,--keep-folders        keep the work folders after use?
  -E,--exit-code CODE      preserve exit-code (default: 0)
  --stdout                 preserve stdout.
  --stderr                 preserve stderr.
  -T,--timelimit SECS      the maximum number of seconds to run the process,
                           negative means no timelimit. (default: -1.0)
  CMD                      the command to run
  ARG..                    arguments to the command.
  -h,--help                Show this help text

JInline



JInline takes a Java program and statically inlines methods read from a database.

Technical Details

We first provide aggressive inline parameters to the JVM. While these parameters are not suitable for running programs, they provide better inlining information. We extract the inlining decisions from the JVM into a database for later use.

We use to our customized database to inform our static inliner. First, we filter out aggressive inlinings which would cause the Java program to miscompile. Using this information, our Inliner tool uses the Soot Bytecode optimization framework to statically inline method calls without affecting the semantics of the program.

Our technique finds inline targets that might not otherwise be detectable by purely static approaches. Our tool produces a new JAR with our modified class files containing inlined methods. We successfully ran our modified JAR on the original tests cases without errors.

Usage

To run JInline on the provided benchmarks, simply run ./jdebloat.py run jinline.

The output programs will be found in output/jinline as jars. If running the tool independently is required, please read the following usage notes:

usage: run-jinline.py [-h] [-o OUTPUT_JAR]
                      test_jar test_classes app_lib_jar output_dir

Run inliner tool.

positional arguments:
  test_jar       JAR containing the test suite
  test_classes   Text file of test classes
  app_lib_jar    JAR containing application and libraries
  output_dir     Output directory

optional arguments:
  -h, --help     show this help message and exit
  -o OUTPUT_JAR  Modified JAR file path
    

Contact Details