Debloat Project

Setup

There are two options to execute this tool explained below.

Using a VM

You can find both tools and subject programs in the VM image. You are required to install the following software, described below.

VirtualBox 5.2.22
vagrant 2.2.2

To setup and enter the VM, please run the following:


vagrant up
vagrant ssh
cd /vagrant

Running on a Linux Machine

You are required to install the following software, described below.

Stack 1.9.3 (Installation instructions here)
- After installation, run stack upgrade --binary-version 1.9.3
Java 8
Git
Python 3

Setup

Some of the tools require setup. You can run the setup for all the projects with the following command:

./jdebloat.py setup

Running the tools

The tool can be executed through the interface provided by the jdebloat.py script

The usage for the script can be listed with the help([-h]) option as follows.


./jdebloat.py -h

usage: jdebloat.py [-h] {clean,setup,run}

positional arguments:
  {clean,setup,run}

optional arguments:
  -h, --help         show this help message and exit

The three positional arguments available for each tool in the package:

Setup - to perform setup and compilation for the tool
Run - to execute the tool with the benchmark projects
Clean - to perform cleanup for the tool

Examples:

Run all 3 debloat tools in sequence.


./jdebloat.py setup
./jdebloat.py run
./jdebloat.py clean

To run the JReduce tool, run:


./jdebloat.py setup jreduce
./jdebloat.py run jreduce

To run the JShrink tool, run:


./jdebloat.py setup jshrink
./jdebloat.py run jshrink

To run the JInline tool, run:


./jdebloat.py setup jinline
./jdebloat.py run jinline

Directory Structure

results [Directory containing the benchmark results]
data [Contains misc. data used by the tools]
jdebloat.py [The script which runs JDebloat]
output [The output directory]
README.mkd [The setup README]
scripts [Contains scripts used by jdebloat.py to run the tools]
tools [Contains the JShrink, JReduce, and JInline tools]

javaq [Contains the javaq tool, used for data collection]

jinline [Contains the JInline tool]
- README.md [The JInline tool README file]

jshrink [Contains the JShrink tool]
- README.md [The JShrink README file]

jreduce [Contains the JReduce tool]
- README.md [The JReduce README file]

Benchmark Results

We tested JDebloat on 25 benchmarks and found the following reductions:

Name	Reduction
aragozin/jvm-tools	64.20%
ata4/disunity	25.64%
Bukkit/Bukkit	66.49%
eirslett/frontend-maven-plugin	99.99%
google/gson	30.05%
JakeWharton/DiskLruCache	20.20%
JakeWharton/retrofit1-okhttp3-client	22.70%
JakeWharton/RxReplayingShare	47.70%
JCTools/JCTools	90.70%
junit-team/junit4	20.21%
kevinsawicki/http-request	19.80%
mabe02/lanterna	24.99%
pagehelper/Mybatis-PageHelper	30.25%
pedrovgs/Algorithms	36.74%
qiujiayu/AutoLoadCache	71.02%
square/javapoet	20.51%
square/moshi	99.56%
takari/maven-wrapper	74.45%
alibaba/TProfiler	97.15%
dieforfree/qart4j	100.00%
dubboclub/dubbokeeper	80.09%
JakeWharton/RxRelay	27.80%
sockeqwe/fragmentargs	23.73%
tomighty/tomighty	29.13%
zeroturnaround/zt-zip	26.61%

The links to all of these repositories, as well as the commits we used, are listed in data/benchmarks.csv



  
    JShrink
    
    
    


JShrink takes a java project as input and removed uninvoked methods and
classes based on static and dynamic call graph analysis. While this
functionality is similar to
JRed, it differs in three
major ways. First, in order to identify call targets invoked using Java
reflection, JShrink uses TamiFlex reflection call
analysis, thus improving the
safety of method removal. We also use JMtrace, a  native  profiling  agent  using  JVM  TI  API, which captures the use of dynamic features in Java code and augments static reachability
analysis in JShrink. Secondly, we remove the body of each uninvoked
method and enable the inserting of a custom warning message to indicate
where debloating has been applied. Third, we allow various options for
entry points such as all main methods, all public methods (excluding
tests), and/or all JUnit tests.

Warning : The current version being released is a first prototype and still in active development. During the duration of the ONR-TPCP project, we will be making continuous improvements and releasing the upgraded version in a timely manner.

Technical Details

JShrink works by generating a static call graph of an input program. It
proceeds to remove methods that are not used based on static call graph
analysis. When using JShrink, the user is required to specify entry
points for constructing the call graph. JShrink provides three
pre-programmed options: (1) all main methods, (2) all public methods
(excluding tests), and/or (3) all JUnit Tests. The user may also specify
custom entry points if required.

Using the
Soot Bytecode optimization framework, we remove unused Java
bytecode methods. The user has the option of either
completely removing the method, removing the method's body, or
replacing the method's body with a RuntimeException.

Due to
Java's Reflection functionality,
we are incapable of creating a complete call graph with standard call
graph analysis libraries alone. To overcome this,
we use TamiFlex. TamiFlex
observes the execution of a Java program under the given test suite
and notes the reflective method invocations --- where these reflective calls are made within a
Java application, and what are the call targets.

JShrink runs TamiFlex with the target Java project's existing test
cases as input. We then extract all method invocations that were made
via reflection. JMtrace is used to extract any additional method invocations which might have resulted from the use of Java dynamic features such as dynamic classloading, dynamic proxy, JNI, etc.

We set these as additional entry points for the static
call graph analysis. This thereby results in safer debloating.

Current Restrictions and Limitations


JShrink works only with Java 1.8.

It requires a user to specify an entry point.

Handling reflective calls and other dynamic features is enabled for Maven projects
only. In other words, the --tamiflex and --jmtrace options only work when
targeting a Maven Project.

If the --tamiflex option is specified, the --test-entry option is
automatically set, since Tamiflex uses tests as entry points to analyze
reflective calls.

--use-spark will use the  Spark Call Graph
analysis. Spark is not as
conservative as the default call graph analysis (CHA) and may cause errors
(we know of instance where Spark does not produce a complete call graph).



Usage

To execute the JShrink tool with the benchmarks, simply run
./jdebloat.py run jshrink in the VM provided. The debloated programs, can be found in
output/JShrink, along with a summary of the size reduction achieved
in output/JShrink/<BENCHMARK>/size_info.dat.

If running the tool independently is required, please read the
following usage notes:

usage: JShrink.jar [-a <arg>] [-c <arg>] [-ch <path>] [-d] [-e <Exception Message>]
       [-f <TamiFlex Jar>] [-h] [-i <arg>] [-jm <path>] [-k] [-l <arg>] [-m] [-n <arg>]
       [-o] [-p] [-r] [-s] [-t <arg>] [-u] [--usecache] [-v]
An application to get the call-graph analysis of an application and to
wipe unused methods
 -a,--app-classpath <arg>                     Specify the application
                                              classpath
 -c,--custom-entry <arg>                      Specify custom entry points
                                              in syntax of
                                              '<[classname]:[public?]
                                              [static?] [returnType]
                                              [methodName]([args...?])>'
 -ch,--checkpoint <path>	      	      Maintain and revert to checkpoints in
					      case a transformation leads to test failure
 -d,--debug                                   Run JShrink in 'debug'
                                              mode. Used for testing
 -e,--include-exception <Exception Message>   Specify if an exception
                                              message should be included
                                              in a wiped method (Optional
                                              argument: the message)
 -f,--tamiflex <TamiFlex Jar>                 Enable TamiFlex
 -jm,--jmtrace <path/to/jmtrace/folder>       Enable Dynamic Profiling
 -h,--help                                    Help
 -i,--ignore-classes <arg>                    Specify classes that should
                                              not be delete or modified
 -k,--use-spark                               Use Spark call graph
                                              analysis (Uses CHA by
                                              default)
 -l,--lib-classpath <arg>                     Specify the classpath for
                                              libraries
 -m,--main-entry                              Include the main method as
                                              an entry point
 -n,--maven-project <arg>                     Instead of targeting using
                                              lib/app/test classpaths, a
                                              Maven project directory may
                                              be specified
 -o,--remove-classes                          Remove unused classes
 -p,--prune-app                               Prune the application
                                              classes as well
 -r,--remove-methods                          Remove methods header and
                                              body (by default, the bodies
                                              are wiped)
 -s,--test-entry                              Include the test methods as
                                              entry points
 -t,--test-classpath <arg>                    Specify the test classpath
 -u,--public-entry                            Include public methods as
                                              entry points
 --use-cache				      Cache static analysis call graph of project
 -v,--verbose                                 Run JShrink in 'verbose'
                                              mode. Outputs analysed
                                              methods and touched methods




Example usage case 1: Use a Maven project as an application, specify entry points as all main methods, all public methods, and all existing testcases, and consider Java reflective calls using Tamiflex

java -jar jshrink.jar --maven-project <PROJECT_DIR> --public-entry
--main-entry --test-entry --prune-app --remove-methods --tamiflex
<TAMFLEX_JAR>

--maven-project <PROJECT_DIR> specifies the Maven project to be debloated.

--public-entry --main-entry --test-entry states that all entry points
(all public, the main methods, and test methods) should be used as entry
points to generate the call graph.

--prune-app specifies that that the application code should be
debloated as well as the dependency code.

--remove-methods specifies that methods should be removed in their
entirety. By default, only their bodies are removed.

--tamiflex <TAMIFLEX_JAR> specifies that TamiFlex should be used to find
reflective calls. The argument is the location of the TamiFlex Jar.

Example usage case 2: Use a non-Maven project as an application, specify main methods as an entry point, and do not consider reflective calls using Tamiflex

java -jar jshrink.jar --app-classpath <APP_CLASSPATH> --lib-classpath
<LIBRARY_CLASSPATH> --test-classpath <TEST_CLASSPATH>
--include-exception "ERROR, METHOD REMOVED"

--app-classpath <APP_CLASSPATH> --lib-classpath<LIBRARY_CLASSPATH>
--test-classpath <TEST_CLASSPATH> specifies the application, library,
and test classpaths of the target.

--include-exception "ERROR, METHOD REMOVE" specifies that when a
method's body is wiped it should be replaced with a Runtime exception
with the message "ERROR, METHOD REMOVE".

Example usage case 3: Use a Maven project as an application, perform call graph analysis with Spark, and remove unused classes

java -jar jshrink.jar --maven-project <PROJECT_DIR> --main-entry
--remove-classes --use-spark

--remove-classes specifies that classes whose methods are all
removed, and contain no accessible static methods, are to be removed
completely.

--use-spark specifies that Spark Call Graph analysis should be used.	

Results

Running our tool on the benchmarks yields the following result.



	
		Benchmark Size Before Debloat (Bytes) Size after Debloat (Bytes) Reduction
	

	
                JavaPoet 234746 230375 1.86%
        
	
                JavaVerbalExpressions 14746 14746 0.00%
        
	
                Curator 10427613 8071252 22.60%
        
	
                RxRelay 5108491 4574410 10.45%
        

Descriptions of benchmark applications


JavaPoet is a Java API for
generating .java source files.

DiskLruCache is a
library that provides a cache bounded by an amount of space on a
file-system.

JavaVerbalExpression
is a Java library that helps in the construction of difficult regular
expressions.

Curator is a set of Java
libraries to improve Apache ZooKeeper.

JUnit4 is a framework to write
repeatable tests for Java.

RxRelay is a Relay library
for RxJava.


Results on other projects.

	
		Benchmark Reduction
	
	
alibaba_TProfiler 10.17%


aragozin_jvm-tools 4.20%


Bukkit_Bukkit 18.54%


dieforfree_qart4j 46.82%


dubboclub_dubbokeeper 17.32%


eirslett_frontend-maven-plugin 22.44%


google_gson 5.52%


JakeWharton_DiskLruCache 1.65%


JakeWharton_retrofit1-okhttp3-client 11.46%


JakeWharton_RxRelay 17.47%


JakeWharton_RxReplayingShare 22.13%


junit-team_junit4 6.93%


kevinsawicki_http-request 6.55%


mabe02_lanterna 1.96%


notnoop_java-apns 18.88%


pagehelper_Mybatis-PageHelper 23.91%


pedrovgs_Algorithms 5.46%


qiujiayu_AutoLoadCache 20.19%


sockeqwe_fragmentargs 11.59%


square_moshi 0.22%


tomighty_tomighty 20.10%


zeroturnaround_zt-zip 11.32%




Method wiping

In our tool, the default behavior is to wipe the method body of each
uninvoked method. We show
below an example of a Java method in the Jimple

format

.method public static staticShortMethodNoParams()Ljava/lang/Short;
    .limit stack 2
    .limit locals 1
    getstatic java/lang/System/out Ljava/io/PrintStream;
    astore_0
    aload_0
    ldc "staticShortMethodNoParams touched"
    invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
    iconst_3
    invokestatic java/lang/Short/valueOf(S)Ljava/lang/Short;
    astore_0
    aload_0
    areturn
.end method


After this method's body is wiped, it leaves the method header,
while removing the body to the maximum possible extent permissible by
the JVM. This is shown below:

.method public static staticShortMethodNoParams()Ljava/lang/Short;
    .limit stack 1
    .limit locals 0
    aconst_null
    areturn
.end method


   




    JReduce
    
    
    
JReduce is a tool that uses a variant of delta-debugging to reduce the
classes of a project given a property. The tool was originally build to
reduce the bytecode that caused bugs in decompilers. We categorized a
bug as any set of classes with no external dependencies, was not able to
decompile and then compile again. The goal was therefore to find the
smallest set of classes, that still had all their dependencies.

In this case, we use the test suite as the property that we want to
preserve. Then we reduce the number of classes to the smallest possible
where the tests still succeed.

JReduce is the focus of a
paper accepted at FSE'19
which showed a 12x faster reduction of Java ByteCode than previous techniques.
JReduce is currently in active development and the progress can be
followed

on the open source repository.


Technical Details

JReduce works by calculating a dependency graph from classes to other
classes. We create the graph by creating an edge from a class to another
if the first class mentions the second class.

Using the graph, we calculate all the strongly connected components
(SCC). If we include one of the classes in an SCC, all
classes in the SCC needs to be included. This means that we can reduce
the program, by reducing this list of SCCs.

It is rare that a program runs classes outside the SCC that contains the
main class, but can happens if the program uses reflection. We have
therefore developed a new reduction technique called Binary Reduction,
which can quickly search the list for the few SCC needed to satisfy the
predicate.

In our tool, it is also possible to provide a set of core classes. The
core classes should not be removed. If a SCC contains a class from
the core, it will not be removed. In our case, we set the test-cases as
a core.

Usage

To run JReduce on the benchmarks of this project, first setup the
tool by running ./jdebloat.py setup jreduce.

Then run ./jdebloat.py run jreduce and the output can be found in the
output/jreduce folder.

You can also run the tool on your own benchmarks. Either use
the scripts/runjreduce.sh script or you can run JReduce directly:

jreduce -v -o output --cp test.jar -t app.jar -c @classes-in-core.txt \
  <runpredicate> <args..>


Where runpredicate.sh is a script that takes a reduced app.jar and
has exit code 0 if the predicate succeeded. In the runjreduce.sh
script we use runtest.sh with the test.jar and test.classes.txt.

In the case you want to be adventures; consult the help notes:

Usage: jreduce [-v] [-q] [-D|--log-depth ARG] [-c|--core CORE] [--cp CLASSPATH]
               [--stdlib] [--jre JRE] (-t|--target FILE) (-o|--output FILE)
               [-R|--reducer ARG] [-W|--work-folder ARG] [-K|--keep-folders]
               [-E|--exit-code CODE] [--stdout] [--stderr] [-T|--timelimit SECS]
               CMD [ARG..]
  A command line tool for reducing java programs.

Available options:
  -v                       make it more verbose.
  -q                       make it more quiet.
  -D,--log-depth ARG       set the log depth. (default: -1)
  -c,--core CORE           the core classes to not reduce.
  --cp CLASSPATH           the library classpath, of things not reduced.
  --stdlib                 load the standard library.
  --jre JRE                the location of the stdlib.
  -t,--target FILE         the path to the jar or folder to reduce.
  -o,--output FILE         the path output folder.
  -R,--reducer ARG         the reducing algorithm to use. (default: Binary)
  -W,--work-folder ARG     the work folder.
  -K,--keep-folders        keep the work folders after use?
  -E,--exit-code CODE      preserve exit-code (default: 0)
  --stdout                 preserve stdout.
  --stderr                 preserve stderr.
  -T,--timelimit SECS      the maximum number of seconds to run the process,
                           negative means no timelimit. (default: -1.0)
  CMD                      the command to run
  ARG..                    arguments to the command.
  -h,--help                Show this help text

	



    JInline
    
    
    
    JInline takes a Java program and statically inlines methods
    read from a database.
    Technical Details
    We first provide aggressive inline parameters to the JVM. While these parameters
    are not suitable for running programs, they provide better inlining information. We
    extract the inlining decisions from the JVM into a database for later use.

    We use to our customized database to inform our static inliner. First, we filter out
    aggressive inlinings which would cause the Java program to miscompile. Using this information,
    our Inliner tool uses the Soot Bytecode optimization framework to statically inline method calls
    without affecting the semantics of the program.

    Our technique finds inline targets that might not otherwise be detectable by purely
    static approaches. Our tool produces a new JAR with our modified class files containing inlined
    methods. We successfully ran our modified JAR on the original tests cases without errors.
    Usage
    To run JInline on the provided benchmarks, simply run
    ./jdebloat.py run jinline.

    The output programs will be found in output/jinline as jars. If running
    the tool independently is required, please read the following usage notes:

    usage: run-jinline.py [-h] [-o OUTPUT_JAR]
                      test_jar test_classes app_lib_jar output_dir

Run inliner tool.

positional arguments:
  test_jar       JAR containing the test suite
  test_classes   Text file of test classes
  app_lib_jar    JAR containing application and libraries
  output_dir     Output directory

optional arguments:
  -h, --help     show this help message and exit
  -o OUTPUT_JAR  Modified JAR file path
    
    



  
    Contact Details
    
    
    
	    Address : Engineering VI, 404 Westwood Plaza, Los Angeles, CA 90095

ONR TPCP Project

University of California, Los Angeles

Setup

Using a VM

Running on a Linux Machine

Setup

Running the tools

Examples:

Directory Structure

Benchmark Results

JShrink

Technical Details

Current Restrictions and Limitations

Usage

Example usage case 1: Use a Maven project as an application, specify entry points as all main methods, all public methods, and all existing testcases, and consider Java reflective calls using Tamiflex

Example usage case 2: Use a non-Maven project as an application, specify main methods as an entry point, and do not consider reflective calls using Tamiflex

Example usage case 3: Use a Maven project as an application, perform call graph analysis with Spark, and remove unused classes

Results

Descriptions of benchmark applications

Method wiping

JReduce

Technical Details

Usage

JInline

Technical Details

Usage

Contact Details

Benchmark	Size Before Debloat (Bytes)	Size after Debloat (Bytes)	Reduction
JavaPoet	234746	230375	1.86%
JavaVerbalExpressions	14746	14746	0.00%
Curator	10427613	8071252	22.60%
RxRelay	5108491	4574410	10.45%

Benchmark	Reduction
alibaba_TProfiler	10.17%
aragozin_jvm-tools	4.20%
Bukkit_Bukkit	18.54%
dieforfree_qart4j	46.82%
dubboclub_dubbokeeper	17.32%
eirslett_frontend-maven-plugin	22.44%
google_gson	5.52%
JakeWharton_DiskLruCache	1.65%
JakeWharton_retrofit1-okhttp3-client	11.46%
JakeWharton_RxRelay	17.47%
JakeWharton_RxReplayingShare	22.13%
junit-team_junit4	6.93%
kevinsawicki_http-request	6.55%
mabe02_lanterna	1.96%
notnoop_java-apns	18.88%
pagehelper_Mybatis-PageHelper	23.91%
pedrovgs_Algorithms	5.46%
qiujiayu_AutoLoadCache	20.19%
sockeqwe_fragmentargs	11.59%
square_moshi	0.22%
tomighty_tomighty	20.10%
zeroturnaround_zt-zip	11.32%