< prev index next >

README.md

Print this page
*** 1,6 ***
! # Welcome to the JDK!
  
  For build instructions please see the
  [online documentation](https://openjdk.org/groups/build/doc/building.html),
  or either of these files:
  
--- 1,87 ---
! # Welcome to the Leyden Prototype Repository!
+ 
+ The purpose of the Leyden repository is to prototype improvements to the
+ startup time, time to peak performance, and footprint of Java programs, as a part of 
+ [Project Leyden](https://openjdk.org/projects/leyden). We solicit feedback from
+ the Java community, with the hope that some of these improvements can be eventually
+ incoporated in future JDK releases.
+ 
+ ## 0. Disclaimers
+ 
+ - *This repository contains experimental and unstable code. It is not intended to be used
+    in a production environment.*
+ - *This repository is intended for developers of the JDK, and advanced Java developers who
+    are familiar with building the JDK.*
+ - *The experimental features in this repository may be changed or removed without notice.
+    Command line flags and workflows will change.*
+ - *The benchmarks results reported on this page are for illustrative purposes only. Your
+    applications may get better or worse results.*
+ 
+ ## 1. Overview
+ 
+ The Leyden "[premain](https://github.com/openjdk/leyden/blob/premain/)" prototype
+ includes many optimizations that shift work from run time to earlier
+ executions of the application, which are
+ called _training runs_. In a training run, we pre-compute various kinds of information.
+ Importantly, we pre-compile
+ bytecode to native code, guided by observations of the application's actual behavior
+ during the training run.
+ 
+ The Leyden repository closely tracks the JDK main line. We are typically only a few weeks behind
+ the [main-line JDK repo](https://github.com/openjdk/jdk).
+ 
+ We have implemented the following improvements:
+ 
+ - **[Ahead-of-Time Class Loading & Linking (JEP 483)](https://openjdk.org/jeps/483)**:
+   This gives
+   the JVM the ability to put classes in the _linked_ state as soon the application starts up. As a result,
+   we can implement many other time shifting optimizations with considerably simplified assumptions.
+   - Please refer to the [JEP 483 document](https://openjdk.org/jeps/483) for more details.
+ 
+ - **[Ahead-of-Time Method Profiling (JEP draft 8325147)](https://openjdk.org/jeps/8325147)**: We store method profiles
+   from training runs in the CDS archive, thereby enabling the JIT to begin compiling earlier during warmup.
+   As a result, Java applications can reach peak performance faster.
+   - This feature is enabled by the new diagnostic (`-XX:+UnlockDiagnosticVMOptions`) VM flags `-XX:+RecordTraining` and `-XX:+ReplayTraining`.
+ 
+ - **[Ahead-of-Time Code Compilation (JEP draft 8335368)](https://openjdk.org/jeps/8335368)**: Methods that are frequently used during the training run can be
+   compiled and stored along with the CDS archive. As a result, as soon as the application starts up
+   in the production run, its methods can be can be natively executed.
+   - This feature is enabled by the new VM flags `-XX:+StoreCachedCode`, `-XX:+LoadCachedCode`, and `-XX:CachedCodeFile`.
+   - Currently, the native code is stored in a separate file, but our plans is to eventually store the native code
+     inside the CDS archive file.
+ 
+ - **Ahead-of-time resolution of constant pool entries**: many
+   constant pool entries are resolved during the assembly phase. This allows the application to start up faster. Also,
+   the existence of resolved constant pool entries allows the AOT compiler to generate better code.
+   For diagnostic purposes, you can use `-XX:+UnlockDiagnosticVMOptions -XX:-AOTInvokeDynamicLinking`
+   to disable the AOT linking of constant pool entries for the `invokedynamic` bytecode.
+ 
+ - **Ahead-of-time generation of [Dynamic Proxies](https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/reflect/Proxy.html)**:
+   Dynamic proxies are frequently used by popular application frameworks. We can improve start-up time by generating these proxies ahead of time.
+   - This feature is enabled by the new VM flag `-XX:+ArchiveDynamicProxies`.
+ 
+ - **Ahead-of-time generation of reflection data**: Reflection data (such as instances of
+   `java.lang.reflect.Method`) are generated by the JVM to support `java.lang.reflect` operations. We can
+   generate these ahead of time to improve start-up.
+   - This feature is enabled by the new VM flag `-XX:+ArchiveReflectionData`.
+ 
+ - **Class Not Found Cache**: Sometimes application frameworks repeatedly try to load classes that do not exist. This optimization allows such failing lookups to be done quickly without repeatedly scanning the class path.
+   - This feature is enabled by the new VM flag `-XX:+ArchiveLoaderLookupCache`.
+ 
+ By default, all optimizations listed above are enabled.  This simplifies testing of the whole
+ prototype.  If necessary for more detailed testing, each feature can
+ be individually disabled by negating its associated flag.
+ 
+ The names of all of these VM flags will change in a future EA build as we transition from the old “CDS” terminology to the new “AOT” terminology, as discussed [here](https://openjdk.org/jeps/483#History).
+ 
+ [CDS]: <https://docs.oracle.com/en/java/javase/22/vm/class-data-sharing.html>
+ 
+ ## 2. Building the Leyden Repository
+ 
+ The Leyden Repository can be built in the same way as the main-line JDK repository.
+ Please use the "premain" branch. I.e., [https://github.com/openjdk/leyden/tree/premain](https://github.com/openjdk/leyden/tree/premain).
  
  For build instructions please see the
  [online documentation](https://openjdk.org/groups/build/doc/building.html),
  or either of these files:
  

*** 8,5 ***
--- 89,407 ---
  - [doc/building.md](doc/building.md) (markdown version)
  
  See <https://openjdk.org/> for more information about the OpenJDK
  Community and the JDK and see <https://bugs.openjdk.org> for JDK issue
  tracking.
+ 
+ ## 3. Trying out Leyden Features
+ 
+ The easiest way to try out the Leyden optimizations is to build a JVM from the Leyden repository, and use it with your application with the `-XX:AOTCache` flag.
+ 
+ > Note: in an earlier version of the Leyden prototype, the optimizations were controlled by an experimental flag `-XX:CacheDataStore`. This flag has been deprecated and will be removed. For a reference to this flag, please see an [older version of this document](https://github.com/openjdk/leyden/blob/076c71f7cb9887ef3d64b752976610d19792203b/README.md).
+ 
+ 
+ Here's a small benchmark that uses the JDK's built-in
+ [`JavaCompiler`](https://docs.oracle.com/en/java/javase/21/docs/api/java.compiler/javax/tools/JavaCompiler.html)
+ class to compile some Java source files. This benchmark spends a significant amount of start-up time 
+ setting up the classes used by `JavaCompiler`, so it will benefit from the Leyden features.
+ 
+ First, download [JavacBenchApp.java](https://github.com/iklam/jdk/raw/f95f851aed3d2bf06edabab1e7c24e15f4145d0d/test/hotspot/jtreg/runtime/cds/appcds/applications/JavacBenchApp.java)
+ and compile it into a JAR file.
+ 
+ (Remember to use the `java` program that you built from the Leyden repository.)
+ 
+ ```
+ $ javac JavacBenchApp.java
+ $ jar cvf JavacBenchApp.jar JavacBenchApp*.class
+ added manifest
+ adding: JavacBenchApp$ClassFile.class(in = 1608) (out= 787)(deflated 51%)
+ adding: JavacBenchApp$FileManager.class(in = 2090) (out= 979)(deflated 53%)
+ adding: JavacBenchApp$SourceFile.class(in = 1351) (out= 671)(deflated 50%)
+ adding: JavacBenchApp.class(in = 7571) (out= 3302)(deflated 56%)
+ ```
+ 
+ We can run this benchmark without any Leyden features. It takes 893 ms:
+ 
+ ```
+ $ java -cp JavacBenchApp.jar JavacBenchApp 50
+ Generated source code for 51 classes and compiled them in 893 ms
+ ```
+ 
+ To use AOT optimizations for JavacBenchApp, we should first perform a _training run_ and
+ capture the profiling information into `JavacBenchApp.aotconfig`
+ 
+ ```
+ $ java -XX:AOTMode=record -XX:AOTConfiguration=JavacBenchApp.aotconfig \
+        -cp JavacBenchApp.jar JavacBenchApp 50
+ $ ls -l JavacBenchApp.aotconfig
+ -rw-rw-r-- 1 iklam iklam 27652096 Mar  3 16:23 JavacBenchApp.aotconfig
+ ```
+ 
+ With the `JavacBenchApp.aotconfig` file, we can create the AOT cache. This is called the _assembly phase_:
+ 
+ ```
+ $ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
+        -cp JavacBenchApp.jar -XX:AOTCache=JavacBenchApp.aot
+ $ ls -l JavacBenchApp.aot
+ -r--r--r-- 1 iklam iklam 42332160 Mar  3 16:58 JavacBenchApp.aot
+ ```
+ 
+ Now, we can make a _production run_ of the program using the AOT cache `JavacBenchApp.aot`. It finishes in 423 ms, or more than twice as fast as
+ before.
+ 
+ ```
+ $ java -XX:AOTCache=JavacBenchApp.aot -cp JavacBenchApp.jar JavacBenchApp 50
+ Generated source code for 51 classes and compiled them in 423 ms
+ ```
+ 
+ By default, training runs end when the application terminates.  You have two other options to end training runs:
+ 
+ - `-XX:AOTEndTrainingOnMethodEntry=<method1,method2,...>[,count=100]`
+ - `jcmd <pid> AOT.end_training`
+ 
+ Note that `-XX:AOTEndTrainingOnMethodEntry` uses the same format as `-XX:CompileOnly` and the default count is 1.
+ 
+ See [EndTrainingOnMethodEntry.java](test/hotspot/jtreg/runtime/cds/appcds/leyden/EndTrainingOnMethodEntry.java) for a test case.
+ 
+ ### Diagnostic VM Flags
+ 
+ By default, all of the optimizations described
+ in the [Overview](#1-overview) section above are enabled by default. This ensures that you can get all the optimizations
+ without specifying them individually.
+ 
+ For diagnostic purposes, you can selectively disable some of the options:
+ 
+ - The `-XX:+LoadCachedCode` and `-XX:+ReplayTraining` flags affect only the production run.
+ - The `-XX:+RecordTraining` option affects only the training run and the assembly phase.
+ - All other options affect only the assembly phase.
+ 
+ For example, you can disable the loading of AOT-compiled methods during the production run. Notice that the benchmark now
+ starts more slowly than it did when AOT-compiled methods was loaded.
+ 
+ ```
+ $ java -XX:AOTCache=JavacBenchApp.aot -Xlog:cds=error -XX:-LoadCachedCode \
+        -cp JavacBenchApp.jar JavacBenchApp 50
+ Generated source code for 51 classes and compiled them in 647 ms
+ ```
+ 
+ You can also disable AOT compilation in the assembly phase. Note that the size of the AOT
+ cache is smaller because it no longer has AOT-compiled methods.
+ 
+ ```
+ $ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
+        -cp JavacBenchApp.jar \
+        -XX:AOTCache=JavacBenchApp.aot -XX:-StoreCachedCode
+ $ ls -l JavacBenchApp.aot
+ -r--r--r-- 1 iklam iklam 29990912 Mar  3 16:34 JavacBenchApp.aot
+ ```
+ 
+ 
+ ## 4. Limitations of the Leyden Prototype
+ 
+ When trying out the Leyden, please pay attention to the following limitations.
+ 
+ ### The Same Garbage Collector Must be Used between Assembly Phase and Production Runs
+ 
+ The CDS archive generated by the Leyden prototype includes machine instructions that are specific to
+ the garbage collector. We recommend that you explicitly specify the same collector during both
+ training and production runs. For example:
+ 
+ ```
+ # assembly phase.
+ $ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
+        -cp JavacBenchApp.jar \
+        -XX:AOTCache=JavacBenchApp.aot -XX:+UseSerialGC
+ 
+ # production run
+ $ java -XX:AOTCache=JavacBenchApp.aot -XX:+UseSerialGC -cp JavacBenchApp.jar \
+        JavacBenchApp 50
+ ```
+ 
+ Otherwise, the CDS archive may not be useable for the production run, leading to suboptimal performance.
+ For example, sometimes you may perform the assembly phase run on a large development host, and then use
+ a container to run the application in a small production node. In the following scenario, as the collector
+ is not explicitly specified, the VM will automatically pick G1 for the assembly phase, and SerialGC for the
+ production run (due to its limited amount of memory):
+ 
+ ```
+ # Assembly phase (uses G1 by default)
+ $ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
+        -cp JavacBenchApp.jar -XX:AOTCache=JavacBenchApp.aot
+ 
+ # Production run (uses SerialGC)
+ $ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \
+     --memory=1024m \
+     container-registry.oracle.com/java/openjdk \
+     bash -c 'cd /test; ' \
+             '/jdk/bin/java -XX:AOTCache=JavacBenchApp.aot ' \
+             '    -cp JavacBenchApp.jar JavacBenchApp 50'
+ [0.001s][error][cds] CDS archive has aot-linked classes. It cannot be used because
+                      GC used during dump time (G1) is not the same as runtime (Serial)
+ [0.001s][error][cds] An error has occurred while processing the AOT cache.
+ [0.001s][error][cds] Unable to map shared spaces
+ Error occurred during initialization of VM
+ Unable to use AOT cache.
+ ```
+ 
+ ### Only G1GC, SerialGC, ParallelGC, EpsilonGC, ShenandoahGC are Supported
+ 
+ Currently, if you use any other garbage collector in combination with `-XX:AOTMode` or `-XX:AOTCache`, the VM will
+ exit with an error.
+ 
+ ```
+ $ java -XX:AOTMode=record -XX:AOTConfiguration=JavacBenchApp.aotconfig \
+        -cp JavacBenchApp.jar -XX:+UseZGC JavacBenchApp 50
+ Error occurred during initialization of VM
+ Cannot create the AOT configuration file: UseCompressedClassPointers must be enabled,
+ and collector must be G1, Parallel, Serial, Epsilon, or Shenandoah
+ ```
+ 
+ ### -XX:AOTMode=on is Enabled by default
+ 
+ As seen in the example immediately above, in the production run, if the CDS archive cannot be
+ used for any reason, the JVM will report an error and exit. This happens as if `-XX:AOTMode=on` was
+ specified in the command-line.
+ 
+ In the standard JDK, when the CDS archive cannot be used for any reason (for example, the
+ archive was created for a different version of the JDK), the application will
+ continue to run without using CDS.
+ This fall-back strategy ensures that the application will function correctly, though at a lower level of performance.
+ 
+ With the Leyden prototype, we have changed this fall-back behavior to make it easier to diagnose
+ performance issues. For example, when the start-up time is not as good as one would expect, we
+ want know whether it's caused by a misconfiguration that prevents the CDS archive
+ from being used, or it's caused by a deficiency in the implementation of the Leyden optimizations.
+ 
+ To revert to the behavior of the standard JDK, you can explicitly add `-XX:AOTMode=auto` to the command-line.
+ 
+ ```
+ $ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \
+     --memory=1024m \
+     container-registry.oracle.com/java/openjdk \
+     bash -c 'cd /test; ' \
+             '/jdk/bin/java -XX:AOTMode=auto -XX:AOTCache=JavacBenchApp.aot ' \
+             '    -cp JavacBenchApp.jar JavacBenchApp 50'
+ [0.001s][error][cds] CDS archive has aot-linked classes. It cannot be used because
+                      GC used during dump time (G1) is not the same as runtime (Serial)
+ Generated source code for 51 classes and compiled them in 831 ms
+ ```
+ 
+ See [JEP 483](https://openjdk.org/jeps/483) for a discussion of `-XX:AOTMode=on` vs  `-XX:AOTMode=auto`.
+ 
+ 
+ ## 5. Benchmarking
+ 
+ We use a small set of benchmarks to demonstrate the performance of the optimizations in the Leyden repo.
+ 
+ | Benchmark  | Source |
+ | ------------- | ------------- |
+ |[helidon-quickstart-se](test/hotspot/jtreg/premain/helidon-quickstart-se) | https://helidon.io/docs/v4/se/guides/quickstart|
+ |[micronaut-first-app](test/hotspot/jtreg/premain/micronaut-first-app) | https://guides.micronaut.io/latest/creating-your-first-micronaut-app-maven-java.html|
+ |[quarkus-getting-started](test/hotspot/jtreg/premain/quarkus-getting-started) | https://quarkus.io/guides/getting-started|
+ |[spring-boot-getting-started](test/hotspot/jtreg/premain/spring-boot-getting-started) | https://spring.io/guides/gs/spring-boot|
+ |[spring-petclinic](test/hotspot/jtreg/premain/spring-petclinic) | https://github.com/spring-projects/spring-petclinic|
+ 
+ *(FIXME: add a benchmark for javac)*
+ 
+ ### Benchmarking Against JDK Main-line
+ 
+ To can compare the performance of Leyden vs the main-line JDK, you need:
+ 
+ - An official build of JDK 21
+ - An up-to-date build of the JDK main-line
+ - The latest Leyden build
+ - Maven (ideally 3.8 or later, as required by some of the demos). Note: if you are behind
+   a firewall, you may need to [set up proxies for Maven](https://maven.apache.org/guides/mini/guide-proxies.html)
+ 
+ The same steps are used for benchmarking all of the above demos. For example:
+ 
+ ```
+ $ cd helidon-quickstart-se
+ $ make PREMAIN_HOME=/repos/leyden/build/linux-x64/images/jdk \
+        MAINLINE_HOME=/repos/jdk/build/linux-x64/images/jdk \
+        BLDJDK_HOME=/usr/local/jdk21 \
+        bench
+ run,mainline default,mainline custom static cds,mainline aot cache,premain aot cache
+ 1,456,229,156,117
+ 2,453,227,157,117
+ 3,455,232,155,116
+ 4,448,230,154,114
+ 5,440,228,156,114
+ 6,446,228,156,114
+ 7,448,232,156,114
+ 8,465,261,159,114
+ 9,448,226,157,113
+ 10,442,233,154,114
+ Geomean,450.05,232.41,155.99,114.69
+ Stdev,6.98,9.72,1.41,1.35
+ Markdown snippets in mainline_vs_premain.md
+ ```
+ 
+ The above command runs each configuration 10 times, in an interleaving order. This way
+ the noise of the system (background processes, thermo throttling, etc) is more likely to
+ be spread across the different runs.
+ 
+ As is typical for benchmarking start-up performance, the numbers are not very steady.
+ It is best to plot
+ the results (as saved in the file `mainline_vs_premain.csv`) in a spreadsheet to check for
+ noise and other artifacts.
+ 
+ The "make bench" target also generates GitHub markdown snippets (in the file `mainline_vs_premain.md`) for creating the
+ graphs below.
+ 
+ ### Benchmarking Between Two Leyden Builds
+ 
+ This is useful for Leyden developers to measure the benefits of a particular optimization.
+ The steps are similar to above, but we use the "make compare_premain_builds" target:
+ 
+ ```
+ $ cd helidon-quickstart-se
+ $ make PM_OLD=/repos/leyden_old/build/linux-x64/images/jdk \
+        PM_NEW=/repos/leyden_new/build/linux-x64/images/jdk \
+        BLDJDK_HOME=/usr/local/jdk21 \
+        compare_premain_builds
+ Old build = /repos/leyden_old/build/linux-x64/images/jdk with options
+ New build = /repos/leyden_new/build/linux-x64/images/jdk with options
+ Run,Old CDS + AOT,New CDS + AOT
+ 1,110,109
+ 2,131,111
+ 3,118,115
+ 4,110,108
+ 5,117,110
+ 6,114,109
+ 7,110,109
+ 8,118,110
+ 9,110,110
+ 10,113,114
+ Geomean,114.94,110.48
+ Stdev,6.19,2.16
+ Markdown snippets in compare_premain_builds.md
+ ```
+ 
+ Please see [test/hotspot/jtreg/premain/lib/Bench.gmk](test/hotspot/jtreg/premain/lib/Bench.gmk) for more details.
+ 
+ Note: due to the variability of start-up time, the benefit of minor improvements may
+ be difficult to measure.
+ 
+ ### Preliminary Benchmark Results
+ 
+ The following charts show the relative start-up performance of the Leyden/Premain branch vs
+ the JDK main-line.
+ 
+ For example, a number of "premain aot cache: 255" indicates that if the application takes
+ 1000 ms to start-up with the JDK main-line, it takes only 255 ms to start up when all the
+ current set of Leyden optimizations are enabled.
+ 
+ The benchmark results are collected with `make bench` in the following directories:
+ 
+ - `helidon-quickstart-se`
+ - `micronaut-first-app`
+ - `quarkus-getting-started`
+ - `spring-boot-getting-started`
+ - `spring-petclinic`
+ 
+ The meaning of the four rows in the following the charts:
+ 
+ | Row  | Meaning |
+ | ------------- | ------------- |
+ | **mainline default**            |Run benchmark with no optimizations|
+ | **mainline custom static cds**  |Run benchmark with a custom static CDS archive|
+ | **mainline aot cache**          |Run benchmark with a custom AOT cache (JEP 483)|
+ | **premain aot cache**           |Run benchmark with a custom AOT cache, plus all Leyden optimizations such as AOT profiles and AOT-compiled methods|
+ 
+ These JDK versions were used in the comparisons:
+ 
+ - JDK main-line: JDK 24, build 24+36-3646
+ - Leyden: https://github.com/openjdk/leyden/tree/bbac8f2d845aa6408182ca3ff9ce60b5ca6e0390
+ 
+ For details information about the hardware and raw numbers, see [bench.20250307.txt](test/hotspot/jtreg/premain/bench_data/bench.20250307.txt)
+ 
+ ### Helidon Quick Start (SE) Demo (3.92x improvement)
+ 
+ ```mermaid
+ ---
+ config:
+     xyChart:
+         chartOrientation: horizontal
+         height: 300
+ ---
+ xychart-beta
+     x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
+     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
+     bar [1000, 516, 347, 255]
+ ```
+ 
+ ### Micronaut First App Demo (3.12x improvement)
+ 
+ ```mermaid
+ ---
+ config:
+     xyChart:
+         chartOrientation: horizontal
+         height: 300
+ ---
+ xychart-beta
+     x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
+     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
+     bar [1000, 475, 366, 321]
+ ```
+ 
+ ### Quarkus Getting Started Demo (3.52x improvement)
+ 
+ ```mermaid
+ ---
+ config:
+     xyChart:
+         chartOrientation: horizontal
+         height: 300
+ ---
+ xychart-beta
+     x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
+     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
+     bar [1000, 437, 380, 284]
+ ```
+ 
+ ### Spring-boot Getting Started Demo (3.48x improvement)
+ 
+ ```mermaid
+ ---
+ config:
+     xyChart:
+         chartOrientation: horizontal
+         height: 300
+ ---
+ xychart-beta
+     x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
+     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
+     bar [1000, 502, 382, 287]
+ ```
+ 
+ ### Spring PetClinic Demo (2.65x improvement)
+ 
+ ```mermaid
+ ---
+ config:
+     xyChart:
+         chartOrientation: horizontal
+         height: 300
+ ---
+ xychart-beta
+     x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
+     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
+     bar [1000, 625, 586, 376]
+ ```
+ 
+ ## 6. More Documentation
+ 
+ Please see [test/hotspot/jtreg/premain/](test/hotspot/jtreg/premain) for more information.
< prev index next >