1 # Welcome to the Leyden Prototype Repository!
  2 
  3 The purpose of the Leyden repository is to prototype improvements to the
  4 startup time, time to peak performance, and footprint of Java programs, as a part of 
  5 [Project Leyden](https://openjdk.org/projects/leyden). We solicit feedback from
  6 the Java community, with the hope that some of these improvements can be eventually
  7 incoporated in future JDK releases.
  8 
  9 ## 0. Disclaimers
 10 
 11 - *This repository contains experimental and unstable code. It is not intended to be used
 12    in a production environment.*
 13 - *This repository is intended for developers of the JDK, and advanced Java developers who
 14    are familiar with building the JDK.*
 15 - *The experimental features in this repository may be changed or removed without notice.
 16    Command line flags and workflows will change.*
 17 - *The benchmarks results reported on this page are for illustrative purposes only. Your
 18    applications may get better or worse results.*
 19 
 20 ## 1. Overview
 21 
 22 The Leyden "[premain](https://github.com/openjdk/leyden/blob/premain/)" prototype
 23 includes many optimizations that shift work from run time to earlier
 24 executions of the application, which are
 25 called _training runs_. In a training run, we pre-compute various kinds of information.
 26 Importantly, we pre-compile
 27 bytecode to native code, guided by observations of the application's actual behavior
 28 during the training run.
 29 
 30 The Leyden repository closely tracks the JDK main line. We are typically only a few weeks behind
 31 the [main-line JDK repo](https://github.com/openjdk/jdk).
 32 
 33 We have implemented the following improvements over the JDK main line:
 34 
 35 - **[Ahead-of-Time Class Loading & Linking (JEP 483)](https://openjdk.org/jeps/483)**:
 36   This gives
 37   the JVM the ability to put classes in the _linked_ state as soon the application starts up. As a result,
 38   we can implement many other time shifting optimizations with considerably simplified assumptions.
 39   - This feature is accessed with the new VM flag `-XX:+PreloadSharedClasses`.
 40 
 41 - **[Unified Ahead-of-Time Cache (JEP draft 8320264)](https://openjdk.org/jeps/8320264)**:
 42   This enhancement to [CDS] is foundational to the features that follow.
 43   - It enables [CDS] to store not only class metadata and heap objects (as before),
 44   but also profiling data and compiled code.
 45   - This feature is accessed with the new VM flag `-XX:CacheDataStore`.
 46   - This option simplifies the creation of the CDS archive, and also the testing
 47   of all the prototype features listed here.
 48 
 49 - **[Ahead-of-Time Method Profiling (JEP draft 8325147)](https://openjdk.org/jeps/8325147)**: We store method profiles
 50   from training runs in the CDS archive, thereby enabling the JIT to begin compiling earlier during warmup.
 51   As a result, Java applications can reach peak performance faster.
 52   - This feature is enabled by the new VM flags `-XX:+RecordTraining` and `-XX:+ReplayTraining`.
 53 
 54 - **Ahead-of-time resolution of constant pool entries**: many
 55   constant pool entries are resolved during the assembly phase. This allows the application to start up faster. Also,
 56   the existence of resolved constant pool entries allows the AOT compiler to generate better code.
 57   For diagnostic purposes, you can use `-XX:+UnlockDiagnosticVMOptions -XX:-AOTInvokeDynamicLinking`
 58   to disable the AOT linking of constant pool entries for the `invokedynamic` bytecode.
 59 
 60 - **[Ahead-of-Time Code Compilation (JEP draft 8335368)](https://openjdk.org/jeps/8335368)**: Methods that are frequently used during the training run can be
 61   compiled and stored along with the CDS archive. As a result, as soon as the application starts up
 62   in the production run, its methods can be can be natively executed.
 63   - This feature is enabled by the new VM flags `-XX:+StoreCachedCode`, `-XX:+LoadCachedCode`, and `-XX:CachedCodeFile`.
 64   - Currently, the native code is stored in a separate file, but our plans is to eventually store the native code
 65     inside the CDS archive file.
 66 
 67 - **Ahead-of-time generation of [Dynamic Proxies](https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/reflect/Proxy.html)**:
 68   Dynamic proxies are frequently used by popular application frameworks. We can improve start-up time by generating these proxies ahead of time.
 69   - This feature is enabled by the new VM flag `-XX:+ArchiveDynamicProxies`.
 70 
 71 - **Ahead-of-time generation of reflection data**: Reflection data (such as instances of
 72   `java.lang.reflect.Method`) are generated by the JVM to support `java.lang.reflect` operations. We can
 73   generate these ahead of time to improve start-up.
 74   - This feature is enabled by the new VM flag `-XX:+ArchiveReflectionData`.
 75 
 76 - **Class Not Found Cache**: Sometimes application frameworks repeatedly try to load classes that do not exist. This optimization allows such failing lookups to be done quickly without repeatedly scanning the class path.
 77   - This feature is enabled by the new VM flag `-XX:+ArchiveLoaderLookupCache`.
 78 
 79 The flag `-XX:CacheDataStore` automatically enables the whole bundle
 80 of features listed above.  This simplifies testing of the whole
 81 prototype.  If necessary for more detailed testing, each feature can
 82 be individually disabled by negating its associated flag.
 83 
 84 The names of all of these VM flags will change in a future EA build as we transition from the old “CDS” terminology to the new “AOT” terminology, as discussed [here](https://openjdk.org/jeps/483#History).
 85 
 86 [CDS]: <https://docs.oracle.com/en/java/javase/22/vm/class-data-sharing.html>
 87 
 88 ## 2. Building the Leyden Repository
 89 
 90 The Leyden Repository can be built in the same way as the main-line JDK repository.
 91 Please use the "premain" branch. I.e., [https://github.com/openjdk/leyden/tree/premain](https://github.com/openjdk/leyden/tree/premain).
 92 
 93 For build instructions please see the
 94 [online documentation](https://openjdk.org/groups/build/doc/building.html),
 95 or either of these files:
 96 
 97 - [doc/building.html](doc/building.html) (html version)
 98 - [doc/building.md](doc/building.md) (markdown version)
 99 
100 See <https://openjdk.org/> for more information about the OpenJDK
101 Community and the JDK and see <https://bugs.openjdk.org> for JDK issue
102 tracking.
103 
104 ## 3. Trying out Leyden Features
105 
106 The easiest way to try out the Leyden features is to build a JVM from the Leyden repository, and use it with your application with the `-XX:CacheDataStore` flag.
107 
108 Here's a small benchmark that uses the JDK's built-in
109 [`JavaCompiler`](https://docs.oracle.com/en/java/javase/21/docs/api/java.compiler/javax/tools/JavaCompiler.html)
110 class to compile some Java source files. This benchmark spends a significant amount of start-up time 
111 setting up the classes used by `JavaCompiler`, so it will benefit from the Leyden features.
112 
113 First, download [JavacBenchApp.java](https://github.com/iklam/jdk/blob/f95f851aed3d2bf06edabab1e7c24e15f4145d0d/test/hotspot/jtreg/runtime/cds/appcds/applications/JavacBenchApp.java)
114 and compile it into a JAR file.
115 
116 (Remember to use the `java` program that you built from the Leyden repository.)
117 
118 ```
119 $ javac JavacBenchApp.java
120 $ jar cvf JavacBenchApp.jar JavacBenchApp*.class
121 added manifest
122 adding: JavacBenchApp$ClassFile.class(in = 1608) (out= 787)(deflated 51%)
123 adding: JavacBenchApp$FileManager.class(in = 2090) (out= 979)(deflated 53%)
124 adding: JavacBenchApp$SourceFile.class(in = 1351) (out= 671)(deflated 50%)
125 adding: JavacBenchApp.class(in = 7571) (out= 3302)(deflated 56%)
126 ```
127 
128 We can run this benchmark without any Leyden features. It takes 893 ms:
129 
130 ```
131 $ java -cp JavacBenchApp.jar JavacBenchApp 50
132 Generated source code for 51 classes and compiled them in 893 ms
133 ```
134 
135 Now, we can perform a _training run_ and create the Leyden cache files.
136 
137 <b>Note: Any files `JavacBenchApp.cds*` created by previous tests must
138 be deleted, before new ones are created.</b>:
139 
140 ```
141 $ rm -fv JavacBenchApp.cds*
142 $ java -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50
143 $ ls -l JavacBenchApp.cds*
144 -r--r--r-- 1 iklam iklam 30900224 May 20 19:21 JavacBenchApp.cds
145 -r--r--r-- 1 iklam iklam 16895736 May 20 19:21 JavacBenchApp.cds.code
146 ```
147 
148 Two files are created:
149 
150 - `JavacBenchApp.cds`: This file contains classes, heap objects and profiling data harvested from the training run.
151 - `JavacBenchApp.cds.code`: This file contains AOT-compiled methods, optimized for the execution behaviors observed during the training run.
152   (Data in this file will be merged into `JavacBenchApp.cds` in a future release.)
153 
154 Now, we can make a _production run_ of the program with the cache files. It finishes in 423 ms, or more than twice as fast as
155 before.
156 
157 ```
158 $ java -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50
159 Generated source code for 51 classes and compiled them in 423 ms
160 ```
161 
162 By default, training runs end when the application terminates.  You have two other options to end training runs:
163 
164 - -XX:AOTEndTrainingOnMethodEntry=<method1,method2,...>[,count=100]
165 - jcmd \<pid> AOT.end_training
166 
167 Note that AOTEndTrainingOnMethodEntry uses the same format as CompileOnly and the default count is 1
168 
169 ### Optional VM Flags
170 
171 When you create the file `JavacBenchApp.cds` with the flag `-XX:CacheDataStore`,
172 all of the other options described
173 in the [Overview](#1-overview) section above are enabled by default. This ensures that you can get all the optimizations
174 without specifying them individually.
175 
176 For diagnostic purposes, you can selectively disable some of the options:
177 
178 - The `-XX:+LoadCachedCode` and `-XX:+ReplayTraining` flags affect only the production run.
179 - All other options affect only the training run.
180 
181 For example, you can disable the loading of the AOT code during the production run. Notice that the benchmark now
182 starts more slowly than it did when AOT code was loaded.
183 
184 ```
185 $ java -XX:CacheDataStore=JavacBenchApp.cds -XX:-LoadCachedCode -cp JavacBenchApp.jar JavacBenchApp 50
186 Generated source code for 51 classes and compiled them in 647 ms
187 ```
188 
189 You can also disable AOT compilation in the training run:
190 
191 ```
192 $ rm -fv JavacBenchApp.cds*
193 $ java -XX:CacheDataStore=JavacBenchApp.cds -XX:-StoreCachedCode -cp JavacBenchApp.jar JavacBenchApp 50
194 $ ls -l JavacBenchApp.cds*
195 -r--r--r-- 1 iklam iklam 30277632 May 20 20:05 JavacBenchApp.cds
196 ```
197 
198 Note that the file `JavacBenchApp.cds.code` is no longer created.
199 
200 ## 4. Limitations of the Leyden Prototype
201 
202 When trying out the Leyden, please pay attention to the following limitations.
203 
204 ### The Same Garbage Collector Must be Used between Training and Production Runs
205 
206 The CDS archive generated by the Leyden prototype includes machine instructions that are specific to
207 the garbage collector. We recommend that you explicitly specify the same collector during both
208 training and production runs. For example:
209 
210 ```
211 # training run
212 $ rm -fv JavacBenchApp.cds*
213 $ java -XX:CacheDataStore=JavacBenchApp.cds -XX:+UseSerialGC -cp JavacBenchApp.jar JavacBenchApp 50
214 
215 # production run
216 $ java -XX:CacheDataStore=JavacBenchApp.cds -XX:+UseSerialGC -cp JavacBenchApp.jar JavacBenchApp 50
217 ```
218 
219 Otherwise, the CDS archive may not be loaded for the production run, leading to suboptimal performance.
220 For example, sometimes you may perform the training run on a large development host, and then use
221 a container to run the application in a small production node. In the following scenario, as the collector
222 is not explicitly specified, the VM will automatically pick G1 for the training run, and SerialGC for the
223 production run (due to its limited amount of memory):
224 
225 ```
226 # training run (uses G1 by default)
227 $ rm -fv JavacBenchApp.cds*
228 $ java -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50
229 
230 # production run (uses SerialGC)
231 $ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \
232     --memory=1024m \
233     container-registry.oracle.com/java/openjdk \
234     bash -c 'cd /test; /jdk/bin/java -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50'
235 [0.001s][error][cds] CDS archive has preloaded classes. It cannot be used because GC used during dump time (G1)
236                      is not the same as runtime (Serial)
237 [0.001s][error][cds] An error has occurred while processing the shared archive file.
238 [0.001s][error][cds] Unable to map shared spaces
239 Error occurred during initialization of VM
240 Unable to use shared archive.
241 ```
242 ### Only G1GC, SerialGC, ParallelGC, EpsilonGC, ShenandoahGC are Supported
243 =======
244 
245 Currently, if you use any other garbage collector in combination with `-XX:CacheDataStore`, the VM will
246 exit with an error.
247 
248 ```
249 $ java -XX:+UseZGC -XX:CacheDataStore=foo --version
250 Error occurred during initialization of VM
251 Cannot create the CacheDataStore: UseCompressedClassPointers must be enabled, and collector
252 must be G1, Parallel, Serial, Epsilon, or Shenandoah
253 ```
254 
255 
256 ### -Xshare:on is Enabled by default
257 
258 As seen in the example immediately above, in the production run, if the CDS archive cannot be
259 used for any reason, the JVM will report an error and exit. This happens as if `-Xshare:on` was
260 specified in the command-line.
261 
262 In the standard JDK, when the CDS archive cannot be used for any reason (for example, the
263 archive was created for a different version of the JDK), the application will
264 continue to run without using CDS.
265 This fall-back strategy ensures that the application will function correctly, though at a lower level of performance.
266 
267 With the Leyden prototype, we have changed this fall-back behavior to make it easier to diagnose
268 performance issues. For example, when the start-up time is not as good as one would expect, we
269 want know whether it's caused by a misconfiguration that prevents the CDS archive
270 from being used, or it's caused by a deficiency in the implementation of the Leyden optimizations.
271 
272 To revert to the behavior of the standard JDK, you can explicitly add `-Xshare:auto` to the command-line.
273 
274 ```
275 $ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \
276     --memory=1024m \
277     container-registry.oracle.com/java/openjdk \
278     bash -c 'cd /test; /jdk/bin/java -Xshare:auto -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50'
279 [0.001s][error][cds] CDS archive has preloaded classes. It cannot be used because GC used during dump time (G1)
280                      is not the same as runtime (Serial)
281 Generated source code for 51 classes and compiled them in 831 ms
282 ```
283 
284 See [here](https://docs.oracle.com/en/java/javase/21/vm/class-data-sharing.html) for a discussion of `-Xshare:on` vs  `-Xshare:auto`.
285 
286 
287 ## 5. Benchmarking
288 
289 We use a small set of benchmarks to demonstrate the performance of the optimizations in the Leyden repo.
290 
291 | Benchmark  | Source |
292 | ------------- | ------------- |
293 |[helidon-quickstart-se](test/hotspot/jtreg/premain/helidon-quickstart-se) | https://helidon.io/docs/v4/se/guides/quickstart|
294 |[micronaut-first-app](test/hotspot/jtreg/premain/micronaut-first-app) | https://guides.micronaut.io/latest/creating-your-first-micronaut-app-maven-java.html|
295 |[quarkus-getting-started](test/hotspot/jtreg/premain/quarkus-getting-started) | https://quarkus.io/guides/getting-started|
296 |[spring-boot-getting-started](test/hotspot/jtreg/premain/spring-boot-getting-started) | https://spring.io/guides/gs/spring-boot|
297 |[spring-petclinic](test/hotspot/jtreg/premain/spring-petclinic) | https://github.com/spring-projects/spring-petclinic|
298 
299 *(FIXME: add a benchmark for javac)*
300 
301 ### Benchmarking Against JDK Main-line
302 
303 To can compare the performance of Leyden vs the main-line JDK, you need:
304 
305 - An official build of JDK 21
306 - An up-to-date build of the JDK main-line
307 - The latest Leyden build
308 - Maven (ideally 3.8 or later, as required by some of the demos). Note: if you are behind
309   a firewall, you may need to [set up proxies for Maven](https://maven.apache.org/guides/mini/guide-proxies.html)
310 
311 The same steps are used for benchmarking all of the above demos. For example:
312 
313 ```
314 $ cd helidon-quickstart-se
315 $ make PREMAIN_HOME=/repos/leyden/build/linux-x64/images/jdk \
316        MAINLINE_HOME=/repos/jdk/build/linux-x64/images/jdk \
317        BLDJDK_HOME=/usr/local/jdk21 \
318        bench
319 run,mainline default,mainline custom static CDS,premain custom static CDS only,premain CDS + AOT
320 1,398,244,144,107
321 2,387,247,142,108
322 3,428,238,143,107
323 4,391,252,142,111
324 5,417,247,141,107
325 6,390,239,139,127
326 7,387,247,145,111
327 8,387,240,147,110
328 9,388,242,147,108
329 10,400,242,167,108
330 Geomean,397.08,243.76,145.52,110.26
331 Stdev,13.55,4.19,7.50,5.73
332 Markdown snippets in mainline_vs_premain.md
333 ```
334 
335 The above command runs each configuration 10 times, in an interleaving order. This way
336 the noise of the system (background processes, thermo throttling, etc) is more likely to
337 be spread across the different runs.
338 
339 As is typical for benchmarking start-up performance, the numbers are not very steady.
340 It is best to plot
341 the results (as saved in the file `mainline_vs_premain.csv`) in a spreadsheet to check for
342 noise and other artifacts.
343 
344 The "make bench" target also generates GitHub markdown snippets (in the file `mainline_vs_premain.md`) for creating the
345 graphs below.
346 
347 ### Benchmarking Between Two Leyden Builds
348 
349 This is useful for Leyden developers to measure the benefits of a particular optimization.
350 The steps are similar to above, but we use the "make compare_premain_builds" target:
351 
352 ```
353 $ cd helidon-quickstart-se
354 $ make PM_OLD=/repos/leyden_old/build/linux-x64/images/jdk \
355        PM_NEW=/repos/leyden_new/build/linux-x64/images/jdk \
356        BLDJDK_HOME=/usr/local/jdk21 \
357        compare_premain_builds
358 Old build = /repos/leyden_old/build/linux-x64/images/jdk with options
359 New build = /repos/leyden_new/build/linux-x64/images/jdk with options
360 Run,Old CDS + AOT,New CDS + AOT
361 1,110,109
362 2,131,111
363 3,118,115
364 4,110,108
365 5,117,110
366 6,114,109
367 7,110,109
368 8,118,110
369 9,110,110
370 10,113,114
371 Geomean,114.94,110.48
372 Stdev,6.19,2.16
373 Markdown snippets in compare_premain_builds.md
374 ```
375 
376 Please see [test/hotspot/jtreg/premain/lib/Bench.gmk](test/hotspot/jtreg/premain/lib/Bench.gmk) for more details.
377 
378 Note: due to the variability of start-up time, the benefit of minor improvements may
379 be difficult to measure.
380 
381 ### Preliminary Benchmark Results
382 
383 The following charts show the relative start-up performance of the Leyden/Premain branch vs
384 the JDK main-line.
385 
386 For example, a number of "premain CDS + AOT : 291" indicates that if the application takes
387 1000 ms to start-up with the JDK main-line, it takes only 291 ms to start up when all the
388 current set of Leyden optimizations for CDS and AOT are enabled.
389 
390 The benchmark results are collected with `make bench` in the following directories:
391 
392 - `helidon-quickstart-se`
393 - `micronaut-first-app`
394 - `quarkus-getting-started`
395 - `spring-petclinic`
396 
397 These JDK versions were used in the comparisons:
398 
399 - JDK main-line: https://github.com/openjdk/jdk/commit/70944ca54ad0090c734bb5b3082beb33450c4877
400 - Leyden: https://github.com/openjdk/leyden/commit/9fa972214934d30f67db5fd4d1b8007636ac1428
401 
402 The benchmarks were executed on an 8-core Intel i7-10700 CPU @ 2.90GHz with 32GB RAM running Ubuntu 22.04.3 LTS.
403 
404 ### Helidon Quick Start (SE) Demo (3.44x improvement)
405 
406 ```mermaid
407 ---
408 config:
409     xyChart:
410         chartOrientation: horizontal
411         height: 300
412 ---
413 xychart-beta
414     x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"]
415     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
416     bar [1000, 632, 376, 291]
417 ```
418 
419 ### Micronaut First App Demo (2.83x improvement)
420 
421 ```mermaid
422 ---
423 config:
424     xyChart:
425         chartOrientation: horizontal
426         height: 300
427 ---
428 xychart-beta
429     x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"]
430     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
431     bar [1000, 558, 410, 353]
432 ```
433 
434 ### Quarkus Getting Started Demo (3.15x improvement)
435 
436 ```mermaid
437 ---
438 config:
439     xyChart:
440         chartOrientation: horizontal
441         height: 300
442 ---
443 xychart-beta
444     x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"]
445     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
446     bar [1000, 568, 395, 317]
447 ```
448 
449 ### Spring-boot Getting Started Demo (3.53x improvement)
450 
451 ```mermaid
452 ---
453 config:
454     xyChart:
455         chartOrientation: horizontal
456         height: 300
457 ---
458 xychart-beta
459     x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"]
460     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
461     bar [1000, 560, 394, 283]
462 ```
463 
464 ### Spring PetClinic Demo (2.72x improvement)
465 
466 ```mermaid
467 ---
468 config:
469     xyChart:
470         chartOrientation: horizontal
471         height: 300
472 ---
473 xychart-beta
474     x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"]
475     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
476     bar [1000, 695, 563, 368]
477 ```
478 
479 ## 6. More Documentation
480 
481 Please see [test/hotspot/jtreg/premain/](test/hotspot/jtreg/premain) for more information.