1 # Welcome to the JDK!
2
3 For build instructions please see the
4 [online documentation](https://openjdk.org/groups/build/doc/building.html),
5 or either of these files:
6
7 - [doc/building.html](doc/building.html) (html version)
8 - [doc/building.md](doc/building.md) (markdown version)
9
10 See <https://openjdk.org/> for more information about the OpenJDK
11 Community and the JDK and see <https://bugs.openjdk.org> for JDK issue
12 tracking.
|
1 # Welcome to the Leyden Prototype Repository!
2
3 The purpose of the Leyden repository is to prototype improvements to the
4 startup time, time to peak performance, and footprint of Java programs, as a part of
5 [Project Leyden](https://openjdk.org/projects/leyden). We solicit feedback from
6 the Java community, with the hope that some of these improvements can be eventually
7 incoporated in future JDK releases.
8
9 ## 0. Disclaimers
10
11 - *This repository contains experimental and unstable code. It is not intended to be used
12 in a production environment.*
13 - *This repository is intended for developers of the JDK, and advanced Java developers who
14 are familiar with building the JDK.*
15 - *The experimental features in this repository may be changed or removed without notice.
16 Command line flags and workflows will change.*
17 - *The benchmarks results reported on this page are for illustrative purposes only. Your
18 applications may get better or worse results.*
19
20 ## 1. Overview
21
22 The Leyden "[premain](https://github.com/openjdk/leyden/blob/premain/)" prototype
23 includes many optimizations that shift work from run time to earlier
24 executions of the application, which are
25 called _training runs_. In a training run, we pre-compute various kinds of information.
26 Importantly, we pre-compile
27 bytecode to native code, guided by observations of the application's actual behavior
28 during the training run.
29
30 The Leyden repository closely tracks the JDK main line. We are typically only a few weeks behind
31 the [main-line JDK repo](https://github.com/openjdk/jdk).
32
33 We have implemented the following improvements:
34
35 - **[Ahead-of-Time Class Loading & Linking (JEP 483)](https://openjdk.org/jeps/483)**:
36 This gives
37 the JVM the ability to put classes in the _linked_ state as soon the application starts up. As a result,
38 we can implement many other time shifting optimizations with considerably simplified assumptions.
39 - Please refer to the [JEP 483 document](https://openjdk.org/jeps/483) for more details.
40
41 - **[Ahead-of-Time Method Profiling (JEP draft 8325147)](https://openjdk.org/jeps/8325147)**: We store method profiles
42 from training runs in the CDS archive, thereby enabling the JIT to begin compiling earlier during warmup.
43 As a result, Java applications can reach peak performance faster.
44 - This feature is enabled by the new diagnostic (`-XX:+UnlockDiagnosticVMOptions`) VM flags `-XX:+RecordTraining` and `-XX:+ReplayTraining`.
45
46 - **[Ahead-of-Time Code Compilation (JEP draft 8335368)](https://openjdk.org/jeps/8335368)**: Methods that are frequently used during the training run can be
47 compiled and stored along with the CDS archive. As a result, as soon as the application starts up
48 in the production run, its methods can be can be natively executed.
49 - This feature is enabled by the new VM flags `-XX:+StoreCachedCode`, `-XX:+LoadCachedCode`, and `-XX:CachedCodeFile`.
50 - Currently, the native code is stored in a separate file, but our plans is to eventually store the native code
51 inside the CDS archive file.
52
53 - **Ahead-of-time resolution of constant pool entries**: many
54 constant pool entries are resolved during the assembly phase. This allows the application to start up faster. Also,
55 the existence of resolved constant pool entries allows the AOT compiler to generate better code.
56 For diagnostic purposes, you can use `-XX:+UnlockDiagnosticVMOptions -XX:-AOTInvokeDynamicLinking`
57 to disable the AOT linking of constant pool entries for the `invokedynamic` bytecode.
58
59 - **Ahead-of-time generation of [Dynamic Proxies](https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/reflect/Proxy.html)**:
60 Dynamic proxies are frequently used by popular application frameworks. We can improve start-up time by generating these proxies ahead of time.
61 - This feature is enabled by the new VM flag `-XX:+ArchiveDynamicProxies`.
62
63 - **Ahead-of-time generation of reflection data**: Reflection data (such as instances of
64 `java.lang.reflect.Method`) are generated by the JVM to support `java.lang.reflect` operations. We can
65 generate these ahead of time to improve start-up.
66 - This feature is enabled by the new VM flag `-XX:+ArchiveReflectionData`.
67
68 - **Class Not Found Cache**: Sometimes application frameworks repeatedly try to load classes that do not exist. This optimization allows such failing lookups to be done quickly without repeatedly scanning the class path.
69 - This feature is enabled by the new VM flag `-XX:+ArchiveLoaderLookupCache`.
70
71 By default, all optimizations listed above are enabled. This simplifies testing of the whole
72 prototype. If necessary for more detailed testing, each feature can
73 be individually disabled by negating its associated flag.
74
75 The names of all of these VM flags will change in a future EA build as we transition from the old “CDS” terminology to the new “AOT” terminology, as discussed [here](https://openjdk.org/jeps/483#History).
76
77 [CDS]: <https://docs.oracle.com/en/java/javase/22/vm/class-data-sharing.html>
78
79 ## 2. Building the Leyden Repository
80
81 The Leyden Repository can be built in the same way as the main-line JDK repository.
82 Please use the "premain" branch. I.e., [https://github.com/openjdk/leyden/tree/premain](https://github.com/openjdk/leyden/tree/premain).
83
84 For build instructions please see the
85 [online documentation](https://openjdk.org/groups/build/doc/building.html),
86 or either of these files:
87
88 - [doc/building.html](doc/building.html) (html version)
89 - [doc/building.md](doc/building.md) (markdown version)
90
91 See <https://openjdk.org/> for more information about the OpenJDK
92 Community and the JDK and see <https://bugs.openjdk.org> for JDK issue
93 tracking.
94
95 ## 3. Trying out Leyden Features
96
97 The easiest way to try out the Leyden optimizations is to build a JVM from the Leyden repository, and use it with your application with the `-XX:AOTCache` flag.
98
99 > Note: in an earlier version of the Leyden prototype, the optimizations were controlled by an experimental flag `-XX:CacheDataStore`. This flag has been deprecated and will be removed. For a reference to this flag, please see an [older version of this document](https://github.com/openjdk/leyden/blob/076c71f7cb9887ef3d64b752976610d19792203b/README.md).
100
101
102 Here's a small benchmark that uses the JDK's built-in
103 [`JavaCompiler`](https://docs.oracle.com/en/java/javase/21/docs/api/java.compiler/javax/tools/JavaCompiler.html)
104 class to compile some Java source files. This benchmark spends a significant amount of start-up time
105 setting up the classes used by `JavaCompiler`, so it will benefit from the Leyden features.
106
107 First, download [JavacBenchApp.java](https://github.com/iklam/jdk/raw/f95f851aed3d2bf06edabab1e7c24e15f4145d0d/test/hotspot/jtreg/runtime/cds/appcds/applications/JavacBenchApp.java)
108 and compile it into a JAR file.
109
110 (Remember to use the `java` program that you built from the Leyden repository.)
111
112 ```
113 $ javac JavacBenchApp.java
114 $ jar cvf JavacBenchApp.jar JavacBenchApp*.class
115 added manifest
116 adding: JavacBenchApp$ClassFile.class(in = 1608) (out= 787)(deflated 51%)
117 adding: JavacBenchApp$FileManager.class(in = 2090) (out= 979)(deflated 53%)
118 adding: JavacBenchApp$SourceFile.class(in = 1351) (out= 671)(deflated 50%)
119 adding: JavacBenchApp.class(in = 7571) (out= 3302)(deflated 56%)
120 ```
121
122 We can run this benchmark without any Leyden features. It takes 893 ms:
123
124 ```
125 $ java -cp JavacBenchApp.jar JavacBenchApp 50
126 Generated source code for 51 classes and compiled them in 893 ms
127 ```
128
129 To use AOT optimizations for JavacBenchApp, we should first perform a _training run_ and
130 capture the profiling information into `JavacBenchApp.aotconfig`
131
132 ```
133 $ java -XX:AOTMode=record -XX:AOTConfiguration=JavacBenchApp.aotconfig \
134 -cp JavacBenchApp.jar JavacBenchApp 50
135 $ ls -l JavacBenchApp.aotconfig
136 -rw-rw-r-- 1 iklam iklam 27652096 Mar 3 16:23 JavacBenchApp.aotconfig
137 ```
138
139 With the `JavacBenchApp.aotconfig` file, we can create the AOT cache. This is called the _assembly phase_:
140
141 ```
142 $ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
143 -cp JavacBenchApp.jar -XX:AOTCache=JavacBenchApp.aot
144 $ ls -l JavacBenchApp.aot
145 -r--r--r-- 1 iklam iklam 42332160 Mar 3 16:58 JavacBenchApp.aot
146 ```
147
148 Now, we can make a _production run_ of the program using the AOT cache `JavacBenchApp.aot`. It finishes in 423 ms, or more than twice as fast as
149 before.
150
151 ```
152 $ java -XX:AOTCache=JavacBenchApp.aot -cp JavacBenchApp.jar JavacBenchApp 50
153 Generated source code for 51 classes and compiled them in 423 ms
154 ```
155
156 By default, training runs end when the application terminates. You have two other options to end training runs:
157
158 - `-XX:AOTEndTrainingOnMethodEntry=<method1,method2,...>[,count=100]`
159 - `jcmd <pid> AOT.end_training`
160
161 Note that `-XX:AOTEndTrainingOnMethodEntry` uses the same format as `-XX:CompileOnly` and the default count is 1.
162
163 See [EndTrainingOnMethodEntry.java](test/hotspot/jtreg/runtime/cds/appcds/leyden/EndTrainingOnMethodEntry.java) for a test case.
164
165 ### Diagnostic VM Flags
166
167 By default, all of the optimizations described
168 in the [Overview](#1-overview) section above are enabled by default. This ensures that you can get all the optimizations
169 without specifying them individually.
170
171 For diagnostic purposes, you can selectively disable some of the options:
172
173 - The `-XX:+LoadCachedCode` and `-XX:+ReplayTraining` flags affect only the production run.
174 - The `-XX:+RecordTraining` option affects only the training run and the assembly phase.
175 - All other options affect only the assembly phase.
176
177 For example, you can disable the loading of AOT-compiled methods during the production run. Notice that the benchmark now
178 starts more slowly than it did when AOT-compiled methods was loaded.
179
180 ```
181 $ java -XX:AOTCache=JavacBenchApp.aot -Xlog:cds=error -XX:-LoadCachedCode \
182 -cp JavacBenchApp.jar JavacBenchApp 50
183 Generated source code for 51 classes and compiled them in 647 ms
184 ```
185
186 You can also disable AOT compilation in the assembly phase. Note that the size of the AOT
187 cache is smaller because it no longer has AOT-compiled methods.
188
189 ```
190 $ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
191 -cp JavacBenchApp.jar \
192 -XX:AOTCache=JavacBenchApp.aot -XX:-StoreCachedCode
193 $ ls -l JavacBenchApp.aot
194 -r--r--r-- 1 iklam iklam 29990912 Mar 3 16:34 JavacBenchApp.aot
195 ```
196
197
198 ## 4. Limitations of the Leyden Prototype
199
200 When trying out the Leyden, please pay attention to the following limitations.
201
202 ### The Same Garbage Collector Must be Used between Assembly Phase and Production Runs
203
204 The CDS archive generated by the Leyden prototype includes machine instructions that are specific to
205 the garbage collector. We recommend that you explicitly specify the same collector during both
206 training and production runs. For example:
207
208 ```
209 # assembly phase.
210 $ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
211 -cp JavacBenchApp.jar \
212 -XX:AOTCache=JavacBenchApp.aot -XX:+UseSerialGC
213
214 # production run
215 $ java -XX:AOTCache=JavacBenchApp.aot -XX:+UseSerialGC -cp JavacBenchApp.jar \
216 JavacBenchApp 50
217 ```
218
219 Otherwise, the CDS archive may not be useable for the production run, leading to suboptimal performance.
220 For example, sometimes you may perform the assembly phase run on a large development host, and then use
221 a container to run the application in a small production node. In the following scenario, as the collector
222 is not explicitly specified, the VM will automatically pick G1 for the assembly phase, and SerialGC for the
223 production run (due to its limited amount of memory):
224
225 ```
226 # Assembly phase (uses G1 by default)
227 $ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
228 -cp JavacBenchApp.jar -XX:AOTCache=JavacBenchApp.aot
229
230 # Production run (uses SerialGC)
231 $ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \
232 --memory=1024m \
233 container-registry.oracle.com/java/openjdk \
234 bash -c 'cd /test; ' \
235 '/jdk/bin/java -XX:AOTCache=JavacBenchApp.aot ' \
236 ' -cp JavacBenchApp.jar JavacBenchApp 50'
237 [0.001s][error][cds] CDS archive has aot-linked classes. It cannot be used because
238 GC used during dump time (G1) is not the same as runtime (Serial)
239 [0.001s][error][cds] An error has occurred while processing the AOT cache.
240 [0.001s][error][cds] Unable to map shared spaces
241 Error occurred during initialization of VM
242 Unable to use AOT cache.
243 ```
244
245 ### Only G1GC, SerialGC, ParallelGC, EpsilonGC, ShenandoahGC are Supported
246
247 Currently, if you use any other garbage collector in combination with `-XX:AOTMode` or `-XX:AOTCache`, the VM will
248 exit with an error.
249
250 ```
251 $ java -XX:AOTMode=record -XX:AOTConfiguration=JavacBenchApp.aotconfig \
252 -cp JavacBenchApp.jar -XX:+UseZGC JavacBenchApp 50
253 Error occurred during initialization of VM
254 Cannot create the AOT configuration file: UseCompressedClassPointers must be enabled,
255 and collector must be G1, Parallel, Serial, Epsilon, or Shenandoah
256 ```
257
258 ### -XX:AOTMode=on is Enabled by default
259
260 As seen in the example immediately above, in the production run, if the CDS archive cannot be
261 used for any reason, the JVM will report an error and exit. This happens as if `-XX:AOTMode=on` was
262 specified in the command-line.
263
264 In the standard JDK, when the CDS archive cannot be used for any reason (for example, the
265 archive was created for a different version of the JDK), the application will
266 continue to run without using CDS.
267 This fall-back strategy ensures that the application will function correctly, though at a lower level of performance.
268
269 With the Leyden prototype, we have changed this fall-back behavior to make it easier to diagnose
270 performance issues. For example, when the start-up time is not as good as one would expect, we
271 want know whether it's caused by a misconfiguration that prevents the CDS archive
272 from being used, or it's caused by a deficiency in the implementation of the Leyden optimizations.
273
274 To revert to the behavior of the standard JDK, you can explicitly add `-XX:AOTMode=auto` to the command-line.
275
276 ```
277 $ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \
278 --memory=1024m \
279 container-registry.oracle.com/java/openjdk \
280 bash -c 'cd /test; ' \
281 '/jdk/bin/java -XX:AOTMode=auto -XX:AOTCache=JavacBenchApp.aot ' \
282 ' -cp JavacBenchApp.jar JavacBenchApp 50'
283 [0.001s][error][cds] CDS archive has aot-linked classes. It cannot be used because
284 GC used during dump time (G1) is not the same as runtime (Serial)
285 Generated source code for 51 classes and compiled them in 831 ms
286 ```
287
288 See [JEP 483](https://openjdk.org/jeps/483) for a discussion of `-XX:AOTMode=on` vs `-XX:AOTMode=auto`.
289
290
291 ## 5. Benchmarking
292
293 We use a small set of benchmarks to demonstrate the performance of the optimizations in the Leyden repo.
294
295 | Benchmark | Source |
296 | ------------- | ------------- |
297 |[helidon-quickstart-se](test/hotspot/jtreg/premain/helidon-quickstart-se) | https://helidon.io/docs/v4/se/guides/quickstart|
298 |[micronaut-first-app](test/hotspot/jtreg/premain/micronaut-first-app) | https://guides.micronaut.io/latest/creating-your-first-micronaut-app-maven-java.html|
299 |[quarkus-getting-started](test/hotspot/jtreg/premain/quarkus-getting-started) | https://quarkus.io/guides/getting-started|
300 |[spring-boot-getting-started](test/hotspot/jtreg/premain/spring-boot-getting-started) | https://spring.io/guides/gs/spring-boot|
301 |[spring-petclinic](test/hotspot/jtreg/premain/spring-petclinic) | https://github.com/spring-projects/spring-petclinic|
302
303 *(FIXME: add a benchmark for javac)*
304
305 ### Benchmarking Against JDK Main-line
306
307 To can compare the performance of Leyden vs the main-line JDK, you need:
308
309 - An official build of JDK 21
310 - An up-to-date build of the JDK main-line
311 - The latest Leyden build
312 - Maven (ideally 3.8 or later, as required by some of the demos). Note: if you are behind
313 a firewall, you may need to [set up proxies for Maven](https://maven.apache.org/guides/mini/guide-proxies.html)
314
315 The same steps are used for benchmarking all of the above demos. For example:
316
317 ```
318 $ cd helidon-quickstart-se
319 $ make PREMAIN_HOME=/repos/leyden/build/linux-x64/images/jdk \
320 MAINLINE_HOME=/repos/jdk/build/linux-x64/images/jdk \
321 BLDJDK_HOME=/usr/local/jdk21 \
322 bench
323 run,mainline default,mainline custom static cds,mainline aot cache,premain aot cache
324 1,456,229,156,117
325 2,453,227,157,117
326 3,455,232,155,116
327 4,448,230,154,114
328 5,440,228,156,114
329 6,446,228,156,114
330 7,448,232,156,114
331 8,465,261,159,114
332 9,448,226,157,113
333 10,442,233,154,114
334 Geomean,450.05,232.41,155.99,114.69
335 Stdev,6.98,9.72,1.41,1.35
336 Markdown snippets in mainline_vs_premain.md
337 ```
338
339 The above command runs each configuration 10 times, in an interleaving order. This way
340 the noise of the system (background processes, thermo throttling, etc) is more likely to
341 be spread across the different runs.
342
343 As is typical for benchmarking start-up performance, the numbers are not very steady.
344 It is best to plot
345 the results (as saved in the file `mainline_vs_premain.csv`) in a spreadsheet to check for
346 noise and other artifacts.
347
348 The "make bench" target also generates GitHub markdown snippets (in the file `mainline_vs_premain.md`) for creating the
349 graphs below.
350
351 ### Benchmarking Between Two Leyden Builds
352
353 This is useful for Leyden developers to measure the benefits of a particular optimization.
354 The steps are similar to above, but we use the "make compare_premain_builds" target:
355
356 ```
357 $ cd helidon-quickstart-se
358 $ make PM_OLD=/repos/leyden_old/build/linux-x64/images/jdk \
359 PM_NEW=/repos/leyden_new/build/linux-x64/images/jdk \
360 BLDJDK_HOME=/usr/local/jdk21 \
361 compare_premain_builds
362 Old build = /repos/leyden_old/build/linux-x64/images/jdk with options
363 New build = /repos/leyden_new/build/linux-x64/images/jdk with options
364 Run,Old CDS + AOT,New CDS + AOT
365 1,110,109
366 2,131,111
367 3,118,115
368 4,110,108
369 5,117,110
370 6,114,109
371 7,110,109
372 8,118,110
373 9,110,110
374 10,113,114
375 Geomean,114.94,110.48
376 Stdev,6.19,2.16
377 Markdown snippets in compare_premain_builds.md
378 ```
379
380 Please see [test/hotspot/jtreg/premain/lib/Bench.gmk](test/hotspot/jtreg/premain/lib/Bench.gmk) for more details.
381
382 Note: due to the variability of start-up time, the benefit of minor improvements may
383 be difficult to measure.
384
385 ### Preliminary Benchmark Results
386
387 The following charts show the relative start-up performance of the Leyden/Premain branch vs
388 the JDK main-line.
389
390 For example, a number of "premain aot cache: 255" indicates that if the application takes
391 1000 ms to start-up with the JDK main-line, it takes only 255 ms to start up when all the
392 current set of Leyden optimizations are enabled.
393
394 The benchmark results are collected with `make bench` in the following directories:
395
396 - `helidon-quickstart-se`
397 - `micronaut-first-app`
398 - `quarkus-getting-started`
399 - `spring-boot-getting-started`
400 - `spring-petclinic`
401
402 The meaning of the four rows in the following the charts:
403
404 | Row | Meaning |
405 | ------------- | ------------- |
406 | **mainline default** |Run benchmark with no optimizations|
407 | **mainline custom static cds** |Run benchmark with a custom static CDS archive|
408 | **mainline aot cache** |Run benchmark with a custom AOT cache (JEP 483)|
409 | **premain aot cache** |Run benchmark with a custom AOT cache, plus all Leyden optimizations such as AOT profiles and AOT-compiled methods|
410
411 These JDK versions were used in the comparisons:
412
413 - JDK main-line: JDK 24, build 24+36-3646
414 - Leyden: https://github.com/openjdk/leyden/tree/bbac8f2d845aa6408182ca3ff9ce60b5ca6e0390
415
416 For details information about the hardware and raw numbers, see [bench.20250307.txt](test/hotspot/jtreg/premain/bench_data/bench.20250307.txt)
417
418 ### Helidon Quick Start (SE) Demo (3.92x improvement)
419
420 ```mermaid
421 ---
422 config:
423 xyChart:
424 chartOrientation: horizontal
425 height: 300
426 ---
427 xychart-beta
428 x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
429 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
430 bar [1000, 516, 347, 255]
431 ```
432
433 ### Micronaut First App Demo (3.12x improvement)
434
435 ```mermaid
436 ---
437 config:
438 xyChart:
439 chartOrientation: horizontal
440 height: 300
441 ---
442 xychart-beta
443 x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
444 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
445 bar [1000, 475, 366, 321]
446 ```
447
448 ### Quarkus Getting Started Demo (3.52x improvement)
449
450 ```mermaid
451 ---
452 config:
453 xyChart:
454 chartOrientation: horizontal
455 height: 300
456 ---
457 xychart-beta
458 x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
459 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
460 bar [1000, 437, 380, 284]
461 ```
462
463 ### Spring-boot Getting Started Demo (3.48x improvement)
464
465 ```mermaid
466 ---
467 config:
468 xyChart:
469 chartOrientation: horizontal
470 height: 300
471 ---
472 xychart-beta
473 x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
474 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
475 bar [1000, 502, 382, 287]
476 ```
477
478 ### Spring PetClinic Demo (2.65x improvement)
479
480 ```mermaid
481 ---
482 config:
483 xyChart:
484 chartOrientation: horizontal
485 height: 300
486 ---
487 xychart-beta
488 x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
489 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
490 bar [1000, 625, 586, 376]
491 ```
492
493 ## 6. More Documentation
494
495 Please see [test/hotspot/jtreg/premain/](test/hotspot/jtreg/premain) for more information.
|