1 # Welcome to the Leyden Prototype Repository! 2 3 The purpose of the Leyden repository is to prototype improvements to the 4 startup time, time to peak performance, and footprint of Java programs, as a part of 5 [Project Leyden](https://openjdk.org/projects/leyden). We solicit feedback from 6 the Java community, with the hope that some of these improvements can be eventually 7 incoporated in future JDK releases. 8 9 ## 0. Disclaimers 10 11 - *This repository contains experimental and unstable code. It is not intended to be used 12 in a production environment.* 13 - *This repository is intended for developers of the JDK, and advanced Java developers who 14 are familiar with building the JDK.* 15 - *The experimental features in this repository may be changed or removed without notice. 16 Command line flags and workflows will change.* 17 - *The benchmarks results reported on this page are for illustrative purposes only. Your 18 applications may get better or worse results.* 19 20 ## 1. Overview 21 22 The Leyden "[premain](https://github.com/openjdk/leyden/blob/premain/)" prototype 23 includes many optimizations that shift work from run time to earlier 24 executions of the application, which are 25 called _training runs_. In a training run, we pre-compute various kinds of information. 26 Importantly, we pre-compile 27 bytecode to native code, guided by observations of the application's actual behavior 28 during the training run. 29 30 The Leyden repository closely tracks the JDK main line. We are typically only a few weeks behind 31 the [main-line JDK repo](https://github.com/openjdk/jdk). 32 33 We have implemented the following improvements: 34 35 - **[Ahead-of-Time Class Loading & Linking (JEP 483)](https://openjdk.org/jeps/483)**: 36 This gives 37 the JVM the ability to put classes in the _linked_ state as soon the application starts up. As a result, 38 we can implement many other time shifting optimizations with considerably simplified assumptions. 39 - Please refer to the [JEP 483 document](https://openjdk.org/jeps/483) for more details. 40 41 - **[Ahead-of-Time Method Profiling (JEP draft 8325147)](https://openjdk.org/jeps/8325147)**: We store method profiles 42 from training runs in the CDS archive, thereby enabling the JIT to begin compiling earlier during warmup. 43 As a result, Java applications can reach peak performance faster. 44 - This feature is enabled by the new diagnostic (`-XX:+UnlockDiagnosticVMOptions`) VM flags `-XX:+RecordTraining` and `-XX:+ReplayTraining`. 45 46 - **[Ahead-of-Time Code Compilation (JEP draft 8335368)](https://openjdk.org/jeps/8335368)**: Methods that are frequently used during the training run can be 47 compiled and stored along with the CDS archive. As a result, as soon as the application starts up 48 in the production run, its methods can be can be natively executed. 49 - This feature is enabled by the new VM flags `-XX:+StoreCachedCode`, `-XX:+LoadCachedCode`, and `-XX:CachedCodeFile`. 50 - Currently, the native code is stored in a separate file, but our plans is to eventually store the native code 51 inside the CDS archive file. 52 53 - **Ahead-of-time resolution of constant pool entries**: many 54 constant pool entries are resolved during the assembly phase. This allows the application to start up faster. Also, 55 the existence of resolved constant pool entries allows the AOT compiler to generate better code. 56 For diagnostic purposes, you can use `-XX:+UnlockDiagnosticVMOptions -XX:-AOTInvokeDynamicLinking` 57 to disable the AOT linking of constant pool entries for the `invokedynamic` bytecode. 58 59 - **Ahead-of-time generation of [Dynamic Proxies](https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/reflect/Proxy.html)**: 60 Dynamic proxies are frequently used by popular application frameworks. We can improve start-up time by generating these proxies ahead of time. 61 - This feature is enabled by the new VM flag `-XX:+ArchiveDynamicProxies`. 62 63 - **Ahead-of-time generation of reflection data**: Reflection data (such as instances of 64 `java.lang.reflect.Method`) are generated by the JVM to support `java.lang.reflect` operations. We can 65 generate these ahead of time to improve start-up. 66 - This feature is enabled by the new VM flag `-XX:+ArchiveReflectionData`. 67 68 - **Class Not Found Cache**: Sometimes application frameworks repeatedly try to load classes that do not exist. This optimization allows such failing lookups to be done quickly without repeatedly scanning the class path. 69 - This feature is enabled by the new VM flag `-XX:+ArchiveLoaderLookupCache`. 70 71 By default, all optimizations listed above are enabled. This simplifies testing of the whole 72 prototype. If necessary for more detailed testing, each feature can 73 be individually disabled by negating its associated flag. 74 75 The names of all of these VM flags will change in a future EA build as we transition from the old “CDS” terminology to the new “AOT” terminology, as discussed [here](https://openjdk.org/jeps/483#History). 76 77 [CDS]: <https://docs.oracle.com/en/java/javase/22/vm/class-data-sharing.html> 78 79 ## 2. Building the Leyden Repository 80 81 The Leyden Repository can be built in the same way as the main-line JDK repository. 82 Please use the "premain" branch. I.e., [https://github.com/openjdk/leyden/tree/premain](https://github.com/openjdk/leyden/tree/premain). 83 84 For build instructions please see the 85 [online documentation](https://openjdk.org/groups/build/doc/building.html), 86 or either of these files: 87 88 - [doc/building.html](doc/building.html) (html version) 89 - [doc/building.md](doc/building.md) (markdown version) 90 91 See <https://openjdk.org/> for more information about the OpenJDK 92 Community and the JDK and see <https://bugs.openjdk.org> for JDK issue 93 tracking. 94 95 ## 3. Trying out Leyden Features 96 97 The easiest way to try out the Leyden optimizations is to build a JVM from the Leyden repository, and use it with your application with the `-XX:AOTCache` flag. 98 99 > Note: in an earlier version of the Leyden prototype, the optimizations were controlled by an experimental flag `-XX:CacheDataStore`. This flag has been deprecated and will be removed. For a reference to this flag, please see an [older version of this document](https://github.com/openjdk/leyden/blob/076c71f7cb9887ef3d64b752976610d19792203b/README.md). 100 101 102 Here's a small benchmark that uses the JDK's built-in 103 [`JavaCompiler`](https://docs.oracle.com/en/java/javase/21/docs/api/java.compiler/javax/tools/JavaCompiler.html) 104 class to compile some Java source files. This benchmark spends a significant amount of start-up time 105 setting up the classes used by `JavaCompiler`, so it will benefit from the Leyden features. 106 107 First, download [JavacBenchApp.java](https://github.com/iklam/jdk/raw/f95f851aed3d2bf06edabab1e7c24e15f4145d0d/test/hotspot/jtreg/runtime/cds/appcds/applications/JavacBenchApp.java) 108 and compile it into a JAR file. 109 110 (Remember to use the `java` program that you built from the Leyden repository.) 111 112 ``` 113 $ javac JavacBenchApp.java 114 $ jar cvf JavacBenchApp.jar JavacBenchApp*.class 115 added manifest 116 adding: JavacBenchApp$ClassFile.class(in = 1608) (out= 787)(deflated 51%) 117 adding: JavacBenchApp$FileManager.class(in = 2090) (out= 979)(deflated 53%) 118 adding: JavacBenchApp$SourceFile.class(in = 1351) (out= 671)(deflated 50%) 119 adding: JavacBenchApp.class(in = 7571) (out= 3302)(deflated 56%) 120 ``` 121 122 We can run this benchmark without any Leyden features. It takes 893 ms: 123 124 ``` 125 $ java -cp JavacBenchApp.jar JavacBenchApp 50 126 Generated source code for 51 classes and compiled them in 893 ms 127 ``` 128 129 To use AOT optimizations for JavacBenchApp, we should first perform a _training run_ and 130 capture the profiling information into `JavacBenchApp.aotconfig` 131 132 ``` 133 $ java -XX:AOTMode=record -XX:AOTConfiguration=JavacBenchApp.aotconfig \ 134 -cp JavacBenchApp.jar JavacBenchApp 50 135 $ ls -l JavacBenchApp.aotconfig 136 -rw-rw-r-- 1 iklam iklam 27652096 Mar 3 16:23 JavacBenchApp.aotconfig 137 ``` 138 139 With the `JavacBenchApp.aotconfig` file, we can create the AOT cache. This is called the _assembly phase_: 140 141 ``` 142 $ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \ 143 -cp JavacBenchApp.jar -XX:AOTCache=JavacBenchApp.aot 144 $ ls -l JavacBenchApp.aot 145 -r--r--r-- 1 iklam iklam 42332160 Mar 3 16:58 JavacBenchApp.aot 146 ``` 147 148 Now, we can make a _production run_ of the program using the AOT cache `JavacBenchApp.aot`. It finishes in 423 ms, or more than twice as fast as 149 before. 150 151 ``` 152 $ java -XX:AOTCache=JavacBenchApp.aot -cp JavacBenchApp.jar JavacBenchApp 50 153 Generated source code for 51 classes and compiled them in 423 ms 154 ``` 155 156 By default, training runs end when the application terminates. You have two other options to end training runs: 157 158 - `-XX:AOTEndTrainingOnMethodEntry=<method1,method2,...>[,count=100]` 159 - `jcmd <pid> AOT.end_training` 160 161 Note that `-XX:AOTEndTrainingOnMethodEntry` uses the same format as `-XX:CompileOnly` and the default count is 1. 162 163 See [EndTrainingOnMethodEntry.java](test/hotspot/jtreg/runtime/cds/appcds/leyden/EndTrainingOnMethodEntry.java) for a test case. 164 165 ### Diagnostic VM Flags 166 167 By default, all of the optimizations described 168 in the [Overview](#1-overview) section above are enabled by default. This ensures that you can get all the optimizations 169 without specifying them individually. 170 171 For diagnostic purposes, you can selectively disable some of the options: 172 173 - The `-XX:+LoadCachedCode` and `-XX:+ReplayTraining` flags affect only the production run. 174 - The `-XX:+RecordTraining` option affects only the training run and the assembly phase. 175 - All other options affect only the assembly phase. 176 177 For example, you can disable the loading of AOT-compiled methods during the production run. Notice that the benchmark now 178 starts more slowly than it did when AOT-compiled methods was loaded. 179 180 ``` 181 $ java -XX:AOTCache=JavacBenchApp.aot -Xlog:cds=error -XX:-LoadCachedCode \ 182 -cp JavacBenchApp.jar JavacBenchApp 50 183 Generated source code for 51 classes and compiled them in 647 ms 184 ``` 185 186 You can also disable AOT compilation in the assembly phase. Note that the size of the AOT 187 cache is smaller because it no longer has AOT-compiled methods. 188 189 ``` 190 $ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \ 191 -cp JavacBenchApp.jar \ 192 -XX:AOTCache=JavacBenchApp.aot -XX:-StoreCachedCode 193 $ ls -l JavacBenchApp.aot 194 -r--r--r-- 1 iklam iklam 29990912 Mar 3 16:34 JavacBenchApp.aot 195 ``` 196 197 198 ## 4. Limitations of the Leyden Prototype 199 200 When trying out the Leyden, please pay attention to the following limitations. 201 202 ### The Same Garbage Collector Must be Used between Assembly Phase and Production Runs 203 204 The CDS archive generated by the Leyden prototype includes machine instructions that are specific to 205 the garbage collector. We recommend that you explicitly specify the same collector during both 206 training and production runs. For example: 207 208 ``` 209 # assembly phase. 210 $ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \ 211 -cp JavacBenchApp.jar \ 212 -XX:AOTCache=JavacBenchApp.aot -XX:+UseSerialGC 213 214 # production run 215 $ java -XX:AOTCache=JavacBenchApp.aot -XX:+UseSerialGC -cp JavacBenchApp.jar \ 216 JavacBenchApp 50 217 ``` 218 219 Otherwise, the CDS archive may not be useable for the production run, leading to suboptimal performance. 220 For example, sometimes you may perform the assembly phase run on a large development host, and then use 221 a container to run the application in a small production node. In the following scenario, as the collector 222 is not explicitly specified, the VM will automatically pick G1 for the assembly phase, and SerialGC for the 223 production run (due to its limited amount of memory): 224 225 ``` 226 # Assembly phase (uses G1 by default) 227 $ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \ 228 -cp JavacBenchApp.jar -XX:AOTCache=JavacBenchApp.aot 229 230 # Production run (uses SerialGC) 231 $ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \ 232 --memory=1024m \ 233 container-registry.oracle.com/java/openjdk \ 234 bash -c 'cd /test; ' \ 235 '/jdk/bin/java -XX:AOTCache=JavacBenchApp.aot ' \ 236 ' -cp JavacBenchApp.jar JavacBenchApp 50' 237 [0.001s][error][cds] CDS archive has aot-linked classes. It cannot be used because 238 GC used during dump time (G1) is not the same as runtime (Serial) 239 [0.001s][error][cds] An error has occurred while processing the AOT cache. 240 [0.001s][error][cds] Unable to map shared spaces 241 Error occurred during initialization of VM 242 Unable to use AOT cache. 243 ``` 244 245 ### Only G1GC, SerialGC, ParallelGC, EpsilonGC, ShenandoahGC are Supported 246 247 Currently, if you use any other garbage collector in combination with `-XX:AOTMode` or `-XX:AOTCache`, the VM will 248 exit with an error. 249 250 ``` 251 $ java -XX:AOTMode=record -XX:AOTConfiguration=JavacBenchApp.aotconfig \ 252 -cp JavacBenchApp.jar -XX:+UseZGC JavacBenchApp 50 253 Error occurred during initialization of VM 254 Cannot create the AOT configuration file: UseCompressedClassPointers must be enabled, 255 and collector must be G1, Parallel, Serial, Epsilon, or Shenandoah 256 ``` 257 258 ### -XX:AOTMode=on is Enabled by default 259 260 As seen in the example immediately above, in the production run, if the CDS archive cannot be 261 used for any reason, the JVM will report an error and exit. This happens as if `-XX:AOTMode=on` was 262 specified in the command-line. 263 264 In the standard JDK, when the CDS archive cannot be used for any reason (for example, the 265 archive was created for a different version of the JDK), the application will 266 continue to run without using CDS. 267 This fall-back strategy ensures that the application will function correctly, though at a lower level of performance. 268 269 With the Leyden prototype, we have changed this fall-back behavior to make it easier to diagnose 270 performance issues. For example, when the start-up time is not as good as one would expect, we 271 want know whether it's caused by a misconfiguration that prevents the CDS archive 272 from being used, or it's caused by a deficiency in the implementation of the Leyden optimizations. 273 274 To revert to the behavior of the standard JDK, you can explicitly add `-XX:AOTMode=auto` to the command-line. 275 276 ``` 277 $ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \ 278 --memory=1024m \ 279 container-registry.oracle.com/java/openjdk \ 280 bash -c 'cd /test; ' \ 281 '/jdk/bin/java -XX:AOTMode=auto -XX:AOTCache=JavacBenchApp.aot ' \ 282 ' -cp JavacBenchApp.jar JavacBenchApp 50' 283 [0.001s][error][cds] CDS archive has aot-linked classes. It cannot be used because 284 GC used during dump time (G1) is not the same as runtime (Serial) 285 Generated source code for 51 classes and compiled them in 831 ms 286 ``` 287 288 See [JEP 483](https://openjdk.org/jeps/483) for a discussion of `-XX:AOTMode=on` vs `-XX:AOTMode=auto`. 289 290 291 ## 5. Benchmarking 292 293 We use a small set of benchmarks to demonstrate the performance of the optimizations in the Leyden repo. 294 295 | Benchmark | Source | 296 | ------------- | ------------- | 297 |[helidon-quickstart-se](test/hotspot/jtreg/premain/helidon-quickstart-se) | https://helidon.io/docs/v4/se/guides/quickstart| 298 |[micronaut-first-app](test/hotspot/jtreg/premain/micronaut-first-app) | https://guides.micronaut.io/latest/creating-your-first-micronaut-app-maven-java.html| 299 |[quarkus-getting-started](test/hotspot/jtreg/premain/quarkus-getting-started) | https://quarkus.io/guides/getting-started| 300 |[spring-boot-getting-started](test/hotspot/jtreg/premain/spring-boot-getting-started) | https://spring.io/guides/gs/spring-boot| 301 |[spring-petclinic](test/hotspot/jtreg/premain/spring-petclinic) | https://github.com/spring-projects/spring-petclinic| 302 303 *(FIXME: add a benchmark for javac)* 304 305 ### Benchmarking Against JDK Main-line 306 307 To can compare the performance of Leyden vs the main-line JDK, you need: 308 309 - An official build of JDK 21 310 - An up-to-date build of the JDK main-line 311 - The latest Leyden build 312 - Maven (ideally 3.8 or later, as required by some of the demos). Note: if you are behind 313 a firewall, you may need to [set up proxies for Maven](https://maven.apache.org/guides/mini/guide-proxies.html) 314 315 The same steps are used for benchmarking all of the above demos. For example: 316 317 ``` 318 $ cd helidon-quickstart-se 319 $ make PREMAIN_HOME=/repos/leyden/build/linux-x64/images/jdk \ 320 MAINLINE_HOME=/repos/jdk/build/linux-x64/images/jdk \ 321 BLDJDK_HOME=/usr/local/jdk21 \ 322 bench 323 run,mainline default,mainline custom static cds,mainline aot cache,premain aot cache 324 1,456,229,156,117 325 2,453,227,157,117 326 3,455,232,155,116 327 4,448,230,154,114 328 5,440,228,156,114 329 6,446,228,156,114 330 7,448,232,156,114 331 8,465,261,159,114 332 9,448,226,157,113 333 10,442,233,154,114 334 Geomean,450.05,232.41,155.99,114.69 335 Stdev,6.98,9.72,1.41,1.35 336 Markdown snippets in mainline_vs_premain.md 337 ``` 338 339 The above command runs each configuration 10 times, in an interleaving order. This way 340 the noise of the system (background processes, thermo throttling, etc) is more likely to 341 be spread across the different runs. 342 343 As is typical for benchmarking start-up performance, the numbers are not very steady. 344 It is best to plot 345 the results (as saved in the file `mainline_vs_premain.csv`) in a spreadsheet to check for 346 noise and other artifacts. 347 348 The "make bench" target also generates GitHub markdown snippets (in the file `mainline_vs_premain.md`) for creating the 349 graphs below. 350 351 ### Benchmarking Between Two Leyden Builds 352 353 This is useful for Leyden developers to measure the benefits of a particular optimization. 354 The steps are similar to above, but we use the "make compare_premain_builds" target: 355 356 ``` 357 $ cd helidon-quickstart-se 358 $ make PM_OLD=/repos/leyden_old/build/linux-x64/images/jdk \ 359 PM_NEW=/repos/leyden_new/build/linux-x64/images/jdk \ 360 BLDJDK_HOME=/usr/local/jdk21 \ 361 compare_premain_builds 362 Old build = /repos/leyden_old/build/linux-x64/images/jdk with options 363 New build = /repos/leyden_new/build/linux-x64/images/jdk with options 364 Run,Old CDS + AOT,New CDS + AOT 365 1,110,109 366 2,131,111 367 3,118,115 368 4,110,108 369 5,117,110 370 6,114,109 371 7,110,109 372 8,118,110 373 9,110,110 374 10,113,114 375 Geomean,114.94,110.48 376 Stdev,6.19,2.16 377 Markdown snippets in compare_premain_builds.md 378 ``` 379 380 Please see [test/hotspot/jtreg/premain/lib/Bench.gmk](test/hotspot/jtreg/premain/lib/Bench.gmk) for more details. 381 382 Note: due to the variability of start-up time, the benefit of minor improvements may 383 be difficult to measure. 384 385 ### Preliminary Benchmark Results 386 387 The following charts show the relative start-up performance of the Leyden/Premain branch vs 388 the JDK main-line. 389 390 For example, a number of "premain aot cache: 255" indicates that if the application takes 391 1000 ms to start-up with the JDK main-line, it takes only 255 ms to start up when all the 392 current set of Leyden optimizations are enabled. 393 394 The benchmark results are collected with `make bench` in the following directories: 395 396 - `helidon-quickstart-se` 397 - `micronaut-first-app` 398 - `quarkus-getting-started` 399 - `spring-boot-getting-started` 400 - `spring-petclinic` 401 402 The meaning of the four rows in the following the charts: 403 404 | Row | Meaning | 405 | ------------- | ------------- | 406 | **mainline default** |Run benchmark with no optimizations| 407 | **mainline custom static cds** |Run benchmark with a custom static CDS archive| 408 | **mainline aot cache** |Run benchmark with a custom AOT cache (JEP 483)| 409 | **premain aot cache** |Run benchmark with a custom AOT cache, plus all Leyden optimizations such as AOT profiles and AOT-compiled methods| 410 411 These JDK versions were used in the comparisons: 412 413 - JDK main-line: JDK 24, build 24+36-3646 414 - Leyden: https://github.com/openjdk/leyden/tree/bbac8f2d845aa6408182ca3ff9ce60b5ca6e0390 415 416 For details information about the hardware and raw numbers, see [bench.20250307.txt](test/hotspot/jtreg/premain/bench_data/bench.20250307.txt) 417 418 ### Helidon Quick Start (SE) Demo (3.92x improvement) 419 420 ```mermaid 421 --- 422 config: 423 xyChart: 424 chartOrientation: horizontal 425 height: 300 426 --- 427 xychart-beta 428 x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"] 429 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000 430 bar [1000, 516, 347, 255] 431 ``` 432 433 ### Micronaut First App Demo (3.12x improvement) 434 435 ```mermaid 436 --- 437 config: 438 xyChart: 439 chartOrientation: horizontal 440 height: 300 441 --- 442 xychart-beta 443 x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"] 444 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000 445 bar [1000, 475, 366, 321] 446 ``` 447 448 ### Quarkus Getting Started Demo (3.52x improvement) 449 450 ```mermaid 451 --- 452 config: 453 xyChart: 454 chartOrientation: horizontal 455 height: 300 456 --- 457 xychart-beta 458 x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"] 459 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000 460 bar [1000, 437, 380, 284] 461 ``` 462 463 ### Spring-boot Getting Started Demo (3.48x improvement) 464 465 ```mermaid 466 --- 467 config: 468 xyChart: 469 chartOrientation: horizontal 470 height: 300 471 --- 472 xychart-beta 473 x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"] 474 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000 475 bar [1000, 502, 382, 287] 476 ``` 477 478 ### Spring PetClinic Demo (2.65x improvement) 479 480 ```mermaid 481 --- 482 config: 483 xyChart: 484 chartOrientation: horizontal 485 height: 300 486 --- 487 xychart-beta 488 x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"] 489 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000 490 bar [1000, 625, 586, 376] 491 ``` 492 493 ## 6. More Documentation 494 495 Please see [test/hotspot/jtreg/premain/](test/hotspot/jtreg/premain) for more information.