1 # Welcome to the Leyden Prototype Repository! 2 3 The purpose of the Leyden repository is to prototype improvements to the 4 startup time, time to peak performance, and footprint of Java programs, as a part of 5 [Project Leyden](https://openjdk.org/projects/leyden). We solicit feedback from 6 the Java community, with the hope that some of these improvements can be eventually 7 incoporated in future JDK releases. 8 9 ## 0. Disclaimers 10 11 - *This repository contains experimental and unstable code. It is not intended to be used 12 in a production environment.* 13 - *This repository is intended for developers of the JDK, and advanced Java developers who 14 are familiar with building the JDK.* 15 - *The experimental features in this repository may be changed or removed without notice. 16 Command line flags and workflows will change.* 17 - *The benchmarks results reported on this page are for illustrative purposes only. Your 18 applications may get better or worse results.* 19 20 ## 1. Overview 21 22 The Leyden "[premain](https://github.com/openjdk/leyden/blob/premain/)" prototype 23 includes many optimizations that shift work from run time to earlier 24 executions of the application, which are 25 called _training runs_. In a training run, we pre-compute various kinds of information. 26 Importantly, we pre-compile 27 bytecode to native code, guided by observations of the application's actual behavior 28 during the training run. 29 30 The Leyden repository closely tracks the JDK main line. We are typically only a few weeks behind 31 the [main-line JDK repo](https://github.com/openjdk/jdk). 32 33 We have implemented the following improvements over the JDK main line: 34 35 - **[Ahead-of-Time Class Loading & Linking (JEP 483)](https://openjdk.org/jeps/483)**: 36 This gives 37 the JVM the ability to put classes in the _linked_ state as soon the application starts up. As a result, 38 we can implement many other time shifting optimizations with considerably simplified assumptions. 39 - This feature is accessed with the new VM flag `-XX:+PreloadSharedClasses`. 40 41 - **[Unified Ahead-of-Time Cache (JEP draft 8320264)](https://openjdk.org/jeps/8320264)**: 42 This enhancement to [CDS] is foundational to the features that follow. 43 - It enables [CDS] to store not only class metadata and heap objects (as before), 44 but also profiling data and compiled code. 45 - This feature is accessed with the new VM flag `-XX:CacheDataStore`. 46 - This option simplifies the creation of the CDS archive, and also the testing 47 of all the prototype features listed here. 48 49 - **[Ahead-of-Time Method Profiling (JEP draft 8325147)](https://openjdk.org/jeps/8325147)**: We store method profiles 50 from training runs in the CDS archive, thereby enabling the JIT to begin compiling earlier during warmup. 51 As a result, Java applications can reach peak performance faster. 52 - This feature is enabled by the new VM flags `-XX:+RecordTraining` and `-XX:+ReplayTraining`. 53 54 - **Ahead-of-time resolution of constant pool entries**: many 55 constant pool entries are resolved during the assembly phase. This allows the application to start up faster. Also, 56 the existence of resolved constant pool entries allows the AOT compiler to generate better code. 57 For diagnostic purposes, you can use `-XX:+UnlockDiagnosticVMOptions -XX:-AOTInvokeDynamicLinking` 58 to disable the AOT linking of constant pool entries for the `invokedynamic` bytecode. 59 60 - **[Ahead-of-Time Code Compilation (JEP draft 8335368)](https://openjdk.org/jeps/8335368)**: Methods that are frequently used during the training run can be 61 compiled and stored along with the CDS archive. As a result, as soon as the application starts up 62 in the production run, its methods can be can be natively executed. 63 - This feature is enabled by the new VM flags `-XX:+StoreCachedCode`, `-XX:+LoadCachedCode`, and `-XX:CachedCodeFile`. 64 - Currently, the native code is stored in a separate file, but our plans is to eventually store the native code 65 inside the CDS archive file. 66 67 - **Ahead-of-time generation of [Dynamic Proxies](https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/reflect/Proxy.html)**: 68 Dynamic proxies are frequently used by popular application frameworks. We can improve start-up time by generating these proxies ahead of time. 69 - This feature is enabled by the new VM flag `-XX:+ArchiveDynamicProxies`. 70 71 - **Ahead-of-time generation of reflection data**: Reflection data (such as instances of 72 `java.lang.reflect.Method`) are generated by the JVM to support `java.lang.reflect` operations. We can 73 generate these ahead of time to improve start-up. 74 - This feature is enabled by the new VM flag `-XX:+ArchiveReflectionData`. 75 76 - **Class Not Found Cache**: Sometimes application frameworks repeatedly try to load classes that do not exist. This optimization allows such failing lookups to be done quickly without repeatedly scanning the class path. 77 - This feature is enabled by the new VM flag `-XX:+ArchiveLoaderLookupCache`. 78 79 The flag `-XX:CacheDataStore` automatically enables the whole bundle 80 of features listed above. This simplifies testing of the whole 81 prototype. If necessary for more detailed testing, each feature can 82 be individually disabled by negating its associated flag. 83 84 The names of all of these VM flags will change in a future EA build as we transition from the old “CDS” terminology to the new “AOT” terminology, as discussed [here](https://openjdk.org/jeps/483#History). 85 86 [CDS]: <https://docs.oracle.com/en/java/javase/22/vm/class-data-sharing.html> 87 88 ## 2. Building the Leyden Repository 89 90 The Leyden Repository can be built in the same way as the main-line JDK repository. 91 Please use the "premain" branch. I.e., [https://github.com/openjdk/leyden/tree/premain](https://github.com/openjdk/leyden/tree/premain). 92 93 For build instructions please see the 94 [online documentation](https://openjdk.org/groups/build/doc/building.html), 95 or either of these files: 96 97 - [doc/building.html](doc/building.html) (html version) 98 - [doc/building.md](doc/building.md) (markdown version) 99 100 See <https://openjdk.org/> for more information about the OpenJDK 101 Community and the JDK and see <https://bugs.openjdk.org> for JDK issue 102 tracking. 103 104 ## 3. Trying out Leyden Features 105 106 The easiest way to try out the Leyden features is to build a JVM from the Leyden repository, and use it with your application with the `-XX:CacheDataStore` flag. 107 108 Here's a small benchmark that uses the JDK's built-in 109 [`JavaCompiler`](https://docs.oracle.com/en/java/javase/21/docs/api/java.compiler/javax/tools/JavaCompiler.html) 110 class to compile some Java source files. This benchmark spends a significant amount of start-up time 111 setting up the classes used by `JavaCompiler`, so it will benefit from the Leyden features. 112 113 First, download [JavacBenchApp.java](https://github.com/iklam/jdk/blob/f95f851aed3d2bf06edabab1e7c24e15f4145d0d/test/hotspot/jtreg/runtime/cds/appcds/applications/JavacBenchApp.java) 114 and compile it into a JAR file. 115 116 (Remember to use the `java` program that you built from the Leyden repository.) 117 118 ``` 119 $ javac JavacBenchApp.java 120 $ jar cvf JavacBenchApp.jar JavacBenchApp*.class 121 added manifest 122 adding: JavacBenchApp$ClassFile.class(in = 1608) (out= 787)(deflated 51%) 123 adding: JavacBenchApp$FileManager.class(in = 2090) (out= 979)(deflated 53%) 124 adding: JavacBenchApp$SourceFile.class(in = 1351) (out= 671)(deflated 50%) 125 adding: JavacBenchApp.class(in = 7571) (out= 3302)(deflated 56%) 126 ``` 127 128 We can run this benchmark without any Leyden features. It takes 893 ms: 129 130 ``` 131 $ java -cp JavacBenchApp.jar JavacBenchApp 50 132 Generated source code for 51 classes and compiled them in 893 ms 133 ``` 134 135 Now, we can perform a _training run_ and create the Leyden cache files. 136 137 <b>Note: Any files `JavacBenchApp.cds*` created by previous tests must 138 be deleted, before new ones are created.</b>: 139 140 ``` 141 $ rm -fv JavacBenchApp.cds* 142 $ java -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50 143 $ ls -l JavacBenchApp.cds* 144 -r--r--r-- 1 iklam iklam 30900224 May 20 19:21 JavacBenchApp.cds 145 -r--r--r-- 1 iklam iklam 16895736 May 20 19:21 JavacBenchApp.cds.code 146 ``` 147 148 Two files are created: 149 150 - `JavacBenchApp.cds`: This file contains classes, heap objects and profiling data harvested from the training run. 151 - `JavacBenchApp.cds.code`: This file contains AOT-compiled methods, optimized for the execution behaviors observed during the training run. 152 (Data in this file will be merged into `JavacBenchApp.cds` in a future release.) 153 154 Now, we can make a _production run_ of the program with the cache files. It finishes in 423 ms, or more than twice as fast as 155 before. 156 157 ``` 158 $ java -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50 159 Generated source code for 51 classes and compiled them in 423 ms 160 ``` 161 162 ### Optional VM Flags 163 164 When you create the file `JavacBenchApp.cds` with the flag `-XX:CacheDataStore`, 165 all of the other options described 166 in the [Overview](#1-overview) section above are enabled by default. This ensures that you can get all the optimizations 167 without specifying them individually. 168 169 For diagnostic purposes, you can selectively disable some of the options: 170 171 - The `-XX:+LoadCachedCode` and `-XX:+ReplayTraining` flags affect only the production run. 172 - All other options affect only the training run. 173 174 For example, you can disable the loading of the AOT code during the production run. Notice that the benchmark now 175 starts more slowly than it did when AOT code was loaded. 176 177 ``` 178 $ java -XX:CacheDataStore=JavacBenchApp.cds -XX:-LoadCachedCode -cp JavacBenchApp.jar JavacBenchApp 50 179 Generated source code for 51 classes and compiled them in 647 ms 180 ``` 181 182 You can also disable AOT compilation in the training run: 183 184 ``` 185 $ rm -fv JavacBenchApp.cds* 186 $ java -XX:CacheDataStore=JavacBenchApp.cds -XX:-StoreCachedCode -cp JavacBenchApp.jar JavacBenchApp 50 187 $ ls -l JavacBenchApp.cds* 188 -r--r--r-- 1 iklam iklam 30277632 May 20 20:05 JavacBenchApp.cds 189 ``` 190 191 Note that the file `JavacBenchApp.cds.code` is no longer created. 192 193 ## 4. Limitations of the Leyden Prototype 194 195 When trying out the Leyden, please pay attention to the following limitations. 196 197 ### The Same Garbage Collector Must be Used between Training and Production Runs 198 199 The CDS archive generated by the Leyden prototype includes machine instructions that are specific to 200 the garbage collector. We recommend that you explicitly specify the same collector during both 201 training and production runs. For example: 202 203 ``` 204 # training run 205 $ rm -fv JavacBenchApp.cds* 206 $ java -XX:CacheDataStore=JavacBenchApp.cds -XX:+UseSerialGC -cp JavacBenchApp.jar JavacBenchApp 50 207 208 # production run 209 $ java -XX:CacheDataStore=JavacBenchApp.cds -XX:+UseSerialGC -cp JavacBenchApp.jar JavacBenchApp 50 210 ``` 211 212 Otherwise, the CDS archive may not be loaded for the production run, leading to suboptimal performance. 213 For example, sometimes you may perform the training run on a large development host, and then use 214 a container to run the application in a small production node. In the following scenario, as the collector 215 is not explicitly specified, the VM will automatically pick G1 for the training run, and SerialGC for the 216 production run (due to its limited amount of memory): 217 218 ``` 219 # training run (uses G1 by default) 220 $ rm -fv JavacBenchApp.cds* 221 $ java -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50 222 223 # production run (uses SerialGC) 224 $ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \ 225 --memory=1024m \ 226 container-registry.oracle.com/java/openjdk \ 227 bash -c 'cd /test; /jdk/bin/java -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50' 228 [0.001s][error][cds] CDS archive has preloaded classes. It cannot be used because GC used during dump time (G1) 229 is not the same as runtime (Serial) 230 [0.001s][error][cds] An error has occurred while processing the shared archive file. 231 [0.001s][error][cds] Unable to map shared spaces 232 Error occurred during initialization of VM 233 Unable to use shared archive. 234 ``` 235 ### Only G1GC, SerialGC, ParallelGC, EpsilonGC are Supported 236 237 Currently, if you use any other garbage collector in combination with `-XX:CacheDataStore`, the VM will 238 exit with an error. 239 240 ``` 241 $ java -XX:+UseZGC -XX:CacheDataStore=foo --version 242 Error occurred during initialization of VM 243 Cannot create the CacheDataStore: UseCompressedClassPointers must be enabled, and collector 244 must be G1, Parallel, Serial, or Epsilon 245 ``` 246 247 248 ### -Xshare:on is Enabled by default 249 250 As seen in the example immediately above, in the production run, if the CDS archive cannot be 251 used for any reason, the JVM will report an error and exit. This happens as if `-Xshare:on` was 252 specified in the command-line. 253 254 In the standard JDK, when the CDS archive cannot be used for any reason (for example, the 255 archive was created for a different version of the JDK), the application will 256 continue to run without using CDS. 257 This fall-back strategy ensures that the application will function correctly, though at a lower level of performance. 258 259 With the Leyden prototype, we have changed this fall-back behavior to make it easier to diagnose 260 performance issues. For example, when the start-up time is not as good as one would expect, we 261 want know whether it's caused by a misconfiguration that prevents the CDS archive 262 from being used, or it's caused by a deficiency in the implementation of the Leyden optimizations. 263 264 To revert to the behavior of the standard JDK, you can explicitly add `-Xshare:auto` to the command-line. 265 266 ``` 267 $ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \ 268 --memory=1024m \ 269 container-registry.oracle.com/java/openjdk \ 270 bash -c 'cd /test; /jdk/bin/java -Xshare:auto -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50' 271 [0.001s][error][cds] CDS archive has preloaded classes. It cannot be used because GC used during dump time (G1) 272 is not the same as runtime (Serial) 273 Generated source code for 51 classes and compiled them in 831 ms 274 ``` 275 276 See [here](https://docs.oracle.com/en/java/javase/21/vm/class-data-sharing.html) for a discussion of `-Xshare:on` vs `-Xshare:auto`. 277 278 279 ## 5. Benchmarking 280 281 We use a small set of benchmarks to demonstrate the performance of the optimizations in the Leyden repo. 282 283 | Benchmark | Source | 284 | ------------- | ------------- | 285 |[helidon-quickstart-se](test/hotspot/jtreg/premain/helidon-quickstart-se) | https://helidon.io/docs/v4/se/guides/quickstart| 286 |[micronaut-first-app](test/hotspot/jtreg/premain/micronaut-first-app) | https://guides.micronaut.io/latest/creating-your-first-micronaut-app-maven-java.html| 287 |[quarkus-getting-started](test/hotspot/jtreg/premain/quarkus-getting-started) | https://quarkus.io/guides/getting-started| 288 |[spring-boot-getting-started](test/hotspot/jtreg/premain/spring-boot-getting-started) | https://spring.io/guides/gs/spring-boot| 289 |[spring-petclinic](test/hotspot/jtreg/premain/spring-petclinic) | https://github.com/spring-projects/spring-petclinic| 290 291 *(FIXME: add a benchmark for javac)* 292 293 ### Benchmarking Against JDK Main-line 294 295 To can compare the performance of Leyden vs the main-line JDK, you need: 296 297 - An official build of JDK 21 298 - An up-to-date build of the JDK main-line 299 - The latest Leyden build 300 - Maven (ideally 3.8 or later, as required by some of the demos). Note: if you are behind 301 a firewall, you may need to [set up proxies for Maven](https://maven.apache.org/guides/mini/guide-proxies.html) 302 303 The same steps are used for benchmarking all of the above demos. For example: 304 305 ``` 306 $ cd helidon-quickstart-se 307 $ make PREMAIN_HOME=/repos/leyden/build/linux-x64/images/jdk \ 308 MAINLINE_HOME=/repos/jdk/build/linux-x64/images/jdk \ 309 BLDJDK_HOME=/usr/local/jdk21 \ 310 bench 311 run,mainline default,mainline custom static CDS,premain custom static CDS only,premain CDS + AOT 312 1,398,244,144,107 313 2,387,247,142,108 314 3,428,238,143,107 315 4,391,252,142,111 316 5,417,247,141,107 317 6,390,239,139,127 318 7,387,247,145,111 319 8,387,240,147,110 320 9,388,242,147,108 321 10,400,242,167,108 322 Geomean,397.08,243.76,145.52,110.26 323 Stdev,13.55,4.19,7.50,5.73 324 Markdown snippets in mainline_vs_premain.md 325 ``` 326 327 The above command runs each configuration 10 times, in an interleaving order. This way 328 the noise of the system (background processes, thermo throttling, etc) is more likely to 329 be spread across the different runs. 330 331 As is typical for benchmarking start-up performance, the numbers are not very steady. 332 It is best to plot 333 the results (as saved in the file `mainline_vs_premain.csv`) in a spreadsheet to check for 334 noise and other artifacts. 335 336 The "make bench" target also generates GitHub markdown snippets (in the file `mainline_vs_premain.md`) for creating the 337 graphs below. 338 339 ### Benchmarking Between Two Leyden Builds 340 341 This is useful for Leyden developers to measure the benefits of a particular optimization. 342 The steps are similar to above, but we use the "make compare_premain_builds" target: 343 344 ``` 345 $ cd helidon-quickstart-se 346 $ make PM_OLD=/repos/leyden_old/build/linux-x64/images/jdk \ 347 PM_NEW=/repos/leyden_new/build/linux-x64/images/jdk \ 348 BLDJDK_HOME=/usr/local/jdk21 \ 349 compare_premain_builds 350 Old build = /repos/leyden_old/build/linux-x64/images/jdk with options 351 New build = /repos/leyden_new/build/linux-x64/images/jdk with options 352 Run,Old CDS + AOT,New CDS + AOT 353 1,110,109 354 2,131,111 355 3,118,115 356 4,110,108 357 5,117,110 358 6,114,109 359 7,110,109 360 8,118,110 361 9,110,110 362 10,113,114 363 Geomean,114.94,110.48 364 Stdev,6.19,2.16 365 Markdown snippets in compare_premain_builds.md 366 ``` 367 368 Please see [test/hotspot/jtreg/premain/lib/Bench.gmk](test/hotspot/jtreg/premain/lib/Bench.gmk) for more details. 369 370 Note: due to the variability of start-up time, the benefit of minor improvements may 371 be difficult to measure. 372 373 ### Preliminary Benchmark Results 374 375 The following charts show the relative start-up performance of the Leyden/Premain branch vs 376 the JDK main-line. 377 378 For example, a number of "premain CDS + AOT : 291" indicates that if the application takes 379 1000 ms to start-up with the JDK main-line, it takes only 291 ms to start up when all the 380 current set of Leyden optimizations for CDS and AOT are enabled. 381 382 The benchmark results are collected with `make bench` in the following directories: 383 384 - `helidon-quickstart-se` 385 - `micronaut-first-app` 386 - `quarkus-getting-started` 387 - `spring-petclinic` 388 389 These JDK versions were used in the comparisons: 390 391 - JDK main-line: https://github.com/openjdk/jdk/commit/70944ca54ad0090c734bb5b3082beb33450c4877 392 - Leyden: https://github.com/openjdk/leyden/commit/9fa972214934d30f67db5fd4d1b8007636ac1428 393 394 The benchmarks were executed on an 8-core Intel i7-10700 CPU @ 2.90GHz with 32GB RAM running Ubuntu 22.04.3 LTS. 395 396 ### Helidon Quick Start (SE) Demo (3.44x improvement) 397 398 ```mermaid 399 --- 400 config: 401 xyChart: 402 chartOrientation: horizontal 403 height: 300 404 --- 405 xychart-beta 406 x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"] 407 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000 408 bar [1000, 632, 376, 291] 409 ``` 410 411 ### Micronaut First App Demo (2.83x improvement) 412 413 ```mermaid 414 --- 415 config: 416 xyChart: 417 chartOrientation: horizontal 418 height: 300 419 --- 420 xychart-beta 421 x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"] 422 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000 423 bar [1000, 558, 410, 353] 424 ``` 425 426 ### Quarkus Getting Started Demo (3.15x improvement) 427 428 ```mermaid 429 --- 430 config: 431 xyChart: 432 chartOrientation: horizontal 433 height: 300 434 --- 435 xychart-beta 436 x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"] 437 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000 438 bar [1000, 568, 395, 317] 439 ``` 440 441 ### Spring-boot Getting Started Demo (3.53x improvement) 442 443 ```mermaid 444 --- 445 config: 446 xyChart: 447 chartOrientation: horizontal 448 height: 300 449 --- 450 xychart-beta 451 x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"] 452 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000 453 bar [1000, 560, 394, 283] 454 ``` 455 456 ### Spring PetClinic Demo (2.72x improvement) 457 458 ```mermaid 459 --- 460 config: 461 xyChart: 462 chartOrientation: horizontal 463 height: 300 464 --- 465 xychart-beta 466 x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"] 467 y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000 468 bar [1000, 695, 563, 368] 469 ``` 470 471 ## 6. More Documentation 472 473 Please see [test/hotspot/jtreg/premain/](test/hotspot/jtreg/premain) for more information.