1 # Welcome to the Leyden Prototype Repository!
  2 
  3 The purpose of the Leyden repository is to prototype improvements in 
  4 startup time, in time to peak performance, and in footprint of Java programs, as a part of 
  5 [Project Leyden](https://openjdk.org/projects/leyden). We would like to solicit feedback from
  6 the Java community, with the hope that some of these improvements can be eventually
  7 incoporated in future Java releases.
  8 
  9 ## 0. Disclaimers
 10 
 11 - *This repository contains experimental and unstable code. It is not intended to be used
 12    in a production environment.*
 13 - *This repository is intended for developers of the JDK, and advanced Java developers who
 14    are familiar with building the JDK.*
 15 - *The experimental features in this repository may be changed or removed without notice.
 16    Command line flags and workflows are likely to change.*
 17 - *The benchmarks results reported on this page are for illustrative purposes only. Your
 18    applications may get better or worse results.*
 19 
 20 ## 1. Overview
 21 
 22 The Leyden "[premain](https://github.com/openjdk/leyden/blob/premain/)" prototype
 23 includes many optimizations that shift work from run time to earlier
 24 experimental executions of the application, which are
 25 called <i>training runs</i>. In a training run, we pre-compute various kinds of information.
 26 Importantly, we pre-compile
 27 bytecode to native code, guided by observations of the application's actual behavior
 28 during the training run.
 29 
 30 The Leyden repository is closely tracking the JDK main-line development. We are typically only a few weeks behind
 31 the [main-line JDK repo](https://github.com/openjdk/jdk).
 32 
 33 We have implemented the following improvements over the JDK main-line:
 34 
 35 - <b>[Unified Cache Data Storage (JDK-8320264)](https://openjdk.org/jeps/8320264)</b>:
 36   This enhancement to [CDS] is foundational to the other features.
 37   - It enables [CDS] to store not only class metadata and heap objects (as before),
 38   but also profiling data and compiled code.
 39   - This feature is accessed with the new VM flag `-XX:CacheDataStore`.
 40   - This option simplifies the creation of the CDS archive, and also the testing
 41   of all the prototype features listed here.
 42 - <b>[Loaded Classes in CDS Archives (JDK-8315737)](https://openjdk.org/jeps/8315737)</b>:
 43   This gives
 44   the JVM the ability to put classes in the <i>loaded</i> state as soon the application starts up. As a result,
 45   we can implement many other time shifting optimizations with considerably simplified assumptions.
 46   - This feature is accessed with the new VM flag `-XX:+PreloadSharedClasses`.
 47   (Note that this flag will be renamed when JDK-8315737
 48     is integrated into the JDK main-line).
 49 - <b>[Method Profiles in CDS Archives (JDK-8325147)](https://openjdk.org/jeps/8325147)</b>: We store method profiles
 50   from training runs in the CDS archive, thereby enabling the JIT to begin compiling earlier during warmup.
 51   As a result, Java application can reach peak performance faster.
 52   - This feature is enabled by the new VM flags `-XX:+RecordTraining` and `-XX:+ReplayTraining`.
 53 - <b>Ahead-of-time resolution of constant pool entries</b>: the new VM flags `-XX:+ArchiveFieldReferences`,
 54   `-XX:+ArchiveMethodReferences` and `-XX:+ArchiveInvokeDynamic` makes it possible to resolve many
 55   constant pool entries during the training run. This allows the application to start up faster. Also,
 56   the existence of resolved constant pool entries allows the AOT compiler to generate better code.
 57 - <b>Ahead-of-time compilation of Java methods</b>: Methods that are frequently used during the training run can be
 58   compiled and stored along with the CDS archive. As a result, as soon as the application starts up
 59   in the production run, its methods can be can be natively executed.
 60   - This feature is enabled by the new VM flags `-XX:+StoreCachedCode`, `-XX:+LoadCachedCode`, and `-XX:CachedCodeFile`.
 61   - Currently, the native code is stored in a separate file, but our plans is to eventually store the native code
 62     inside the CDS archive file.
 63 - <b>Ahead-of-time generation of [Dynamic Proxies](https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/reflect/Proxy.html)</b>:
 64   Dynamic proxies are frequently used by popular application frameworks. We can improve start-up time by generating these proxies ahead of time.
 65   - This feature is enabled by the new VM flag `-XX:+ArchiveDynamicProxies`.
 66 - <b>Ahead-of-time generation of reflection data</b>: Reflection data (such as instances of
 67   `java.lang.reflect.Method`) are generated by the JVM to support `java.lang.reflect` operations. We can
 68   generate these ahead of time to improve start-up.
 69   - This feature is enabled by the new VM flag `-XX:+ArchiveReflectionData`.
 70 - <b>Class Loader Lookup Cache</b>: Sometimes application frameworks may perform many repeated lookups of classes by name (with `Class.forName()`,
 71   etc.). This optimization allows such lookups to be done quickly without repeatedly scanning the classpath.
 72   - This feature is enabled by the new VM flag `-XX:+ArchiveLoaderLookupCache`.
 73 
 74 The flag `-XX:CacheDataStore` automatically enables the whole bundle
 75 of features listed above.  This simplifies testing of the whole
 76 prototype.  If necessary for more detailed testing, each feature can
 77 be individually disabled by negating its associated flag.
 78 
 79 [CDS]: <https://docs.oracle.com/en/java/javase/22/vm/class-data-sharing.html>
 80 
 81 ## 2. Building the Leyden Repository
 82 
 83 The Leyden Repository can be built in the same way as the main-line JDK repository.
 84 Please use the "premain" branch. I.e., [https://github.com/openjdk/leyden/tree/premain](https://github.com/openjdk/leyden/tree/premain).
 85 
 86 For build instructions please see the
 87 [online documentation](https://openjdk.org/groups/build/doc/building.html),
 88 or either of these files:
 89 
 90 - [doc/building.html](doc/building.html) (html version)
 91 - [doc/building.md](doc/building.md) (markdown version)
 92 
 93 See <https://openjdk.org/> for more information about the OpenJDK
 94 Community and the JDK and see <https://bugs.openjdk.org> for JDK issue
 95 tracking.
 96 
 97 ## 3. Trying out Leyden Features
 98 
 99 The easiest way to try out the Leyden features is to build a JVM from the Leyden repository, and use it with your application with the `-XX:CacheDataStore` flag.
100 
101 Here's a small benchmark that uses the JDK's built-in
102 [`JavaCompiler`](https://docs.oracle.com/en/java/javase/21/docs/api/java.compiler/javax/tools/JavaCompiler.html)
103 class to compile some Java source files. This benchmark spends a significant amount of start-up time 
104 setting up the classes used by `JavaCompiler`, so it will benefit from the Leyden features.
105 
106 First, download [JavacBenchApp.java](https://github.com/iklam/jdk/blob/f95f851aed3d2bf06edabab1e7c24e15f4145d0d/test/hotspot/jtreg/runtime/cds/appcds/applications/JavacBenchApp.java)
107 and compile it into a JAR file.
108 
109 (Remember to use the `java` program that you built from the Leyden repository.)
110 
111 ```
112 $ javac JavacBenchApp.java
113 $ jar cvf JavacBenchApp.jar JavacBenchApp*.class
114 added manifest
115 adding: JavacBenchApp$ClassFile.class(in = 1608) (out= 787)(deflated 51%)
116 adding: JavacBenchApp$FileManager.class(in = 2090) (out= 979)(deflated 53%)
117 adding: JavacBenchApp$SourceFile.class(in = 1351) (out= 671)(deflated 50%)
118 adding: JavacBenchApp.class(in = 7571) (out= 3302)(deflated 56%)
119 ```
120 
121 We can run this benchmark without any Leyden features. It takes 893 ms:
122 
123 ```
124 $ java -cp JavacBenchApp.jar JavacBenchApp 50
125 Generated source code for 51 classes and compiled them in 893 ms
126 ```
127 
128 Now, we can perform a <b>training run</b> and create the Leyden cache files.
129 
130 <b>Note: Any files `JavacBenchApp.cds*` created by previous tests must
131 be deleted, before new ones are created.</b>:
132 
133 ```
134 $ rm -fv JavacBenchApp.cds*
135 $ java -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50
136 $ ls -l JavacBenchApp.cds*
137 -r--r--r-- 1 iklam iklam 30900224 May 20 19:21 JavacBenchApp.cds
138 -r--r--r-- 1 iklam iklam 16895736 May 20 19:21 JavacBenchApp.cds.code
139 ```
140 
141 Two files are created:
142 
143 - `JavacBenchApp.cds`: This file contains classes, heap objects and profiling data harvested from the training run.
144 - `JavacBenchApp.cds.code`: This file contains AOT-compiled methods, optimized for the execution behaviors observed during the training run.
145   (Data in this file will be merged into `JavacBenchApp.cds` in a future release.)
146 
147 Now, we can make a <b>production run</b> of the program with the cache files. It finishes in 423 ms, or more than twice as fast as
148 before.
149 
150 ```
151 $ java -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50
152 Generated source code for 51 classes and compiled them in 423 ms
153 ```
154 
155 ### Optional VM Flags
156 
157 When you create the file `JavacBenchApp.cds` with the flag `-XX:CacheDataStore`,
158 all of the other options described
159 in the [Overview](#1-overview) section above are enabled by default. This ensures that you can get all the optimizations
160 without specifying them individually.
161 
162 For diagnostic purposes, you can selectively disable some of the options:
163 
164 - The `-XX:+LoadCachedCode` and `-XX:+ReplayTraining` flags affect only the production run.
165 - All other options affect only the training run.
166 
167 For example, you can disable the loading of the AOT code during the production run. Notice that the benchmark now
168 starts more slowly than it did when AOT code was loaded.
169 
170 ```
171 $ java -XX:CacheDataStore=JavacBenchApp.cds -XX:-LoadCachedCode -cp JavacBenchApp.jar JavacBenchApp 50
172 Generated source code for 51 classes and compiled them in 647 ms
173 ```
174 
175 You can also disable AOT compilation in the training run:
176 
177 ```
178 $ rm -fv JavacBenchApp.cds*
179 $ java -XX:CacheDataStore=JavacBenchApp.cds -XX:-StoreCachedCode -cp JavacBenchApp.jar JavacBenchApp 50
180 $ ls -l JavacBenchApp.cds*
181 -r--r--r-- 1 iklam iklam 30277632 May 20 20:05 JavacBenchApp.cds
182 ```
183 
184 Note that the file `JavacBenchApp.cds.code` is no longer created.
185 
186 ## 4. Limitations of the Leyden Prototype
187 
188 When trying out the Leyden, please pay attention to the following limitations.
189 
190 ### The Same Garbage Collector Must be Used between Training and Production Runs
191 
192 The CDS archive generated by the Leyden prototype includes machine instructions that are specific to
193 the garbage collector. We recommend that you explicitly specify the same collector during both
194 training and production runs. For example:
195 
196 ```
197 # training run
198 $ rm -fv JavacBenchApp.cds*
199 $ java -XX:CacheDataStore=JavacBenchApp.cds -XX:+UseSerialGC -cp JavacBenchApp.jar JavacBenchApp 50
200 
201 # production run
202 $ java -XX:CacheDataStore=JavacBenchApp.cds -XX:+UseSerialGC -cp JavacBenchApp.jar JavacBenchApp 50
203 ```
204 
205 Otherwise, the CDS archive may not be loaded for the production run, leading to suboptimal performance.
206 For example, sometimes you may perform the training run on a large development host, and then use
207 a container to run the application in a small production node. In the following scenario, as the collector
208 is not explicitly specified, the VM will automatically pick G1 for the training run, and SerialGC for the
209 production run (due to its limited amount of memory):
210 
211 ```
212 # training run (uses G1 by default)
213 $ rm -fv JavacBenchApp.cds*
214 $ java -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50
215 
216 # production run (uses SerialGC)
217 $ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \
218     --memory=1024m \
219     container-registry.oracle.com/java/openjdk \
220     bash -c 'cd /test; /jdk/bin/java -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50'
221 [0.001s][error][cds] CDS archive has preloaded classes. It cannot be used because GC used during dump time (G1)
222                      is not the same as runtime (Serial)
223 [0.001s][error][cds] An error has occurred while processing the shared archive file.
224 [0.001s][error][cds] Unable to map shared spaces
225 Error occurred during initialization of VM
226 Unable to use shared archive.
227 ```
228 ### Only G1GC, SerialGC and ParallelGC are Supported
229 
230 Currently, if you use any other garbage collector in combination with `-XX:CacheDataStore`, the VM will
231 exit with an error.
232 
233 ```
234 $ java -XX:+UseZGC -XX:CacheDataStore=foo --version
235 Error occurred during initialization of VM
236 Cannot create the CacheDataStore: UseCompressedClassPointers must be enabled, and collector
237 must be G1, Parallel, or Serial
238 ```
239 
240 
241 ### -Xshare:on is Enabled by default
242 
243 As seen in the example immediately above, in the production run, if the CDS archive cannot be
244 used for any reason, the JVM will report an error and exit. This happens as if `-Xshare:on` was
245 specified in the command-line.
246 
247 In the standard JDK, when the CDS archive cannot be used for any reason (for example, the
248 archive was created for a different version of the JDK), the application will
249 continue to run without using CDS.
250 This fall-back strategy ensures that the application will function correctly, though at a lower level of performance.
251 
252 With the Leyden prototype, we have changed this fall-back behavior to make it easier to diagnose
253 performance issues. For example, when the start-up time is not as good as one would expect, we
254 want know whether it's caused by a misconfiguration that prevents the CDS archive
255 from being used, or it's caused by a deficiency in the implementation of the Leyden optimizations.
256 
257 To revert to the behavior of the standard JDK, you can explicitly add `-Xshare:auto` to the command-line.
258 
259 ```
260 $ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \
261     --memory=1024m \
262     container-registry.oracle.com/java/openjdk \
263     bash -c 'cd /test; /jdk/bin/java -Xshare:auto -XX:CacheDataStore=JavacBenchApp.cds -cp JavacBenchApp.jar JavacBenchApp 50'
264 [0.001s][error][cds] CDS archive has preloaded classes. It cannot be used because GC used during dump time (G1)
265                      is not the same as runtime (Serial)
266 Generated source code for 51 classes and compiled them in 831 ms
267 ```
268 
269 See [here](https://docs.oracle.com/en/java/javase/21/vm/class-data-sharing.html) for a discussion of `-Xshare:on` vs  `-Xshare:auto`.
270 
271 
272 ## 5. Benchmarking
273 
274 We use a small set of benchmarks to demonstrate the performance of the optimizations in the Leyden repo.
275 
276 - [helidon-quickstart-se](test/hotspot/jtreg/premain/helidon-quickstart-se): from https://helidon.io/docs/v4/se/guides/quickstart
277 - [micronaut-first-app](test/hotspot/jtreg/premain/micronaut-first-app): from https://guides.micronaut.io/latest/creating-your-first-micronaut-app-maven-java.html
278 - [quarkus-getting-started](test/hotspot/jtreg/premain/quarkus-getting-started): from https://quarkus.io/guides/getting-started
279 - [spring-petclinic](test/hotspot/jtreg/premain/spring-petclinic): from https://github.com/spring-projects/spring-petclinic
280 - *(FIXME: add a benchmark for javac)*
281 
282 ### Benchmarking Against JDK Main-line
283 
284 To can compare the performance of Leyden vs the main-line JDK, you need:
285 
286 - An official build of JDK 21
287 - An up-to-date build of the JDK main-line
288 - The latest Leyden build
289 - Maven (ideally 3.8 or later, as required by some of the demos). Note: if you are behind
290   a firewall, you may need to [set up proxies for Maven](https://maven.apache.org/guides/mini/guide-proxies.html)
291 
292 The same steps are used for benchmarking all of the above demos. For example:
293 
294 ```
295 $ cd helidon-quickstart-se
296 $ make PREMAIN_HOME=/repos/leyden/build/linux-x64/images/jdk \
297        MAINLINE_HOME=/repos/jdk/build/linux-x64/images/jdk \
298        BLDJDK_HOME=/usr/local/jdk21 \
299        bench
300 run,mainline default,mainline custom static CDS,premain custom static CDS only,premain CDS + AOT
301 1,398,244,144,107
302 2,387,247,142,108
303 3,428,238,143,107
304 4,391,252,142,111
305 5,417,247,141,107
306 6,390,239,139,127
307 7,387,247,145,111
308 8,387,240,147,110
309 9,388,242,147,108
310 10,400,242,167,108
311 Geomean,397.08,243.76,145.52,110.26
312 Stdev,13.55,4.19,7.50,5.73
313 Markdown snippets in mainline_vs_premain.md
314 ```
315 
316 The above command runs each configuration 10 times, in an interleaving order. This way
317 the noise of the system (background processes, thermo throttling, etc) is more likely to
318 be spread across the different runs.
319 
320 As is typical for benchmarking start-up performance, the numbers are not very steady.
321 It is best to plot
322 the results (as saved in the file `mainline_vs_premain.csv`) in a spreadsheet to check for
323 noise and other artifacts.
324 
325 The "make bench" target also generates GitHub markdown snippets (in the file `mainline_vs_premain.md`) for creating the
326 graphs below.
327 
328 ### Benchmarking Between Two Leyden Builds
329 
330 This is useful for Leyden developers to measure the benefits of a particular optimization.
331 The steps are similar to above, but we use the "make compare_premain_builds" target:
332 
333 ```
334 $ cd helidon-quickstart-se
335 $ make PM_OLD=/repos/leyden_old/build/linux-x64/images/jdk \
336        PM_NEW=/repos/leyden_new/build/linux-x64/images/jdk \
337        BLDJDK_HOME=/usr/local/jdk21 \
338        compare_premain_builds
339 Old build = /repos/leyden_old/build/linux-x64/images/jdk with options
340 New build = /repos/leyden_new/build/linux-x64/images/jdk with options
341 Run,Old CDS + AOT,New CDS + AOT
342 1,110,109
343 2,131,111
344 3,118,115
345 4,110,108
346 5,117,110
347 6,114,109
348 7,110,109
349 8,118,110
350 9,110,110
351 10,113,114
352 Geomean,114.94,110.48
353 Stdev,6.19,2.16
354 Markdown snippets in compare_premain_builds.md
355 ```
356 
357 Please see [test/hotspot/jtreg/premain/lib/Bench.gmk](test/hotspot/jtreg/premain/lib/Bench.gmk) for more details.
358 
359 Note: due to the variability of start-up time, the benefit of minor improvements may
360 be difficult to measure.
361 
362 ### Preliminary Benchmark Results
363 
364 The following charts show the relative start-up performance of the Leyden/Premain branch vs
365 the JDK main-line.
366 
367 For example, a number of "premain CDS + AOT : 291" indicates that if the application takes
368 1000 ms to start-up with the JDK main-line, it takes only 291 ms to start up when all the
369 current set of Leyden optimizations for CDS and AOT are enabled.
370 
371 The benchmark results are collected with `make bench` in the following directories:
372 
373 - `helidon-quickstart-se`
374 - `micronaut-first-app`
375 - `quarkus-getting-started`
376 - `spring-petclinic`
377 
378 These JDK versions were used in the comparisons:
379 
380 - JDK main-line: https://github.com/openjdk/jdk/commit/70944ca54ad0090c734bb5b3082beb33450c4877
381 - Leyden: https://github.com/openjdk/leyden/commit/9fa972214934d30f67db5fd4d1b8007636ac1428
382 
383 The benchmarks were executed on an 8-core Intel i7-10700 CPU @ 2.90GHz with 32GB RAM running Ubuntu 22.04.3 LTS.
384 
385 ### Helidon Quick Start (SE) Demo (3.44x improvement)
386 
387 ```mermaid
388 ---
389 config:
390     xyChart:
391         chartOrientation: horizontal
392         height: 300
393 ---
394 xychart-beta
395     x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"]
396     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
397     bar [1000, 632, 376, 291]
398 ```
399 
400 ### Micronaut First App Demo (2.83x improvement)
401 
402 ```mermaid
403 ---
404 config:
405     xyChart:
406         chartOrientation: horizontal
407         height: 300
408 ---
409 xychart-beta
410     x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"]
411     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
412     bar [1000, 558, 410, 353]
413 ```
414 
415 ### Quarkus Getting Started Demo (3.15x improvement)
416 
417 ```mermaid
418 ---
419 config:
420     xyChart:
421         chartOrientation: horizontal
422         height: 300
423 ---
424 xychart-beta
425     x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"]
426     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
427     bar [1000, 568, 395, 317]
428 ```
429 
430 ### Spring PetClinic Demo (2.72x improvement)
431 
432 ```mermaid
433 ---
434 config:
435     xyChart:
436         chartOrientation: horizontal
437         height: 300
438 ---
439 xychart-beta
440     x-axis "variant" ["mainline default", "mainline custom static CDS", "premain custom static CDS only", "premain CDS + AOT"]
441     y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
442     bar [1000, 695, 563, 368]
443 ```
444 
445 ## 6. More Documentation
446 
447 Please see [test/hotspot/jtreg/premain/](test/hotspot/jtreg/premain) for more information.