1 # Proposed Leyden Terminal Stage Workflow
2
3 This is a new propsed workflow for the "terminal stage" of the [Leyden
4 condenser pipeline](https://openjdk.org/projects/leyden/notes/03-toward-condensers)
5
6 - The CDS and AOT caches are automatically generated with a single `java` command.
7
8 - The caches are stored in the file specified by the `-XX:CacheDataStore=<app>.cds` option
9 - The implementation is still a work in progress. AOT integration is not done yet.
10 - As an intermediate step, the AOT cache may be stored in a separate file.
11
12 - The `-XX:CacheDataStore` option is intended to be a replacement for the existing
13 `-XX:SharedArchiveFile` option.
14
15 - We no longer need a separate "training run". Instead, the `-XX:CacheDataStore=<app>.cds`
16 option should be added to the command-line of the production run of your application. For example
17
18 ```
19 java -Xlog:cds -XX:CacheDataStore=javac.cds com.sun.tools.javac.Main ~/tmp/HelloWorld.java
20 ```
21
22 - If the specified file doesn't exist, it will be created automatically when the JVM process exits:
23
24 - The loaded classes and their compiler profile are dumped into a temporary file with a `.preimage`
25 prefix. E.g., `javac.cds.preimage`
26 - A JVM subprocess is launched to convert `javac.cds.preimage` to the final CDS image, `javac.cds`
27 - See the end of `MetaspaceShared::preload_and_dump_impl()` in
28 [metaspaceShared.cpp](../../../../../src/hotspot/share/cds/metaspaceShared.cpp)
29
30 - In the next run of your application, the `javac.cds` file will be automatically loaded at start-up. Your
31 application will see the benefit of CDS (and soon, AOT).
32
33
34 - By default, the following VM options are used when `-XX:CacheDataStore=<app>.cds` is specified. This way, you
35 can automatically use all the Leyden-premain optimizations without specifying any extra flags.
36
37 - `AOTRecordTraining` is set to `true` when the VM is *writing* the `<app>.cds.preimage` file.
38 - `AOTRecordTraining`, `AOTReplayTraining` and `StoreCachedCode` are set to `true` when the VM is *writing* the final CDS image file.
39 - `AOTReplayTraining` and `LoadCachedCode` are set to `true` when the VM is *loading* the final CDS image file.
40 - `CachedCodeFile` is set to `<app>.cds.code`.
41
42 However, you can explicitly disable some of these flags for diagnostic purposes. For example, the
43 following command-line will automatically generate `app.cds` and `app.cds.code` on its first run. However, it will
44 only load `app.cds` on subsequent runs, but not `app.cds.code`.
45
46
47 ```
48 java -XX:CacheDataStore=app.cds -XX:-LoadCachedCode -cp app.jar MyApp
49
50 ```
51
52 - See [run.sh](run.sh) in this directory for an example of using `-XX:CacheDataStore=<app>.cds`
53
54 ## Notes
55
56 - For applications that do not exit automatically, you may need to hand-craft a training like this, so you
57 app exits voluntarily, to allow the subprocess to be launched to complete the generation of `app.cds`.
58
59 ```
60 rm -f app.cds
61 java -XX:CacheDataStore=app.cds -cp app.jar MyApp -exit-after-start
62 ```
63
64 - In the future, we may add a `jcmd` option to connect to a long running JVM and trigger the creation of
65 the CacheDataStore.
66
67 - By default, the subprocess is automatically forked at JVM exit. For debugging purpose, you can use the
68 `-XX:+CDSManualFinalImage` option to disable the automatic forking. This allows you to debug the the
69 subprocess more easily.
70 - When `-XX:+CDSManualFinalImage` is specified, the JVM will create only the `<app>.cds.preimage`
71 file at exit. It will then print out a command-line that you can execute manually to create the
72 final `<app>.cds` file.
73
74 ## AOT Code Generation
75
76 AOT support is not fully implemented yet. As of Sep 18, 2023, at the end of `MetaspaceShared::preload_and_dump()`,
77 the compiler will be executed to compile a single method, `String::charAt`. The nmethod will be stored inside the
78 `CachedCodeFile`.
79
80 The intended design is to, at this point, compile all methods that were recorded in the traing data during the
81 training run. This is TBD.
82
83 ## Benchmark
84
85 (Sep 11, 2023)
86
87
88 - Without `-XX:CacheDataStore`
89
90 ```
91 $ perf stat -r 20 java com.sun.tools.javac.Main HelloWorld.java
92
93 Performance counter stats for 'java com.sun.tools.javac.Main HelloWorld.java' (20 runs):
94
95 643.10 msec task-clock # 2.374 CPUs utilized ( +- 0.24% )
96 4,318 context-switches # 6.800 K/sec ( +- 1.84% )
97 29 cpu-migrations # 45.666 /sec ( +- 5.89% )
98 15,003 page-faults # 23.625 K/sec ( +- 0.20% )
99 2,936,972,438 cycles # 4.625 GHz ( +- 0.24% )
100 3,262,915,553 instructions # 1.12 insn per cycle ( +- 0.10% )
101 644,286,520 branches # 1.015 G/sec ( +- 0.11% )
102 29,099,407 branch-misses # 4.57% of all branches ( +- 0.15% )
103
104 0.27091 +- 0.00107 seconds time elapsed ( +- 0.40% )
105 ```
106
107 - With `-XX:CacheDataStore` (note: AOT is not yet supported)
108
109 ```
110 $ perf stat -r 20 java -XX:+AOTReplayTraining -XX:CacheDataStore=javac.cds com.sun.tools.javac.Main HelloWorld.java
111
112 Performance counter stats for 'java -XX:+AOTReplayTraining -XX:CacheDataStore=javac.cds com.sun.tools.javac.Main HelloWorld.java' (20 runs):
113
114 234.72 msec task-clock # 2.165 CPUs utilized ( +- 0.29% )
115 1,839 context-switches # 7.735 K/sec ( +- 1.22% )
116 14 cpu-migrations # 58.883 /sec ( +- 4.13% )
117 9,003 page-faults # 37.866 K/sec ( +- 0.22% )
118 1,070,819,957 cycles # 4.504 GHz ( +- 0.30% )
119 1,170,776,369 instructions # 1.08 insn per cycle ( +- 0.35% )
120 229,314,097 branches # 964.471 M/sec ( +- 0.36% )
121 9,544,981 branch-misses # 4.09% of all branches ( +- 0.38% )
122
123 0.108406 +- 0.000844 seconds time elapsed ( +- 0.78% )
124 ```