1 # Proposed Leyden Terminal Stage Workflow 2 3 This is a new propsed workflow for the "terminal stage" of the [Leyden 4 condenser pipeline](https://openjdk.org/projects/leyden/notes/03-toward-condensers) 5 6 - The CDS and AOT caches are automatically generated with a single `java` command. 7 8 - The caches are stored in the file specified by the `-XX:CacheDataStore=<app>.cds` option 9 - The implementation is still a work in progress. AOT integration is not done yet. 10 - As an intermediate step, the AOT cache may be stored in a separate file. 11 12 - The `-XX:CacheDataStore` option is intended to be a replacement for the existing 13 `-XX:SharedArchiveFile` option. 14 15 - We no longer need a separate "training run". Instead, the `-XX:CacheDataStore=<app>.cds` 16 option should be added to the command-line of the production run of your application. For example 17 18 ``` 19 java -Xlog:cds -XX:CacheDataStore=javac.cds com.sun.tools.javac.Main ~/tmp/HelloWorld.java 20 ``` 21 22 - If the specified file doesn't exist, it will be created automatically when the JVM process exits: 23 24 - The loaded classes and their compiler profile are dumped into a temporary file with a `.preimage` 25 prefix. E.g., `javac.cds.preimage` 26 - A JVM subprocess is launched to convert `javac.cds.preimage` to the final CDS image, `javac.cds` 27 - See the end of `MetaspaceShared::preload_and_dump_impl()` in 28 [metaspaceShared.cpp](../../../../../src/hotspot/share/cds/metaspaceShared.cpp) 29 30 - In the next run of your application, the `javac.cds` file will be automatically loaded at start-up. Your 31 application will see the benefit of CDS (and soon, AOT). 32 33 34 - By default, the following VM options are used when `-XX:CacheDataStore=<app>.cds` is specified. This way, you 35 can automatically use all the Leyden-premain optimizations without specifying any extra flags. 36 37 - `RecordTraining` is set to `true` when the VM is *writing* the `<app>.cds.preimage` file. 38 - `RecordTraining`, `ReplayTraining` and `StoreCachedCode` are set to `true` when the VM is *writing* the final CDS image file. 39 - `ReplayTraining` and `LoadCachedCode` are set to `true` when the VM is *loading* the final CDS image file. 40 - `CachedCodeFile` is set to `<app>.cds.code`. 41 42 However, you can explicitly disable some of these flags for diagnostic purposes. For example, the 43 following command-line will automatically generate `app.cds` and `app.cds.code` on its first run. However, it will 44 only load `app.cds` on subsequent runs, but not `app.cds.code`. 45 46 47 ``` 48 java -XX:CacheDataStore=app.cds -XX:-LoadCachedCode -cp app.jar MyApp 49 50 ``` 51 52 - See [run.sh](run.sh) in this directory for an example of using `-XX:CacheDataStore=<app>.cds` 53 54 ## Notes 55 56 - For applications that do not exit automatically, you may need to hand-craft a training like this, so you 57 app exits voluntarily, to allow the subprocess to be launched to complete the generation of `app.cds`. 58 59 ``` 60 rm -f app.cds 61 java -XX:CacheDataStore=app.cds -cp app.jar MyApp -exit-after-start 62 ``` 63 64 - In the future, we may add a `jcmd` option to connect to a long running JVM and trigger the creation of 65 the CacheDataStore. 66 67 - By default, the subprocess is automatically forked at JVM exit. For debugging purpose, you can use the 68 `-XX:+CDSManualFinalImage` option to disable the automatic forking. This allows you to debug the the 69 subprocess more easily. 70 - When `-XX:+CDSManualFinalImage` is specified, the JVM will create only the `<app>.cds.preimage` 71 file at exit. It will then print out a command-line that you can execute manually to create the 72 final `<app>.cds` file. 73 74 ## AOT Code Generation 75 76 AOT support is not fully implemented yet. As of Sep 18, 2023, at the end of `MetaspaceShared::preload_and_dump()`, 77 the compiler will be executed to compile a single method, `String::charAt`. The nmethod will be stored inside the 78 `CachedCodeFile`. 79 80 The intended design is to, at this point, compile all methods that were recorded in the traing data during the 81 training run. This is TBD. 82 83 ## Benchmark 84 85 (Sep 11, 2023) 86 87 88 - Without `-XX:CacheDataStore` 89 90 ``` 91 $ perf stat -r 20 java com.sun.tools.javac.Main HelloWorld.java 92 93 Performance counter stats for 'java com.sun.tools.javac.Main HelloWorld.java' (20 runs): 94 95 643.10 msec task-clock # 2.374 CPUs utilized ( +- 0.24% ) 96 4,318 context-switches # 6.800 K/sec ( +- 1.84% ) 97 29 cpu-migrations # 45.666 /sec ( +- 5.89% ) 98 15,003 page-faults # 23.625 K/sec ( +- 0.20% ) 99 2,936,972,438 cycles # 4.625 GHz ( +- 0.24% ) 100 3,262,915,553 instructions # 1.12 insn per cycle ( +- 0.10% ) 101 644,286,520 branches # 1.015 G/sec ( +- 0.11% ) 102 29,099,407 branch-misses # 4.57% of all branches ( +- 0.15% ) 103 104 0.27091 +- 0.00107 seconds time elapsed ( +- 0.40% ) 105 ``` 106 107 - With `-XX:CacheDataStore` (note: AOT is not yet supported) 108 109 ``` 110 $ perf stat -r 20 java -XX:+ReplayTraining -XX:CacheDataStore=javac.cds com.sun.tools.javac.Main HelloWorld.java 111 112 Performance counter stats for 'java -XX:+ReplayTraining -XX:CacheDataStore=javac.cds com.sun.tools.javac.Main HelloWorld.java' (20 runs): 113 114 234.72 msec task-clock # 2.165 CPUs utilized ( +- 0.29% ) 115 1,839 context-switches # 7.735 K/sec ( +- 1.22% ) 116 14 cpu-migrations # 58.883 /sec ( +- 4.13% ) 117 9,003 page-faults # 37.866 K/sec ( +- 0.22% ) 118 1,070,819,957 cycles # 4.504 GHz ( +- 0.30% ) 119 1,170,776,369 instructions # 1.08 insn per cycle ( +- 0.35% ) 120 229,314,097 branches # 964.471 M/sec ( +- 0.36% ) 121 9,544,981 branch-misses # 4.09% of all branches ( +- 0.38% ) 122 123 0.108406 +- 0.000844 seconds time elapsed ( +- 0.78% ) 124 ```