1 # Heterogeneous Accelerator Toolkit (HAT)
2
3 [](https://github.com/openjdk/babylon/tree/code-reflection/hat)
4
5 HAT is a toolkit that allows developers to express data-parallel applications in Java, optimize, offload and execute them on hardware accelerators.
6 - **Heterogeneous**: a variety of devices and their corresponding programming languages.
7 - **Accelerator**: GPUs, FPGA, CPUs, etc.
8 - **Toolkit**: a set of libraries for Java developers.
9
10 HAT uses the code reflection API from the [Project Babylon](https://github.com/openjdk/babylon).
11
12 The toolkit offers:
13 - An API for Kernel Programming on Accelerators from Java.
14 - An API for Combining multiple kernels into a compute-graph.
15 - An API for Java object mapping to hardware accelerators using Panama FFM.
16 - An extensible backend system for multiple accelerators:
17 - OpenCL
18 - CUDA
19 - Java
20
21 ## Prerequisites
22
23 - HAT currently requires Babylon JDK, which contains the code reflection APIs.
24 - A base JDK >= 25. We currently use OpenJDK 26 for development.
25 - A GPU SDK (one or more of the SDKs below) to be able to run on GPUs:
26 - An OpenCL implementation (e.g., Intel, Apple Silicon, CUDA SDK)
27 - OpenCL >= 1.2
28 - CUDA SDK >= 12.9
29 - `cmake` >= `3.22.1`
30 - `gcc` >= 12.0, or `clang` >= 17.0
31
32 ## Compatible systems
33
34 We actively develop and run tests on the following systems:
35
36 - Apple Silicon M1-M4
37 - Linux Fedora >= 42
38 - Oracle Linux 10
39 - Ubuntu >= 22.04
40
41 ## Quick Start
42
43 ### 1. Build Babylon JDK
44
45 ```bash
46 git clone https://github.com/openjdk/babylon
47 cd babylon
48 bash configure --with-boot-jdk=${JAVA_HOME}
49 make clean
50 make images
51 ```
52
53 ### 2. Update JAVA_HOME and PATH
54
55 ```bash
56 export JAVA_HOME=<BABYLON-DIR>/build/macosx-aarch64-server-release/images/jdk
57 export PATH=$JAVA_HOME/bin:$PATH
58 ```
59
60 ### 3. Build HAT
61
62 ```bash
63 sdk install jextract #if needed
64 cd hat
65 java @.bld
66 ```
67
68 Done!
69
70 ## Run Examples
71
72 For instance, matrix-multiply:
73
74 ```bash
75 java @.run ffi-opencl matmul --size=1024
76 ```
77
78 Some examples have a GUI implementation:
79
80 ```java
81 java @.run ffi-opencl mandel
82 ```
83
84 Full list of examples:
85 - [link](https://github.com/openjdk/babylon/tree/code-reflection/hat/examples)
86
87
88 ## Run Unit-Tests
89
90 OpenCL backend:
91
92 ```bash
93 java @.test-suite ffi-opencl
94 ```
95
96 CUDA backed:
97
98 ```bash
99 java @.test-suite ffi-cuda
100 ```
101
102 ## Full Example Explained
103
104 The following example compute the square value of an input vector.
105 The example is self-contained and it can be directly run with the `java` command.
106
107 Place the following code in the `hat` directory.
108
109 ```java
110 import hat.*;
111 import hat.Accelerator.Compute;
112 import hat.backend.*;
113 import hat.buffer.*;
114 import optkl.ifacemapper.MappableIface.*;
115 import jdk.incubator.code.Reflect;
116 import java.lang.invoke.MethodHandles;
117
118 public class ExampleHAT {
119
120 // Kernel Code: This is the function to be offloaded to the accelerator (e.g.,
121 // a GPU). The kernel will be executed by many GPU threads, in this case,
122 // as many threads as elements in `array`.
123 // The `kc` object can be used to obtain the thread identifier and map
124 // the data element to process.
125 // HAT kernels follow the SIMT programming model (Single Instruction Multiple Thread)
126 // mode.
127 // Kernel code is reflectable. Thus, the HAT runtime and HAT compiler can build
128 // and optimize the code model. Once the code model is optimized, HAT generates
129 // OpenCL/CUDA C99 code.
130 @Reflect
131 public static void squareKernel(@RO KernelContext kc, @RW S32Array array) {
132 // HAT kernels support a reduced set of Java.
133 // Kernels express the work to be done per thread (GPU/accelerator thread).
134 if (kc.gix < array.length()) {
135 int value = array.array(kc.gix);
136 array.array(kc.gix, (value * value));
137 }
138 }
139
140 // The following method represents the compute layer, in which we specify
141 // the number of threads to be deployed on the accelerator. The number of threads
142 // is specified in an ND-Range. An ND-Range could be 1D, 2D and 3D.
143 // In this example, we launch 1D-range with the number of threads equal to
144 // the input array size.
145 @Reflect
146 public static void square(@RO ComputeContext cc, @RW S32Array array) {
147 var ndRange = NDRange.of1D(array.length());
148
149 // Dispatch the kernel. The HAT runtime will offload the kernels
150 // reached from this point and run the generated GPU kernels on the
151 // target accelerator.
152 // Furthermore, HAT automatically transfers data to the accelerator.
153 // This is a blocking call, and when it returns control to the main
154 // Java thread, results (outputs) are available to be consumed.
155 cc.dispatchKernel(ndRange, kc -> squareKernel(kc, array));
156 }
157
158 static void main(String[] args) {
159 final int size = 4096;
160
161 // Create a new accelerator object
162 var accelerator = new Accelerator(MethodHandles.lookup(), Backend.FIRST);
163
164 // Instantiate an array on the target accelerator.
165 // Data is stored off-heap using the Panama FFM API.
166 var array = S32Array.create(accelerator, size);
167
168 // Data initialization
169 for (int i = 0; i < array.length(); i++) {
170 array.array(i, i);
171 }
172
173 // Offload and dispatch of the compute-graph on the target accelerator.
174 // This is a blocking call. Once this call finalizes, the results (outputs)
175 // will be available to consume by the current Java thread.
176 accelerator.compute((@Reflect Compute) cc -> ExampleHAT.square(cc, array));
177
178 // Test result
179 boolean isCorrect = true;
180 for (int i = 0; i < size; i++) {
181 if (array.array(i) != i * i) {
182 isCorrect = false;
183 }
184 }
185 if (isCorrect) {
186 IO.println("Result is correct");
187 } else {
188 IO.println("Result is NOT correct");
189 }
190 }
191 }
192 ```
193
194 Run this example in the `babylon/hat` directory.
195 If you run from another directory, update the `--class-path` parameter accordingly.
196 Use the `java` version built with the Babylon JDK.
197
198 ```bash
199 java --enable-preview \
200 --add-modules=jdk.incubator.code \
201 --enable-native-access=ALL-UNNAMED \
202 --class-path build/hat-optkl-1.0.jar:build/hat-core-1.0.jar:build/hat-backend-ffi-shared-1.0.jar:build/hat-backend-ffi-opencl-1.0.jar \
203 -Djava.library.path=/Users/juanfumero/repos/babylon/hat/build \
204 ExampleHAT
205 ```
206
207 If you run with `HAT=INFO` you can see which accelerator was used:
208
209 ```bash
210 $ HAT=INFO java --enable-preview ... ExampleHAT.java
211
212 [INFO] Config Bits = 8000
213 [INFO] Platform :"Apple"
214 [INFO] Version :"OpenCL 1.2 (Jan 16 2026 07:22:26)"
215 [INFO] Name :"Apple"
216 [INFO] Device Type : GPU 4
217 [INFO] OpenCLBackend::OpenCLQueue::dispatch
218 [INFO] numDimensions: 1
219 [INFO] GLOBAL [4096,1,1]
220 [INFO] LOCAL [ nullptr ] // The driver will setup a default value
221
222 Result is correct
223 ```
224
225 ## Documentation
226
227 Visit the [docs](docs/) folder.
228
229 ## Contributing
230
231 Contributions are welcome. Please see the [OpenJDK Developers' Guide](https://openjdk.org/guide/).
232
233 ## Development Workflow
234
235 1. Fork the repository
236 2. Create a feature branch: `git checkout -b <branch>`
237 3. Commit with clear messages
238 4. Run formatting and tests:
239 1. For OpenCL: `java @.est-suite ffi-opencl`
240 1. For CUDA: `java @.test-suite ffi-cuda`
241 5. Submit a pull request
242
243
244 ## Contacts/Questions
245
246 You can interact, provide feedback and ask questions using the [babylon-dev](https://mail.openjdk.org/pipermail/babylon-dev/) mailing list.
247