1 # Interface Mapping
2 [Back to Index ../](../index.md)
3
4 ## or ... HAT from a Data POV
5
6 ### Or ... what is this `S32Array` thing and why can't I just pass `int[]` to my kernel
7
8 Again here is the canonical HAT 'hello world' kernel, weill use this to describe itgerface mapping
9
10 We implement this in HAT by collecting the kernel(s) and compute method(s) in a `Compute` class.
11
12 ```java
13 public class SquareCompute {
14 @Reflect
15 public static int square(int v) {
16 return v * v;
17 }
18
19 @Reflect
20 public static void squareKernel(KernelContext kc, S32Array s32Array) {
21 int value = s32Array.array(kc.x); // arr[cc.x]
22 s32Array.array(kc.x, square(value)); // arr[cc.x]=value*value
23 }
24
25 @Reflect
26 public static void square(ComputeContext cc, S32Array s32Array) {
27 cc.dispatchKernel(s32Array.length(),
28 kc -> squareKernel(kc, s32Array)
29 );
30 }
31 }
32 ```
33 Which we dispatch by creating the appropriate data buffer and then asking an `Accelerator` (bound to a typical vendor backend) to execute the compute method.. which in turn coordinates the dispatch of the various kernels.
34
35 ```java
36 // Create an accelerator bound to a particular backend
37
38 var accelerator = new Accelerator(MethodHandles.lookup(), Backend.FIRST);
39
40 // Ask the accelerator/backend to allocate an S32Array
41 var s32Array = S32Array.create(accelerator, 32);
42
43 // Fill it with data
44 for (int i = 0; i < s32Array.length(); i++) {
45 s32Array.array(i, i);
46 }
47
48 // Tell the accelerator to execute the square() compute entrypoint
49
50 accelerator.compute(
51 cc -> SquareCompute.square(cc, s32Array)
52 );
53
54 // Check the data
55 for (int i = 0; i < arr.length(); i++) {
56 System.out.println(i + " " + arr.array(i));
57 }
58 ```
59
60 HAT kernels only accept Java primitives and HAT buffers as parameters.
61
62 We don't directly support heap allocated data (such as int[])
63
64 From Java's point of view `S32Array` is a `hat.Buffer` and is defined as an interface.
65
66 ```java
67 public interface S32Array extends Buffer {
68 int length();
69 void length(int i);
70 int array(long idx);
71 void array(long idx, int i);
72 }
73 ```
74
75 From C99 style OpenCL/CUDA POV this will eventually be mapped to a typedef.
76
77 ```C++
78 typedef struct S32Array_s{
79 int length;
80 int array[]; //<-- ?
81 }S32Array_t;
82 ```
83
84 Our Java implementations should treat the interface as `data`, generally the only
85 methods that we include in a `hat.Buffer` should be
86
87 ```java
88 T name(); //getter for a field called name with type T, where T may be primitive or inner interface)
89 void name(T name); //setter for a field called name with type T, T must be primitive
90 T name(long idx); //get an array element [idx] where array is called name and T is either primitive or inner interface
91 void name(long idx, T name); //set an array element [idx] where array is called name and T is primitive
92 ```
93
94 Algorithms can assume that an interface is 'bound' to 'some' concrete data layout.
95
96 We could for example implement `S32Array` like this.
97
98 ```java
99 class JavaS32Array implements S32Array{
100 int[] arr;
101 int length(){ return arr.length;}
102 int array(long idx) {return arr[idx];}
103 void array(long idx, int value) {arr[idx] = value;}
104 void length(int len) ; // we'll come back to this ;)
105 }
106 ```
107
108 But for HAT to access native memory, allocated by the appropriate backend we need interfaces bound to MemorySegents/
109
110 HAT includes an API which allows us to take an interface which extends `hat.Buffer`, and 'bind' it to a Panama FFM MemorySegment.
111
112 This binding process automatically maps the accessors (for example `length()`, `array(long idx, int v)`) to low level Method and Var handel trickery underlying MemorySegments.
113
114 Conceptually we might imagine that HAT creates something like this
115
116 ```java
117 class PanamaS32Array implements S32Array{
118 MemorySegment segment;
119 final int SIZEOFINT = 4;
120 final long lenOffset = 0;
121 final long arrayOffset = lenOffset+SIZEOFINT;
122 int length(){ return segment.getInt(lenOffset);}
123 int array(long idx) {return segment.getInt(arrayOffset+idx*SIZEOFINT);}
124 void array(long idx, int value) {segment.setInt(arrayOffset+idx*SIZEOFINT,value);}
125 void length(int len) ; // we'll come back to this ;)
126 }
127 ```
128
129 Much like Java's `Proxy` class, the iface mapper creates an implementation of the interface 'on the fly', the new Classfile API is used to 'spin up' the new class and the accessors are are composed using Var/Method Handles and offsets derived from the size and order of fields.
130
131 Sadly an interface is not quite enough to establish exactly what is needed to complete the mapping. We need to tell the `iface mapper` the order and size of fields and possibly some padding information.
132
133 We do this by providing a 'layout description' using Panama's Layout api.
134
135 ```java
136 MemoryLayout s32ArrayLayout = MemoryLayout.structLayout(
137 JAVA_INT.withName("length"),
138 MemoryLayout.sequenceLayout(N, JAVA_INT.withName("length")).withName("array")
139 ).withName(S32Array.getSimpleName());
140 ```
141
142 Eventually we came to a common pattern for describing HAT buffers by adding a `create` method to our interface which hides the mapping detail
143
144 So the complete `S32Array` looks a like this. (....ish)
145
146 ```java
147 public interface S32Array extends Buffer {
148 int length();
149
150 void length(int i);
151
152 int array(long idx);
153
154 void array(long idx, int i);
155
156 S32Array create(Accelerator a, int len) {
157 MemoryLayout s32ArrayLayout = MemoryLayout.structLayout(
158 JAVA_INT.withName("length"),
159 MemoryLayout.sequenceLayout(len, JAVA_INT.withName("length")).withName("array")
160 ).withName(S32Array.getSimpleName());
161
162 S32Array s32Array = a.allocate(
163 SegmentMapper.of(MethodHandles.lookup(), S32Array.class, s32ArrayLayout, len)
164 );
165
166 return s32Array;
167 }
168 }
169 ```
170
171 So now hopefully this code makes more sense.
172
173 ```
174 var s32Array = S32Array.create(accelerator, 32);
175 ```
176
177 Whilst this code is much nicer than hand mapping each method to offsets. It is still quite verbose.
178
179 In the last few weeks we have been migrating to Schema builder which makes this code easier to express..
180
181 ```java
182 public interface S32Array extends Buffer {
183 int length();
184 void length(int i);
185 int array(long idx);
186 void array(long idx, int i);
187 Schema<S32Array> schema = Schema.of(S32Array.class, s->s
188 .arrayLen("length")
189 .array("array")
190 );
191 }
192 ```
193 The schema is embedded inside the interface and defines the order of fields. It also allows us to bind fields to each other (above we are telling the schema we have a `int length` field followed by an `int array[]` field and that the first defines the size of the second), we also can describe useful 'HAT' information for fields. Such as whether a field is 'atomic' ;)
194
195 Here is an example of a table of Results for the face detector.
196
197 ```java
198 public interface ResultTable extends Buffer{
199 interface Result extends Buffer.StructChild {
200 float x();
201 void x(float x);
202 float y();
203 void y(float y);
204 }
205 void count(int count);
206 int count();
207 int length();
208 Result result(long idx);
209
210 Schema<ResultTable> schema = Schema.of(ResultTable.class, s->s
211 .atomic("count")
212 .arrayLen("length")
213 .array("result", r->r
214 .field("x")
215 .field("y")
216 )
217 );
218 }
219 ```
220
221 Which in C99 OpenCL code will manifest as
222
223 ```C++
224 typedef Result_s{
225 int x,y
226 } Result_t;
227
228 typedef ResultTable_s{
229 int count;
230 int length;
231 Result_t result[0];
232 } Result_t;
233 ```
234
235 In our Java code this interface makes access to MemorySegments much cleaner
236
237 ```java
238 ResultTable resultTable = ResultTable.create(acc, 100);
239 for (int i=0; i<resultTable.length(); i++){
240 Result result = resultTable.result(i);
241 result.x(0);
242 result.y(0);
243 }
244 ```
245
246 The generated OpenCL/C99 code from Java kernel code is also quite clean
247
248 We might use a kernel to initialize the location of a bunch of Results
249
250 ```java
251 @Reflect public static void init(KernelContext kc, ResultTable resultTable) {
252 if (kc.x < kc.maxX){
253 Result result = resulTable.result(kc.x);
254 result.x(kc.x);
255 result.y(100);
256 }
257 }
258 ```
259
260 Whose Kernel code will look like this.
261
262 ```
263 typedef struct KernelContext_s{
264 int x;
265 int maxX;
266 }KernelContext_t;
267
268 typedef Result_s{
269 int x,y
270 } Result_t;
271
272 typedef ResultTable_s{
273 int count;
274 int length;
275 Result_t result[0];
276 } Result_t;
277
278 __kernel void init(
279 __global KernelContext_t *empty,
280 __global ResultTable_t* resultTable
281 ){
282 KernelContext_t kernelContext;
283 KernelContext_t *kc = &kernelContext;
284 kc->x=get_global_id(0);
285 kc->maxX = get_global_id(0);
286
287 if(kc->x<kc->maxX){
288 __global Result_t *result = &resultTable[kc->x];
289 result->x = kc->x;
290 }
291 return;
292 }
293 ```
294
295 A few notes from this generated code...
296
297 * `KernelContext` is itself just an iface mapped segment.
298 - But we don't pass `kc.x` o `kc.maxX` in the segment.
299 - Instead initialize using appropriate vendor calls
300
301 So for OpenCL all kernels start like this
302
303 ```
304 __kernel void init(__global KernelContext_t *empty , ....){
305 KernelContext_t kernelContext;
306 KernelContext_t *kc = &kernelContext;
307 kc->x=get_global_id(0);
308 kc->maxX = get_global_id(0);
309 ....
310 }
311 ```
312
313 Whereas CUDA ;)
314
315 ```
316 __kernel void init(__global KernelContext_t *empty , ....){
317 KernelContext_t kernelContext;
318 KernelContext_t *kc = &kernelContext;
319 kc->x=blockIdx.x*blockDim.x+threadIdx.x;
320 kc->maxX =gridDim.x*blockDim.x
321 ....
322 }
323 ```
324
325 This simplifies code gen. Generally the CUDA code and OpenCL code looks identical.
326
327 ----
328
329 The iface mapping code in hat is a modified form of the code hereWe have a copy of Per's segment mapping code from
330
331 https://github.com/minborg/panama-foreign/blob/segment-mapper/src/java.base/share/classes