1 
  2 # Interface Mapping
  3 
  4 ----
  5 
  6 * [Contents](hat-00.md)
  7 * House Keeping
  8     * [Project Layout](hat-01-01-project-layout.md)
  9     * [Building Babylon](hat-01-02-building-babylon.md)
 10     * [Maven and CMake](hat-01-03-maven-cmake.md)
 11 * Programming Model
 12     * [Programming Model](hat-03-programming-model.md)
 13 * Interface Mapping
 14     * [Interface Mapping Overview](hat-04-01-interface-mapping.md)
 15     * [Cascade Interface Mapping](hat-04-02-cascade-interface-mapping.md)
 16 * Implementation Detail
 17     * [Walkthrough Of Accelerator.compute()](hat-accelerator-compute.md)
 18 
 19 ----
 20 # Interface Mapping
 21 
 22 ## or ... HAT from a Data POV
 23 
 24 ### Or ... what is this `S32Array` thing and why can't I just pass `int[]` to my kernel
 25 
 26 Again here is the canonical HAT 'hello world' kernel, weill use this to describe itgerface mapping
 27 
 28 We implement this in HAT by collecting the kernel(s) and compute method(s) in a `Compute` class.
 29 
 30 ```java
 31 public class SquareCompute {
 32     @CodeReflection
 33     public static int square(int v) {
 34         return v * v;
 35     }
 36 
 37     @CodeReflection
 38     public static void squareKernel(KernelContext kc, S32Array s32Array) {
 39         int value = s32Array.array(kc.x);     // arr[cc.x]
 40         s32Array.array(kc.x, square(value));  // arr[cc.x]=value*value
 41     }
 42 
 43     @CodeReflection
 44     public static void square(ComputeContext cc, S32Array s32Array) {
 45         cc.dispatchKernel(s32Array.length(),
 46                 kc -> squareKernel(kc, s32Array)
 47         );
 48     }
 49 }
 50 ```
 51 Which we dispatch by creating the appropriate data buffer and then asking an `Accelerator` (bound to a typical vendor backend) to execute the compute method.. which in turn coordinates the dispatch of the various kernels.
 52 
 53 ```java
 54   // Create an accelerator bound to a particular backend
 55 
 56   var accelerator = new Accelerator(
 57       java.lang.invoke.MethodHandles.lookup(),
 58       Backend.FIRST  // Predicate<Backend>
 59   );
 60 
 61   // Ask the accelerator/backend to allocate an S32Array
 62   var s32Array = S32Array.create(accelerator, 32);
 63 
 64   // Fill it with data
 65   for (int i = 0; i < s32Array.length(); i++) {
 66       s32Array.array(i, i);
 67   }
 68 
 69   // Tell the accelerator to execute the square() compute entrypoint
 70 
 71   accelerator.compute(
 72      cc -> SquareCompute.square(cc, s32Array)
 73   );
 74 
 75   // Check the data
 76   for (int i = 0; i < arr.length(); i++) {
 77       System.out.println(i + " " + arr.array(i));
 78   }
 79 ```
 80 
 81 HAT kernels only accept Java primitives and HAT buffers as parameters.
 82 
 83 We don't directly support heap allocated data (such as int[])
 84 
 85 From Java's point of view `S32Array` is a `hat.Buffer` and is defined as an interface.
 86 
 87 ```java
 88 public interface S32Array extends Buffer {
 89     int length();
 90     void length(int i);
 91     int array(long idx);
 92     void array(long idx, int i);
 93 }
 94 ```
 95 
 96 From C99 style OpenCL/CUDA POV this will eventually be mapped to a typedef.
 97 
 98 ```C++
 99 typedef struct S32Array_s{
100     int length;
101     int array[];  //<-- ?
102 }S32Array_t;
103 ```
104 
105 Our Java implementations should treat the interface as `data`, generally the only
106 methods that we include in a `hat.Buffer` should be
107 
108 ```java
109 T name();                    //getter for a field called name with type T, where T may be primitive or inner interface)
110 void name(T name);           //setter for a field called name with type T, T must be  primitive
111 T name(long idx);            //get an array element [idx] where array is called name and T is either primitive or inner interface
112 void name(long idx, T name); //set an array element [idx] where array is called name and T is primitive
113 ```
114 
115 Algorithms can assume that an interface is 'bound' to 'some' concrete data layout.
116 
117 We could for example implement `S32Array` like this.
118 
119 ```java
120 class JavaS32Array implements S32Array{
121      int[] arr;
122      int length(){ return arr.length;}
123      int array(long idx) {return arr[idx];}
124      void array(long idx, int value) {arr[idx] = value;}
125      void length(int len) ; // we'll come back to this ;)
126 }
127 ```
128 
129 But for HAT to access native memory, allocated by the appropriate backend we need interfaces bound to MemorySegents/
130 
131 HAT includes an API which allows us to take an interface which extends `hat.Buffer`, and 'bind' it to a Panama FFM MemorySegment.
132 
133 This binding process automatically maps the accessors (for example `length()`, `array(long idx, int v)`) to low level Method and Var handel trickery underlying MemorySegments.
134 
135 Conceptually we might imagine that HAT creates something like this
136 
137 ```java
138 class PanamaS32Array implements S32Array{
139      MemorySegment segment;
140      final int SIZEOFINT = 4;
141      final long lenOffset = 0;
142      final long arrayOffset = lenOffset+SIZEOFINT;
143      int length(){ return segment.getInt(lenOffset);}
144      int array(long idx) {return segment.getInt(arrayOffset+idx*SIZEOFINT);}
145      void array(long idx, int value) {segment.setInt(arrayOffset+idx*SIZEOFINT,value);}
146      void length(int len) ; // we'll come back to this ;)
147 }
148 ```
149 
150 Much like Java's `Proxy` class, the iface mapper creates an implementation of the interface  'on the fly', the new Classfile API is used to 'spin up' the new class and the accessors are are composed using Var/Method Handles and offsets derived from the size and order of fields.
151 
152 Sadly an interface is not quite enough to establish exactly what is needed to complete the mapping.  We need to tell the `iface mapper` the order and size of fields and possibly some padding information.
153 
154 We do this by providing a 'layout description' using Panama's Layout api.
155 
156 ```java
157 MemoryLayout s32ArrayLayout = MemoryLayout.structLayout(
158         JAVA_INT.withName("length"),
159         MemoryLayout.sequenceLayout(N, JAVA_INT.withName("length")).withName("array")
160 ).withName(S32Array.getSimpleName());
161 ```
162 
163 Eventually we came to a common pattern for describing HAT buffers by adding a `create` method to our interface which hides the mapping detail
164 
165 So the complete `S32Array` looks a like this. (....ish)
166 
167 ```java
168 public interface S32Array extends Buffer {
169     int length();
170 
171     void length(int i);
172 
173     int array(long idx);
174 
175     void array(long idx, int i);
176 
177     S32Array create(Accelerator a, int len) {
178         MemoryLayout s32ArrayLayout = MemoryLayout.structLayout(
179                 JAVA_INT.withName("length"),
180                 MemoryLayout.sequenceLayout(len, JAVA_INT.withName("length")).withName("array")
181         ).withName(S32Array.getSimpleName());
182 
183         S32Array s32Array = a.allocate(
184                 SegmentMapper.of(MethodHandles.lookup(), S32Array.class, s32ArrayLayout, len)
185         );
186 
187         return s32Array;
188     }
189 }
190 ```
191 
192 So now hopefully this code makes more sense.
193 
194 ```
195 var s32Array = S32Array.create(accelerator, 32);
196 ```
197 
198 Whilst this code is much nicer than hand mapping each method to offsets.  It is still quite verbose.
199 
200 In the last few weeks we have been migrating to Schema builder which makes this code easier to express..
201 
202 ```java
203 public interface S32Array extends Buffer {
204     int length();
205     void length(int i);
206     int array(long idx);
207     void array(long idx, int i);
208     Schema<S32Array> schema = Schema.of(S32Array.class, s->s
209         .arrayLen("length")
210         .array("array")
211     );
212 }
213 ```
214 The schema is embedded inside the interface and defines the order of fields. It also allows us to bind fields to each other (above we are telling the schema we have a `int length` field followed by an `int array[]` field and that the first defines the size of the second), we also can describe useful 'HAT' information for fields. Such as whether a field is 'atomic' ;)
215 
216 Here is an example of a table of Results for the face detector.
217 
218 ```java
219 public interface ResultTable extends Buffer{
220     interface Result extends Buffer.StructChild {
221         float x();
222         void x(float x);
223         float y();
224         void y(float y);
225     }
226     void count(int count);
227     int count();
228     int length();
229     Result result(long idx);
230 
231     Schema<ResultTable> schema = Schema.of(ResultTable.class, s->s
232             .atomic("count")
233             .arrayLen("length")
234             .array("result", r->r
235                .field("x")
236                .field("y")
237             )
238     );
239 }
240 ```
241 
242 Which in C99 OpenCL code will manifest as
243 
244 ```C++
245 typedef Result_s{
246    int x,y
247 } Result_t;
248 
249 typedef ResultTable_s{
250    int count;
251    int length;
252    Result_t result[0];
253 } Result_t;
254 ```
255 
256 In our Java code this interface makes access to MemorySegments much cleaner
257 
258 ```java
259     ResultTable resultTable = ResultTable.create(acc, 100);
260     for (int i=0; i<resultTable.length(); i++){
261         Result result = resultTable.result(i);
262         result.x(0);
263         result.y(0);
264     }
265 ```
266 
267 The generated OpenCL/C99 code from Java kernel code is also quite clean
268 
269 We might use a kernel to initialize the location of a bunch of Results
270 
271 ```java
272     @CodeReflection public static void init(KernelContext kc, ResultTable resultTable) {
273         if (kc.x < kc.maxX){
274            Result result = resulTable.result(kc.x);
275            result.x(kc.x);
276            result.y(100);
277         }
278     }
279 ```
280 
281 Whose Kernel code will look like this.
282 
283 ```
284 typedef struct KernelContext_s{
285     int x;
286     int maxX;
287 }KernelContext_t;
288 
289 typedef Result_s{
290    int x,y
291 } Result_t;
292 
293 typedef ResultTable_s{
294    int count;
295    int length;
296    Result_t result[0];
297 } Result_t;
298 
299 __kernel void init(
300     __global KernelContext_t *empty,
301     __global ResultTable_t* resultTable
302 ){
303     KernelContext_t kernelContext;
304     KernelContext_t *kc = &kernelContext;
305     kc->x=get_global_id(0);
306     kc->maxX = get_global_id(0);
307 
308     if(kc->x<kc->maxX){
309         __global Result_t *result = &resultTable[kc->x];
310         result->x = kc->x;
311     }
312     return;
313 }
314 ```
315 
316 A few notes from this generated code...
317 
318 * `KernelContext` is itself just an iface mapped segment.
319     -  But we don't pass `kc.x` o `kc.maxX` in the segment.
320         -  Instead initialize using appropriate  vendor calls
321 
322 So for OpenCL all kernels start like this
323 
324 ```
325 __kernel void init(__global KernelContext_t *empty , ....){
326     KernelContext_t kernelContext;
327     KernelContext_t *kc = &kernelContext;
328     kc->x=get_global_id(0);
329     kc->maxX = get_global_id(0);
330      ....
331 }
332 ```
333 
334 Whereas CUDA ;)
335 
336 ```
337 __kernel void init(__global KernelContext_t *empty , ....){
338     KernelContext_t kernelContext;
339     KernelContext_t *kc = &kernelContext;
340     kc->x=blockIdx.x*blockDim.x+threadIdx.x;
341     kc->maxX =gridDim.x*blockDim.x
342     ....
343 }
344 ```
345 
346 This simplifies code gen. Generally the CUDA code and OpenCL code looks identical.
347 
348 ----
349 
350 The iface mapping code in hat is a modified form of the code hereWe have a copy of Per's segment mapping code from
351 
352 https://github.com/minborg/panama-foreign/blob/segment-mapper/src/java.base/share/classes