1 
  2 # HAT's Programming Model
  3 ----
  4 
  5 * [Contents](hat-00.md)
  6 * House Keeping
  7     * [Project Layout](hat-01-01-project-layout.md)
  8     * [Building Babylon](hat-01-02-building-babylon.md)
  9     * [Building HAT](hat-01-03-building-hat.md)
 10 * Programming Model
 11     * [Programming Model](hat-03-programming-model.md)
 12 * Interface Mapping
 13     * [Interface Mapping Overview](hat-04-01-interface-mapping.md)
 14     * [Cascade Interface Mapping](hat-04-02-cascade-interface-mapping.md)
 15 * Implementation Detail
 16     * [Walkthrough Of Accelerator.compute()](hat-accelerator-compute.md)
 17     * [How we minimize buffer transfers](hat-minimizing-buffer-transfers.md)
 18 
 19 ---
 20 
 21 #  HAT's Programming model
 22 
 23 Let's consider a trivial opencl kernel which squares each element in an int buffer
 24 
 25 ```java
 26 int square(int value){
 27     return value*value;
 28 }
 29 
 30 __kernel void squareKernel( __global int* s32Array){
 31     int value = s32Array[get_global_id(0)];
 32     s32Array[get_global_id(0)]=square(value);
 33     return;
 34 }
 35 
 36 ```
 37 
 38 We implement this in HAT by collecting the kernel(s) and compute method(s) in a `Compute` class.
 39 
 40 ```java
 41 public class SquareCompute {
 42     @CodeReflection
 43     public static int square(int v) {
 44         return v * v;
 45     }
 46 
 47     @CodeReflection
 48     public static void squareKernel(KernelContext kc, S32Array s32Array) {
 49         int value = s32Array.array(kc.x);     // arr[cc.x]
 50         s32Array.array(kc.x, square(value));  // arr[cc.x]=value*value
 51     }
 52 
 53     @CodeReflection
 54     public static void square(ComputeContext cc, S32Array s32Array) {
 55         cc.dispatchKernel(s32Array.length(),
 56                 kc -> squareKernel(kc, s32Array)
 57         );
 58     }
 59 }
 60 ```
 61 And we dispatch by creating the appropriate data buffer and then asking an `Accelerator` (bound to a typical vendor backend) to execute the compute method.. which in turn coordinates the dispatch of the various kernels.
 62 
 63 ```java
 64   // Create an accelerator bound to a particular backend
 65 
 66   var accelerator = new Accelerator(
 67       java.lang.invoke.MethodHandles.lookup(),
 68       Backend.FIRST  // Predicate<Backend>
 69   );
 70 
 71   // Ask the accelerator/backend to allocate an S32Array
 72   var s32Array = S32Array.create(accelerator, 32);
 73 
 74   // Fill it with data
 75   for (int i = 0; i < s32Array.length(); i++) {
 76       s32Array.array(i, i);
 77   }
 78 
 79   // Tell the accelerator to execute the square() compute entrypoint
 80 
 81   accelerator.compute(
 82      cc -> SquareCompute.square(cc, s32Array)
 83   );
 84 
 85   // Check the data
 86   for (int i = 0; i < arr.length(); i++) {
 87       System.out.println(i + " " + arr.array(i));
 88   }
 89 ```
 90 
 91 ## Programming model notes
 92 
 93 The most important concept here is that we separate `normal java` code,
 94 from `compute` code from `kernel` code
 95 
 96 We must not assume that Compute or Kernel code are ever executed by the JVM
 97 
 98 ### Kernel Code (kernel entrypoints and kernel reachable methods)
 99 Kernel's and any kernel reachable methods will naturally be restricted to subset of Java.
100 
101 * No exceptions (no exceptions! :) )
102 * No heap access (no `new`)
103 * No access to static or instance fields from this or any other classes )
104     * Except `final static primitives` (which generally get constant pooled)
105     * Except fields of `KernelContext` (thread identity `.x`, `.maxX`, `.groups`... )
106         - We may even decide to access these via methods (`.x()`);
107 * The only methods that can be called are either :-
108    * Kernel reachable methods
109       - Technically you can call a kernel entrypoint, but must pass your KernelContext
110    * `ifaceMappedSegment` accessor/mutators (see later)
111    * Calls on `KernelContext` (backend kernel features)
112      - `KernelContext.barrier()`
113      - `kernelContext.I32.hypot(x,y)`
114 #### Kernel Entrypoints
115 * Declared `@CodeReflection static public void`
116     * Later we may allow reductions to return data...
117 * Parameters
118     * 0 is always a `KernelContext` (KernelContext2D, KernelContext3D logically follow)
119     * 1..n are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments`
120 
121 #### Kernel Reachable Methods
122 * Declared `@CodeReflection static public`
123 * All Parameters are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments`
124 
125 ### Compute Code (Compute entry points and compute reachable methods)
126 Code within the `compute entrypoint` and `compute reachable
127 methods` have much fewer Java restrictions than kernels but generally...
128 
129 * Exceptions are discouraged
130 * Java Synchronization is discouraged
131 * Don't assume any allocation of local `ifaceMappedSegmants` are allocated
132 * Java accesses/mutations to `ifaceMappedSegment` will likely impact performance
133 * Code should ideally just contain simple plyTable flow and kernel dispatches.
134 * Data movements (to and from backend) will automatically be derived from plyTable flow and `ifaceMappedSegment` accesses
135    - We hope to never have to add `cc.moveToDevice(hatBuffer)`
136 * All methods reachable from a `compute entrypoint` are either :-
137   * Compute Reachable Methods
138       - Technically methods can be compute reachable and kernel reachable.
139   * `ifaceMappedSegment` accessor/mutators (see later)
140   * Calls on the `ComputeContext` to generate ranges, or dispatch kernels.
141 
142 #### Compute Entry Points
143 * Declared `@CodeReflection static public void`
144 * Parameter 0 is `ComputeContext`
145 
146 
147 #### Compute Reachable Methods
148 * Declared `@CodeReflection static public `