New hat/docs/hat-03-programming-model.md

  1 
  2 # HAT's Programming Model
  3 ----
  4 
  5 * [Contents](hat-00.md)
  6 * House Keeping
  7     * [Project Layout](hat-01-01-project-layout.md)
  8     * [Building Babylon](hat-01-02-building-babylon.md)
  9     * [Building HAT](hat-01-03-building-hat.md)
 10 * Programming Model
 11     * [Programming Model](hat-03-programming-model.md)
 12 * Interface Mapping
 13     * [Interface Mapping Overview](hat-04-01-interface-mapping.md)
 14     * [Cascade Interface Mapping](hat-04-02-cascade-interface-mapping.md)
 15 * Implementation Detail
 16     * [Walkthrough Of Accelerator.compute()](hat-accelerator-compute.md)
 17 
 18 ---
 19 
 20 #  HAT's Programming model
 21 
 22 Let's consider a trivial opencl kernel which squares each element in an int buffer
 23 
 24 ```java
 25 int square(int value){
 26     return value*value;
 27 }
 28 
 29 __kernel void squareKernel( __global int* s32Array){
 30     int value = s32Array[get_global_id(0)];
 31     s32Array[get_global_id(0)]=square(value);
 32     return;
 33 }
 34 
 35 ```
 36 
 37 We implement this in HAT by collecting the kernel(s) and compute method(s) in a `Compute` class.
 38 
 39 ```java
 40 public class SquareCompute {
 41     @CodeReflection
 42     public static int square(int v) {
 43         return v * v;
 44     }
 45 
 46     @CodeReflection
 47     public static void squareKernel(KernelContext kc, S32Array s32Array) {
 48         int value = s32Array.array(kc.x);     // arr[cc.x]
 49         s32Array.array(kc.x, square(value));  // arr[cc.x]=value*value
 50     }
 51 
 52     @CodeReflection
 53     public static void square(ComputeContext cc, S32Array s32Array) {
 54         cc.dispatchKernel(s32Array.length(),
 55                 kc -> squareKernel(kc, s32Array)
 56         );
 57     }
 58 }
 59 ```
 60 And we dispatch by creating the appropriate data buffer and then asking an `Accelerator` (bound to a typical vendor backend) to execute the compute method.. which in turn coordinates the dispatch of the various kernels.
 61 
 62 ```java
 63   // Create an accelerator bound to a particular backend
 64 
 65   var accelerator = new Accelerator(
 66       java.lang.invoke.MethodHandles.lookup(),
 67       Backend.FIRST  // Predicate<Backend>
 68   );
 69 
 70   // Ask the accelerator/backend to allocate an S32Array
 71   var s32Array = S32Array.create(accelerator, 32);
 72 
 73   // Fill it with data
 74   for (int i = 0; i < s32Array.length(); i++) {
 75       s32Array.array(i, i);
 76   }
 77 
 78   // Tell the accelerator to execute the square() compute entrypoint
 79 
 80   accelerator.compute(
 81      cc -> SquareCompute.square(cc, s32Array)
 82   );
 83 
 84   // Check the data
 85   for (int i = 0; i < arr.length(); i++) {
 86       System.out.println(i + " " + arr.array(i));
 87   }
 88 ```
 89 
 90 ## Programming model notes
 91 
 92 The most important concept here is that we separate `normal java` code,
 93 from `compute` code from `kernel` code
 94 
 95 We must not assume that Compute or Kernel code are ever executed by the JVM
 96 
 97 ### Kernel Code (kernel entrypoints and kernel reachable methods)
 98 Kernel's and any kernel reachable methods will naturally be restricted to subset of Java.
 99 
100 * No exceptions (no exceptions! :) )
101 * No heap access (no `new`)
102 * No access to static or instance fields from this or any other classes )
103     * Except `final static primitives` (which generally get constant pooled)
104     * Except fields of `KernelContext` (thread identity `.x`, `.maxX`, `.groups`... )
105         - We may even decide to access these via methods (`.x()`);
106 * The only methods that can be called are either :-
107    * Kernel reachable methods
108       - Technically you can call a kernel entrypoint, but must pass your KernelContext
109    * `ifaceMappedSegment` accessor/mutators (see later)
110    * Calls on `KernelContext` (backend kernel features)
111      - `KernelContext.barrier()`
112      - `kernelContext.I32.hypot(x,y)`
113 #### Kernel Entrypoints
114 * Declared `@CodeReflection static public void`
115     * Later we may allow reductions to return data...
116 * Parameters
117     * 0 is always a `KernelContext` (KernelContext2D, KernelContext3D logically follow)
118     * 1..n are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments`
119 
120 #### Kernel Reachable Methods
121 * Declared `@CodeReflection static public`
122 * All Parameters are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments`
123 
124 ### Compute Code (Compute entry points and compute reachable methods)
125 Code within the `compute entrypoint` and `compute reachable
126 methods` have much fewer Java restrictions than kernels but generally...
127 
128 * Exceptions are discouraged
129 * Java Synchronization is discouraged
130 * Don't assume any allocation of local `ifaceMappedSegmants` are allocated
131 * Java accesses/mutations to `ifaceMappedSegment` will likely impact performance
132 * Code should ideally just contain simple plyTable flow and kernel dispatches.
133 * Data movements (to and from backend) will automatically be derived from plyTable flow and `ifaceMappedSegment` accesses
134    - We hope to never have to add `cc.moveToDevice(hatBuffer)`
135 * All methods reachable from a `compute entrypoint` are either :-
136   * Compute Reachable Methods
137       - Technically methods can be compute reachable and kernel reachable.
138   * `ifaceMappedSegment` accessor/mutators (see later)
139   * Calls on the `ComputeContext` to generate ranges, or dispatch kernels.
140 
141 #### Compute Entry Points
142 * Declared `@CodeReflection static public void`
143 * Parameter 0 is `ComputeContext`
144 
145 
146 #### Compute Reachable Methods
147 * Declared `@CodeReflection static public `