New hat/docs/ProgrammingModel/programming-model.md

  1 # HAT's Programming Model
  2 [Back to Index ../](../index.md)
  3 
  4 #  HAT's Programming model
  5 
  6 Let's consider a trivial opencl kernel which squares each element in an int buffer
  7 
  8 ```c
  9 int square(int value){
 10     return value*value;
 11 }
 12 
 13 __kernel void squareKernel( __global int* s32Array){
 14     int value = s32Array[get_global_id(0)];
 15     s32Array[get_global_id(0)]=square(value);
 16     return;
 17 }
 18 
 19 ```
 20 
 21 We implement this in HAT by collecting the kernel(s) and compute method(s) in a `Compute` class.
 22 
 23 ```java
 24 public class SquareCompute {
 25     @Reflect
 26     public static int square(int v) {
 27         return v * v;
 28     }
 29 
 30     @Reflect
 31     public static void squareKernel(KernelContext kc, S32Array s32Array) {
 32         int value = s32Array.array(kc.x);     // arr[cc.x]
 33         s32Array.array(kc.x, square(value));  // arr[cc.x]=value*value
 34     }
 35 
 36     @Reflect
 37     public static void square(ComputeContext cc, S32Array s32Array) {
 38         cc.dispatchKernel(s32Array.length(),
 39                 kc -> squareKernel(kc, s32Array)
 40         );
 41     }
 42 }
 43 ```
 44 And we dispatch by creating the appropriate data buffer and then asking an `Accelerator` (bound to a typical vendor backend) to execute the compute method.. which in turn coordinates the dispatch of the various kernels.
 45 
 46 ```java
 47   // Create an accelerator bound to a particular backend
 48 
 49   var accelerator = new Accelerator(
 50       MethodHandles.lookup(), Backend.FIRST  // Predicate<Backend>
 51   );
 52 
 53   // Ask the accelerator/backend to allocate an S32Array
 54   var s32Array = S32Array.create(accelerator, 32);
 55 
 56   // Fill it with data
 57   for (int i = 0; i < s32Array.length(); i++) {
 58       s32Array.array(i, i);
 59   }
 60 
 61   // Tell the accelerator to execute the square() compute entrypoint
 62 
 63   accelerator.compute(
 64      cc -> SquareCompute.square(cc, s32Array)
 65   );
 66 
 67   // Check the data
 68   for (int i = 0; i < arr.length(); i++) {
 69       System.out.println(i + " " + arr.array(i));
 70   }
 71 ```
 72 
 73 ## Programming model notes
 74 
 75 The most important concept here is that we separate `normal java` code,
 76 from `compute` code from `kernel` code
 77 
 78 We must not assume that Compute or Kernel code are ever executed by the JVM
 79 
 80 ### Kernel Code (kernel entrypoints and kernel reachable methods)
 81 Kernel's and any kernel reachable methods will naturally be restricted to subset of Java.
 82 
 83 * No exceptions (no exceptions! :) )
 84 * No heap access (no `new`)
 85 * No access to static or instance fields from this or any other classes )
 86     * Except `final static primitives` (which generally get constant pooled)
 87     * Except fields of `KernelContext` (thread identity `.x`, `.maxX`, `.groups`... )
 88         - We may even decide to access these via methods (`.x()`);
 89 * The only methods that can be called are either :-
 90    * Kernel reachable methods
 91       - Technically you can call a kernel entrypoint, but must pass your KernelContext
 92    * `ifaceMappedSegment` accessor/mutators (see later)
 93    * Calls on `KernelContext` (backend kernel features)
 94      - `KernelContext.barrier()`
 95      - `kernelContext.I32.hypot(x,y)`
 96 #### Kernel Entrypoints
 97 * Declared `@Reflect static public void`
 98     * Later we may allow reductions to return data...
 99 * Parameters
100     * 0 is always a `KernelContext` (KernelContext2D, KernelContext3D logically follow)
101     * 1..n are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments`
102 
103 #### Kernel Reachable Methods
104 * Declared `@Reflect static public`
105 * All Parameters are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments`
106 
107 ### Compute Code (Compute entry points and compute reachable methods)
108 Code within the `compute entrypoint` and `compute reachable
109 methods` have much fewer Java restrictions than kernels but generally...
110 
111 * Exceptions are discouraged
112 * Java Synchronization is discouraged
113 * Don't assume any allocation of local `ifaceMappedSegmants` are allocated
114 * Java accesses/mutations to `ifaceMappedSegment` will likely impact performance
115 * Code should ideally just contain simple plyTable flow and kernel dispatches.
116 * Data movements (to and from backend) will automatically be derived from plyTable flow and `ifaceMappedSegment` accesses
117    - We hope to never have to add `cc.moveToDevice(hatBuffer)`
118 * All methods reachable from a `compute entrypoint` are either :-
119   * Compute Reachable Methods
120       - Technically methods can be compute reachable and kernel reachable.
121   * `ifaceMappedSegment` accessor/mutators (see later)
122   * Calls on the `ComputeContext` to generate ranges, or dispatch kernels.
123 
124 #### Compute Entry Points
125 * Declared `@Reflect static public void`
126 * Parameter 0 is `ComputeContext`
127 
128 
129 #### Compute Reachable Methods
130 * Declared `@Reflect static public `