1 # HAT's Programming Model
  2 
  3 ----
  4 * [Contents](hat-00.md)
  5 * Build Babylon and HAT
  6     * [Quick Install](hat-01-quick-install.md)
  7     * [Building Babylon with jtreg](hat-01-02-building-babylon.md)
  8     * [Building HAT with jtreg](hat-01-03-building-hat.md)
  9         * [Enabling the NVIDIA CUDA Backend](hat-01-05-building-hat-for-cuda.md)
 10 * [Testing Framework](hat-02-testing-framework.md)
 11 * [Running Examples](hat-03-examples.md)
 12 * [HAT Programming Model](hat-03-programming-model.md)
 13 * Interface Mapping
 14     * [Interface Mapping Overview](hat-04-01-interface-mapping.md)
 15     * [Cascade Interface Mapping](hat-04-02-cascade-interface-mapping.md)
 16 * Development
 17     * [Project Layout](hat-01-01-project-layout.md)
 18     * [IntelliJ Code Formatter](hat-development.md)
 19 * Implementation Details
 20     * [Walkthrough Of Accelerator.compute()](hat-accelerator-compute.md)
 21     * [How we minimize buffer transfers](hat-minimizing-buffer-transfers.md)
 22 * [Running HAT with Docker on NVIDIA GPUs](hat-07-docker-build-nvidia.md)
 23 ---
 24 
 25 #  HAT's Programming model
 26 
 27 Let's consider a trivial opencl kernel which squares each element in an int buffer
 28 
 29 ```java
 30 int square(int value){
 31     return value*value;
 32 }
 33 
 34 __kernel void squareKernel( __global int* s32Array){
 35     int value = s32Array[get_global_id(0)];
 36     s32Array[get_global_id(0)]=square(value);
 37     return;
 38 }
 39 
 40 ```
 41 
 42 We implement this in HAT by collecting the kernel(s) and compute method(s) in a `Compute` class.
 43 
 44 ```java
 45 public class SquareCompute {
 46     @Reflect
 47     public static int square(int v) {
 48         return v * v;
 49     }
 50 
 51     @Reflect
 52     public static void squareKernel(KernelContext kc, S32Array s32Array) {
 53         int value = s32Array.array(kc.x);     // arr[cc.x]
 54         s32Array.array(kc.x, square(value));  // arr[cc.x]=value*value
 55     }
 56 
 57     @Reflect
 58     public static void square(ComputeContext cc, S32Array s32Array) {
 59         cc.dispatchKernel(s32Array.length(),
 60                 kc -> squareKernel(kc, s32Array)
 61         );
 62     }
 63 }
 64 ```
 65 And we dispatch by creating the appropriate data buffer and then asking an `Accelerator` (bound to a typical vendor backend) to execute the compute method.. which in turn coordinates the dispatch of the various kernels.
 66 
 67 ```java
 68   // Create an accelerator bound to a particular backend
 69 
 70   var accelerator = new Accelerator(
 71       MethodHandles.lookup(), Backend.FIRST  // Predicate<Backend>
 72   );
 73 
 74   // Ask the accelerator/backend to allocate an S32Array
 75   var s32Array = S32Array.create(accelerator, 32);
 76 
 77   // Fill it with data
 78   for (int i = 0; i < s32Array.length(); i++) {
 79       s32Array.array(i, i);
 80   }
 81 
 82   // Tell the accelerator to execute the square() compute entrypoint
 83 
 84   accelerator.compute(
 85      cc -> SquareCompute.square(cc, s32Array)
 86   );
 87 
 88   // Check the data
 89   for (int i = 0; i < arr.length(); i++) {
 90       System.out.println(i + " " + arr.array(i));
 91   }
 92 ```
 93 
 94 ## Programming model notes
 95 
 96 The most important concept here is that we separate `normal java` code,
 97 from `compute` code from `kernel` code
 98 
 99 We must not assume that Compute or Kernel code are ever executed by the JVM
100 
101 ### Kernel Code (kernel entrypoints and kernel reachable methods)
102 Kernel's and any kernel reachable methods will naturally be restricted to subset of Java.
103 
104 * No exceptions (no exceptions! :) )
105 * No heap access (no `new`)
106 * No access to static or instance fields from this or any other classes )
107     * Except `final static primitives` (which generally get constant pooled)
108     * Except fields of `KernelContext` (thread identity `.x`, `.maxX`, `.groups`... )
109         - We may even decide to access these via methods (`.x()`);
110 * The only methods that can be called are either :-
111    * Kernel reachable methods
112       - Technically you can call a kernel entrypoint, but must pass your KernelContext
113    * `ifaceMappedSegment` accessor/mutators (see later)
114    * Calls on `KernelContext` (backend kernel features)
115      - `KernelContext.barrier()`
116      - `kernelContext.I32.hypot(x,y)`
117 #### Kernel Entrypoints
118 * Declared `@Reflect static public void`
119     * Later we may allow reductions to return data...
120 * Parameters
121     * 0 is always a `KernelContext` (KernelContext2D, KernelContext3D logically follow)
122     * 1..n are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments`
123 
124 #### Kernel Reachable Methods
125 * Declared `@Reflect static public`
126 * All Parameters are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments`
127 
128 ### Compute Code (Compute entry points and compute reachable methods)
129 Code within the `compute entrypoint` and `compute reachable
130 methods` have much fewer Java restrictions than kernels but generally...
131 
132 * Exceptions are discouraged
133 * Java Synchronization is discouraged
134 * Don't assume any allocation of local `ifaceMappedSegmants` are allocated
135 * Java accesses/mutations to `ifaceMappedSegment` will likely impact performance
136 * Code should ideally just contain simple plyTable flow and kernel dispatches.
137 * Data movements (to and from backend) will automatically be derived from plyTable flow and `ifaceMappedSegment` accesses
138    - We hope to never have to add `cc.moveToDevice(hatBuffer)`
139 * All methods reachable from a `compute entrypoint` are either :-
140   * Compute Reachable Methods
141       - Technically methods can be compute reachable and kernel reachable.
142   * `ifaceMappedSegment` accessor/mutators (see later)
143   * Calls on the `ComputeContext` to generate ranges, or dispatch kernels.
144 
145 #### Compute Entry Points
146 * Declared `@Reflect static public void`
147 * Parameter 0 is `ComputeContext`
148 
149 
150 #### Compute Reachable Methods
151 * Declared `@Reflect static public `