1 # HAT's Programming Model 2 3 ---- 4 5 * [Contents](hat-00.md) 6 * House Keeping 7 * [Project Layout](hat-01-01-project-layout.md) 8 * [Building Babylon](hat-01-02-building-babylon.md) 9 * [Building HAT](hat-01-03-building-hat.md) 10 * [Enabling the CUDA Backend](hat-01-05-building-hat-for-cuda.md) 11 * Programming Model 12 * [Programming Model](hat-03-programming-model.md) 13 * Interface Mapping 14 * [Interface Mapping Overview](hat-04-01-interface-mapping.md) 15 * [Cascade Interface Mapping](hat-04-02-cascade-interface-mapping.md) 16 * Implementation Detail 17 * [Walkthrough Of Accelerator.compute()](hat-accelerator-compute.md) 18 * [How we minimize buffer transfers](hat-minimizing-buffer-transfers.md) 19 20 ---- 21 22 # HAT's Programming model 23 24 Let's consider a trivial opencl kernel which squares each element in an int buffer 25 26 ```java 27 int square(int value){ 28 return value*value; 29 } 30 31 __kernel void squareKernel( __global int* s32Array){ 32 int value = s32Array[get_global_id(0)]; 33 s32Array[get_global_id(0)]=square(value); 34 return; 35 } 36 37 ``` 38 39 We implement this in HAT by collecting the kernel(s) and compute method(s) in a `Compute` class. 40 41 ```java 42 public class SquareCompute { 43 @CodeReflection 44 public static int square(int v) { 45 return v * v; 46 } 47 48 @CodeReflection 49 public static void squareKernel(KernelContext kc, S32Array s32Array) { 50 int value = s32Array.array(kc.x); // arr[cc.x] 51 s32Array.array(kc.x, square(value)); // arr[cc.x]=value*value 52 } 53 54 @CodeReflection 55 public static void square(ComputeContext cc, S32Array s32Array) { 56 cc.dispatchKernel(s32Array.length(), 57 kc -> squareKernel(kc, s32Array) 58 ); 59 } 60 } 61 ``` 62 And we dispatch by creating the appropriate data buffer and then asking an `Accelerator` (bound to a typical vendor backend) to execute the compute method.. which in turn coordinates the dispatch of the various kernels. 63 64 ```java 65 // Create an accelerator bound to a particular backend 66 67 var accelerator = new Accelerator( 68 java.lang.invoke.MethodHandles.lookup(), 69 Backend.FIRST // Predicate<Backend> 70 ); 71 72 // Ask the accelerator/backend to allocate an S32Array 73 var s32Array = S32Array.create(accelerator, 32); 74 75 // Fill it with data 76 for (int i = 0; i < s32Array.length(); i++) { 77 s32Array.array(i, i); 78 } 79 80 // Tell the accelerator to execute the square() compute entrypoint 81 82 accelerator.compute( 83 cc -> SquareCompute.square(cc, s32Array) 84 ); 85 86 // Check the data 87 for (int i = 0; i < arr.length(); i++) { 88 System.out.println(i + " " + arr.array(i)); 89 } 90 ``` 91 92 ## Programming model notes 93 94 The most important concept here is that we separate `normal java` code, 95 from `compute` code from `kernel` code 96 97 We must not assume that Compute or Kernel code are ever executed by the JVM 98 99 ### Kernel Code (kernel entrypoints and kernel reachable methods) 100 Kernel's and any kernel reachable methods will naturally be restricted to subset of Java. 101 102 * No exceptions (no exceptions! :) ) 103 * No heap access (no `new`) 104 * No access to static or instance fields from this or any other classes ) 105 * Except `final static primitives` (which generally get constant pooled) 106 * Except fields of `KernelContext` (thread identity `.x`, `.maxX`, `.groups`... ) 107 - We may even decide to access these via methods (`.x()`); 108 * The only methods that can be called are either :- 109 * Kernel reachable methods 110 - Technically you can call a kernel entrypoint, but must pass your KernelContext 111 * `ifaceMappedSegment` accessor/mutators (see later) 112 * Calls on `KernelContext` (backend kernel features) 113 - `KernelContext.barrier()` 114 - `kernelContext.I32.hypot(x,y)` 115 #### Kernel Entrypoints 116 * Declared `@CodeReflection static public void` 117 * Later we may allow reductions to return data... 118 * Parameters 119 * 0 is always a `KernelContext` (KernelContext2D, KernelContext3D logically follow) 120 * 1..n are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments` 121 122 #### Kernel Reachable Methods 123 * Declared `@CodeReflection static public` 124 * All Parameters are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments` 125 126 ### Compute Code (Compute entry points and compute reachable methods) 127 Code within the `compute entrypoint` and `compute reachable 128 methods` have much fewer Java restrictions than kernels but generally... 129 130 * Exceptions are discouraged 131 * Java Synchronization is discouraged 132 * Don't assume any allocation of local `ifaceMappedSegmants` are allocated 133 * Java accesses/mutations to `ifaceMappedSegment` will likely impact performance 134 * Code should ideally just contain simple plyTable flow and kernel dispatches. 135 * Data movements (to and from backend) will automatically be derived from plyTable flow and `ifaceMappedSegment` accesses 136 - We hope to never have to add `cc.moveToDevice(hatBuffer)` 137 * All methods reachable from a `compute entrypoint` are either :- 138 * Compute Reachable Methods 139 - Technically methods can be compute reachable and kernel reachable. 140 * `ifaceMappedSegment` accessor/mutators (see later) 141 * Calls on the `ComputeContext` to generate ranges, or dispatch kernels. 142 143 #### Compute Entry Points 144 * Declared `@CodeReflection static public void` 145 * Parameter 0 is `ComputeContext` 146 147 148 #### Compute Reachable Methods 149 * Declared `@CodeReflection static public `