1 # HAT's Programming Model
2
3 ----
4
5 * [Contents](hat-00.md)
6 * House Keeping
7 * [Project Layout](hat-01-01-project-layout.md)
8 * [Building Babylon](hat-01-02-building-babylon.md)
9 * [Building HAT](hat-01-03-building-hat.md)
10 * [Enabling the CUDA Backend](hat-01-05-building-hat-for-cuda.md)
11 * Programming Model
12 * [Programming Model](hat-03-programming-model.md)
13 * Interface Mapping
14 * [Interface Mapping Overview](hat-04-01-interface-mapping.md)
15 * [Cascade Interface Mapping](hat-04-02-cascade-interface-mapping.md)
16 * Implementation Detail
17 * [Walkthrough Of Accelerator.compute()](hat-accelerator-compute.md)
18 * [How we minimize buffer transfers](hat-minimizing-buffer-transfers.md)
19
20 ----
21
22 # HAT's Programming model
23
24 Let's consider a trivial opencl kernel which squares each element in an int buffer
25
26 ```java
27 int square(int value){
28 return value*value;
29 }
30
31 __kernel void squareKernel( __global int* s32Array){
32 int value = s32Array[get_global_id(0)];
33 s32Array[get_global_id(0)]=square(value);
34 return;
35 }
36
37 ```
38
39 We implement this in HAT by collecting the kernel(s) and compute method(s) in a `Compute` class.
40
41 ```java
42 public class SquareCompute {
43 @CodeReflection
44 public static int square(int v) {
45 return v * v;
46 }
47
48 @CodeReflection
49 public static void squareKernel(KernelContext kc, S32Array s32Array) {
50 int value = s32Array.array(kc.x); // arr[cc.x]
51 s32Array.array(kc.x, square(value)); // arr[cc.x]=value*value
52 }
53
54 @CodeReflection
55 public static void square(ComputeContext cc, S32Array s32Array) {
56 cc.dispatchKernel(s32Array.length(),
57 kc -> squareKernel(kc, s32Array)
58 );
59 }
60 }
61 ```
62 And we dispatch by creating the appropriate data buffer and then asking an `Accelerator` (bound to a typical vendor backend) to execute the compute method.. which in turn coordinates the dispatch of the various kernels.
63
64 ```java
65 // Create an accelerator bound to a particular backend
66
67 var accelerator = new Accelerator(
68 java.lang.invoke.MethodHandles.lookup(),
69 Backend.FIRST // Predicate<Backend>
70 );
71
72 // Ask the accelerator/backend to allocate an S32Array
73 var s32Array = S32Array.create(accelerator, 32);
74
75 // Fill it with data
76 for (int i = 0; i < s32Array.length(); i++) {
77 s32Array.array(i, i);
78 }
79
80 // Tell the accelerator to execute the square() compute entrypoint
81
82 accelerator.compute(
83 cc -> SquareCompute.square(cc, s32Array)
84 );
85
86 // Check the data
87 for (int i = 0; i < arr.length(); i++) {
88 System.out.println(i + " " + arr.array(i));
89 }
90 ```
91
92 ## Programming model notes
93
94 The most important concept here is that we separate `normal java` code,
95 from `compute` code from `kernel` code
96
97 We must not assume that Compute or Kernel code are ever executed by the JVM
98
99 ### Kernel Code (kernel entrypoints and kernel reachable methods)
100 Kernel's and any kernel reachable methods will naturally be restricted to subset of Java.
101
102 * No exceptions (no exceptions! :) )
103 * No heap access (no `new`)
104 * No access to static or instance fields from this or any other classes )
105 * Except `final static primitives` (which generally get constant pooled)
106 * Except fields of `KernelContext` (thread identity `.x`, `.maxX`, `.groups`... )
107 - We may even decide to access these via methods (`.x()`);
108 * The only methods that can be called are either :-
109 * Kernel reachable methods
110 - Technically you can call a kernel entrypoint, but must pass your KernelContext
111 * `ifaceMappedSegment` accessor/mutators (see later)
112 * Calls on `KernelContext` (backend kernel features)
113 - `KernelContext.barrier()`
114 - `kernelContext.I32.hypot(x,y)`
115 #### Kernel Entrypoints
116 * Declared `@CodeReflection static public void`
117 * Later we may allow reductions to return data...
118 * Parameters
119 * 0 is always a `KernelContext` (KernelContext2D, KernelContext3D logically follow)
120 * 1..n are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments`
121
122 #### Kernel Reachable Methods
123 * Declared `@CodeReflection static public`
124 * All Parameters are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments`
125
126 ### Compute Code (Compute entry points and compute reachable methods)
127 Code within the `compute entrypoint` and `compute reachable
128 methods` have much fewer Java restrictions than kernels but generally...
129
130 * Exceptions are discouraged
131 * Java Synchronization is discouraged
132 * Don't assume any allocation of local `ifaceMappedSegmants` are allocated
133 * Java accesses/mutations to `ifaceMappedSegment` will likely impact performance
134 * Code should ideally just contain simple plyTable flow and kernel dispatches.
135 * Data movements (to and from backend) will automatically be derived from plyTable flow and `ifaceMappedSegment` accesses
136 - We hope to never have to add `cc.moveToDevice(hatBuffer)`
137 * All methods reachable from a `compute entrypoint` are either :-
138 * Compute Reachable Methods
139 - Technically methods can be compute reachable and kernel reachable.
140 * `ifaceMappedSegment` accessor/mutators (see later)
141 * Calls on the `ComputeContext` to generate ranges, or dispatch kernels.
142
143 #### Compute Entry Points
144 * Declared `@CodeReflection static public void`
145 * Parameter 0 is `ComputeContext`
146
147
148 #### Compute Reachable Methods
149 * Declared `@CodeReflection static public `