diff a/hat/docs/ProgrammingModel/programming-model.md b/hat/docs/ProgrammingModel/programming-model.md
--- /dev/null
+++ b/hat/docs/ProgrammingModel/programming-model.md
@@ -0,0 +1,130 @@
+# HAT's Programming Model
+[Back to Index ../](../index.md)
+
+#  HAT's Programming model
+
+Let's consider a trivial opencl kernel which squares each element in an int buffer
+
+```c
+int square(int value){
+    return value*value;
+}
+
+__kernel void squareKernel( __global int* s32Array){
+    int value = s32Array[get_global_id(0)];
+    s32Array[get_global_id(0)]=square(value);
+    return;
+}
+
+```
+
+We implement this in HAT by collecting the kernel(s) and compute method(s) in a `Compute` class.
+
+```java
+public class SquareCompute {
+    @Reflect
+    public static int square(int v) {
+        return v * v;
+    }
+
+    @Reflect
+    public static void squareKernel(KernelContext kc, S32Array s32Array) {
+        int value = s32Array.array(kc.x);     // arr[cc.x]
+        s32Array.array(kc.x, square(value));  // arr[cc.x]=value*value
+    }
+
+    @Reflect
+    public static void square(ComputeContext cc, S32Array s32Array) {
+        cc.dispatchKernel(s32Array.length(),
+                kc -> squareKernel(kc, s32Array)
+        );
+    }
+}
+```
+And we dispatch by creating the appropriate data buffer and then asking an `Accelerator` (bound to a typical vendor backend) to execute the compute method.. which in turn coordinates the dispatch of the various kernels.
+
+```java
+  // Create an accelerator bound to a particular backend
+
+  var accelerator = new Accelerator(
+      MethodHandles.lookup(), Backend.FIRST  // Predicate<Backend>
+  );
+
+  // Ask the accelerator/backend to allocate an S32Array
+  var s32Array = S32Array.create(accelerator, 32);
+
+  // Fill it with data
+  for (int i = 0; i < s32Array.length(); i++) {
+      s32Array.array(i, i);
+  }
+
+  // Tell the accelerator to execute the square() compute entrypoint
+
+  accelerator.compute(
+     cc -> SquareCompute.square(cc, s32Array)
+  );
+
+  // Check the data
+  for (int i = 0; i < arr.length(); i++) {
+      System.out.println(i + " " + arr.array(i));
+  }
+```
+
+## Programming model notes
+
+The most important concept here is that we separate `normal java` code,
+from `compute` code from `kernel` code
+
+We must not assume that Compute or Kernel code are ever executed by the JVM
+
+### Kernel Code (kernel entrypoints and kernel reachable methods)
+Kernel's and any kernel reachable methods will naturally be restricted to subset of Java.
+
+* No exceptions (no exceptions! :) )
+* No heap access (no `new`)
+* No access to static or instance fields from this or any other classes )
+    * Except `final static primitives` (which generally get constant pooled)
+    * Except fields of `KernelContext` (thread identity `.x`, `.maxX`, `.groups`... )
+        - We may even decide to access these via methods (`.x()`);
+* The only methods that can be called are either :-
+   * Kernel reachable methods
+      - Technically you can call a kernel entrypoint, but must pass your KernelContext
+   * `ifaceMappedSegment` accessor/mutators (see later)
+   * Calls on `KernelContext` (backend kernel features)
+     - `KernelContext.barrier()`
+     - `kernelContext.I32.hypot(x,y)`
+#### Kernel Entrypoints
+* Declared `@Reflect static public void`
+    * Later we may allow reductions to return data...
+* Parameters
+    * 0 is always a `KernelContext` (KernelContext2D, KernelContext3D logically follow)
+    * 1..n are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments`
+
+#### Kernel Reachable Methods
+* Declared `@Reflect static public`
+* All Parameters are restricted to uniform primitive values and Panama FFM `ifaceMappedSegments`
+
+### Compute Code (Compute entry points and compute reachable methods)
+Code within the `compute entrypoint` and `compute reachable
+methods` have much fewer Java restrictions than kernels but generally...
+
+* Exceptions are discouraged
+* Java Synchronization is discouraged
+* Don't assume any allocation of local `ifaceMappedSegmants` are allocated
+* Java accesses/mutations to `ifaceMappedSegment` will likely impact performance
+* Code should ideally just contain simple plyTable flow and kernel dispatches.
+* Data movements (to and from backend) will automatically be derived from plyTable flow and `ifaceMappedSegment` accesses
+   - We hope to never have to add `cc.moveToDevice(hatBuffer)`
+* All methods reachable from a `compute entrypoint` are either :-
+  * Compute Reachable Methods
+      - Technically methods can be compute reachable and kernel reachable.
+  * `ifaceMappedSegment` accessor/mutators (see later)
+  * Calls on the `ComputeContext` to generate ranges, or dispatch kernels.
+
+#### Compute Entry Points
+* Declared `@Reflect static public void`
+* Parameter 0 is `ComputeContext`
+
+
+#### Compute Reachable Methods
+* Declared `@Reflect static public `