New hat/docs/Implementation/accelerator-compute.md

  1 # What happens when we call accelerator.compute(lambda)
  2 [Back to Index ../](../index.md)
  3 
  4 # Back to our Squares example.
  5 
  6 So what is going on here?
  7 
  8 ```java
  9   accelerator.compute(
 10      cc -> SquareCompute.square(cc, s32Array)
 11   );
 12 ```
 13 
 14 Recall we have two types of code in our compute class. We have kernels (and kernel reachable methods) and we have
 15 compute entrypoints (and compute reachable methods).
 16 
 17 ```java
 18 public class SquareCompute{
 19     @Reflect public static int square(int v) {
 20         return  v * v;
 21     }
 22 
 23     @Reflect public static void squareKernel(KernelContext kc, S32Array s32Array) {
 24         int value = s32Array.array(kc.x);     // arr[cc.x]
 25         s32Array.array(kc.x, square(value));  // arr[cc.x]=value*value
 26     }
 27 
 28     @Reflect public static void square(ComputeContext cc, S32Array s32Array) {
 29         cc.dispatchKernel(s32Array.length(),
 30                 kc -> squareKernel(kc, s32Array)
 31         );
 32     }
 33 }
 34 ```
 35 
 36 AGAIN.... NOTE that we cannot just call the compute entrypoint or the kernel directly.
 37 
 38 ```java
 39   SquareCompute.square(????, s32Array);  // We can't do this!!!!
 40 ```
 41 
 42 We purposely make it inconvenient (ComputeContext and KernelContext construction is embedded in the framwork) to
 43 mistakenly call the compute entrypoint directly.  Doing so is akin to calling `Thread.run()` directly, rather than
 44 calling `Thread.start()` on a class extending `Thread` and providing an implementation of `Thread.run()`
 45 
 46 Instead we use this pattern
 47 
 48 ```java
 49   accelerator.compute(
 50      cc -> SquareCompute.square(cc, s32Array)
 51   );
 52 ```
 53 
 54 We pass a lambda to `accelerator.compute()` which is used to determine which compute method to invoke.
 55 
 56 ```
 57  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
 58                           Context                 Java     C++     Vendor
 59 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 60 |    |   |           |   |       |   |       |   |    |   |   |   |      |
 61 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 62     +--------> accelerator.compute(lambda)
 63 
 64 ```
 65 
 66 Incidently, this lambda is never executed by Java JVM ;) instead, the accelerator uses Babylon's Code Reflection
 67 capabilities to extract the model of this lambda to determine the compute entrypoint and it's captured args.
 68 
 69 ```
 70  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
 71                           Context                 Java     C++     Vendor
 72 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 73 |    |   |           |   |       |   |       |   |    |   |   |   |      |
 74 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 75     +--------> accelerator.compute( cc -> SquareCompute.square(cc, s32Array) )
 76                 ------------------------->
 77                     getModelOf(lambda)
 78                 <------------------------
 79 ```
 80 
 81 This model describes the call that we want the accelerator to
 82 execute or interpret (`SquareCompute.square()`) and the args that were captured from the call site (the `s32Array` buffer).
 83 
 84 The accelerator uses Babylon again to get the
 85 code model of `SquareCompute.square()` builds a ComputeReachableGraph with this method at the root.
 86 So the accelerator walks the code model and collects the methods (and code models) of all methods
 87 reachable from the entrypoint.
 88 
 89 In our trivial case, the ComputeReachableGraph has a single root node representing the `SquareCompute.square()`.
 90 
 91 ```
 92  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
 93                           Context                 Java     C++     Vendor
 94 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 95 |    |   |           |   |       |   |       |   |    |   |   |   |      |
 96 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 97     +--------> accelerator.compute( cc -> SquareCompute.square(cc, s32Array) )
 98                 ------------------------->
 99                      getModelOf(lambda)
100                 <------------------------
101                 ------------------------->
102                      getModelOf(SquareCompute.square())
103                 <-------------------------
104           forEachReachable method in SquareCompute.square() {
105                 ------------------------->
106                      getModelOf(method)
107                 <------------------------
108                 add to ComputeReachableGraph
109           }
110 ```
111 
112 The Accelertor then walks through the ComputeReachableGraph to determine which kernels are referenced..
113 
114 For each kernel we extract the kernels entrypoint (again as a Babylon
115 Code Model) and create a KernelReachableGraph for each kernel.  Again by starting
116 at the kernel entrypoint and closing over all reachable methods (and Code Models).
117 
118 We combine the compute and kernel reachable graphs and create an place them in a  `ComputeContext`.
119 
120 This is the first arg that is 'seemingly' passed to the Compute class. Remember the compute
121 entrypoint is just a model of the code we expect to
122 execute. It may never be executed by the JVM.
123 
124 ```
125  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
126                           Context                 Java     C++     Vendor
127 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
128 |    |   |           |   |       |   |       |   |    |   |   |   |      |
129 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
130 
131           forEachReachable kernel in ComputeReachableGraph {
132                 ------------------------->
133                       getModelOf(kernel)
134                 <------------------------
135                 add to KernelReachableGraph
136           }
137           ComputeContext = {ComputeReachableGraph + KernelReachableGraph}
138 
139 ```
140 
141 The accelerator passes the ComputeContext to backend (`computeContextHandoff()`), which will typically take
142 the opportunity to inspect/mutate the compute and kernel models and possibly build backend specific representations of
143 kernels and compile them.
144 
145 The ComputeContext and the captured args are then passed to the backend for execution.
146 
147 ```
148  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
149                           Context                 Java     C++     Vendor
150 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
151 |    |   |           |   |       |   |       |   |    |   |   |   |      |
152 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
153 
154 
155                 ----------------------------------->
156                     computeContextHandoff(CLWrapComputeContext)
157                                                     ------->
158                                                              ------->
159                                                          compileKernels()
160                                                              <------
161                                                       mutateComputeModels
162                                                     <-------
163                     dispatchCompute(CLWrapComputeContext, args)
164                                                     ------->
165                                                         dispatchCompute(...)
166                                                             --------->
167                                                                {
168                                                                dispatchKernel()
169                                                                ...
170                                                                }
171                                                             <--------
172                                                     <------
173                 <----------------------------------
174 
175 ```
176 
177 ----
178 ### Notes
179 
180 In reality. The Accelerator receives a `Compute`
181 
182 ```java
183     public interface Compute extends Consumer<ComputeContext> {
184     }
185 ```
186 Here is how we extract the 'target' from such a lambda
187 
188 ```java
189     public void compute(Compute compute) {
190     Quoted<JavaOp.LambdaOp> quoted = Op.ofLambda(compute).orElseThrow();
191     JavaOp.LambdaOp lambda = quoted.op();
192     Method method = getTargetInvoke(this.lookup,lambda, ComputeContext.class).resolveMethodOrThrow();
193     // Create (or get cached) a compute context which closes over compute entrypoint and reachable kernels.
194     // The models of all compute and kernel methods are passed to the backend during creation
195     // The backend may well mutate the models.
196     // It will also use this opportunity to generate ISA specific code for the kernels.
197     ComputeContext computeContext = cache.computeIfAbsent(method, (_) -> new ComputeContext(this, method));
198     // Here we get the captured values from the lambda
199     Object[] args = lambda(lookup,lambda).getQuotedCapturedValues( quoted, method);
200     args[0] = computeContext;
201     // now ask the backend to execute
202     backend.dispatchCompute(computeContext, args);
203 }
204 ```