1 # What happens when we call accelerator.compute(lambda)
  2 
  3 ----
  4 
  5 * [Contents](hat-00.md)
  6 * House Keeping
  7     * [Project Layout](hat-01-01-project-layout.md)
  8     * [Building Babylon](hat-01-02-building-babylon.md)
  9     * [Building HAT](hat-01-03-building-hat.md)
 10 * Programming Model
 11     * [Programming Model](hat-03-programming-model.md)
 12 * Interface Mapping
 13     * [Interface Mapping Overview](hat-04-01-interface-mapping.md)
 14     * [Cascade Interface Mapping](hat-04-02-cascade-interface-mapping.md)
 15 * Implementation Detail
 16     * [Walkthrough Of Accelerator.compute()](hat-accelerator-compute.md)
 17 
 18 ----
 19 
 20 # What happens when we call accelerator.compute(lambda)
 21 
 22 # Back to our Squares example.
 23 
 24 So what is going on here?
 25 
 26 ```java
 27   accelerator.compute(
 28      cc -> SquareCompute.square(cc, s32Array)
 29   );
 30 ```
 31 
 32 Recall we have two types of code in our compute class. We have kernels (and kernel reachable methods) and we have
 33 compute entrypoints (and compute reachable methods).
 34 
 35 ```java
 36 public class SquareCompute{
 37     @CodeReflection public static int square(int v) {
 38         return  v * v;
 39     }
 40 
 41     @CodeReflection public static void squareKernel(KernelContext kc, S32Array s32Array) {
 42         int value = s32Array.array(kc.x);     // arr[cc.x]
 43         s32Array.array(kc.x, square(value));  // arr[cc.x]=value*value
 44     }
 45 
 46     @CodeReflection public static void square(ComputeContext cc, S32Array s32Array) {
 47         cc.dispatchKernel(s32Array.length(),
 48                 kc -> squareKernel(kc, s32Array)
 49         );
 50     }
 51 }
 52 ```
 53 
 54 AGAIN.... NOTE that we cannot just call the compute entrypoint or the kernel directly.
 55 
 56 ```java
 57   SquareCompute.square(????, s32Array);  // We can't do this!!!!
 58 ```
 59 
 60 We purposely make it inconvenient (ComputeContext and KernelContext construction is embedded in the framwork) to
 61 mistakenly call the compute entrypoint directly.  Doing so is akin to calling `Thread.run()` directly, rather than
 62 calling `Thread.start()` on a class extending `Thread` and providing an implementation of `Thread.run()`
 63 
 64 Instead we use this pattern
 65 
 66 ```java
 67   accelerator.compute(
 68      cc -> SquareCompute.square(cc, s32Array)
 69   );
 70 ```
 71 
 72 We pass a lambda to `accelerator.compute()` which is used to determine which compute method to invoke.
 73 
 74 ```
 75  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
 76                           Context                 Java     C++     Vendor
 77 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 78 |    |   |           |   |       |   |       |   |    |   |   |   |      |
 79 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 80     +--------> accelerator.compute(lambda)
 81 
 82 ```
 83 
 84 Incidently, this lambda is never executed by Java JVM ;) instead, the accelerator uses Babylon's Code Reflection
 85 capabilities to extract the model of this lambda to determine the compute entrypoint and it's captured args.
 86 
 87 ```
 88  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
 89                           Context                 Java     C++     Vendor
 90 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 91 |    |   |           |   |       |   |       |   |    |   |   |   |      |
 92 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 93     +--------> accelerator.compute( cc -> SquareCompute.square(cc, s32Array) )
 94                 ------------------------->
 95                     getModelOf(lambda)
 96                 <------------------------
 97 ```
 98 
 99 This model describes the call that we want the accelerator to
100 execute or interpret (`SquareCompute.square()`) and the args that were captured from the call site (the `s32Array` buffer).
101 
102 The accelerator uses Babylon again to get the
103 code model of `SquareCompute.square()` builds a ComputeReachableGraph with this method at the root.
104 So the accelerator walks the code model and collects the methods (and code models) of all methods
105 reachable from the entrypoint.
106 
107 In our trivial case, the ComputeReachableGraph has a single root node representing the `SquareCompute.square()`.
108 
109 ```
110  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
111                           Context                 Java     C++     Vendor
112 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
113 |    |   |           |   |       |   |       |   |    |   |   |   |      |
114 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
115     +--------> accelerator.compute( cc -> SquareCompute.square(cc, s32Array) )
116                 ------------------------->
117                      getModelOf(lambda)
118                 <------------------------
119                 ------------------------->
120                      getModelOf(SquareCompute.square())
121                 <-------------------------
122           forEachReachable method in SquareCompute.square() {
123                 ------------------------->
124                      getModelOf(method)
125                 <------------------------
126                 add to ComputeReachableGraph
127           }
128 ```
129 
130 The Accelertor then walks through the ComputeReachableGraph to determine which kernels are referenced..
131 
132 For each kernel we extract the kernels entrypoint (again as a Babylon
133 Code Model) and create a KernelReachableGraph for each kernel.  Again by starting
134 at the kernel entrypoint and closing over all reachable methods (and Code Models).
135 
136 We combine the compute and kernel reachable graphs and create an place them in a  `ComputeContext`.
137 
138 This is the first arg that is 'seemingly' passed to the Compute class. Remember the compute
139 entrypoint is just a model of the code we expect to
140 execute. It may never be executed by the JVM.
141 
142 ```
143  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
144                           Context                 Java     C++     Vendor
145 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
146 |    |   |           |   |       |   |       |   |    |   |   |   |      |
147 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
148 
149           forEachReachable kernel in ComputeReachableGraph {
150                 ------------------------->
151                       getModelOf(kernel)
152                 <------------------------
153                 add to KernelReachableGraph
154           }
155           ComputeContext = {ComputeReachableGraph + KernelReachableGraph}
156 
157 ```
158 
159 The accelerator passes the ComputeContext to backend (`computeContextHandoff()`), which will typically take
160 the opportunity to inspect/mutate the compute and kernel models and possibly build backend specific representations of
161 kernels and compile them.
162 
163 The ComputeContext and the captured args are then passed to the backend for execution.
164 
165 ```
166  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
167                           Context                 Java     C++     Vendor
168 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
169 |    |   |           |   |       |   |       |   |    |   |   |   |      |
170 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
171 
172 
173                 ----------------------------------->
174                     computeContextHandoff(computeContext)
175                                                     ------->
176                                                              ------->
177                                                          compileKernels()
178                                                              <------
179                                                       mutateComputeModels
180                                                     <-------
181                     dispatchCompute(computeContext, args)
182                                                     ------->
183                                                         dispatchCompute(...)
184                                                             --------->
185                                                                {
186                                                                dispatchKernel()
187                                                                ...
188                                                                }
189                                                             <--------
190                                                     <------
191                 <----------------------------------
192 
193 ```
194 
195 ----
196 ### Notes
197 
198 In reality. The Accelerator receives a `QuotableComputeContextConsumer`
199 
200 ```java
201    public interface QuotableComputeContextConsumer
202         extends Quotable,
203         Consumer<ComputeContext> {
204     }
205 ```
206 Here is how we extract the 'target' from such a lambda
207 
208 ```java
209  public void  compute(QuotableComputeContextConsumer qccc) {
210    Quoted quoted = qccc.quoted();
211    LambdaOpWrapper lambda = OpTools.wrap((CoreOps.LambdaOp)quoted.op());
212 
213    Method method = lambda.getQuotableComputeContextTargetMethod();
214 
215    // Get from the cache or create a compute context which closes over compute entryppint
216    // and reachable kernels.
217    // The models of all compute and kernel methods are passed to the backend during creation
218    // The backend may well mutate the models.
219    // It will also use this opportunity to generate ISA specific code for the kernels.
220 
221    ComputeContext = this.cache.computeIfAbsent(method, (_) ->
222            new ComputeContext(this/*Accelerator*/, method)
223    );
224 
225    // Here we get the captured args from the Quotable and 'jam' in the computeContext in slot[0]
226    Object[] args = lambda.getQuotableComputeContextArgs(quoted, method, computeContext);
227    this.compute(computeContext, args);
228 }
229 ```