1 # What happens when we call accelerator.compute(lambda)
  2 
  3 ----
  4 * [Contents](hat-00.md)
  5 * Build Babylon and HAT
  6     * [Quick Install](hat-01-quick-install.md)
  7     * [Building Babylon with jtreg](hat-01-02-building-babylon.md)
  8     * [Building HAT with jtreg](hat-01-03-building-hat.md)
  9         * [Enabling the NVIDIA CUDA Backend](hat-01-05-building-hat-for-cuda.md)
 10 * [Testing Framework](hat-02-testing-framework.md)
 11 * [Running Examples](hat-03-examples.md)
 12 * [HAT Programming Model](hat-03-programming-model.md)
 13 * Interface Mapping
 14     * [Interface Mapping Overview](hat-04-01-interface-mapping.md)
 15     * [Cascade Interface Mapping](hat-04-02-cascade-interface-mapping.md)
 16 * Development
 17     * [Project Layout](hat-01-01-project-layout.md)
 18 * Implementation Details
 19     * [Walkthrough Of Accelerator.compute()](hat-accelerator-compute.md)
 20     * [How we minimize buffer transfers](hat-minimizing-buffer-transfers.md)
 21 * [Running HAT with Docker on NVIDIA GPUs](hat-07-docker-build-nvidia.md)
 22 ---
 23 
 24 # What happens when we call accelerator.compute(lambda)
 25 
 26 # Back to our Squares example.
 27 
 28 So what is going on here?
 29 
 30 ```java
 31   accelerator.compute(
 32      cc -> SquareCompute.square(cc, s32Array)
 33   );
 34 ```
 35 
 36 Recall we have two types of code in our compute class. We have kernels (and kernel reachable methods) and we have
 37 compute entrypoints (and compute reachable methods).
 38 
 39 ```java
 40 public class SquareCompute{
 41     @Reflect public static int square(int v) {
 42         return  v * v;
 43     }
 44 
 45     @Reflect public static void squareKernel(KernelContext kc, S32Array s32Array) {
 46         int value = s32Array.array(kc.x);     // arr[cc.x]
 47         s32Array.array(kc.x, square(value));  // arr[cc.x]=value*value
 48     }
 49 
 50     @Reflect public static void square(ComputeContext cc, S32Array s32Array) {
 51         cc.dispatchKernel(s32Array.length(),
 52                 kc -> squareKernel(kc, s32Array)
 53         );
 54     }
 55 }
 56 ```
 57 
 58 AGAIN.... NOTE that we cannot just call the compute entrypoint or the kernel directly.
 59 
 60 ```java
 61   SquareCompute.square(????, s32Array);  // We can't do this!!!!
 62 ```
 63 
 64 We purposely make it inconvenient (ComputeContext and KernelContext construction is embedded in the framwork) to
 65 mistakenly call the compute entrypoint directly.  Doing so is akin to calling `Thread.run()` directly, rather than
 66 calling `Thread.start()` on a class extending `Thread` and providing an implementation of `Thread.run()`
 67 
 68 Instead we use this pattern
 69 
 70 ```java
 71   accelerator.compute(
 72      cc -> SquareCompute.square(cc, s32Array)
 73   );
 74 ```
 75 
 76 We pass a lambda to `accelerator.compute()` which is used to determine which compute method to invoke.
 77 
 78 ```
 79  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
 80                           Context                 Java     C++     Vendor
 81 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 82 |    |   |           |   |       |   |       |   |    |   |   |   |      |
 83 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 84     +--------> accelerator.compute(lambda)
 85 
 86 ```
 87 
 88 Incidently, this lambda is never executed by Java JVM ;) instead, the accelerator uses Babylon's Code Reflection
 89 capabilities to extract the model of this lambda to determine the compute entrypoint and it's captured args.
 90 
 91 ```
 92  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
 93                           Context                 Java     C++     Vendor
 94 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 95 |    |   |           |   |       |   |       |   |    |   |   |   |      |
 96 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
 97     +--------> accelerator.compute( cc -> SquareCompute.square(cc, s32Array) )
 98                 ------------------------->
 99                     getModelOf(lambda)
100                 <------------------------
101 ```
102 
103 This model describes the call that we want the accelerator to
104 execute or interpret (`SquareCompute.square()`) and the args that were captured from the call site (the `s32Array` buffer).
105 
106 The accelerator uses Babylon again to get the
107 code model of `SquareCompute.square()` builds a ComputeReachableGraph with this method at the root.
108 So the accelerator walks the code model and collects the methods (and code models) of all methods
109 reachable from the entrypoint.
110 
111 In our trivial case, the ComputeReachableGraph has a single root node representing the `SquareCompute.square()`.
112 
113 ```
114  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
115                           Context                 Java     C++     Vendor
116 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
117 |    |   |           |   |       |   |       |   |    |   |   |   |      |
118 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
119     +--------> accelerator.compute( cc -> SquareCompute.square(cc, s32Array) )
120                 ------------------------->
121                      getModelOf(lambda)
122                 <------------------------
123                 ------------------------->
124                      getModelOf(SquareCompute.square())
125                 <-------------------------
126           forEachReachable method in SquareCompute.square() {
127                 ------------------------->
128                      getModelOf(method)
129                 <------------------------
130                 add to ComputeReachableGraph
131           }
132 ```
133 
134 The Accelertor then walks through the ComputeReachableGraph to determine which kernels are referenced..
135 
136 For each kernel we extract the kernels entrypoint (again as a Babylon
137 Code Model) and create a KernelReachableGraph for each kernel.  Again by starting
138 at the kernel entrypoint and closing over all reachable methods (and Code Models).
139 
140 We combine the compute and kernel reachable graphs and create an place them in a  `ComputeContext`.
141 
142 This is the first arg that is 'seemingly' passed to the Compute class. Remember the compute
143 entrypoint is just a model of the code we expect to
144 execute. It may never be executed by the JVM.
145 
146 ```
147  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
148                           Context                 Java     C++     Vendor
149 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
150 |    |   |           |   |       |   |       |   |    |   |   |   |      |
151 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
152 
153           forEachReachable kernel in ComputeReachableGraph {
154                 ------------------------->
155                       getModelOf(kernel)
156                 <------------------------
157                 add to KernelReachableGraph
158           }
159           ComputeContext = {ComputeReachableGraph + KernelReachableGraph}
160 
161 ```
162 
163 The accelerator passes the ComputeContext to backend (`computeContextHandoff()`), which will typically take
164 the opportunity to inspect/mutate the compute and kernel models and possibly build backend specific representations of
165 kernels and compile them.
166 
167 The ComputeContext and the captured args are then passed to the backend for execution.
168 
169 ```
170  User  |  Accelerator  |  Compute  |  Babylon  |        Backend            |
171                           Context                 Java     C++     Vendor
172 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
173 |    |   |           |   |       |   |       |   |    |   |   |   |      |
174 +----+   +-----------+   +-------+   +-------+   +----+   +---+   +------+
175 
176 
177                 ----------------------------------->
178                     computeContextHandoff(CLWrapComputeContext)
179                                                     ------->
180                                                              ------->
181                                                          compileKernels()
182                                                              <------
183                                                       mutateComputeModels
184                                                     <-------
185                     dispatchCompute(CLWrapComputeContext, args)
186                                                     ------->
187                                                         dispatchCompute(...)
188                                                             --------->
189                                                                {
190                                                                dispatchKernel()
191                                                                ...
192                                                                }
193                                                             <--------
194                                                     <------
195                 <----------------------------------
196 
197 ```
198 
199 ----
200 ### Notes
201 
202 In reality. The Accelerator receives a `Compute`
203 
204 ```java
205     public interface Compute extends Consumer<ComputeContext> {
206     }
207 ```
208 Here is how we extract the 'target' from such a lambda
209 
210 ```java
211     public void compute(Compute compute) {
212     Quoted<JavaOp.LambdaOp> quoted = Op.ofLambda(compute).orElseThrow();
213     JavaOp.LambdaOp lambda = quoted.op();
214     Method method = getTargetInvoke(this.lookup,lambda, ComputeContext.class).resolveMethodOrThrow();
215     // Create (or get cached) a compute context which closes over compute entrypoint and reachable kernels.
216     // The models of all compute and kernel methods are passed to the backend during creation
217     // The backend may well mutate the models.
218     // It will also use this opportunity to generate ISA specific code for the kernels.
219     ComputeContext computeContext = cache.computeIfAbsent(method, (_) -> new ComputeContext(this, method));
220     // Here we get the captured values from the lambda
221     Object[] args = lambda(lookup,lambda).getQuotedCapturedValues( quoted, method);
222     args[0] = computeContext;
223     // now ask the backend to execute
224     backend.dispatchCompute(computeContext, args);
225 }
226 ```