1 # What happens when we call accelerator.compute(lambda) 2 3 ---- 4 5 * [Contents](hat-00.md) 6 * House Keeping 7 * [Project Layout](hat-01-01-project-layout.md) 8 * [Building Babylon](hat-01-02-building-babylon.md) 9 * [Building HAT](hat-01-03-building-hat.md) 10 * Programming Model 11 * [Programming Model](hat-03-programming-model.md) 12 * Interface Mapping 13 * [Interface Mapping Overview](hat-04-01-interface-mapping.md) 14 * [Cascade Interface Mapping](hat-04-02-cascade-interface-mapping.md) 15 * Implementation Detail 16 * [Walkthrough Of Accelerator.compute()](hat-accelerator-compute.md) 17 18 ---- 19 20 # What happens when we call accelerator.compute(lambda) 21 22 # Back to our Squares example. 23 24 So what is going on here? 25 26 ```java 27 accelerator.compute( 28 cc -> SquareCompute.square(cc, s32Array) 29 ); 30 ``` 31 32 Recall we have two types of code in our compute class. We have kernels (and kernel reachable methods) and we have 33 compute entrypoints (and compute reachable methods). 34 35 ```java 36 public class SquareCompute{ 37 @CodeReflection public static int square(int v) { 38 return v * v; 39 } 40 41 @CodeReflection public static void squareKernel(KernelContext kc, S32Array s32Array) { 42 int value = s32Array.array(kc.x); // arr[cc.x] 43 s32Array.array(kc.x, square(value)); // arr[cc.x]=value*value 44 } 45 46 @CodeReflection public static void square(ComputeContext cc, S32Array s32Array) { 47 cc.dispatchKernel(s32Array.length(), 48 kc -> squareKernel(kc, s32Array) 49 ); 50 } 51 } 52 ``` 53 54 AGAIN.... NOTE that we cannot just call the compute entrypoint or the kernel directly. 55 56 ```java 57 SquareCompute.square(????, s32Array); // We can't do this!!!! 58 ``` 59 60 We purposely make it inconvenient (ComputeContext and KernelContext construction is embedded in the framwork) to 61 mistakenly call the compute entrypoint directly. Doing so is akin to calling `Thread.run()` directly, rather than 62 calling `Thread.start()` on a class extending `Thread` and providing an implementation of `Thread.run()` 63 64 Instead we use this pattern 65 66 ```java 67 accelerator.compute( 68 cc -> SquareCompute.square(cc, s32Array) 69 ); 70 ``` 71 72 We pass a lambda to `accelerator.compute()` which is used to determine which compute method to invoke. 73 74 ``` 75 User | Accelerator | Compute | Babylon | Backend | 76 Context Java C++ Vendor 77 +----+ +-----------+ +-------+ +-------+ +----+ +---+ +------+ 78 | | | | | | | | | | | | | | 79 +----+ +-----------+ +-------+ +-------+ +----+ +---+ +------+ 80 +--------> accelerator.compute(lambda) 81 82 ``` 83 84 Incidently, this lambda is never executed by Java JVM ;) instead, the accelerator uses Babylon's Code Reflection 85 capabilities to extract the model of this lambda to determine the compute entrypoint and it's captured args. 86 87 ``` 88 User | Accelerator | Compute | Babylon | Backend | 89 Context Java C++ Vendor 90 +----+ +-----------+ +-------+ +-------+ +----+ +---+ +------+ 91 | | | | | | | | | | | | | | 92 +----+ +-----------+ +-------+ +-------+ +----+ +---+ +------+ 93 +--------> accelerator.compute( cc -> SquareCompute.square(cc, s32Array) ) 94 -------------------------> 95 getModelOf(lambda) 96 <------------------------ 97 ``` 98 99 This model describes the call that we want the accelerator to 100 execute or interpret (`SquareCompute.square()`) and the args that were captured from the call site (the `s32Array` buffer). 101 102 The accelerator uses Babylon again to get the 103 code model of `SquareCompute.square()` builds a ComputeReachableGraph with this method at the root. 104 So the accelerator walks the code model and collects the methods (and code models) of all methods 105 reachable from the entrypoint. 106 107 In our trivial case, the ComputeReachableGraph has a single root node representing the `SquareCompute.square()`. 108 109 ``` 110 User | Accelerator | Compute | Babylon | Backend | 111 Context Java C++ Vendor 112 +----+ +-----------+ +-------+ +-------+ +----+ +---+ +------+ 113 | | | | | | | | | | | | | | 114 +----+ +-----------+ +-------+ +-------+ +----+ +---+ +------+ 115 +--------> accelerator.compute( cc -> SquareCompute.square(cc, s32Array) ) 116 -------------------------> 117 getModelOf(lambda) 118 <------------------------ 119 -------------------------> 120 getModelOf(SquareCompute.square()) 121 <------------------------- 122 forEachReachable method in SquareCompute.square() { 123 -------------------------> 124 getModelOf(method) 125 <------------------------ 126 add to ComputeReachableGraph 127 } 128 ``` 129 130 The Accelertor then walks through the ComputeReachableGraph to determine which kernels are referenced.. 131 132 For each kernel we extract the kernels entrypoint (again as a Babylon 133 Code Model) and create a KernelReachableGraph for each kernel. Again by starting 134 at the kernel entrypoint and closing over all reachable methods (and Code Models). 135 136 We combine the compute and kernel reachable graphs and create an place them in a `ComputeContext`. 137 138 This is the first arg that is 'seemingly' passed to the Compute class. Remember the compute 139 entrypoint is just a model of the code we expect to 140 execute. It may never be executed by the JVM. 141 142 ``` 143 User | Accelerator | Compute | Babylon | Backend | 144 Context Java C++ Vendor 145 +----+ +-----------+ +-------+ +-------+ +----+ +---+ +------+ 146 | | | | | | | | | | | | | | 147 +----+ +-----------+ +-------+ +-------+ +----+ +---+ +------+ 148 149 forEachReachable kernel in ComputeReachableGraph { 150 -------------------------> 151 getModelOf(kernel) 152 <------------------------ 153 add to KernelReachableGraph 154 } 155 ComputeContext = {ComputeReachableGraph + KernelReachableGraph} 156 157 ``` 158 159 The accelerator passes the ComputeContext to backend (`computeContextHandoff()`), which will typically take 160 the opportunity to inspect/mutate the compute and kernel models and possibly build backend specific representations of 161 kernels and compile them. 162 163 The ComputeContext and the captured args are then passed to the backend for execution. 164 165 ``` 166 User | Accelerator | Compute | Babylon | Backend | 167 Context Java C++ Vendor 168 +----+ +-----------+ +-------+ +-------+ +----+ +---+ +------+ 169 | | | | | | | | | | | | | | 170 +----+ +-----------+ +-------+ +-------+ +----+ +---+ +------+ 171 172 173 -----------------------------------> 174 computeContextHandoff(computeContext) 175 -------> 176 -------> 177 compileKernels() 178 <------ 179 mutateComputeModels 180 <------- 181 dispatchCompute(computeContext, args) 182 -------> 183 dispatchCompute(...) 184 ---------> 185 { 186 dispatchKernel() 187 ... 188 } 189 <-------- 190 <------ 191 <---------------------------------- 192 193 ``` 194 195 ---- 196 ### Notes 197 198 In reality. The Accelerator receives a `QuotableComputeContextConsumer` 199 200 ```java 201 public interface QuotableComputeContextConsumer 202 extends Quotable, 203 Consumer<ComputeContext> { 204 } 205 ``` 206 Here is how we extract the 'target' from such a lambda 207 208 ```java 209 public void compute(QuotableComputeContextConsumer qccc) { 210 Quoted quoted = qccc.quoted(); 211 LambdaOpWrapper lambda = OpTools.wrap((CoreOps.LambdaOp)quoted.op()); 212 213 Method method = lambda.getQuotableComputeContextTargetMethod(); 214 215 // Get from the cache or create a compute context which closes over compute entryppint 216 // and reachable kernels. 217 // The models of all compute and kernel methods are passed to the backend during creation 218 // The backend may well mutate the models. 219 // It will also use this opportunity to generate ISA specific code for the kernels. 220 221 ComputeContext = this.cache.computeIfAbsent(method, (_) -> 222 new ComputeContext(this/*Accelerator*/, method) 223 ); 224 225 // Here we get the captured args from the Quotable and 'jam' in the computeContext in slot[0] 226 Object[] args = lambda.getQuotableComputeContextArgs(quoted, method, computeContext); 227 this.compute(computeContext, args); 228 } 229 ```