1 # Using OpenCL Intercept Layer for HAT
2 [Back to Index ../](../index.md)
3
4 The [OpenCL Intercept Layer](https://github.com/intel/opencl-intercept-layer) is a tool that intercepts OpenCL calls
5 for debugging and performance analysis. We can use this tool for multiple OpenCL platforms, including Intel, NVIDIA and macOS.
6
7 ## How to install OpenCL Intercept Layer?
8
9 ```bash
10 git clone https://github.com/intel/opencl-intercept-layer.git
11 cd opencl-intercept-layer
12 mkdir build
13 cd build
14 ## We can optionally enable cliprof, but we mainly use cliloader
15 cmake .. -DENABLE_CLIPROF=1
16 ```
17
18 Then, add in your `PATH` the `opencl-intercept-layer/build/cliloader` directory.
19
20 ```bash
21 export PATH=/path/to/opencl-intercept-layer/build/cliloader:$PATH
22 ```
23
24 ## How to use with HAT
25
26 ```bash
27 cliloader \
28 -d -h \
29 java @.ffi-opencl-example tensors.Main --iterations=10 --verbose
30 ```
31
32 Example of output:
33
34 ```bash
35 Host Performance Timing Results:
36
37 Total Time (ns): 374760223
38
39 Function Name, Calls, Time (ns), Time (%), Average (ns), Min (ns), Max (ns)
40 (device timing overhead), 60, 57423, 0.02%, 957, 0, 4459
41 iclBuildProgram, 3, 70517666, 18.82%, 23505888, 677208, 61181833
42 iclCreateBuffer, 10, 20667, 0.01%, 2066, 375, 5666
43 iclCreateCommandQueue, 1, 45291, 0.01%, 45291, 45291, 45291
44 iclCreateContext, 1, 448625, 0.12%, 448625, 448625, 448625
45 iclCreateKernel, 3, 133957582, 35.74%, 44652527, 292833, 133316541
46 iclCreateProgramWithSource, 3, 43709, 0.01%, 14569, 12667, 16208
47 iclEnqueueMarkerWithWaitList, 120, 300161, 0.08%, 2501, 125, 11166
48 iclEnqueueNDRangeKernel( mxmNaiveF16 ), 10, 9917, 0.00%, 991, 542, 1750
49 iclEnqueueNDRangeKernel( mxmNaiveF32 ), 10, 13252, 0.00%, 1325, 500, 2583
50 iclEnqueueNDRangeKernel( mxmTensorsCM ), 10, 9624, 0.00%, 962, 583, 1625
51 iclEnqueueReadBuffer, 30, 163125, 0.04%, 5437, 3834, 9667
52 iclEnqueueWriteBuffer, 90, 683671, 0.18%, 7596, 542, 70291
53 iclGetDeviceIDs, 2, 24621167, 6.57%, 12310583, 250, 24620917
54 iclGetDeviceInfo, 660, 38704, 0.01%, 58, 0, 875
55 iclGetPlatformIDs, 2, 83, 0.00%, 41, 41, 42
56 iclGetPlatformInfo, 180, 338296, 0.09%, 1879, 0, 330709
57 iclGetProgramBuildInfo, 9, 5959, 0.00%, 662, 42, 2333
58 iclReleaseEvent, 270, 45167, 0.01%, 167, 41, 1125
59 iclSetKernelArg, 150, 25224, 0.01%, 168, 41, 750
60 iclWaitForEvents, 60, 143414910, 38.27%, 2390248, 15166, 20305042
61
62 Device Performance Timing Results for Apple M4 Max (40CUs, 1000MHz):
63
64 Total Time (ns): 3174486
65
66 Function Name, Calls, Time (ns), Time (%), Average (ns), Min (ns), Max (ns)
67 iclEnqueueReadBuffer, 30, 48206, 1.52%, 1606, 758, 6029
68 iclEnqueueWriteBuffer, 90, 90860, 2.86%, 1009, 53, 9861
69 mxmNaiveF16, 10, 906729, 28.56%, 90672, 89916, 96693
70 mxmNaiveF32, 10, 1675136, 52.77%, 167513, 98113, 382520
71 mxmTensorsCM, 10, 453555, 14.29%, 45355, 38614, 46225
72 ```
73
74 ## How to use with Chrome Tracing
75
76 ```bash
77 cliloader -d -h \
78 --chrome-call-logging \
79 --chrome-device-timeline \
80 --chrome-kernel-timeline \
81 --chrome-device-stages \
82 java @.ffi-opencl-example tensors.Main --iterations=10 --verbose
83 ```
84
85 The same functionality could be achived by invoking the `scripts/cliloader-chrome-opencl.bash` script.
86
87 ```bash
88 sh scripts/cliloader-opencl.bash tensors.Main --iterations=10 --verbose
89 ```
90
91 Then open Chrome and enter the following url: `chrome://tracing`.
92
93 Then load the traces (usually a file called `CLIntercept_Trace.json`) that is stored in the default location of the `cliloader` tool.
94
95 To obtain the default location, run `cliloader | grep dump-dir -A 3`.
96
97
98 ## Documentation
99 - https://github.com/intel/opencl-intercept-layer/tree/main/docs