New hat/docs/hat-03-examples.md

 1 
 2 # Running Examples on the GPU
 3 
 4 ----
 5 * [Contents](hat-00.md)
 6 * Build Babylon and HAT
 7     * [Quick Install](hat-01-quick-install.md)
 8     * [Building Babylon with jtreg](hat-01-02-building-babylon.md)
 9     * [Building HAT with jtreg](hat-01-03-building-hat.md)
10         * [Enabling the NVIDIA CUDA Backend](hat-01-05-building-hat-for-cuda.md)
11 * [Testing Framework](hat-02-testing-framework.md)
12 * [Running Examples](hat-03-examples.md)
13 * [HAT Programming Model](hat-03-programming-model.md)
14 * Interface Mapping
15     * [Interface Mapping Overview](hat-04-01-interface-mapping.md)
16     * [Cascade Interface Mapping](hat-04-02-cascade-interface-mapping.md)
17 * Development
18     * [Project Layout](hat-01-01-project-layout.md)
19     * [IntelliJ Code Formatter](hat-development.md)
20 * Implementation Details
21     * [Walkthrough Of Accelerator.compute()](hat-accelerator-compute.md)
22     * [How we minimize buffer transfers](hat-minimizing-buffer-transfers.md)
23 * [Running HAT with Docker on NVIDIA GPUs](hat-07-docker-build-nvidia.md)
24 ---
25 
26 The [examples-package]() in HAT contains a list of of examples varying from matrix operations, DFT, Flash-Attention, nbody, shaders, and image detection.
27 
28 To run an example:
29 
30 ```bash
31 java @hat/run ffi-<backend> <example>
32 ```
33 
34 For instance, to run `flash-attention` with the OpenCL backend:
35 
36 ```bash
37 java @hat/run ffi-opencl flashattention
38 ```
39 
40 For the CUDA backend:
41 
42 ```backend
43 java @hat/run ffi-cuda flashattention
44 ```
45 
46 ## Options for Examples
47 
48 Some of the examples accept command line options to specify input size, kernel version, etc.
49 
50 For example, `flashattention`:
51 
52 ```bash
53 java -cp hat/job.jar hat.java run ffi-opencl flashattention --size=2048 --iterations=10 --verbose
54 ```
55 
56 You can see the full list of options by using `--help`:
57 
58 ```bash
59     --size=<size>                   Specify an input size
60     --iterations=<numIterations>    Specify the number of iterations to perform
61     --skip-sequential               Flag to bypass the sequential execution in Java
62     --check                         Flag to check the results. This implies the Java sequential version runs.
63     --verbose                       Flag to print information between runs (e.g., total time).
64     --help                          Print this help.
65 ```
66 
67 ### Obtaining information about the accelerator used to launch the application
68 
69 You can use the variable `INFO` to indicate the HAT runtime to dump the device name and device version used to launch the generated GPU kernel:
70 
71 
72 ```bash
73 $ HAT=INFO java @hat/run ffi-cuda matmul
74 
75 [INFO] Input Size     : 1024x1024
76 [INFO] Check Result:  : false
77 [INFO] Num Iterations : 100
78 [INFO] NDRangeConfiguration: 2DTILING
79 
80 [INFO] Using NVIDIA GPU: NVIDIA GeForce RTX 5060   << an NVIDIA 5060 was used
81 [INFO] Dispatching the CUDA kernel
82         \_ BlocksPerGrid   = [64,64,1]    << Num blocks dispatched
83         \_ ThreadsPerBlock = [16,16,1]    << threads-per-block dispatched
84 ```
85 
86 ## Running Headless Versions
87 
88 GUI applications contains a `headless` version, which can be able by passing the following argument:
89 
90 ```bash
91 java @hat/run headless ffi-opencl mandel
92 ```
93 
94 This sets `-Dheadless=true` and passes '--headless' to the example.
95