New hat/docs/hat-07-docker-build-nvidia.md

  1 # Running HAT with Docker on NVIDIA GPUs
  2 
  3 ----
  4 * [Contents](hat-00.md)
  5 * Build Babylon and HAT
  6     * [Quick Install](hat-01-quick-install.md)
  7     * [Building Babylon with jtreg](hat-01-02-building-babylon.md)
  8     * [Building HAT with jtreg](hat-01-03-building-hat.md)
  9         * [Enabling the NVIDIA CUDA Backend](hat-01-05-building-hat-for-cuda.md)
 10 * [Testing Framework](hat-02-testing-framework.md)
 11 * [Running Examples](hat-03-examples.md)
 12 * [HAT Programming Model](hat-03-programming-model.md)
 13 * Interface Mapping
 14     * [Interface Mapping Overview](hat-04-01-interface-mapping.md)
 15     * [Cascade Interface Mapping](hat-04-02-cascade-interface-mapping.md)
 16 * Development
 17     * [Project Layout](hat-01-01-project-layout.md)
 18     * [IntelliJ Code Formatter](hat-development.md)
 19 * Implementation Details
 20     * [Walkthrough Of Accelerator.compute()](hat-accelerator-compute.md)
 21     * [How we minimize buffer transfers](hat-minimizing-buffer-transfers.md)
 22 * [Running HAT with Docker on NVIDIA GPUs](hat-07-docker-build-nvidia.md)
 23 ---
 24 
 25 ## Setting Up Docker Containers for NVIDIA GPUs
 26 
 27 To run Docker on NVIDIA GPUs, you need to install the NVIDIA Container Toolkit first.
 28 Follow the instructions [here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
 29 
 30 Once the NVIDIA Container Toolkit has been installed, you can check access to the GPU by running the following image to run the `nvidia-smi` tool:
 31 
 32 ```bash
 33 sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
 34 ```
 35 
 36 You will get an output similar to this:
 37 
 38 ```bash
 39 +-----------------------------------------------------------------------------------------+
 40 | NVIDIA-SMI 590.48.01              Driver Version: 590.48.01      CUDA Version: 13.1     |
 41 +-----------------------------------------+------------------------+----------------------+
 42 | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
 43 | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
 44 |                                         |                        |               MIG M. |
 45 |=========================================+========================+======================|
 46 |   0  NVIDIA GeForce RTX 5060        Off |   00000000:01:00.0  On |                  N/A |
 47 |  0%   46C    P8             12W /  145W |     578MiB /   8151MiB |      0%      Default |
 48 |                                         |                        |                  N/A |
 49 +-----------------------------------------+------------------------+----------------------+
 50 
 51 +-----------------------------------------------------------------------------------------+
 52 | Processes:                                                                              |
 53 |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
 54 |        ID   ID                                                               Usage      |
 55 |=========================================================================================|
 56 |  No running processes found                                                             |
 57 +-----------------------------------------------------------------------------------------+
 58 ```
 59 
 60 ## DockerFile for HAT using NVIDIA SDK
 61 
 62 You can use the following `Dockerfile` for building a new docker container:
 63 
 64 ```dockerfile
 65 FROM nvidia/cuda:13.0.1-devel-ubuntu24.04
 66 
 67 RUN apt-get update -q && apt install -qy \
 68         build-essential git cmake vim maven curl bash unzip zip wget
 69 
 70 WORKDIR /opt/babylon/
 71 RUN wget https://download.java.net/java/early_access/jdk26/22/GPL/openjdk-26-ea+22_linux-x64_bin.tar.gz
 72 RUN tar xvzf openjdk-26-ea+22_linux-x64_bin.tar.gz
 73 ENV JAVA_HOME=/opt/babylon/jdk-26/
 74 ENV PATH=$JAVA_HOME/bin:$PATH
 75 RUN java --version
 76 
 77 ## Configure Babylon/HAT from source
 78 RUN git clone https://github.com/openjdk/babylon.git
 79 WORKDIR /opt/babylon/babylon
 80 
 81 RUN apt-get update -y
 82 RUN apt-get install -y autoconf libfreetype6-dev
 83 RUN apt-get install -y file
 84 RUN apt-get install -y libasound2-dev
 85 RUN apt-get install -y libcups2-dev
 86 RUN apt-get install -y libfontconfig1-dev
 87 RUN apt-get install -y libx11-dev libxext-dev libxrender-dev libxrandr-dev libxtst-dev libxt-dev
 88 
 89 RUN bash configure --with-boot-jdk=${JAVA_HOME}
 90 RUN make clean
 91 RUN make images
 92 
 93 # Configure HAT
 94 WORKDIR /opt/babylon/babylon/hat
 95 RUN wget https://download.java.net/java/early_access/jextract/22/6/openjdk-22-jextract+6-47_linux-x64_bin.tar.gz
 96 RUN tar xvzf openjdk-22-jextract+6-47_linux-x64_bin.tar.gz > /dev/null
 97 ENV PATH=/opt/babylon/babylon/hat/jextract-22/bin:$PATH
 98 ENV PATH=/opt/babylon/babylon/build/linux-x86_64-server-release/jdk/bin/:$PATH
 99 ENV JAVA_HOME=/opt/babylon/babylon/build/linux-x86_64-server-release/jdk
100 RUN /bin/bash -c "source env.bash"
101 
102 RUN java @hat/clean
103 RUN java @hat/bld
104 
105 ## Expose a volume to pass files in the local directory
106 WORKDIR /opt/babylon/babylon/hat/
107 VOLUME ["/data"]
108 ```
109 
110 ## Build Image
111 
112 Run the following command in the same directory of the `Dockerfile` with the previous configuration:
113 
114 ```bash
115 docker build . -t babylon
116 ```
117 
118 ## Running Examples on the NVIDIA GPU
119 
120 Check `nvidia-smi` tool from NVIDIA with the new image, so we have connection to the GPU:
121 
122 ```bash
123 docker run -it --rm --runtime=nvidia --gpus all babylon nvidia-smi
124 ```
125 
126 All setup! Now you can run HAT on NVIDIA GPUs.
127 
128 Run matrix-multiply example:
129 
130 ```bash
131 docker run -it --rm --runtime=nvidia --gpus all babylon java -cp hat/job.jar hat.java run ffi-cuda matmul --size=1024 --kernel=2DREGISTERTILING_FP16
132 ```
133 
134 ## Enable debug info
135 
136 ```bash
137 docker run -it --rm --runtime=nvidia --gpus all babylon java -cp hat/job.jar hat.java run ffi-cuda -DHAT=INFO matmul --size=1024 --kernel=2DREGISTERTILING_FP16
138 ```
139 
140 Expected output:
141 
142 ```bash
143 [INFO] Input Size     : 1024x1024
144 [INFO] Check Result:  : false
145 [INFO] Num Iterations : 100
146 [INFO] NDRangeConfiguration: 2DREGISTER_TILING_FP16
147 
148 [INFO] Using NVIDIA GPU: NVIDIA GeForce RTX 5060
149 [INFO] Dispatching the CUDA kernel
150         \_ BlocksPerGrid   = [16,16,1]
151         \_ ThreadsPerBlock = [16,16,1]
152 ```