1 ## State of foreign memory support 2 3 **December 2023** 4 5 **Maurizio Cimadamore** 6 7 A crucial part of any native interop story lies in the ability of accessing off-heap memory efficiently and safely. Java achieves this goal through the Foreign Function & Memory API (FFM API in short), parts of which have been available as an [incubating](https://openjdk.java.net/jeps/11) API since Java [14](https://openjdk.java.net/jeps/370). The FFM API introduces abstractions to allocate and access flat memory regions (whether on- or off-heap), to manage the lifecycle of memory resources and to model native memory addresses. 8 9 ### Memory segments and arenas 10 11 Memory segments are abstractions which can be used to model contiguous memory regions, located either on-heap (i.e. *heap segments*) or off- the Java heap (i.e. *native segments*). Memory segments provide *strong* spatial, temporal and thread-confinement guarantees which make memory dereference operation *safe* (more on that later), although in most simple cases some properties of memory segments can safely be ignored. 12 13 For instance, the following snippet allocates 100 bytes off-heap: 14 15 ```java 16 MemorySegment segment = Arena.global().allocate(100); 17 ``` 18 19 The above code allocates a 100-bytes long memory segment, using an *arena*. The FFM API provides several kinds of arena, which can be used to control the lifecycle of the allocated native segments in different ways. In this example, the segment is allocated with the *global* arena. Memory segments allocated with this arena are always *alive* and their backing regions of memory are never deallocated. In other words, we say that the above segment has an *unbounded* lifetime. 20 21 > Note: the lifetime of a memory segment is modelled by a *scope* (see `MemorySegment.Scope`). A memory segment can be accessed as long as its associated scope is *alive* (see `Scope::isAlive`). In most cases, the scope of a memory segment is the scope of the arena which allocated that segment. Accessing the scope of a segment can be useful to perform lifetime queries (e.g. asking whether a segment has the same lifetime as that of another segment), creating custom arenas and unsafely assigning new temporal bounds to an existing native memory segments (these topics are explored in more details below). 22 23 Most programs, though, require off-heap memory to be deallocated while the program is running, and thus need memory segments with *bounded* lifetimes. The simplest way to obtain a segment with bounded lifetime is to use an *automatic arena*: 24 25 ```java 26 MemorySegment segment = Arena.ofAuto().allocate(100); 27 ``` 28 29 Segments allocated with an automatic arena are alive as long as they are determined to be reachable by the garbage collector. In other words, the above snippet creates a native segment whose behavior closely matches that of a `ByteBuffer` allocated with the `allocateDirect` factory. 30 31 There are cases, however, where automatic deallocation is not enough: consider the case where a large memory segment is mapped from a file (this is possible using `FileChannel::map`); in this case, an application would probably prefer to release (e.g. `unmap`) the memory associated with this segment in a *deterministic* fashion, to ensure that memory doesn't remain available for longer than it needs to. 32 33 A *confined* arena allocates segment featuring a bounded *and* deterministic lifetime. A memory segment allocated with a confined arena is alive from the time when the arena is opened, until the time when the arena is closed (at which point the segments become inaccessible). Multiple segments allocated with the same arena enjoy the *same* bounded lifetime and can safely contain mutual references. For example, this code opens an arena and uses it to allocate several native segments: 34 35 ```java 36 try (Arena arena = Arena.ofConfined()) { 37 MemorySegment segment1 = arena.allocate(100); 38 MemorySegment segment2 = arena.allocate(100); 39 ... 40 MemorySegment segmentN = arena.allocate(100); 41 } // all segments are deallocated here 42 ``` 43 44 When the arena is closed (above, this is done with the *try-with-resources* construct) the arena is no longer alive, all the segments associated with it are invalidated atomically, and the regions of memory backing the segments are deallocated. 45 46 A confined arena's deterministic lifetime comes at a price: only one thread can access the memory segments allocated in a confined arena. If multiple threads need access to a segment, then a *shared* arena can be used (`Arena::ofShared`). The memory segments allocated in a shared arena can be accessed by multiple threads, and any thread (regardless of whether it was involved in access) can close the shared arena to deallocate the segments. The closure will atomically invalidate the segments, though deallocation of the regions of memory backing the segments might not occur immediately: an expensive synchronization operation<a href="#1"><sup>1</sup></a> is needed to detect and cancel pending concurrent access operations on the segments. 47 48 In summary, an arena controls *which* threads can access a memory segment and *when*, in order to provide both strong temporal safety and a predictable performance model. The FFM API offers a choice of arenas so that a client can trade off breadth-of-access against timeliness of deallocation. 49 50 ### Slicing segments 51 52 Memory segments support *slicing* — that is, given a segment, it is possible to create a new segment whose spatial bounds are stricter than that of the original segment: 53 54 ```java 55 MemorySegment segment = Arena.ofAuto().allocate(10); 56 MemorySegment slice = segment.asSlice(4, 4); 57 ``` 58 59 The above code creates a slice that starts at offset 4 and has a length of 4 bytes. Slices have the *same* temporal bounds (i.e. segment scope) as the parent segment. In the above example, the memory associated with the parent segment will not be released as long as there is at least one *reachable* slice derived from that segment. 60 61 To process the contents of a memory segment in bulk, a memory segment can be turned into a stream of slices, using the `MemorySegment::elements` method: 62 63 ```java 64 SequenceLayout seq = MemoryLayout.sequenceLayout(1_000_000, JAVA_INT); 65 SequenceLayout bulk_element = MemoryLayout.sequenceLayout(100, JAVA_INT); 66 67 try (Arena arena = Arena.ofShared()) { 68 MemorySegment segment = arena.allocate(seq); 69 int sum = segment.elements(bulk_element).parallel() 70 .mapToInt(slice -> { 71 int res = 0; 72 for (int i = 0; i < 100 ; i++) { 73 res += slice.getAtIndex(JAVA_INT, i); 74 } 75 return res; 76 }).sum(); 77 } 78 ``` 79 80 The `MemorySegment::elements` method takes an element layout and returns a new stream. The stream is built on top of a spliterator instance (see `MemorySegment::spliterator`) which splits the segment into chunks whose size matches that of the provided layout. Here, we want to sum elements in an array which contains a million of elements; now, doing a parallel sum where each computation processes *exactly* one element would be inefficient, so instead we use a *bulk* element layout. The bulk element layout is a sequence layout containing a group of 100 elements — which should make it more amenable to parallel processing. Since we are using `Stream::parallel` to work on disjoint slices in parallel, here we use a *shared* arena, to ensure that the resulting segment can be accessed by multiple threads. 81 82 ### Accessing segments 83 84 Memory segments can be dereferenced easily, by using *value layouts* (layouts are covered in greater details in the next section). A value layout captures information such as: 85 86 - The number of bytes to be dereferenced; 87 - The alignment constraints of the address at which dereference occurs; 88 - The endianness with which bytes are stored in said memory region; 89 - The Java type to be used in the dereference operation (e.g. `int` vs `float`). 90 91 For instance, the layout constant `ValueLayout.JAVA_INT` is four bytes wide, has no alignment constraints, uses the native platform endianness (e.g. little-endian on Linux/x64) and is associated with the Java type `int`. The following example reads pairs of 32-bit values (as Java ints) and uses them to construct an array of points: 92 93 ```java 94 record Point(int x, int y); 95 MemorySegment segment = Arena.ofAuto().allocate(10 * 4 * 2); 96 Point[] values = new Point[10]; 97 for (int i = 0 ; i < values.length ; i++) { 98 int x = segment.getAtIndex(JAVA_INT, i * 2); 99 int y = segment.getAtIndex(JAVA_INT, (i * 2) + 1); 100 values[i] = new Point(x, y); 101 } 102 ``` 103 104 The above snippet allocates a flat array of 80 bytes using an automatic arena. Then, inside the loop, elements in the array are accessed using the `MemorySegment::getAtIndex` method, which accesses `int` elements in a segment at a certain *logical* index (under the hood, the segment offset being accessed is obtained by multiplying the logical index by 4, which is the stride of a Java `int` array). Thus, all coordinates `x` and `y` are collected into instances of a `Point` record. 105 106 ### Structured access 107 108 Expressing byte offsets (as in the example above) can lead to code that is hard to read, and very fragile — as memory layout invariants are captured, implicitly, in the constants used to scale offsets. To address this issue, clients can use a `MemoryLayout` to describe the contents of a memory segment *programmatically*. For instance, the layout of the array used in the above example can be expressed using the following code <a href="#2"><sup>2</sup></a>: 109 110 ```java 111 MemoryLayout points = MemoryLayout.sequenceLayout(10, 112 MemoryLayout.structLayout( 113 JAVA_INT.withName("x"), 114 JAVA_INT.withName("y") 115 ) 116 ); 117 ``` 118 119 That is, our layout is a repetition of 10 *struct* elements, each struct element containing two 32-bit values each. Once defined, a memory layout can be queried — for instance we can compute the offset of the `y` coordinate in the 4th element of the `points` array: 120 121 ```java 122 long y3 = points.byteOffset(PathElement.sequenceElement(3), PathElement.groupElement("y")); // 28 123 ``` 124 125 To specify which nested layout element should be used for the offset calculation we use a *layout path*, a selection expression that navigates the layout, from the *root* layout, down to the leaf layout we wish to select; in this case we need to select the 4th layout element in the sequence, and then select the layout named `y` inside the selected group layout. 126 127 One of the things that can be derived from a layout is a *memory access var handle*. A memory access var handle is a special kind of var handle which takes a memory segment access coordinate, together with a byte offset — the offset, relative to the segment's base address at which the dereference operation should occur. With memory access var handles we can rewrite our example above as follows: 128 129 ```java 130 MemorySegment segment = Arena.ofAuto().allocate(points); 131 VarHandle xHandle = points.varHandle(PathElement.sequenceElement(), PathElement.groupElement("x")); 132 VarHandle yHandle = points.varHandle(PathElement.sequenceElement(), PathElement.groupElement("y")); 133 Point[] values = new Point[10]; 134 for (int i = 0 ; i < values.length ; i++) { 135 int x = (int)xHandle.get(segment, 0L /* base offset */, (long)i /* index */); 136 int y = (int)yHandle.get(segment, 0L /* base offset */, (long)i /* index */); 137 } 138 ``` 139 140 In the above, `xHandle` and `yHandle` are two var handle instances whose type is `int` and which takes three access coordinates: 141 142 1. a `MemorySegment` instance; the segment whose memory should be dereferenced 143 2. a *base offset*, which indicates the portions of the memory segment to be accessed; this is typically left to zero (as above), but can be useful when combining memory access var handles (see below); 144 3. a *logical* index, which is used to select the element of the sequence we want to access (as the layout path used to construct these var handles contains one free dimension) 145 146 In other words, the offset of the access operation can be expressed as follows: 147 148 ```java 149 offset = baseOffset + (index * JAVA_INT.byteSize()); 150 ``` 151 152 Or, equivalently, using the `MemoryLayout::scale` method, as: 153 154 ```java 155 offset = JAVA_INT.scale(baseOffset, index); 156 ``` 157 158 Note that memory access var handles (as any other var handle) are *strongly* typed; and to get maximum efficiency, it is generally necessary to introduce casts to make sure that the access coordinates match the expected types — in this case we have to cast `i` into a `long`; similarly, since the signature polymorphic method `VarHandle::get` notionally returns `Object` a cast is necessary to force the right return type the var handle operation <a href="#3"><sup>3</sup></a>. 159 160 In other words, manual offset computation is no longer needed — offsets and strides can in fact be derived from the layout object; note how `yHandle` is able to compute the required offset of the `y` coordinate in the flat array without the need of any error-prone arithmetic computation. 161 162 ### Combining memory access handles 163 164 We have seen in the previous sections how memory access var handles dramatically simplify user code when structured access is involved. While deriving memory access var handles from layout is the most convenient option, the FFM API also allows to create such memory access var handles in a standalone fashion, as demonstrated in the following code: 165 166 ```java 167 VarHandle intHandle = JAVA_INT.varHandle(); // (MS, J) -> I 168 ``` 169 170 The above code creates a memory access var handle which reads/writes `int` values at a certain byte offset in a segment. To create this var handle we have to specify a carrier type — the type we want to use e.g. to extract values from memory, as well as whether any byte swapping should be applied when contents are read from or stored to memory. Additionally, the user might want to impose additional constraints on how memory dereferences should occur; for instance, a client might want to prevent access to misaligned 32 bit values. Of course, all this information can be succinctly derived from the provided value layout (`JAVA_INT` in the above example). 171 172 The attentive reader might have noted how the var handles obtained from the sequence layout in the previous section can be in fact derived from the simple memory access var handle we have constructed here. That is, var handles can be adapted and turned into more complex var handles, using var handle *combinators*. Developers familiar with the method handle API know how simpler method handles can be combined into more complex ones using the various combinator methods in the `MethodHandles` class. These methods allow, for instance, to insert (or bind) arguments into a target method handle, filter return values, permute arguments and much more. 173 174 The FFM API adds a rich set of var handle combinators in the `MethodHandles` class; with these tools, developers can express var handle transformations such as: 175 176 * mapping a var handle carrier type into a different one, using an embedding/projection method handle pairs 177 * filter one or more var handle access coordinates using unary filters 178 * permute var handle access coordinates 179 * bind concrete access coordinates to an existing var handle 180 181 Without diving too deep, let's consider how we might want to take a basic memory access handle and turn it into a var handle which dereference a segment at a specific offset (again using the `points` layout defined previously): 182 183 ```java 184 VarHandle intHandle = JAVA_INT.varHandle(); // (MS, J) -> I 185 long offsetOfY = points.byteOffset(PathElement.sequenceElement(3), PathElement.groupElement("y")); 186 VarHandle valueHandle = MethodHandles.insertCoordinates(intHandle, 1, offsetOfValue); // (MS) -> I 187 ``` 188 189 We have been able to derive, from a basic memory access var handle, a new var handle that dereferences a segment at a given fixed offset. It is easy to see how other, richer, var handles obtained using a memory layout can also be constructed manually using the var handle combinators provided by the FFM API. 190 191 ### Segment allocators and custom arenas 192 193 Memory allocation is often a bottleneck when clients use off-heap memory. The FFM API therefore includes a `SegmentAllocator` interface to define operations to allocate and initialize memory segments. As a convenience, the `Arena` interface extends the `SegmentAllocator` interface so that arenas can be used to allocate native segments. In other words, `Arena` is a "one-stop shop" for flexible allocation and timely deallocation of off-heap memory: 194 195 ```java 196 FileChannel channel = ... 197 try (Arena offHeap = Arena.ofConfined()) { 198 MemorySegment nativeArray = offHeap.allocateFrom(ValueLayout.JAVA_INT, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9); 199 MemorySegment nativeString = offHeap.allocateFrom("Hello!"); 200 201 MemorySegment mappedSegment = channel.map(MapMode.READ_WRITE, 0, 1000, arena); 202 ... 203 } // memory released here 204 ``` 205 206 Segment allocators can also be obtained via factories in the `SegmentAllocator` interface. For example, one factory creates a *slicing allocator* that responds to allocation requests by returning memory segments which are part of a previously allocated segment; thus, many requests can be satisfied without physically allocating more memory. The following code obtains a slicing allocator over an existing segment, then uses it to allocate a segment initialized from a Java array: 207 208 ```java 209 MemorySegment segment = ... 210 SegmentAllocator allocator = SegmentAllocator.slicingAllocator(segment); 211 for (int i = 0 ; i < 10 ; i++) { 212 MemorySegment s = allocator.allocateFrom(JAVA_INT, 1, 2, 3, 4, 5); 213 ... 214 } 215 ``` 216 217 A segment allocator can be used as a building block to create an arena that supports a custom allocation strategy. For example, if many segments share the same bounded lifetime, then an arena could use a slicing allocator to allocate the segments efficiently. This lets clients enjoy both scalable allocation (thanks to slicing) and deterministic deallocation (thanks to the arena). 218 219 As an example, the following code defines a *slicing arena* that behaves like a confined arena (i.e., single-threaded access), but internally uses a slicing allocator to respond to allocation requests. When the slicing arena is closed, the underlying confined arena is also closed; this will invalidate all segments allocated with the slicing arena: 220 221 ```java 222 class SlicingArena { 223 final Arena arena = Arena.ofConfined(); 224 final SegmentAllocator slicingAllocator; 225 226 SlicingArena(long size) { 227 slicingAllocator = SegmentAllocator.slicingAllocator(arena.allocate(size)); 228 } 229 230 public void allocate(long byteSize, long byteAlignment) { 231 return slicingAllocator.allocate(byteSize, byteAlignment); 232 } 233 234 public MemorySegment.Scope scope() { 235 return arena.scope(); 236 } 237 238 public void close() { 239 return arena.close(); 240 } 241 } 242 ``` 243 244 The earlier code which used a slicing allocator directly can now be written more succinctly, as follows: 245 246 ```java 247 try (Arena slicingArena = new SlicingArena(1000)) { 248 for (int i = 0 ; i < 10 ; i++) { 249 MemorySegment s = arena.allocateFrom(JAVA_INT, 1, 2, 3, 4, 5); 250 ... 251 } 252 } // all memory allocated is released here 253 ``` 254 255 * <a id="1"/>(<sup>1</sup>):<small> Shared arenas rely on VM thread-local handshakes (JEP [312](https://openjdk.java.net/jeps/312)) to implement lock-free, safe, shared memory access; that is, when it comes to memory access, there should be no difference in performance between a shared segment and a confined segment. On the other hand, `Arena::close` might be slower on shared arenas than on confined ones.</small> 256 * <a id="2"/>(<sup>2</sup>):<small> In general, deriving a complete layout from a C `struct` declaration is no trivial matter, and it's one of those areas where tooling can help greatly.</small> 257 * <a id="3"/>(<sup>3</sup>):<small> Clients can enforce stricter type checking when interacting with `VarHandle` instances, by obtaining an *exact* var handle, using the `VarHandle::withInvokeExactBehavior` method.</small> 258