1 ## State of foreign memory support
  2 
  3 **January 2022**
  4 
  5 **Maurizio Cimadamore**
  6 
  7 A crucial part of any native interop story lies in the ability of accessing off-heap memory efficiently and safely. Panama achieves this goal through the Foreign Memory Access API, which has been available as an [incubating](https://openjdk.java.net/jeps/11) API since Java [14](https://openjdk.java.net/jeps/370). The Foreign Memory Access API introduces abstractions to allocate and access flat memory regions (whether on- or off-heap), to manage the lifecycle of memory resources and to model native memory addresses.
  8 
  9 ### Segments
 10 
 11 Memory segments are abstractions which can be used to model contiguous memory regions, located either on- or off- the Java heap. Segments can be allocated from native memory (e.g. like a `malloc`), or can be wrapped around existing memory sources (e.g. a Java array or a `ByteBuffer`). Memory segments provide *strong* spatial, temporal and thread-confinement guarantees which make memory dereference operation *safe* (more on that later), although in most simple cases some properties of memory segments can safely be ignored.
 12 
 13 For instance, the following snippet allocates 100 bytes off-heap:
 14 
 15 ```java
 16 MemorySegment segment = MemorySegment.allocateNative(100, ResourceScope.newImplicitScope());
 17 ```
 18 
 19 The above code allocates a 100-bytes long memory segment. The lifecycle of a memory segment is controlled by an abstraction called `ResourceScope`. In this example, the segment memory will not be *freed* as long as the segment instance is deemed *reachable*, as specified by the `newImplicitScope()` parameter. In other words, the above factory creates a segment whose behavior closely matches that of a `ByteBuffer` allocated with the `allocateDirect` factory. Of course, the memory access API also supports deterministic memory release; we will cover that in a later section of this document.
 20 
 21 Memory segments support *slicing* — that is, given a segment, it is possible to create a new segment whose spatial bounds are stricter than that of the original segment:
 22 
 23 ```java
 24 MemorySegment segment = MemorySement.allocateNative(10, ResourceScope.newImplicitScope());
 25 MemorySegment slice = segment.asSlice(4, 4);
 26 ```
 27 
 28 The above code creates a slice that starts at offset 4 and has a length of 4 bytes. Generally speaking, slices have the *same* temporal bounds as the parent segment (we will refine this concept later in this document). In this example, the memory associated with the parent segment will not be released as long as there is at least one *reachable* slice derived from that segment.
 29 
 30 Memory segments can be dereferenced easily, by using *value layouts* (layouts are covered in greater details in the next section). A value layout captures information such as:
 31 
 32 - The number of bytes to be dereferenced;
 33 - The alignment constraints of the address at which dereference occurs;
 34 - The endianness with which bytes are stored in said memory region;
 35 - The Java type to be used in the dereference operation (e.g. `int` vs `float`).
 36 
 37 For instance, the layout constant `ValueLayout.JAVA_INT` is four bytes wide, has no alignment constraints, uses the native platform endianness (e.g. little-endian on Linux/x64) and is associated with the Java type `int`. The following example reads pairs of 32-bit values (as Java ints) and uses them to construct an array of points:
 38 
 39 ```java
 40 record Point(int x, int y);
 41 MemorySegment segment = MemorySement.allocateNative(10 * 4 * 2, ResourceScope.newImplicitScope());
 42 Point[] values = new Point[10];
 43 for (int i = 0 ; i < values.length ; i++) {
 44     int x = segment.getAtIndex(JAVA_INT, i * 2);
 45     int y = segment.getAtIndex(JAVA_INT, (i * 2) + 1);
 46     values[i] = new Point(x, y);
 47 }
 48 ```
 49 
 50 The above snippet allocates a flat array of 80 bytes using `MemorySegment::allocateNative`. Then, inside the loop, elements in the array are accessed using the `MemorySegment::getAtIndex` method, which accesses `int` elements in a segment at a certain *logical* index (under the hood, the segment offset being accessed is obtained by multiplying the logical index by 4, which is the stride of a Java `int` array). Thus, all coordinates `x` and `y` are collected into instances of a `Point` record.
 51 
 52 Memory segments are pretty flexible when it comes to interacting with existing memory sources and APIs. For instance, it is possible to create a `ByteBuffer` *view* out of an existing memory segment, as follows:
 53 
 54 ```java
 55 IntBuffer intBuffer = segment.asByteBuffer().asIntBuffer();
 56 Point[] values = new Point[10];
 57 for (int i = 0 ; i < values.length ; i++) {
 58     int x = intBuffer.get(i * 2);
 59     int y = intBuffer.get((i * 2) + 1);
 60     values[i] = new Point(x, y);
 61 }
 62 ```
 63 
 64 Creating buffer views out of existing segment is a crucial tool enabling interoperability with existing API (especially those dealing with I/O) which might be expressed in terms of the ByteBuffer API.
 65 
 66 ### Layouts and structured access
 67 
 68 Expressing byte offsets (as in the example above) can lead to code that is hard to read, and very fragile — as memory layout invariants are captured, implicitly, in the constants used to scale offsets. To address this issue, we add a *memory layout* API which allows clients to define memory layouts *programmatically*. For instance, the layout of the array used in the above example can be expressed using the following code <a href="#1"><sup>1</sup></a>:
 69 
 70 ```java
 71 MemoryLayout points = MemoryLayout.sequenceLayout(10,
 72     MemoryLayout.structLayout(
 73         MemoryLayouts.JAVA_INT.withName("x"),
 74         MemoryLayouts.JAVA_INT.withName("y")
 75     )
 76 );            
 77 ```
 78 
 79 That is, our layout is a repetition of 10 *struct* elements, each struct element containing two 32-bit values each. The advantage of defining a memory layout upfront, using an API, is that we can then query the layout — for instance we can compute the offset of the `y` coordinate in the 4th element of the `points` array:
 80 
 81 ```java
 82 long y3 = points.byteOffset(PathElement.sequenceElement(3), PathElement.groupElement("y")); // 28
 83 ```
 84 
 85 To specify which nested layout element should be used for the offset calculation we use a *layout path*, a selection expression that navigates the layout, from the *root* layout, down to the leaf layout we wish to select; in this case we need to select the 4th layout element in the sequence, and then select the layout named `y` inside the selected group layout.
 86 
 87 One of the things that can be derived from a layout is a *memory access var handle*. A memory access var handle is a special kind of var handle which takes a memory segment access coordinate, together with a byte offset — the offset, relative to the segment's base address at which the dereference operation should occur. With memory access var handles we can rewrite our example above as follows:
 88 
 89 ```java
 90 MemorySegment segment = MemorySegment.allocateNative(points, ResourceScope.newImplicitScope());
 91 VarHandle xHandle = points.varHandle(PathElement.sequenceElement(), PathElement.groupElement("x"));
 92 VarHandle yHandle = points.varHandle(PathElement.sequenceElement(), PathElement.groupElement("y"));
 93 Point[] values = new Point[10];
 94 for (int i = 0 ; i < values.length ; i++) {
 95     int x = (int)xHandle.get(segment, (long)i);
 96     int y = (int)yHandle.get(segment, (long)i);
 97 }
 98 ```
 99 
100 In the above, `xHandle` and `yHandle` are two var handle instances whose type is `int` and which takes two access coordinates:
101 
102 1. a `MemorySegment` instance; the segment whose memory should be dereferenced
103 2. a *logical* index, which is used to select the element of the sequence we want to access (as the layout path used to construct these var handles contains one free dimension)
104 
105 Note that memory access var handles (as any other var handle) are *strongly* typed; and to get maximum efficiency, it is generally necessary to introduce casts to make sure that the access coordinates match the expected types — in this case we have to cast `i` into a `long`; similarly, since the signature polymorphic method `VarHandle::get` notionally returns `Object` a cast is necessary to force the right return type the var handle operation <a href="#2"><sup>2</sup></a>.
106 
107 In other words, manual offset computation is no longer needed — offsets and strides can in fact be derived from the layout object; note how `yHandle` is able to compute the required offset of the `y` coordinate in the flat array without the need of any error-prone arithmetic computation.
108 
109 ### Deterministic deallocation
110 
111 In addition to spatial bounds, memory segments also feature temporal bounds as well as thread-confinement. In the examples shown so far, we have always used the API in its simpler form, leaving the runtime to handle details such as whether it was safe or not to reclaim memory associated with a given memory segment. But there are cases where this behavior is not desirable: consider the case where a large memory segment is mapped from a file (this is possible using `MemorySegment::map`); in this case, an application would probably prefer to deterministically release (e.g. unmap) the memory associated with this segment, to ensure that memory doesn't remain available for longer than in needs to (and therefore potentially impacting the performance of the application).
112 
113 Memory segments support deterministic deallocation, through an abstraction called `ResourceScope`. A resource scope models the lifecycle associated with one or more resources (in this document, by resources we mean mostly memory segments); a resource scope has a state: it starts off in the *alive* state, which means that all the resources it manages can be safely accessed — and, at the user's request, it can be *closed*. After a resource scope is closed, access to resources managed by that scope is no longer allowed. Resource scopes implement the `AutoCloseable` interface, and can therefore be used with the *try-with-resources* construct, as demonstrated in the following code:
114 
115 ```java
116 try (ResourceScope scope = ResourceScope.newConfinedScope()) {
117     MemorySegment mapped = MemorySegment.map(Path.of("someFile"), 0, 100000, MapMode.READ_WRITE, scope);    
118 } // segment is unmapped here
119 ```
120 
121 Here, we create a new *confined* resource scope, which is then used when creating a mapped segment; this means that the lifecycle of the `mapped` segment is tied to that of the resource scope, and that accessing the segment (e.g. dereference) *after* `scope` has been closed will not be possible.
122 
123 As this example alludes to, resource scopes can come in many flavors: they can be *confined* (where access is restricted to the thread which created the scope), *shared* <a href="#3"><sup>3</sup></a> (where access can occur in any thread) and can be optionally associated with a `Cleaner` object (as in the case of `newImplicitScope`), which performs *implicit* deallocation when the resource scope becomes *unreachable* (if the `close` method has not been called by the user). Resource scopes are very handy when managing the lifecycle of multiple resources:
124 
125 ```java
126 try (ResourceScope scope = ResourceScope.newConfinedScope()) {
127     MemorySegment segment1 = MemorySegment.allocateNative(100, scope);
128     MemorySegment segment2 = MemorySegment.allocateNative(100, scope);
129     ...
130     MemorySegment segmentN = MemorySegment.allocateNative(100, scope);
131 } // all segments are deallocated here
132 ```
133 
134 Here we create another confined scope, and then, inside the *try-with-resources* we use the scope to create many segments; all such segments share the *same* resource scope — meaning that when such scope is closed, the memory associated with all these segments will be reclaimed at once.
135 
136 Dealing with shared access *and* deterministic deallocation at the same time is tricky, and poses new problems for the user code; consider the case where a method receives a segment and has to write two values in that segment (e.g. two point coordinates):
137 
138 ```java
139 void writePoint(MemorySegment segment, int x, int y) {
140     segment.setAtIndex(JAVA_INT, 0, x);
141     segment.setAtIndex(JAVA_INT, 1, y);
142 }
143 ```
144 
145 If the segment is associated with a confined scope, no problem arises: the thread that created the segment is the same thread that performs the dereference operation — as such, when `writePoint` is called, the segment's scope is either alive (and will remain so for the duration of the call), or already closed (in which case some exception will be thrown, and no value will be written).
146 
147 But, if the segment is associated with a shared scope, there is a new problem we are faced with: the segment might be closed (concurrently) in between the two accesses! This means that, the method ends up writing only one value instead of two; in other words, the behavior of the method is no longer atomic.
148 
149 To avoid this problem, clients can temporarily prevent a scope from being closed, by creating a temporal dependency between that scope and another scope under their control. Let's illustrate how that works in practice:
150 
151 ```java
152 void writePointSafe(MemorySegment segment, int x, int y) {
153     try (ResourceScope scope = ResourceScope.newConfinedScope()) {
154 		scope.keepAlive(segment.scope());
155         MemoryAccess.setIntAtIndex(segment, 0, x);
156         MemoryAccess.setIntAtIndex(segment, 1, y);
157     }
158 }
159 ```
160 
161 Here, the client creates a *fresh* confined scope, and then sets up a dependency between this new scope and the segment's scope, using `ResourceScope::keepAlive`. This means that the segment cannot be released until the local scope is closed. The attentive user might have noticed that this idiom acts as a more restricted version <a href="#4"><sup>4</sup></a> of an *atomic reference count*; each time a target scope is kept alive by a new local scope, its *acquired count* goes up; conversely the count goes down each time a local scope associated with the target scope is released. A target scope can only be closed if its acquired count is exactly zero. In our example above, the semantics of resource scope handles guarantees that the method will be able to either set up the temporal dependency successfully, and write both values, or fail, and write no value.
162 
163 ### Parallel processing
164 
165 The contents of a memory segment can be processed in *parallel* (e.g. using a framework such as Fork/Join) — by obtaining a `Spliterator` instance out of a memory segment. For instance to sum all the 32 bit values of a memory segment in parallel, we can use the following code:
166 
167 ```java
168 SequenceLayout seq = MemoryLayout.sequenceLayout(1_000_000, MemoryLayouts.JAVA_INT);
169 SequenceLayout bulk_element = MemoryLayout.sequenceLayout(100, MemoryLayouts.JAVA_INT);
170 
171 try (ResourceScope scope = ResourceScope.newSharedScope()) {
172     MemorySegment segment = MemorySegment.allocateNative(seq, scope);
173     int sum = segment.elements(bulk_element).parallel()
174                        .mapToInt(slice -> {
175                            int res = 0;
176                            for (int i = 0; i < 100 ; i++) {
177                                res += slice.getAtIndex(JAVA_INT, i);
178                            }
179                            return res;
180                        }).sum();
181 }
182 ```
183 
184 The `MemorySegment::elements` method takes an element layout and returns a new stream. The stream is built on top of a spliterator instance (see `MemorySegment::spliterator`) which splits the segment into chunks which corresponds to the elements in the provided layout. Here, we want to sum elements in an array which contains a million of elements; now, doing a parallel sum where each computation processes *exactly* one element would be inefficient, so instead we use a *bulk* element layout. The bulk element layout is a sequence layout containing a group of 100 elements — which should make it more amenable to parallel processing.
185 
186 Since the segment operated upon by the spliterator is associated with a shared scope, the segment can be accessed from multiple threads concurrently; the spliterator API ensures that the access occurs in a disjoint fashion: a slice is created from the original segment, and given to a thread to perform some computation — thus ensuring that no two threads can ever operate concurrently on the same memory region.
187 
188 ### Combining memory access handles
189 
190 We have seen in the previous sections how memory access var handles dramatically simplify user code when structured access is involved. While deriving memory access var handles from layout is the most convenient option, the Foreign Memory Access API also allows to create such memory access var handles in a standalone fashion, as demonstrated in the following code:
191 
192 ```java
193 VarHandle intHandle = MemoryHandles.varHandle(JAVA_INT); // (MS, J) -> I
194 ```
195 
196 The above code creates a memory access var handle which reads/writes `int` values at a certain byte offset in a segment. To create this var handle we have to specify a carrier type — the type we want to use e.g. to extract values from memory, as well as whether any byte swapping should be applied when contents are read from or stored to memory. Additionally, the user might want to impose additional constraints on how memory dereferences should occur; for instance, a client might want to prevent access to misaligned 32 bit values. Of course, all this information can be succinctly derived from the provided value layout (`JAVA_INT` in the above example).
197 
198 The attentive reader might have noted how rich the var handles returned by the layout API are, compared to the simple memory access var handle we have constructed here. How do we go from a simple access var handle that takes a byte offset to a var handle that can dereference a complex layout path? The answer is, by using var handle *combinators*. Developers familiar with the method handle API know how simpler method handles can be combined into more complex ones using the various combinator methods in the `MethodHandles` API. These methods allow, for instance, to insert (or bind) arguments into a target method handle, filter return values, permute arguments and much more.
199 
200 Sadly, none of these features are available when working with var handles. The Foreign Memory Access API rectifies this, by adding a rich set of var handle combinators in the `MemoryHandles` class; with these tools, developers can express var handle transformations such as:
201 
202 * mapping a var handle carrier type into a different one, using an embedding/projection method handle pairs
203 * filter one or more var handle access coordinates using unary filters
204 * permute var handle access coordinates
205 * bind concrete access coordinates to an existing var handle
206 
207 Without diving too deep, let's consider how we might want to take a basic memory access handle and turn it into a var handle which dereference a segment at a specific offset (again using the `points` layout defined previously):
208 
209 ```java
210 VarHandle intHandle = MemoryHandles.varHandle(JAVA_INT); // (MS, J) -> I
211 long offsetOfY = points.byteOffset(PathElement.sequenceElement(3), PathElement.groupElement("y"));
212 VarHandle valueHandle = MemoryHandles.insertCoordinates(intHandle, 1, offsetOfValue); // (MS) -> I
213 ```
214 
215 We have been able to derive, from a basic memory access var handle, a new var handle that dereferences a segment at a given fixed offset. It is easy to see how other, richer, var handles obtained using the layout API can be constructed manually using the var handle combinator API.
216 
217 ### Unsafe segments
218 
219 The memory access API provides basic safety guarantees for all memory segments created using the API. More specifically, a memory dereference operation should either succeed, or result in a runtime exception — but, crucially, should never result in a VM crash, or, more subtly, in memory corruption occurring *outside* the region of memory associated with a memory segment. This is indeed the case, as all memory segments feature immutable *spatial bounds*, and, as we have seen, are associated with a resource scope which make sure that segments cannot be dereferenced after their scope has been closed, or, in case of a confined scope, that segments cannot be dereferenced from a thread other than the one which created the scope.
220 
221 That said, it is sometimes necessary to create a segment out of an existing memory source, which might be managed by native code. This is the case, for instance, if we want to create a segment out of a memory region managed by a *custom allocator*.
222 
223 The ByteBuffer API allows such a move, through a JNI [method](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#NewDirectByteBuffer), namely `NewDirectByteBuffer`. This native method can be used to wrap a long address in a fresh direct byte buffer instance which is then returned to unsuspecting Java code.
224 
225 Memory segments provide a similar capability — that is, given an address (which might have been obtained through some native calls), it is possible to wrap a segment around it, with given spatial bounds and resource scope, as follows:
226 
227 ```java
228 try (ResourceScope scope = ResourceScope.newSharedScope()) {
229     MemoryAddress addr = MemoryAddress.ofLong(someLongAddr);
230     var unsafeSegment = MemorySegment.ofAddress(addr, 10, scope);
231     ...
232 }
233 ```
234 
235 The above code creates a shared scope and then, inside the *try-with-resources* it creates a *new* unsafe segment from a given address; the size of the segment is 10 bytes, and the unsafe segment is associated with the current shared scope. This means that the unsafe segment cannot be dereferenced after the shared scope has been closed.
236 
237 Of course, segments created this way are completely *unsafe*. There is no way for the runtime to verify that the provided address indeed points to a valid memory location, or that the size of the memory region pointed to by `addr` is indeed 10 bytes. Similarly, there are no guarantees that the underlying memory region associated with `addr` will not be deallocated *prior* to the call to `ResourceScope::close`.
238 
239 For these reasons, creating unsafe segments is a *restricted* operation in the Foreign Memory Access API. Restricted operations can only be performed from selected modules. To grant a given module `M` the permission to execute restricted methods, the option `--enable-native-access=M` must be specified on the command line. Multiple module names can be specified in a comma-separated list, where the special name `ALL-UNNAMED` is used to enable restricted access for all code on the class path. Any attempt to call restricted operations from a module not listed in the above flag will fail with a runtime exception.
240 
241 * <a id="1"/>(<sup>1</sup>):<small> In general, deriving a complete layout from a C `struct` declaration is no trivial matter, and it's one of those areas where tooling can help greatly.</small>
242 * <a id="2"/>(<sup>2</sup>):<small> Clients can enforce stricter type checking when interacting with `VarHandle` instances, by obtaining an *exact* var handle, using the `VarHandle::withInvokeExactBehavior` method.</small>
243 * <a id="3"/>(<sup>3</sup>):<small> Shared segments rely on VM thread-local handshakes (JEP [312](https://openjdk.java.net/jeps/312)) to implement lock-free, safe, shared memory access; that is, when it comes to memory access, there should no difference in performance between a shared segment and a confined segment. On the other hand, `MemorySegment::close` might be slower on shared segments than on confined ones.</small>
244 * <a id="4"/>(<sup>4</sup>):<small> The main difference between reference counting and the mechanism proposed here is that reference counting is *symmetric* — meaning that any client is able to both increment and decrement the reference count at will. The resource scope handle mechanism is *asymmetric*, since only the client acquiring a handle has the capability to release that handle. This avoids situation where a client might be tempted to e.g. decrement the reference count multiple times in order to perform some task which would otherwise be forbidden. </small>
245