1 # Summary Plan of Record: GenShen Prototype (2020)
  2 
  3 ## Planned Milestones
  4 
  5 ### Overview of Initial Milestones
  6 
  7 1. Pure young collection. (Young gen size unlimited. Old gen untouched.)
  8 2. Size-restricted young gen.
  9 3. Tenuring and promotion.
 10 4. Global collection after young collection.
 11 5. Young collection after global collection, repeat alternations.
 12 6. Concurrent old marking.
 13 7. Concurrent old and young collections.
 14 
 15 Young collection reclaims garbage only from heap regions that are
 16 identified as belonging to the young generation.
 17 
 18 Old collection reclaims garbage only from heap regions that are
 19 identified as belonging to the old generation, which holds objects
 20 that have been promoted from young generation.
 21 
 22 Global collection reclaims garbage from heap regions belonging to
 23 either the young generation or the old generation.
 24 
 25 ### Milestone 1: GC of Young Gen Only
 26 
 27 This demonstration proves successful implementation of:
 28 
 29 1. Separating out that certain heap regions comprise young gen and
 30    other heap regions are considered to represent old gen. 
 31 2. Confirming that card marking does not crash.  (Does not prove that
 32    card marking works because we have no objects in old gen.) 
 33 3. Confirming that remembered set scanning does not crash.  (Does not
 34    prove that remembered set scanning works because we have no objects in
 35    old gen.) 
 36 4. Demonstrating a simplified form of young collection.
 37 
 38 Tasks
 39 
 40 1. Integrate various completed code patches into a common branch
 41 2. Test and debug existing code
 42 
 43 ### Milestone 2: Restrict Size of Young Gen
 44 
 45 This milestone constrains the young generation to a certain number of
 46 regions.  After that number is reached, allocation fails.  For now, the
 47 number can be fixed, say 25% of all regions, and we can simply crash
 48 thereafter. 
 49 
 50 Tasks
 51 
 52 1. Establish a mechanism to specify and enforce a limit on the number
 53    of heap regions that may comprise young-gen memory.
 54 2. Adjust the GC triggering mechanism so that the size of young gen
 55    does not have reason to exceed the young-gen size during young-gen
 56    GC, where the size of young-gen is affected by GC in the
 57    following ways:
 58    1. Certain free heap regions may be added to young gen in order to
 59       support allocation requests that are made by concurrent mutator
 60       threads while GC is being performed.
 61    2. At the end of GC, all heap regions that were part of the
 62       collection set are removed from young-gen memory and placed
 63       in the free pool.
 64 
 65 
 66 ### Milestone 3: Young Collection with Promotion of Tenured Objects
 67 
 68 Add to Milestone 2 the capability of promoting young gen objects.
 69 Don’t worry about odd objects or humongous objects at this time.
 70 Demonstrate: 
 71 
 72 1. That we can promote objects into old gen.
 73 2. That card-marking works.
 74 3. That remembered set scanning works (correctly, but perhaps not with
 75    full desired efficiency). 
 76 
 77 The following tasks must be implemented:
 78 
 79 1. The collection set is selected as a subset of all young collections
 80    heap regions using TBD heuristics. 
 81 2. During young collection, each time an object is evacuated to a
 82    young consolidation region, the object’s age is incremented. 
 83     1. For simplicity, don’t try to implement the complete region
 84        aging or promotion at this time. 
 85     2. Also for simplicity, don’t try to implement the TenureCycle
 86        optimization. 
 87 3. During young collection, each GC thread maintains both a young
 88    GCLAB and and old GCLAB.
 89     1. When evacuating an object whose incremented age is less than
 90        TenureAge, allocate memory for the object’s copy from within the
 91        young GCLAB. 
 92     2. When evacuating an object whose incremented age is >=
 93        TenureAge, allocate memory for the object’s copy from within the
 94        old GCLAB. 
 95     3. Don’t support Odd or Humongous objects for simplicity.
 96 4. During young collection, each mutator thread maintains both a TLAB
 97    and a GCLAB.  The GCLAB is associated with old gen. 
 98     1. When evacuating an object whose incremented age is less than
 99        TenureAge, allocate memory for the object’s copy from within the
100        TLAB. 
101     2. When evacuating an object whose incremented age is >=
102        TenureAge, allocate memory for the object’s copy from within the
103        GCLAB. 
104 5. Perform maintenance on the contents of each retired old GCLAB,
105    where this maintenance consists of: 
106     1. Registering each object’s starting location and length with the
107        remembered set abstraction so that remembered set scanning can
108        quickly find the first object within each DIRTY card region, 
109     2. Updating the card table to reflect all references from this
110        object into young gen,  
111     3. Evacuating all objects directly referenced from this object
112        which reside within a collection set and have not yet been
113        evacuated, and 
114     4. Healing all pointers from within newly evacuated old objects
115        that refer to objects residing within the collection set. 
116     5. The last two items listed above are already performed by
117        traditional Shenandoah but can be merged with the implementation of
118        the other maintenance activities in order to perform all this work
119        in a consolidated pass. 
120 
121 ### Milestone 4: Sequential Execution of Multiple Young Collections Followed by Multiple Global Collections
122 
123 __Demonstrated Concurrency Between Young-Gen and Old-Gen Activities__
124 
125    ✓ denotes that this combination of activities is allowed.
126 
127    ✗ denotes that this combination of activities is disallowed.
128 
129 |                |  Old-Gen Mark  | Old-Gen Evac  | Old-Gen Idle |
130 |:--------------:|:--------------:|:-------------:|:------------:|
131 | Young Gen Mark |      ✓         |     ✗         |     ✓        |
132 | Young Gen Evac |      ✗         |     ✓         |     ✓        |
133 | Young Gen Idle |      ✗         |     ✗         |     ✓        |
134 
135 Add to Milestone 3 the ability to switch to global collection after a
136 series of young Collections.
137 
138 1. The switch is triggered by a simple TBD test, such as when the
139    space available within old gen is less than the size of young gen. 
140 2. The global collection continues to run with the card-marking
141    barrier enabled though the card values will not be consulted further.
142 3. For this demonstration, the switch to global collections is
143    one-way.  Following this switch, we can no longer perform young
144    collections. 
145 4. For this demonstration, the global collection does not distinguish
146    between young regions and old regions. 
147     1. Evacuation of objects that resided in an old region is handled
148     the same as evacuation of objects that resided in a young heap
149     region. 
150 5. This demonstrates that:
151     1. Promotion of objects works.  The objects that have been promoted
152     into old gen maintain whatever protocols and invariants are
153     assumed by Shenandoah GC. 
154     2. That we can manage the transition from young GC to global GC.
155     3. That global collection, insofar as we characterize it, works.
156     4. That Young collections do not corrupt the heap.
157 
158 Tasks:
159 
160 1. Fix bugs in existing code enhancements.
161 2. Implement the transition from young collection to global collection
162 3. Implement global collection with write barrier for card-table marking
163 
164 ### Milestone 5: Interleaved Execution of Young and Global Collections
165 
166 __Demonstrated Concurrency Between Young-Gen and Old-Gen Activities__
167 
168    ✓ denotes that this combination of activities is allowed.
169 
170    ✗ denotes that this combination of activities is disallowed.
171 
172 |                |  Old-Gen Mark  | Old-Gen Evac  | Old-Gen Idle |
173 |:--------------:|:--------------:|:-------------:|:------------:|
174 | Young Gen Mark |      ✓         |     ✗         |     ✓        |
175 | Young Gen Evac |      ✗         |     ✓         |     ✓        |
176 | Young Gen Idle |      ✗         |     ✗         |     ✓        |
177 
178 Add to Milestone 4 the ability to switch back to Young Collection upon
179 completion of a global collection. 
180 
181 1. Assume that the global collection is successful in reclaiming
182    necessary memory.  No heuristics to resize oldgen or young gen at
183    this time. Specify sizes of each on command line. 
184 2. The switch to old collection is triggered by exhaustion of old gen.
185    At least one young collection is assumed to execute following
186    completion of each global collection. 
187 3. For this demonstration, the global collection does distinguish
188    between young collections and old regions. 
189     1. Objects in the old collection set are evacuated to old
190        consolidation regions. 
191     2. Objects in the young collection set are evacuated
192        to young collections consolidation regions. 
193     3. There is no tenuring of objects during a global collection (for
194        simplicity). 
195 
196 This demonstrates that:
197 
198 1. Promotion of objects works.  The objects that have been promoted
199    into old gen maintain whatever protocols and invariants are
200    assumed by Shenandoah GC. 
201 2. That we can manage the transition from young GC to global GC.
202 3. That we can manage the transition from global GC to young GC.
203 4. That the transition from global GC to young GC
204    establishes all invariants required for correct operation of young
205    GC. 
206 
207 Tasks:
208 
209 1. Distinguish between “global collection” for purposes of
210    maintaining support for non-generational GC and “global collection”
211    for purposes of supporting sequential interleaving of young GC
212    and global GC.
213 2. Implement the transition from global GC to young GC.
214 3. Initialize the remembered set for each consolidation heap region
215    of old gen to all CLEAN before allocating any GCLAB buffers within
216    the consolidation heap region.
217 4. At the start of global evacuation, select the collection set as
218    some combination of existing young regions and
219    old regions based on heuristics TBD.
220 5. During global collection, maintain old-gen GCLABs for all GC
221    threads and mutator threads.
222 6. During global collection, distinguish evacuation behavior
223    depending on whether an object to be evacuated resides within the
224    young collection set or the old collection set since
225    young objects are evacuated into young 
226    consolidation regions and old objects are evacuated into old
227    consolidation regions;
228 7. Add minimal logging reports to describe behavior of young-gen and global
229    GC.
230 8. During global collection, perform maintenance on the contents of
231    each retired old GCLAB, where this maintenance consists of:
232    1. Registering each object’s starting location and length with the
233       remembered set abstraction so that remembered set scanning can
234       quickly find the first object within each DIRTY card region,
235    2. Updating the card table to reflect all references from this
236       object into young gen,
237    3. Evacuating all objects directly referenced from this object
238       that reside within a collection set and that have not already been
239       evacuated, and
240    4. Healing all pointers from within new replica objects residing in old
241       gen that refer to objects residing within the collection set.
242    5. The last two items listed above are already performed by
243       traditional Shenandoah but can be merged with the implementation of
244       the other maintenance activities in order to perform all this work
245       in a consolidated pass.
246 
247 ### Milestone 6: GC of young collections with Concurrent Marking (but not Collecting) of Old Gen
248 
249 __Demonstrated Concurrency Between Young-Gen and Old-Gen Activities__
250 
251    ✓ denotes that this combination of activities is allowed.
252 
253    ✗ denotes that this combination of activities is disallowed.
254 
255 |                |  Old-Gen Mark  | Old-Gen Evac  | Old-Gen Idle |
256 |:--------------:|:--------------:|:-------------:|:------------:|
257 | Young Gen Mark |      ✓         |     NA         |     ✓        |
258 | Young Gen Evac |      ✓         |     NA         |     ✓        |
259 | Young Gen Idle |      ✓         |     NA         |     ✓        |
260 
261 This demonstration relies on GC log reports to show that marking of
262 old gen runs concurrently with marking of young gen.
263 Since the results of old-gen marking are not used to support old-gen
264 evacuation, this demonstration does not prove that old-gen marking
265 produces correct results.
266 
267 All pointers to old-gen memory that are discovered during scan of
268 young-gen memory are communicated to the old-gen concurrent mark
269 threads by inserting these pointer values into a SATB buffer as
270 keep-alive values.  Every SATB buffer is post-processed both by a
271 young-gen GC thread and by an old-gen GC thread.
272 
273 Pointers from old-gen memory to young-gen memory that are discovered
274 during the marking of old-gen are ignored.
275 
276 At the start of young-gen concurrent marking, the remembered set is
277 scanned to detect all inter-generational references.
278 
279 The SATB write barrier remains enabled as long as either young-gen or
280 old-gen concurrent marking is active.
281 
282 Tasks:
283 
284 1. Each young GC thread has a dedicated SATB buffer into which it places
285    discovered references to old-gen memory.
286 2. SATB write barrier is left enabled as long as either young
287    or old marking is active.
288 3. SATB buffer is enlarged to 4096 entries.
289 4. SATB buffer compression is enhanced to deal with the mix of old
290    and young pointers.
291 5. SATB buffer processing is performed by both a young collection
292    thread and an old collection thread.  Pointers to
293    old gen within the SATB buffer are marked and added to the old-gen
294    closure queues so that they can be subsequently scanned.
295 6. Certain GC threads (or work items) are dedicated to old GC
296    and others are dedicated to young GC.
297 7. Old-gen GC threads process the closure of previously marked
298    old-gen objects, scanning all references contained therein.
299 8. Old-gen GC threads add to the old-gen closures all old-gen objects
300    referenced by SATB buffers if those objects were not previously marked.
301 
302 ### Milestone 7: Concurrent Young and Old Collections
303 
304 
305 __Demonstrated Concurrency Between Young-Gen and Old-Gen Activities__
306 
307    ✓ denotes that this combination of activities is allowed.
308 
309    ✗ denotes that this combination of activities is disallowed.
310 
311 |                |  Old-Gen Mark  | Old-Gen Evac  | Old-Gen Idle |
312 |:--------------:|:--------------:|:-------------:|:------------:|
313 | Young Gen Mark |      ✓         |     ✗         |     ✓        |
314 | Young Gen Evac |      ✓         |     ✓         |     ✓        |
315 | Young Gen Idle |      ✓         |     ✗         |     ✓        |
316 
317 In this demonstration, old-gen concurrent marking runs concurrently with all
318 phases of young-gen GC.  Old-gen evacuation only runs while young-gen
319 evacuation is running.  In the case that old-gen needs to evacuate so
320 much memory that doing so in a single uninterruptible batch would
321 significantly extend the duration of the young-gen evacuation phase,
322 the total old-gen evacuation workload is divided into multiple smaller
323 batches of evacuation work, each batch being processed concurrently
324 with a different young-gen evacuation cycle. 
325 
326 The demonstration:
327 
328 1. Uses logging reports to describe the results of young
329    collection and old collection.
330 2. Shows for some “simple” workload (Heapothysis or
331    Extremem?) that generational GC provides performance benefits over
332    non-generational GC.
333 
334 Tasks
335 
336 1. Add minimal logging reports to describe behavior of old-gen
337    GC.
338 2. Decide which old regions comprise the old collection set.
339 3. Divide the old collection set into multiple collection subsets.
340 4. For each of the collection subsets
341    1. Communicate the subset to young GC tasks to process these
342       evacuations when it begins its next evacuation cycle.
343    2. Wait for young GC tasks to signal completion of the evacuation
344       cycle.
345 
346 ## Proposed Future Milestones Not Yet Fully Planned
347 
348 ### Milestone 8: Performance Improvements
349 
350 1. Remembered Set scanning sets cards to CLEAN if they are no longer
351    DIRTY.
352 2. Remembered Set scanning maintains and utilizes start-offset data
353    structure to quickly find the first object to be scanned within each
354    DIRTY card.
355 3. Remembered set scanning refrains from scanning the portions of
356    large objects and arrays that overlap card regions that are not
357    DIRTY.
358 
359 ### Milestone 9: Fix Known Bugs
360 
361 We are aware of bugs in our existing card-marking implementation.
362 
363 ### Milestone 10: Multiple Young-Gen Evacuations Process Old Collection Set
364 
365 ### Milestone 11: Odd Objects (larger than 50% of TLAB/GCLAB size)
366 
367 By default, the promotion of such objects is handled by a
368 slower-than-normal path.  Instead of allocating old gen from the
369 GCLAB, the mutator thread obtains memory for the copy by directly
370 accessing free lists. See existing code that does that already. 
371 
372 ### Micro Milestone 12: Collect and Report New Metrics
373 
374 ### Micro Milestone 13: SATB-Based Remembered Set
375 
376 ### Micro Milestone 14: Heuristic Pacing of Young Collection Frequency
377 
378 ### Micro Milestone 15: Heuristic Pacing of Old Collection Frequency
379 
380 ### Micro Milestone 16: Heuristic Sizing of Young and Old Sizes
381 
382 ### Micro Milestone 17: Heuristic Adjustments of Tenuring Strategies
383 
384 ### Micro Milestone 18: Overlap Evacuation of Cycle N with Marking of Cycle N+1
385 
386 ### Micro Milestone 19: Humongous Objects
387 
388 ### Micro Milestone 20: Reference Processing
389 
390 ### Micro Milestone 21: String Dedup
391 
392 ### Micro Milestone 22: Degenerated GC
393 
394 ### Micro Milesones TBD: Various Performance Improvements
395