< prev index next >

src/jdk.scripting.nashorn/share/classes/jdk/nashorn/internal/runtime/linker/NameCodec.java

Print this page




  22  * or visit www.oracle.com if you need additional information or have any
  23  * questions.
  24  */
  25 
  26 package jdk.nashorn.internal.runtime.linker;
  27 
  28 /**
  29  * <p>
  30  * Implements the name mangling and demangling as specified by John Rose's
  31  * <a href="https://blogs.oracle.com/jrose/entry/symbolic_freedom_in_the_vm"
  32  * target="_blank">"Symbolic Freedom in the VM"</a> article. Normally, you would
  33  * mangle the names in the call sites as you're generating bytecode, and then
  34  * demangle them when you receive them in bootstrap methods.
  35  * </p>
  36  * <p>
  37  * This code is derived from sun.invoke.util.BytecodeName. Apart from subsetting that
  38  * class, we don't want to create dependency between non-exported package from java.base
  39  * to nashorn module.
  40  * </p>
  41  *
  42  * <h2>Comment from BytecodeName class reproduced here:</h2>
  43  *
  44  * Includes universal mangling rules for the JVM.
  45  *
  46  * <h2>Avoiding Dangerous Characters </h2>
  47  *
  48  * <p>
  49  * The JVM defines a very small set of characters which are illegal
  50  * in name spellings.  We will slightly extend and regularize this set
  51  * into a group of <cite>dangerous characters</cite>.
  52  * These characters will then be replaced, in mangled names, by escape sequences.
  53  * In addition, accidental escape sequences must be further escaped.
  54  * Finally, a special prefix will be applied if and only if
  55  * the mangling would otherwise fail to begin with the escape character.
  56  * This happens to cover the corner case of the null string,
  57  * and also clearly marks symbols which need demangling.
  58  * </p>
  59  * <p>
  60  * Dangerous characters are the union of all characters forbidden
  61  * or otherwise restricted by the JVM specification,
  62  * plus their mates, if they are brackets
  63  * (<code><b>[</b></code> and <code><b>]</b></code>,
  64  * <code><b>&lt;</b></code> and <code><b>&gt;</b></code>),
  65  * plus, arbitrarily, the colon character <code><b>:</b></code>.
  66  * There is no distinction between type, method, and field names.
  67  * This makes it easier to convert between mangled names of different
  68  * types, since they do not need to be decoded (demangled).
  69  * </p>
  70  * <p>
  71  * The escape character is backslash <code><b>\</b></code>
  72  * (also known as reverse solidus).
  73  * This character is, until now, unheard of in bytecode names,
  74  * but traditional in the proposed role.
  75  *
  76  * </p>
  77  * <h2> Replacement Characters </h2>
  78  *
  79  *
  80  * <p>
  81  * Every escape sequence is two characters
  82  * (in fact, two UTF8 bytes) beginning with
  83  * the escape character and followed by a
  84  * <cite>replacement character</cite>.
  85  * (Since the replacement character is never a backslash,
  86  * iterated manglings do not double in size.)
  87  * </p>
  88  * <p>
  89  * Each dangerous character has some rough visual similarity
  90  * to its corresponding replacement character.
  91  * This makes mangled symbols easier to recognize by sight.
  92  * </p>
  93  * <p>
  94  * The dangerous characters are
  95  * <code><b>/</b></code> (forward slash, used to delimit package components),
  96  * <code><b>.</b></code> (dot, also a package delimiter),
  97  * <code><b>;</b></code> (semicolon, used in signatures),


 142  *   <li>In each accidental escape, replace the backslash with an escape sequence
 143  * (<code><b>\-</b></code>).</li>
 144  *   <li>Replace each dangerous character with an escape sequence
 145  * (<code><b>\|</b></code> for <code><b>/</b></code>, etc.).</li>
 146  *   <li>If the first two steps introduced any change, <em>and</em>
 147  * if the string does not already begin with a backslash, prepend a null prefix (<code><b>\=</b></code>).</li>
 148  * </ol>
 149  *
 150  * To demangle a mangled string that begins with an escape,
 151  * remove any null prefix, and then replace (in parallel)
 152  * each escape sequence by its original character.
 153  * <p>Spelling strings which contain accidental
 154  * escapes <em>must</em> have them replaced, even if those
 155  * strings do not contain dangerous characters.
 156  * This restriction means that mangling a string always
 157  * requires a scan of the string for escapes.
 158  * But then, a scan would be required anyway,
 159  * to check for dangerous characters.
 160  *
 161  * </p>
 162  * <h2> Nice Properties </h2>
 163  *
 164  * <p>
 165  * If a bytecode name does not contain any escape sequence,
 166  * demangling is a no-op:  The string demangles to itself.
 167  * Such a string is called <cite>self-mangling</cite>.
 168  * Almost all strings are self-mangling.
 169  * In practice, to demangle almost any name &ldquo;found in nature&rdquo;,
 170  * simply verify that it does not begin with a backslash.
 171  * </p>
 172  * <p>
 173  * Mangling is a one-to-one function, while demangling
 174  * is a many-to-one function.
 175  * A mangled string is defined as <cite>validly mangled</cite> if
 176  * it is in fact the unique mangling of its spelling string.
 177  * Three examples of invalidly mangled strings are <code><b>\=foo</b></code>,
 178  * <code><b>\-bar</b></code>, and <code><b>baz\!</b></code>, which demangle to <code><b>foo</b></code>, <code><b>\bar</b></code>, and
 179  * <code><b>baz\!</b></code>, but then remangle to <code><b>foo</b></code>, <code><b>\bar</b></code>, and <code><b>\=baz\-!</b></code>.
 180  * If a language back-end or runtime is using mangled names,
 181  * it should never present an invalidly mangled bytecode
 182  * name to the JVM.  If the runtime encounters one,


 205  * if it would participate in an accidental escape when followed
 206  * by the first character of the second string.</li>
 207  * </ul>
 208  * <p>If languages that include non-Java symbol spellings use this
 209  * mangling convention, they will enjoy the following advantages:
 210  * </p>
 211  * <ul>
 212  *   <li>They can interoperate via symbols they share in common.</li>
 213  *   <li>Low-level tools, such as backtrace printers, will have readable displays.</li>
 214  *   <li>Future JVM and language extensions can safely use the dangerous characters
 215  * for structuring symbols, but will never interfere with valid spellings.</li>
 216  *   <li>Runtimes and compilers can use standard libraries for mangling and demangling.</li>
 217  *   <li>Occasional transliterations and name composition will be simple and regular,
 218  * for classes, methods, and fields.</li>
 219  *   <li>Bytecode names will continue to be compact.
 220  * When mangled, spellings will at most double in length, either in
 221  * UTF8 or UTF16 format, and most will not change at all.</li>
 222  * </ul>
 223  *
 224  *
 225  * <h2> Suggestions for Human Readable Presentations </h2>
 226  *
 227  *
 228  * <p>
 229  * For human readable displays of symbols,
 230  * it will be better to present a string-like quoted
 231  * representation of the spelling, because JVM users
 232  * are generally familiar with such tokens.
 233  * We suggest using single or double quotes before and after
 234  * mangled symbols which are not valid Java identifiers,
 235  * with quotes, backslashes, and non-printing characters
 236  * escaped as if for literals in the Java language.
 237  * </p>
 238  * <p>
 239  * For example, an HTML-like spelling
 240  * <code><b>&lt;pre&gt;</b></code> mangles to
 241  * <code><b>\^pre\_</b></code> and could
 242  * display more cleanly as
 243  * <code><b>'&lt;pre&gt;'</b></code>,
 244  * with the quotes included.
 245  * Such string-like conventions are <em>not</em> suitable




  22  * or visit www.oracle.com if you need additional information or have any
  23  * questions.
  24  */
  25 
  26 package jdk.nashorn.internal.runtime.linker;
  27 
  28 /**
  29  * <p>
  30  * Implements the name mangling and demangling as specified by John Rose's
  31  * <a href="https://blogs.oracle.com/jrose/entry/symbolic_freedom_in_the_vm"
  32  * target="_blank">"Symbolic Freedom in the VM"</a> article. Normally, you would
  33  * mangle the names in the call sites as you're generating bytecode, and then
  34  * demangle them when you receive them in bootstrap methods.
  35  * </p>
  36  * <p>
  37  * This code is derived from sun.invoke.util.BytecodeName. Apart from subsetting that
  38  * class, we don't want to create dependency between non-exported package from java.base
  39  * to nashorn module.
  40  * </p>
  41  *
  42  * <h3>Comment from BytecodeName class reproduced here:</h3>
  43  *
  44  * Includes universal mangling rules for the JVM.
  45  *
  46  * <h3>Avoiding Dangerous Characters </h3>
  47  *
  48  * <p>
  49  * The JVM defines a very small set of characters which are illegal
  50  * in name spellings.  We will slightly extend and regularize this set
  51  * into a group of <cite>dangerous characters</cite>.
  52  * These characters will then be replaced, in mangled names, by escape sequences.
  53  * In addition, accidental escape sequences must be further escaped.
  54  * Finally, a special prefix will be applied if and only if
  55  * the mangling would otherwise fail to begin with the escape character.
  56  * This happens to cover the corner case of the null string,
  57  * and also clearly marks symbols which need demangling.
  58  * </p>
  59  * <p>
  60  * Dangerous characters are the union of all characters forbidden
  61  * or otherwise restricted by the JVM specification,
  62  * plus their mates, if they are brackets
  63  * (<code><b>[</b></code> and <code><b>]</b></code>,
  64  * <code><b>&lt;</b></code> and <code><b>&gt;</b></code>),
  65  * plus, arbitrarily, the colon character <code><b>:</b></code>.
  66  * There is no distinction between type, method, and field names.
  67  * This makes it easier to convert between mangled names of different
  68  * types, since they do not need to be decoded (demangled).
  69  * </p>
  70  * <p>
  71  * The escape character is backslash <code><b>\</b></code>
  72  * (also known as reverse solidus).
  73  * This character is, until now, unheard of in bytecode names,
  74  * but traditional in the proposed role.
  75  *
  76  * </p>
  77  * <h3> Replacement Characters </h3>
  78  *
  79  *
  80  * <p>
  81  * Every escape sequence is two characters
  82  * (in fact, two UTF8 bytes) beginning with
  83  * the escape character and followed by a
  84  * <cite>replacement character</cite>.
  85  * (Since the replacement character is never a backslash,
  86  * iterated manglings do not double in size.)
  87  * </p>
  88  * <p>
  89  * Each dangerous character has some rough visual similarity
  90  * to its corresponding replacement character.
  91  * This makes mangled symbols easier to recognize by sight.
  92  * </p>
  93  * <p>
  94  * The dangerous characters are
  95  * <code><b>/</b></code> (forward slash, used to delimit package components),
  96  * <code><b>.</b></code> (dot, also a package delimiter),
  97  * <code><b>;</b></code> (semicolon, used in signatures),


 142  *   <li>In each accidental escape, replace the backslash with an escape sequence
 143  * (<code><b>\-</b></code>).</li>
 144  *   <li>Replace each dangerous character with an escape sequence
 145  * (<code><b>\|</b></code> for <code><b>/</b></code>, etc.).</li>
 146  *   <li>If the first two steps introduced any change, <em>and</em>
 147  * if the string does not already begin with a backslash, prepend a null prefix (<code><b>\=</b></code>).</li>
 148  * </ol>
 149  *
 150  * To demangle a mangled string that begins with an escape,
 151  * remove any null prefix, and then replace (in parallel)
 152  * each escape sequence by its original character.
 153  * <p>Spelling strings which contain accidental
 154  * escapes <em>must</em> have them replaced, even if those
 155  * strings do not contain dangerous characters.
 156  * This restriction means that mangling a string always
 157  * requires a scan of the string for escapes.
 158  * But then, a scan would be required anyway,
 159  * to check for dangerous characters.
 160  *
 161  * </p>
 162  * <h3> Nice Properties </h3>
 163  *
 164  * <p>
 165  * If a bytecode name does not contain any escape sequence,
 166  * demangling is a no-op:  The string demangles to itself.
 167  * Such a string is called <cite>self-mangling</cite>.
 168  * Almost all strings are self-mangling.
 169  * In practice, to demangle almost any name &ldquo;found in nature&rdquo;,
 170  * simply verify that it does not begin with a backslash.
 171  * </p>
 172  * <p>
 173  * Mangling is a one-to-one function, while demangling
 174  * is a many-to-one function.
 175  * A mangled string is defined as <cite>validly mangled</cite> if
 176  * it is in fact the unique mangling of its spelling string.
 177  * Three examples of invalidly mangled strings are <code><b>\=foo</b></code>,
 178  * <code><b>\-bar</b></code>, and <code><b>baz\!</b></code>, which demangle to <code><b>foo</b></code>, <code><b>\bar</b></code>, and
 179  * <code><b>baz\!</b></code>, but then remangle to <code><b>foo</b></code>, <code><b>\bar</b></code>, and <code><b>\=baz\-!</b></code>.
 180  * If a language back-end or runtime is using mangled names,
 181  * it should never present an invalidly mangled bytecode
 182  * name to the JVM.  If the runtime encounters one,


 205  * if it would participate in an accidental escape when followed
 206  * by the first character of the second string.</li>
 207  * </ul>
 208  * <p>If languages that include non-Java symbol spellings use this
 209  * mangling convention, they will enjoy the following advantages:
 210  * </p>
 211  * <ul>
 212  *   <li>They can interoperate via symbols they share in common.</li>
 213  *   <li>Low-level tools, such as backtrace printers, will have readable displays.</li>
 214  *   <li>Future JVM and language extensions can safely use the dangerous characters
 215  * for structuring symbols, but will never interfere with valid spellings.</li>
 216  *   <li>Runtimes and compilers can use standard libraries for mangling and demangling.</li>
 217  *   <li>Occasional transliterations and name composition will be simple and regular,
 218  * for classes, methods, and fields.</li>
 219  *   <li>Bytecode names will continue to be compact.
 220  * When mangled, spellings will at most double in length, either in
 221  * UTF8 or UTF16 format, and most will not change at all.</li>
 222  * </ul>
 223  *
 224  *
 225  * <h3> Suggestions for Human Readable Presentations </h3>
 226  *
 227  *
 228  * <p>
 229  * For human readable displays of symbols,
 230  * it will be better to present a string-like quoted
 231  * representation of the spelling, because JVM users
 232  * are generally familiar with such tokens.
 233  * We suggest using single or double quotes before and after
 234  * mangled symbols which are not valid Java identifiers,
 235  * with quotes, backslashes, and non-printing characters
 236  * escaped as if for literals in the Java language.
 237  * </p>
 238  * <p>
 239  * For example, an HTML-like spelling
 240  * <code><b>&lt;pre&gt;</b></code> mangles to
 241  * <code><b>\^pre\_</b></code> and could
 242  * display more cleanly as
 243  * <code><b>'&lt;pre&gt;'</b></code>,
 244  * with the quotes included.
 245  * Such string-like conventions are <em>not</em> suitable


< prev index next >