Optimizing Errors Away

Volker Simonis, SAP / volker.simonis@gmail.com

Outline

  • Escape Analysis
  • Intrinsics
  • System.arraycopy()
  • Specification..
  • ..and too much optimization
  • Fixing the bug

Escape Analysis

  • -XX:+DoEscapeAnalysis (enabled by default)
  • Detects NoEscape, ArgEscape or GlobalEscape state for locals
  • -XX:+EliminateAllocations,
    -XX:+OptimizePtrCompare, -XX:+EliminateLocks, ...
  • Heavily depends on inlining (especially EliminateAllocations)!
  • Idea: J. Shoi, M. Gupta, M. Seffano, V. Sreedhar, S. Midkiff,
    "Escape Analysis for Java"
  • Implementation: opt/escape.{cpp,hpp} and ci/bcEscapeAnaylzer.{cpp,hpp}

Scalar Replacement

              

              
            
              
$ java -XX:CompileCommand=\
  "compileonly org.simonis.ScalarReplacement::scalarReplace" \
  org.simonis.ScalarReplacement
...
movl    RBP, RSI                          // RBP = x
movl    RDX, #3 # int                     // arg1 = 3
movq    RSI, precise klass [I             // arg2 = '[I'
call,static  wrapper for: _new_array_Java // RAX = a[]

movl    [RAX + #32 (8-bit)], RBP          // a[0] = x
movl    [RAX + #24 (8-bit)], RBP          // a[1] = x
movl    [RAX + #28 (8-bit)], RBP          // a[2] = x

movq    RSI, RAX                          // arg1 = a[]
movq    RDX, RAX                          // arg2 = a[]
call,static  org.simonis.ScalarReplacement::dot
...
              
            
              
$ java -XX:-DoEscapeAnalysis \
  -XX:CompileCommand="compileonly org.simonis.ScalarReplacement::scalarReplace" \
  -XX:CompileCommand="compileonly org.simonis.ScalarReplacement::dot" \
  org.simonis.ScalarReplacement
...
movl    RBP, RSI                          // RBP = x
movl    RDX, #3 # int                     // arg1 = 3
movq    RSI, precise klass [I             // arg2 = '[I'
call,static  wrapper for: _new_array_Java

movl    [RAX + #32 (8-bit)], RBP          // a[0] = x
movl    [RAX + #28 (8-bit)], RBP          // a[1] = x
movl    [RAX + #24 (8-bit)], RBP          // a[2] = x

imull   RBP, RBP                          // RBP  = x * x
movl    RAX, RBP                          // RAX  = x * x
addl    RAX, RBP                          // RAX += x * x
addl    RAX, RBP                          // RAX += x * x
...
              
            
              
$ java -XX:+DoEscapeAnalysis -XX:+EliminateAllocations \
  -XX:CompileCommand="compileonly org.simonis.ScalarReplacement::scalarReplace" \
  -XX:CompileCommand="compileonly org.simonis.ScalarReplacement::dot" \
  org.simonis.ScalarReplacement
...
imull   RSI, RSI        // RSI  = x * x
movl    RAX, RSI        // RAX  = x * x
addl    RAX, RSI        // RAX += x * x
addl    RAX, RSI        // RAX += x * x
...
              
            
              
$ java -XX:+DoEscapeAnalysis -XX:+EliminateAllocations \
  -XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations ... \
  org.simonis.ScalarReplacement
...
EA: 2 iterations to build connection graph with
    115 nodes and worklist size 10

Connection graph for org.simonis.ScalarReplacement::scalarReplace
JavaObject NoEscape(NoEscape) 41 AllocateArray
  = org.simonis.ScalarReplacement::scalarReplace@bci:1

Scalar 41 AllocateArray
  = org.simonis.ScalarReplacement::scalarReplace@bci:1
++++ Eliminated: 41 AllocateArray
...
imull   RSI, RSI        // RSI  = x * x
movl    RAX, RSI        // RAX  = x * x
addl    RAX, RSI        // RAX += x * x
addl    RAX, RSI        // RAX += x * x
...
              
            

Intrinsics

“ ..an intrinsic function (a.k.a. builtin function) is a function available for use in a given programming language whose implementation is handled specially by the compiler. Typically, it substitutes a sequence of automatically generated instructions for the original function call.. ”

“ ..the compiler has an intimate knowledge of the intrinsic function and can therefore better integrate it and optimize it for the situation.. ”

HotSpot Intrinsics

  • Only for system classes!
  • Must be annotated with jdk.internal.HotSpotIntrinsicCandidate (since Java 9)
    • Checked by -XX:+CheckIntrinsics
  • See: src/share/vm/classfile/vmSymbols.hpp for a full list
  • Library intrinsics (~260)
    • Library methods replaced by assembly, IR or both
  • Bytecode intrinsics (~40)
    • Late/always inlining candidates (e.g. StringBuffer::*, boxing/unboxing methods)
    • See: Compile::should_delay_string_inlining() in doCall.cpp
    • See: InlineTree::should_inline() in bytecodeInfo.cpp

System.arraycopy()

              

              
            
              
klass()->copy_array(s, src_p, d, dst_p, length);
}
]]>
              
            
              

              
            
              
$ java -XX:DisableIntrinsic=_arraycopy \
  org.simonis.ScalarReplacement arrayCopy1

...                            // arg0 already contains src
00c   	movq    RCX, RDX       // arg2 = dst
00f   	movl    R9, #8         // arg4 = 8
015   	xorl    RDX, RDX       // arg1 = 0
017   	xorl    R8, R8         // arg3 = 0
01b   	call,static  java.lang.System::arraycopy
...




.
              
            
              

              
            
              
$ java \
  org.simonis.ScalarReplacement arrayCopy1
...
movl   R10, [RSI + #16 (8-bit)]  // R10 = src.length
movl   R8, [RDX + #16 (8-bit)]   // R8  = dst.length
cmpl   R10, #8                   // if src.length < 8 throw
jb,us  _throw_exception
cmpl   R8, #8                    // if dst.length < 8 throw
jb,us  _throw_exception
movl   R10, [RSI + #24 (8-bit)]  // R10 = src[0]
movl   [RDX + #24 (8-bit)], R10  // dst[0] = R10
...
movl   R11, [RSI + #52 (8-bit)]  // R11 = src[7]
movl   [RDX + #52 (8-bit)], R11  // dst[7] = R11
...
              
            

System.arraycopy() - Specification

System.arraycopy() API Definition

see http://docs.oracle.com/javase/8/docs/api/java/lang/System.html#arraycopy

System.arraycopy() - Test (C2)

              

              
            

Violating the Specification (C2)

              
$ java \
  org.simonis.ScalarReplacement arrayCopy4
Error in iteration 5376
Expected IndexOutOfBoundsException

$ java \
  -XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations \
  org.simonis.ScalarReplacement arrayCopy4
JavaObject NoEscape 43 AllocateArray arrayCopy4@bci:4
Eliminated: 43 AllocateArray
Error in iteration 5376
Expected IndexOutOfBoundsException

$ java -XX:DisableIntrinsic=_arraycopy \
  org.simonis.ScalarReplacement arrayCopy4
$ java -XX:-EliminateAllocations \
  org.simonis.ScalarReplacement arrayCopy4
$ 
              
            
              

              
            
              
$ java -XX:+PrintOptoAssembly \
  org.simonis.ScalarReplacement arrayCopy4
...
movq    R11, RSI          // R11 = src (arg0)
movl    R10, [RSI + #12]  // R10 = src.length
        NullCheck RSI
movl    R8, RDX           // R8  = length (arg1)
addl    R8, #3            // R8 += 3 (src_pos)
cmpl    R10, R8           // if src.length < src_pos+length
jb,us   _throw_exception  // throw exception
movl    R10, RDX          // R10 = length (arg1)
addl    R10, #5           // R8 += 3 (dst_pos)
cmpl    R10, #8           // if dst_pos+length > 8
jnbe,us _throw_exception  // throw exception
...
              
            
Compile::Compile(compiler=0x7ffff01a0a60, target=0x7fffc4299130)
  ParseGenerator::generate()
  Parse::Parse(caller=0x7fffc440de00, parse_method=0x7fffc4299130)
  Parse::do_all_blocks()
  Parse::do_one_block()
  Parse::do_one_bytecode()
  Parse::do_call()
  LibraryIntrinsic::generate() <-- Intrinsification
  LibraryCallKit::try_to_inline(id=vmIntrinsics::_arraycopy)
  LibraryCallKit::inline_arraycopy()
Compile::optimze()
  ConnectionGraph::do_analysis() <-- Escape Analysiy
PhaseMacroExpand::eliminate_macro_nodes()
  PhaseMacroExpand::eliminate_allocate_node(AllocateNode*, ...)
  PhaseMacroExpand::process_users_of_allocation()
PhaseMacroExpand::expand_macro_nodes()
  PhaseMacroExpand::expand_arraycopy_node(ArrayCopyNode*)
  PhaseMacroExpand::generate_arraycopy(ArrayCopyNode*, ...)

src/share/vm/opto/library_call.cpp

              

              
            

src/share/vm/opto/macro.cpp

              
is_ArrayCopy()) {
    // Disconnect ArrayCopy node
    ArrayCopyNode* ac = use->as_ArrayCopy();
    ...
    // Disconnect src: it can help find new opportunities
    Node* src = ac->in(ArrayCopyNode::Src);
]]>
              
            

src/share/vm/opto/macroArrayCopy.cpp

              

              
            

System.arraycopy() - Specification

System.arraycopy() API Definition

see http://docs.oracle.com/javase/8/docs/api/java/lang/System.html#arraycopy

System.arraycopy() - Test (C1)

              

              
            

Violating the Specification (C1)

              
$ java \
  org.simonis.ScalarReplacement arrayCopy5                 
Error in iteration 256
Expected ArrayStoreException

$ java -Xint \
  org.simonis.ScalarReplacement arrayCopy5                 

$ java -XX:-TieredCompilation \
  org.simonis.ScalarReplacement arrayCopy5

$ 
              
            

Violating the Specification (C1)

src/cpu/x86/vm/c1_LIRAssembler_x86.cpp

              
entry());
    __ jcc(Assembler::zero, *stub->continuation()); // <---
  }
  // check if array element types are compatible
  if (flags & LIR_OpArrayCopy::type_check) {
  ...
  address entry = StubRoutines::select_arraycopy_function()
  __ call_VM_leaf(entry, 0);
  __ bind(*stub->continuation());                   // <---
]]>
              
            

https://bugs.openjdk.java.net/browse/JDK-8159611

JDK-8159611: C2: ArrayCopy elimination skips required parameter checks

https://github.com/simonis/OptimizingErrorsAway

https://rawgit.com/simonis/OptimizingErrorsAway/master/optimizinErrorsAway.xhtml