Benchmarks

NetAsmDemo provides two micro-benchmarks to explicit performance gains using NetAsm:
  • Simple Add Benchmark is a benchmark that demonstrates the overhead of calling methods using different managed, interop techniques against NetAsm.
  • Matrix Multiplication Benchmark SSE2 is a benchmark that demonstrates the use of SSE2 for optimized methods and compare different calling techniques (interop, mixed, managed and NetAsm).
The main results are:
  • For calling overhead, NetAsm can perform as fast as pure managed calls and is two times faster than fast interop (no security checks).
  • In the case of using optimized instructions not available in .NET (like SSE2 in the matrix benchmark), NetAsm was able to be 50-60% faster than managed code.

Although, be aware that micro-benchmarks are always subject to issues and should not be considered as a proof for any other benchmarks. All the benchmarks here use a warm-up loop to avoid the cost of any compilation effects at startup and run the test several times.

Simple Add benchmark

This benchmark consists in measuring the processing time for adding 2 integers. This benchmark measure the performance between several implementations:
• Managed and Managed Inline : we test both managed with NoInline (with the attribute MethodImpl(MethodImplOptions.NoInlining)) and the default managed code inline.
• Interop and Interop NoSecurity : we test default Interop technique and Interop with no security attribute flag (SuppressUnmanagedCodeSecurity)
• Mixed Cpp/CLI : we test the use of an external Mixed Cpp/CLI using native code.
• NetAsm : we use NetAsm with a static native code injection.

For example, the implementation of the Managed Inline is like:
        public int ManagedAddInlined(int x, int y)
        {
            return x + y;
        }

The NetAsm code of the Add method is:
[CodeInjection(new byte[] { 0x8b, 0x44, 0x24, 0x04, 0x03, 0xc2, 0xc2, 0x04, 0x00 }), MethodImpl(MethodImplOptions.NoInlining)]
public int NetAsmAdd(int x, int y)
{
    // This method is compiled using the native code from the CodeInjection Attribute
    // The IL code of this method is never compiled by the JIT 
    // ADD +1 to check that this method is not used.
    return x + y + 1;
}

It was generated from the C method and the assembler code was copied from the *.cod output assembler listing files generated by Microsoft visual C++ :
extern "C" int __fastcall NetAsmAddInC(void* pThis, int x, int y) {
	return x + y;
}
// The generated native code of this method is :
  00000	8b 44 24 04	 mov	 eax, DWORD PTR _y$[esp-4]
  00004	03 c2		 add	 eax, edx
  00006	c2 04 00	 ret	 4
Results are indexed on a 100 time based. The default base is managed inline. Lower is better.
SimpleAddBenchmark.png

As we can see, NetAsm performs almost as fast as Managed Inline and is twice faster than the Interop (with no security checks).
This result was expected, as NetAsm native code is considered by the CLR VM as pure managed code compiled by the JIT. Therefore, the performance should be the same than managed code.

See TestSimpleAddBenchmark.cs in NetAsmDemo for the code of this benchmark.

Matrix Multiplication benchmark using SSE2

This benchmark is based on the work of scapecode in the article "Playing with the .NET JIT Part3" : it measures the performance between different matrix multiplications implementations using standard CLR, standard C and SSE2 instructions.
As for SimpleAddBenchmark, this benchmark compares different implementations:
  • Managed Std and Managed Unsafe: we test both standard managed code and unsafe managed code (using pointers on matrix array instead of CLR arrays).
  • Interop Std and Interop SSE2, and Interop SSE2 NoSecurity : we test default Interop technique with a C implementation (without using any SSE2 instructions), and two interop using SSE2 (with one using the no security attribute flag SuppressUnmanagedCodeSecurity )
  • Mixed SSE2 Cpp/CLI : we test the use of an external Mixed Cpp/CLI using native code.
  • NetAsm Std and NetAsm SSE2: we use NetAsm with a standard C matrix multiplication and a SSE2 implementations.

Results are indexed on a 100 time based. The default base is managed inline. Lower is better.
MatrixMultiplicationBenchmark.png

There are several remarks concerning the results:
  • First, NetAsm SSE2 outperforms any other implementations, ranging from 10% to 300% in speed gain (70% faster than the default managed code).
  • Managed Unsafe and NetAsm unsafe are often equivalent: it means that the default JIT compiler is performing very well in optimizing the C# code (as fast as C code). Consequently, you should always test if it is relevant to replace managed code with c code, as the JIT compiler can generate a fast native code.
  • NetAsm SSE2 is only 10% faster than Interop SSE2: this result is different from SimpleAddBenchmark. The reason is the cost of calling the interop code is negligible compare to the time used for computing the result.

See TestMatrixMulBenchmark.cs, NetAsmDemoCLib.dll and NetAsmDemoMixedLib.dll in NetAsmDemo for the code of this benchmark.

Last edited Jul 25, 2008 at 11:02 PM by alexandre_mutel, version 5

Comments

No comments yet.