Blog‎ > ‎

GCD in Compiler Generated Cortex-M4 Assembly

posted Apr 23, 2018, 7:44 AM by MUHAMMAD MUN`IM AHMAD ZABIDI   [ updated Apr 23, 2018, 7:48 AM ]
This post is a continuation of GCD in ARM Assembly.

Referring tot he same GCD code:

int gcd(int x, int y)
{
    while( x != y )
    {
        if( x > y )
            x -= y;
        else
            y -= x;
    }
    return x;
}

Let's recompile it to Cortex-M4 Thumb 2 assembly, unoptimized:

$ arm-none-eabi-gcc -mcpu=cortex-m4 --specs=nosys.specs -S gcd.c

The resulting code:

    .cpu cortex-4
gcd:
    push    {r7}
    sub    sp, sp, #12
    add    r7, sp, #0
    str    r0, [r7, #4]
    str    r1, [r7]
.L4:
    ldr    r2, [r7, #4]
    ldr    r3, [r7]
    cmp    r2, r3
    ble    .L2
    ldr    r2, [r7, #4]
    ldr    r3, [r7]
    subs    r3, r2, r3
    str    r3, [r7, #4]
    b     .L3
.L2:
    ldr    r2, [r7]
    ldr    r3, [r7, #4]
    subs   r3, r2, r3
    str    r3, [r7]
.L3:
    ldr    r2, [r7, #4]
    ldr    r3, [r7]
    cmp    r2, r3
    bne    .L4
    ldr    r3, [r7, #4]
    mov    r0, r3
    adds   r7, r7, #12
    mov    sp, r7
    pop    {r7}
    bx    lr

Man, that's a lot of junk! It has to be that long because unoptimized code use the stack frame to store variables.

Let's recompile it to Cortex-M4 Thumb 2 assembly using the -O1 switch (lowest optimization level):

$ arm-none-eabi-gcc -mcpu=cortex-m4 --specs=nosys.specs -O1 -S gcd.c

The resulting code:

    .cpu cortex-44
gcd:
    cmp    r0, r1
    beq    .L6
.L7:
    cmp    r0, r1
    ite    gt
    subgt  r0, r0, r1
    suble  r1, r1, r0
    cmp    r0, r1
    bne    .L7
    bx     lr
.L6:
    mov    r0, r1
    bx     lr

Much shorter code now. We see the ITE instruction being used. ITE stands for if-then-else. If the expression is true, execute the next line, else the line after than. The ITE instruction is unnecessary here because the SUBGT and SUBLE will be conditionally executed anyway.

Recompile it using the -O2 switch (Optimize even more):

$ arm-none-eabi-gcc -mcpu=cortex-m4 --specs=nosys.specs -O2 -S gcd.c


Resulting code:

gcd:
.L8:
    cmp    r0, r1
    beq    .L1
    cmp    r1, r0
    ite    lt
    sublt    r0, r0, r1
    subge    r1, r1, r0
    b    .L8
.L1:
    bx    lr

Recompile it using the -O3 switch (Optimize yet more):

$ arm-none-eabi-gcc -mcpu=cortex-m4 --specs=nosys.specs -O3 -S gcd.c


Resulting code:

gcd:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    cmp    r0, r1
    mov    r3, r0
    beq    .L4
.L3:
    cmp    r1, r3
    sub    r0, r3, r1
    it     ge
    movge  r0, r3
    sub    r3, r1, r3
    it     ge
    movge  r1, r3
    cmp    r0, r1
    mov    r3, r0
    bne    .L3
    bx     lr   
.L4:
    bx     lr

Surprise! The highest optimization level produced the most instructions.


Finally, recompile it using the -Os switch (Optimize for size):

$ arm-none-eabi-gcc -mcpu=cortex-m4 --specs=nosys.specs -Os -S gcd.c


Resulting code:

gcd:
.L2:
    cmp    r0, r1
    bne    .L5
    bx     lr
.L5:
    ite    gt
    subgt  r0, r0, r1
    suble  r1, r1, r0
    b      .L2


Finally, we get the best compiler setting (-Os) for the shortest code. It is still longer than hand-written code, though.

Even at the best setting, a compiler still could not beat human written code. That's why we still to know a little assembly language.

Find out more:

Condition Codes 3: Conditional Execution in Thumb-2
Comments