Post date: Apr 23, 2018 2:44:37 PM
This post is a continuation of GCD in ARM Assembly.
Referring tot he same GCD code:
int gcd(int x, int y)
{
while( x != y )
{
if( x > y )
x -= y;
else
y -= x;
}
return x;
}
Let's recompile it to Cortex-M4 Thumb 2 assembly, unoptimized:
$ arm-none-eabi-gcc -mcpu=cortex-m4 --specs=nosys.specs -S gcd.c
The resulting code:
.cpu cortex-4
gcd:
push {r7}
sub sp, sp, #12
add r7, sp, #0
str r0, [r7, #4]
str r1, [r7]
.L4:
ldr r2, [r7, #4]
ldr r3, [r7]
cmp r2, r3
ble .L2
ldr r2, [r7, #4]
ldr r3, [r7]
subs r3, r2, r3
str r3, [r7, #4]
b .L3
.L2:
ldr r2, [r7]
ldr r3, [r7, #4]
subs r3, r2, r3
str r3, [r7]
.L3:
ldr r2, [r7, #4]
ldr r3, [r7]
cmp r2, r3
bne .L4
ldr r3, [r7, #4]
mov r0, r3
adds r7, r7, #12
mov sp, r7
pop {r7}
bx lr
Man, that's a lot of junk! It has to be that long because unoptimized code use the stack frame to store variables.
Let's recompile it to Cortex-M4 Thumb 2 assembly using the -O1 switch (lowest optimization level):
$ arm-none-eabi-gcc -mcpu=cortex-m4 --specs=nosys.specs -O1 -S gcd.c
The resulting code:
.cpu cortex-44
gcd:
cmp r0, r1
beq .L6
.L7:
cmp r0, r1
ite gt
subgt r0, r0, r1
suble r1, r1, r0
cmp r0, r1
bne .L7
bx lr
.L6:
mov r0, r1
bx lr
Much shorter code now. We see the ITE instruction being used. ITE stands for if-then-else. If the expression is true, execute the next line, else the line after than. The ITE instruction is unnecessary here because the SUBGT and SUBLE will be conditionally executed anyway.
Recompile it using the -O2 switch (Optimize even more):
$ arm-none-eabi-gcc -mcpu=cortex-m4 --specs=nosys.specs -O2 -S gcd.c
Resulting code:
gcd:
.L8:
cmp r0, r1
beq .L1
cmp r1, r0
ite lt
sublt r0, r0, r1
subge r1, r1, r0
b .L8
.L1:
bx lr
Recompile it using the -O3 switch (Optimize yet more):
$ arm-none-eabi-gcc -mcpu=cortex-m4 --specs=nosys.specs -O3 -S gcd.c
Resulting code:
gcd:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r0, r1
mov r3, r0
beq .L4
.L3:
cmp r1, r3
sub r0, r3, r1
it ge
movge r0, r3
sub r3, r1, r3
it ge
movge r1, r3
cmp r0, r1
mov r3, r0
bne .L3
bx lr
.L4:
bx lr
Surprise! The highest optimization level produced the most instructions.
Finally, recompile it using the -Os switch (Optimize for size):
$ arm-none-eabi-gcc -mcpu=cortex-m4 --specs=nosys.specs -Os -S gcd.c
Resulting code:
gcd:
.L2:
cmp r0, r1
bne .L5
bx lr
.L5:
ite gt
subgt r0, r0, r1
suble r1, r1, r0
b .L2
Finally, we get the best compiler setting (-Os) for the shortest code. It is still longer than hand-written code, though.
Even at the best setting, a compiler still could not beat human written code. That's why we still to know a little assembly language.
Find out more: