Note: There are no branch delay slots with the x86 family: it’s too hard to do with variable length instructions
So in x86 and x86-64, branches typically just stall the pipeline.
Compilers try hard to remove branches.
Look at MIPS case
first:
Our cross-compiler setup: mips-gcc and mips-gcc-O2 are just shell scripts:
sf06.cs.umb.edu$
more mips-gcc
mips-gcc-compiler
-march=r2000 $*
sf06.cs.umb.edu$
more mips-gcc-O2
#
-O2 optimization turns on -fdelayed-branch, which
#
allows instruction reordering to fill in branch delay slots
#
to start, disallow delayed branch slot use with -O2
mips-gcc-compiler
-march=r2000 -O2 -fno-delayed-branch $*
Example
showing use of “if” to avoid division: good idea?
//
do x%N without division: cheap mod, for x below 2*N
//
used in Weiss, DS & Problem Solving using Java (CS310 text), pg. 794
//
note: on some processors, x%N may be faster, but not our MIPS
int
mod1(int x, int N)
{
if (x >= N) // conditional subtraction instead of x%N
x -= N;
return x;
}
Compile
it, allowing branch-delay slot use by MIPS C compiler:
mips-gcc-compiler
-march=r2000 -O2
...
slt
$2,$4,$5
bne
$2,$0,$L34
nop <--branch delay slot with
nop
subu
$4,$4,$5
$L34:
j
$31
move
$2,$4 <--branch delay
slot with out-of-order instruction
Same
code, compile with x86: 6 instructions either way, no branch (gcc –O2 compile
on sf06)
The trick:
with the conditional subtraction, the compiler uses the conditional move instruction
“cmovge”, which tests condition codes set by the cmpl and copies conditionally
from EAX to EDI. This is a fairly new instruction, not in most x86 textbooks.
with conditional subtraction:
movl
%edi, %eax
subl
%esi, %eax
cmpl
%esi, %edi
cmovge
%eax, %edi
movl
%edi, %eax
ret
with %:
movl
%edi, %edx
movl
%edi, %eax
sarl
$31, %edx <--sign extension by right shift 31
idivl
%esi <--integer division: divide edx:eax by esi,
remainder in edx
movl
%edx, %eax
ret
One place this matters in compare routine for sorting. We need to return negative, 0, or positive from the function
Ex. (note it is not correct to subtract and then check sign)
int
intcompare(int *x, int *y)
{
if (*x < *y)
return -1;
else if (*x > *y)
return 1;
else return 0;
}
Looks like 2 branches, but MIPS and x86-64 can do it with one (maybe zero?)
MIPS: -O2
lw
$4,0($4)
lw
$5,0($5)
nop
slt
$2,$4,$5
bne
$2,$0,$L33
li
$3,-1 # 0xffffffffffffffff
slt
$3,$5,$4
$L33:
j
$31
move
$2,$3
gcc -S -O2 : x86-64 can do
intcompare:
.LFB3:
movl
(%rsi), %eax
cmpl
%eax, (%rdi)
movl
$-1, %edx
jl
.L3
setg
%al
movzbl
%al, %edx
.L3:
movl
%edx, %eax
ret
.LF