CS641 Class 2
Handout: screen shot of PCSPIM
Let’s review C, especially how it views memory. Here is the layout of a C program on a 32-bit machine.
The code lies in the lower memory addresses, and the stack at the higher memory addresses, usually growing down. The static data is just above the code area, and the heap just above that. There are usually big gaps before the code, between the heap and the stack, and above the stack. These gap areas have no real memory associated with them, or belong to the OS. The stack may be located below the code on some systems.
Hex numbers of use with addresses:
0x1000 = 4K = 4096
0x10000 = 16*4K = 64K
0x100000 = 16*64K = 1024K = 1M
0x1000000 = 16M
0x1000 0000 = 256M = 1/4G
whole 32-bit address space = 4GB
C program showing types of memory
int
x = 10; /* external variable, in static
data */
char
[] msg = “hi”; /* ditto */
int
main()
{
int y = 5; /* local variables, on stack */
int z;
char *p;
z = x + y;
p = malloc(10); /* allocates 10 bytes of heap */
p = msg; /* repoint pointer to point to static
data area */
}
You see that all the types of memory are equally usable. You do need to be aware that stack memory comes and goes with the execution of the function. The variable y does not exist until main is called. Then it comes into life with value 5, and lives until main returns. Static data and heap data survive return from a function. Static data is valid for the life of the program. Heap data becomes valid with the malloc, and survives until free(p) is called on its pointer.
A C pointer variable like p here can point to the heap, static data, or even to a local variable (or even to a place in code, but that’s rare). The value of a pointer variable is a memory address. That’s how C points at things. Every byte has a unique address, so this works very well.
For example, suppose the heap starts at address 0x300000. The malloc above will return 0x300000, or a little above that, and this value will be put in the pointer variable p, itself a 32-bit variable on the stack.
How can we find the location of variable z? If we had a pointer to this variable, its address would answer the question. Note that C allows us to construct a pointer to z by the expression &z, so this expression’s value is the address. It might be at 0xffffabc0, way up at the top of the 32-bit address space, or on another machine, at 0x7fffabc0, near the half-way point.
The code above is responsible in that it only computes with variables belonging to the program. But nothing stops us in C from using memory outside the areas owned by the program, for ex:
p
= 0x4000000; /* an address inside a gap
above */
*p
= 0; /* try to write to that spot:
fails with segmentation fault or equivalent */
C programmers need to keep track of the memory layout as they program, or work very conservatively. They are responsible for keeping all pointers pointing to real data.
So we see that C is not a very high level language. At least C hides the registers!
Ex. z = x + y;
on a typical processor, need to use a register, say reg1 to do this:
load x in reg1
add y to reg1
store reg1 to z
So we can thank the C compiler for figuring this out for us.
MIPS Registers and
Memory (as set up for SPIM)
MIPS has 32 registers: see top part of SPIM screenshot.
MIPS memory layout: like the C model above, with
code starts at 0x400000 = 4M
data starts at 1000 0000 = 256M, 1/16 the way up the address space
stack grows down from 0x7fff f6d0, just below 0x8000 0000, the half-way point in the 32-bit address space
There is no heap here.
Back to MIPS registers
For now, we’ll stick to a subset: $16-$23, known as Ss0, $s1, ..., $s7 These hold C variables and equivalent
$8 - $15, known as $t0, ... $t7 These hold temporaries
Sample instruction: add $s0, $s1, $s2, where s0 corresponds to C variable a, s1 to b, s2 to c.
This instruction adds contents of s1 and s2, and puts the result in s0, i.e., in effect a = b + c.
sub $s3, $s4, $s5: subtracts s5 from s4, puts result in s3
The following is from slides from Norm Rubin, who taught cs641 some time ago--
How do the following C statement?
a = b + c + d - e;
° Break into multiple instructions
add $t0, $s1, $s2 # temp = b + c
add $t0, $t0, $s3 # temp = temp + d
sub $s0, $t0, $s4 # a = temp - e
° Notice: A single line of C may break up into several lines of MIPS.
° Notice: Everything after the hash mark on each line is ignored (comments)
How do we do this?
f = (g + h) - (i + j);
° Use intermediate temporary register
add $t0,$s1,$s2 # temp = g + h
add $t1,$s3,$s4 # temp = i + j
sub $s0,$t0,$t1 # f=(g+h)-(i+j)
Immediates are numerical constants.
° They appear often in code, so there are special instructions for them.
° Add Immediate:
addi $s0,$s1,10 (in MIPS)
f = g + 10 (in C)
where MIPS registers $s0,$s1 are associated with C variables f, g
° Syntax similar to add instruction, except that last argument is a number instead of a register.
° Note works for immediate values that fit in 16 bits, signed, so between -2^15 = -32K and 2^15 -1 = 32K-1.
° There is no Subtract Immediate in MIPS: Why?
° Limit types of operations that can be done to absolute minimum
° if an operation can be decomposed into a simpler operation, don’t include it
° addi …, -X = subi …, X => so no subi
° addi $s0,$s1,-10 (in MIPS)
° f = g - 10 (in C)
° where MIPS registers $s0,$s1 are associated with C variables f, g
C variables map onto registers; what about large data
structures like arrays?
°
memory
contains such data structures
°
But MIPS
arithmetic instructions only operate on registers, never directly on memory.
°
Data
transfer instructions
transfer data between registers and memory:
•
Memory
to register, Register to memory
° To transfer a word of data, we need to specify two things:
• Register: specify this by number (0 - 31) or symbolic name ($s0,…, $t0, …)
• Memory address: more difficult
To specify a memory address to copy from, specify two things:
• A register which contains a pointer to memory (a memory address)
• A numerical offset (in bytes)
° The desired memory address is the sum of these two values.
° Example: 8($t0)
• specifies the memory address pointed to by the value in $t0, plus 8 bytes
• so if $t0 contains 0x1000 0100, this specifies address 0x1000 0108
° Load Instruction Syntax:
1 2,3(4)
• where
1) operation name
2) register that will receive value
3) numerical offset in bytes
4) register containing pointer to memory
° MIPS Instruction Name:
lw (meaning Load Word, so 32 bits or one word are loaded at a time
° Example: lw $t0,12($s0)
This instruction will take the pointer in $s0, add 12 bytes to it, and then load the value from the memory pointed to by this calculated sum into register $t0
° Notes:
• $s0 is called the base register
• 12 is called the offset
• offset is generally used in accessing elements of array or structure: base reg points to beginning of array or structure
• if s0 contains 0x1000 0100 as above, this load accesses address 0x1000 010c. Note that 12 decimal is c in hex. Numbers default to decimal in assembler. Need to write 0x in front of hex numbers, just as in C.
But how do we get
0x1000 0100 into s0?? Using lui and la
We saw how to use
immediate operands to construct constants, but they only can handle 16 bits.
Another instruction:
lui, load upper immediate, loads 16 bits into the upper half of the target
register, and clears the lower half.
So to get 0x1000
0100 into s0:
lui $s0, 0x1000
addi $s0, 0x0100
Note that the assembler supports a pseudo-instruction la (load address) that knows how to do this, so we can write:
la $s0, 0x10000100
and the assembler will compose the two needed instructions (perhaps using ori instead of addi) for us.
° Store: Also want to store value from a register into memory
° Store instruction syntax is identical to Load instruction syntax
° MIPS Instruction Name:
° sw (meaning Store Word, so 32 bits or one word are stored at a time)
° Example: sw $t0,12($s0)
° This instruction will take the pointer in $s0, add 12 bytes to it, and then store the value from register $t0 into the memory address pointed to by the calculated sum
First full program: hello.s, the hello world program linked to the class web page.
#
hello world from MIPS assembler
#
Note that the first instruction (la) is a pseudo-instruction,
#
and assembles to two real instructions, an lui and an ori
#
to load the upper half and lower half of the 32-bit address
#
for msg into register $a0
.text
.globl
main
main: #program starts at main
la
$a0, msg # load address of msg
into a0
ori
$v0,$0,4 # syscall 4 for print
string with addr in a0
syscall
jr
$ra # return from main
.data
msg: .asciiz "Hello, world!"
The assembler reads the program, in order, and while .text is in effect, it puts the indicated instructions into a growing text segment (i.e. code), and when .data is in effect, it puts the indicated data areas into the data segment. These are loaded into SPIM along with some startup code, and we end up with what you see in the screenshot handout.
Try it out!