CS641 Class 2

Handout: screen shot of PCSPIM

Let’s review C, especially how it views memory.  Here is the layout of a C program on a 32-bit machine.

 

 

 

 

 

 

 

 

 

 


The code lies in the lower memory addresses, and the stack at the higher memory addresses, usually growing down. The static data is just above the code area, and the heap just above that. There are usually big gaps before the code, between the heap and the stack, and above the stack. These gap areas have no real memory associated with them, or belong to the OS.  The stack may be located below the code on some systems.

Hex numbers of use with addresses:

0x1000 = 4K = 4096

0x10000 = 16*4K = 64K

0x100000 = 16*64K = 1024K = 1M

0x1000000 = 16M

0x1000 0000 = 256M = 1/4G

whole 32-bit address space = 4GB

C program showing types of memory

int x = 10;  /* external variable, in static data */

char [] msg = “hi”;  /* ditto */

int main()

{

      int y = 5;   /* local variables, on stack */

      int z;

      char *p;

 

      z = x + y;

      p = malloc(10);  /* allocates 10 bytes of heap */

      p = msg;         /* repoint pointer to point to static data area */

}

 

You see that all the types of memory are equally usable.  You do need to be aware that stack memory comes and goes with the execution of the function.  The variable y does not exist until main is called. Then it comes into life with value 5, and lives until main returns.  Static data and heap data survive return from a function. Static data is valid for the life of the program. Heap data becomes valid with the malloc, and survives until free(p) is called on its pointer.

A C pointer variable like p here can point to the heap, static data, or even to a local variable (or even to a place in code, but that’s rare). The value of a pointer variable is a memory address.  That’s how C points at things.  Every byte has a unique address, so this works very well.

For example, suppose the heap starts at address 0x300000. The malloc above will return 0x300000, or a little above that, and this value will be put in the pointer variable p, itself a 32-bit variable on the stack.

How can we find the location of variable z?  If we had a pointer to this variable, its address would answer the question.  Note that C allows us to construct a pointer to z by the expression &z, so this expression’s value is the address.  It might be at 0xffffabc0, way up at the top of the 32-bit address space, or on another machine, at 0x7fffabc0, near the half-way point.

The code above is responsible in that it only computes with variables belonging to the program. But nothing stops us in C from using memory outside the areas owned by the program, for ex:

p = 0x4000000;  /* an address inside a gap above */

*p = 0;    /* try to write to that spot: fails with segmentation fault or equivalent */

 

C programmers need to  keep track of the memory layout as they program, or work very conservatively. They are responsible for keeping all pointers pointing to real data.

So we see that C is not a very high level language.  At least C hides the registers!

Ex.   z = x + y;

on a typical processor, need to use a register, say reg1 to do this:

load x in reg1

add y to reg1

store reg1 to z

So we can thank the C compiler for figuring this out for us.

MIPS Registers and Memory (as set up for SPIM)

MIPS has 32 registers: see top part of SPIM screenshot.

MIPS memory layout: like the C model above, with

code starts at 0x400000 = 4M

data starts at 1000 0000 = 256M, 1/16 the way up the address space

stack grows down from 0x7fff f6d0, just below 0x8000 0000, the half-way point in the 32-bit address space

There is no heap here.

Back to MIPS registers

For now, we’ll stick to a subset:                 $16-$23, known as Ss0, $s1, ..., $s7   These hold C variables and equivalent

                                                                                 $8 - $15, known as $t0, ... $t7    These hold temporaries

Sample instruction:  add $s0, $s1, $s2,    where s0 corresponds to C variable a, s1 to b, s2 to c.

This instruction adds contents of s1 and s2, and puts the result in s0, i.e., in effect a = b + c.

sub $s3, $s4, $s5: subtracts s5 from s4, puts result in s3

The following is from slides from Norm Rubin, who taught cs641 some time ago--

How do the following C statement?
                 a = b + c + d - e;

°         Break into multiple instructions

add $t0, $s1, $s2 # temp = b + c

add $t0, $t0, $s3 # temp = temp + d

sub $s0, $t0, $s4 # a = temp - e

°         Notice: A single line of C may break up into several lines of MIPS.

°         Notice: Everything after the hash mark on each line is ignored (comments)

How do we do this?

f = (g + h) - (i + j);

°         Use intermediate temporary register

add $t0,$s1,$s2 # temp = g + h

add $t1,$s3,$s4 # temp = i + j

sub $s0,$t0,$t1  # f=(g+h)-(i+j)

Immediates are numerical constants.

°         They appear often in code, so there are special instructions for them.

°         Add Immediate:

                addi $s0,$s1,10 (in MIPS)

                f = g + 10 (in C)

where MIPS registers $s0,$s1 are associated with C variables f, g

°         Syntax similar to add instruction, except that last argument is a number instead of a register.

°         Note works for immediate values that fit in 16 bits, signed, so between -2^15 = -32K and 2^15 -1 = 32K-1.

°         There is no Subtract Immediate in MIPS: Why?

°         Limit types of operations that can be done to absolute minimum

°         if an operation can be decomposed into a simpler operation, don’t include it

°         addi …, -X = subi …, X => so no subi

°                         addi $s0,$s1,-10 (in MIPS)

°                         f = g - 10 (in C)

°         where MIPS registers $s0,$s1 are associated with C variables f, g

C variables map onto registers; what about large data structures like arrays?

°         memory contains such data structures

°         But MIPS arithmetic instructions only operate on registers, never directly on memory.

°         Data transfer instructions transfer data between registers and memory:

          Memory to register, Register to memory

°         To transfer a word of data, we need to specify two things:

          Register: specify this by number (0 - 31) or symbolic name ($s0,…, $t0, …)

          Memory address: more difficult

To specify a memory address to copy from, specify two things:

          A register which contains a pointer to memory (a memory address)

          A numerical offset (in bytes)

°         The desired memory address is the sum of these two values.

°         Example:             8($t0)

          specifies the memory address pointed to by the value in $t0, plus 8 bytes

          so if $t0 contains 0x1000 0100, this specifies address 0x1000 0108

°         Load Instruction Syntax:

                1    2,3(4)

          where

                                1) operation name

                                2) register that will receive value

                                3) numerical offset in bytes

                                4) register containing pointer to memory

°         MIPS Instruction Name:

lw (meaning Load Word, so 32 bits                or one word are loaded at a time

°         Example:             lw $t0,12($s0)

                This instruction will take the pointer in $s0, add 12 bytes to it, and then load the value from the memory pointed to by this calculated sum into register $t0

°         Notes:

          $s0 is called the base register

          12 is called the offset

          offset is generally used in accessing elements of array or structure: base reg points to beginning of array or structure

          if s0 contains 0x1000 0100 as above, this load accesses address 0x1000 010c.  Note that 12 decimal is c in hex. Numbers default to decimal in assembler. Need to write 0x in front of hex numbers, just as in C.

 But how do we get 0x1000 0100 into s0??  Using lui and la

We saw how to use immediate operands to construct constants, but they only can handle 16 bits.

Another instruction: lui, load upper immediate, loads 16 bits into the upper half of the target register, and clears the lower half.

So to get 0x1000 0100 into s0:

      lui $s0, 0x1000

      addi $s0, 0x0100

 

Note that the assembler supports a pseudo-instruction la (load address) that knows how to do this, so we can write:

                la $s0, 0x10000100

and the assembler will compose the two needed instructions (perhaps using ori instead of addi) for us.

 

°         Store: Also want to store value from a register into memory

°         Store instruction syntax is identical to Load instruction syntax

°         MIPS Instruction Name:

°                                         sw (meaning Store Word, so 32 bits              or one word are stored at a time)

°         Example:             sw $t0,12($s0)

°                         This instruction will take the pointer in $s0, add 12 bytes to it, and then store the value from register $t0 into the memory address pointed to by the calculated sum

First full program: hello.s, the hello world program linked to the class web page.

# hello world from MIPS assembler

# Note that the first instruction (la) is a pseudo-instruction,

# and assembles to two real instructions, an lui and an ori

# to load the upper half and lower half of the 32-bit address

# for msg into register $a0

.text

        .globl  main

main:                     #program starts at main

        la      $a0, msg  # load address of msg into a0

        ori     $v0,$0,4  # syscall 4 for print string with addr in a0

        syscall

        jr      $ra       # return from main

 

.data

msg:    .asciiz "Hello, world!"

 

The assembler reads the program, in order, and while .text is in effect, it puts the indicated instructions into a growing text segment (i.e. code), and when .data is in effect, it puts the indicated data areas into the data segment.  These are loaded into SPIM along with some startup code, and we end up with what you see in the screenshot handout.

Try it out!