09: Assembly and MIPS

May 26, 2021

This article is a part of a series where I go through teachyourselfcs. If you would like to start at the beginning start here.

LECTURES

L5

Assembly language below high level language (MIPS)
Assebler turns assembly code into machine code
job of cpu: execute lost of instructions (primitive operations that the cpu can execute)
Example instruction set architectures:

ARM
x86
mips
risc-v
ibm/motorola powerPC
intel IA64

Assembly Variables: Registers
since registers are in hardware there is a limited number, 32 registers in MIPS

32 bits in a register, groups of 32 bits called a word in mips.

Number refererences: ($0, $1, $2)

name references: ($s0-$s7, $t0-\$t7) c variables and temporary variables (preffered because easier to debug)

add in assembly
add $s0, $s1, $s2 = a = b + c

subtract in assembly
sub $s3, $s4, $s5 = d = e - f

how to do a = b + c + d - e

  add $t0, $s1, $s2 # temp = b + c
  add $t0, $t0, $s3 # temp = temp + d
  sub $s0, $t0, $s4 # a = temp - e

Immediates:
Immediates are numerical constants, they appear often in code, so there are special instructions for them.

  addi $s0, $s1, -10 #mips
  f =  g - 10 #c

you have a special register for zero cases

  add $s0, $s1, $zero #mips
  f = g #c

Overflow handling in mips:
Detect overflow:

add
add immediate
subtract

do not detect overflow:

add unsigned
add immediate unsigned
subtract unsigned

only 32 registers, if you dont have space in registers you have to go to memory.

transfer from memory to register:

C code:

  int A[100];
  g = h + A[3];

mips:

lw = load word

  lw $t0, 12 ($s3) #Temp reg $t0 gets A[3]
  add $s1, $s2, $t0 # g = h + A[3]

s3 is a pointer to the beginning of the A array

offset is 12 (offset is in bytes) so it fast fowards 12 bytes in array or 96 bits 96/32 = 3 (32 is size of int) , have to increment base by the size of the datatype.

transfer from register to memory:

C code:

  int A[100];
  A[10] = h + A[3];

mips: sw = store word

  lw $t0, 12 ($s3) #Temp reg $t0 gets A[3]
  add $t0, $s2, $t0 # temp reg $t0 gets h + A[3]
  sw $t0, 40 ($s3) # A[10] = h + A[3]

In addition to word data transfers mips has byte data transfers:

-load byte: lb -store byte: sb

same format as lw, sw

lbu for unsigned.

speed of registers vs. memory

smaller is faster with registers. registers are 100-500 times faster than memory.

mips if statement:

  beq register1, register2, L1

means: go to statement L1 if value in register1 == value in register2

otherwise got to next statement

beq = branch if equal

branch - change of control flow

conditional branch - change control flow depending on outcome of comparison

branch if equal (beq) or branch if not equal (bne)

unconditional branch - always branch

jump (j)

L6

Machine code is lowest level of software.

words and vocabulary are called instructions and instruction sets respectively

mips is example RISC instruction set

rigid format 1 operation, 2 operands, 1 destination

example operations:

operations to move data around registers and memory:

operations for decision/ flow control:

beq
bne
j
slt
slti

program is stored as a bunch of bytes

mips instruction 32 bits or 4 bytes

Assembler converts assembly to machine code.

assembly code has .S extension machine code object has .o extension

machine code executable file has a .out extension

if a branch is false you go to next instruction else you jump to the given instruction (conditional branch).

unconditional branch: Always go to the “label”

C code

  if (i == j)
    f = g + h;

mips code

  bne $s3, $s4, Exit # notice this is not equal, if false this goes to next line which is add
  add $s0, $s1, $s2

C code

if (i == j)
  f = g + h;
else
  f = g - h;

mips code

  bne $s3, $s4, Else
  add $s0, $s1, $s2
  j Exit
  Else: sub $s0, $s1, $s2
  Exit:

The jump is needed to jump over the else code, and go to whatever the next line of code is.

interpreted: compiled at run time, so as a line of code is being executed it compiles to assembly.

C code

  if (g < h)
    goto Less;

mips code

  slt $t0, $s0, $s1
  bne $t0, $zero, Less

slt = set on less than

set means change to 1.

reset means change to 0.

sltu = treats registers as unsigned

slti = immediates

Six Fundamental Steps in Calling a Function:

Put parameters in a place where function can access them
Transfer control to function
Acquire (local) storage resources needed for function
Perform desired task of the function
Put result value in a place where calling code can access it and restore any registers you used
Return control to point of origin since a function can be called from several points in a program.

Function call conventions

Registers faster than memory, so use them
$a0 - $a3: four argument registers to pass parameters
$v0-$v1: two value registers to return values
$ra: one return address register to return to the point of origin

C code

  int sum (int x, int y) {
    return x + y;
  }

mips code

  address (shown in decimal)
  1000 add $a0, $s0, $zero          # x = a
  1004 add $a1, $s1, $zero          # y = b
  1008 addi $ra, $zero, 1016        # $ra = 1016
  1012 j sum                        # jump to sum
  2000 sum: add $v0, $a0, $a1
  2004 jr $ra #new instruction

jal = jump and link

laj = link and jump

link = form an address or link that points to calling site to allow function to return to proper address.

jr = jump register

$sp = stack pointer

Where are old register values saved to restore them after function call?

ideally they are in the stack.

push: placing data onto stack

pop: removing data from stack

stack in memory so need register to point to it ($sp register 29 in mips)

when adding $sp decrements to add more space

when removing $sp increases to decrease space.

C code

  int leaf_example (int g, int h, int i, int j)
  {
    int f;
    f = (g + h) - (i + j);
    return f;
  }

paramerter variables g,h,i,j in $a0, $a1, $a2, $a3

f in $s0

1 temp register $t0

mips code

  addi $sp, $sp, -8      # adjust stack for 2 items
  sw $t0, 4($sp)         # save $t0
  sw $s0, 0($sp)         # save $s0

  add $s0, $a0, $a1      # f = g + h
  add $t0, $a2, $a3      # t0 = i + j
  sub $v0, $s0, $t0      # return value (g+h)-(i+j)

  lw $s0, 0($sp)         # restore register $s0 for caller
  lw $t0, 4($sp)         # restore register $t0 for caller
  addi $sp, $sp, 8       # adjust stack to delete 2 items
  jr $ra                 #jump back to calling routine

What if function calls another function?

have to save the outer functions return address so the inner function can return there.

in C there are 3 import memory areas:

static
heap
stack

mips divides registers into 2 categories:

Preserved across function call ($ra, $sp, $gp, $ , “saved registers” $s0-$s7)
Not preserved ($v0, $v1), argument registers $a0-$a3, temp $t0-$t9

c has 2 storage classes:

automatic: variables are local to function and discarded staic: variables exist across exits from and entries to procedures

use stack for automatic (local) variables that dont fit registers

procedure frame or activation record: segment of stack with saved registers and local variables.

L7

C has 2 storage classes: automatic and static

Automatic: variables are local to function and discared when function exits
Static: vairables exist across exits from and entries to procedures

use stack for automatic (local) variables taht dont fit in registers

procedure frame or activation record: segment of stack with saved registers and local variables

some mips compilers use a frame pointer ($fp) to point to first word of frame.

$gp: global pointer points to start of static data.

heap: dynamic data, grows and shrinks depending on when you allocate and free memory.

26 and 27 used by the operating system.

stored-program computer

instructions are represented as bit patterns - can think of these as numbers
therfore, entire programs can be stored in memory to be read or written just like data.
can reprogram quickly (seconds), dont have to rewire computer (days)
know as the “von neumann” computers after widely distributed tech remort on EDVAC project

Everything addressed

since all instructions are stored in memory everything has a memory address: instructions, data words
c pointers are just memory addresses: they can point to anything in memory
one register keeps address of instruction being executed: “Program Counter” (PC)

L8

MARS - mips simulator, allows you to run mips code on x86 or other architecture machine.

Running a program (compiling, assembling, linking, loading)

Interpreter: program that executes other programs.

Language translation gives us another option.

In general we interpret a high level language when efficiency is not critical and translate to a lower-level language to up performance.

interpreter executes program in the source langauge

translator converts a program from source language to an equivalent program in another langauge

example of interpreted language: python, javascript

Can be useful to interpret machine language for learning/debuging.

Translated/compiled code almost always more efficient.

compiler takes high level code and returns assembly

assembler turns assembly code to object code

multiplication: 32 bit * 32 bit = 64 bit value (in mips)

mult register1, register2

puts result in special result register

puts upper half in hi, lower half in lo

hi and lo are 2 registers separate from 32 general purpose registers.

use mfhi register and mflo register to move from hi, lo

C code

a = b * c

mips code

  mult $s2, $s3
  mfhi $s0
  mflo $s1

division:

div register1, register2

C code

  a = c / d;

mips code

  div $s2, $s3
  mflo $s0
  mfhi $s1

assembler goes through assembly code twice, first to get labels, second to setup code

HW 1

https://github.com/BrooksPoltl/cs61c/tree/master/hw1

HW 2

https://github.com/BrooksPoltl/cs61c/tree/master/hw2

READINGS

2.1 Introduction

instruction sets are very similar to each other. Similar to SQL in that if you learn one instruction set it is very easy to pick up another.

computer designers are focused on writing instruction sets that are easy as well as cost and energy efficient.

stored-program concept: many data types can be stored as a number.

2.2 Operations of the Computer Hardware

Every computer must be able to perform arithmetic. You can chain aritmetic operations for example since add is stored to a variable you can set that variable again with the sum of the previous operation plus the next value.

Design principle 1: simplicity favors regularity.

Java compilation done later than C compilation (just in time compilation or JIT). s

2.3 Operands of the Computer Hardware

Operands are limited in instructions, they must be a part of the register which is 32 bits also known as a “word”. There are 32 32 bit registers.

Design principle 2: smaller is faster.

The farther electronic symbols have to travel the longer it takes.

There is also a need to move data from registers to memory, these are called data transfer instructions. Memory is just a large sigle dimensional array with the address being the index.

Words must start at addresses that are multiples of 4. This is common and called an alignment restriction.

Many programs use more variables than there are registers so more frequently used variables are stored in registers and less frequent are stored in memory. This is called “spilling registers”.

2.5 Representing Instructions in the Computer

Format for instructions in decimal:

add $t0, $s1, $s2
#decimal
0 | 17 | 18 | 8 | 0 | 32
#binary
000000 | 10001 | 10010 | 01000 | 00000 | 100000

the left and right indicate the instruction to perform.

Notice that the left and right are both 6 bits instead of 5. The sum of all of these bits is 32 to represent the size of a single word. Each one of these columns in a single row represents a “field”. The second and third field represents the operands which are $s1 and $s2 respectively. The fourth represents the register that receives the output. The fifth is unused here which is why it is set to 0.

To distinguish from assembly the numeric representation of the instruction is called “machine language”.

Hexadecimal is very easy to convert to binary. Since hexadecimal is base 16 and binary is base 2. You convert by grouping together 4 bits and converting it to its hexadecimal equivalent.

11100001 = E1
1110 = E # first four
0001 = 1 # last four

C and Java use 0xnnnn notation for hexadecimal.

Further breakdown of instructions:

# register format or r-type or r-format

op | rs | rt | rd | shamt | funct

# op: Basic operation of the instruction aka opcode.
# rs: first register source operand
# rt: second register source operand
# rd: register destination operand
# shamt: shift amount
# funct: function or function code, selects the specific variant of the operation in the op field

Idealy we would like to keep all instructions the same size, but some require more than 32 bits given the above format like lw (load word) since it includes constants.

Design Principle 3: Good Design demands good compromises.

This issue with constants has led us to the next instruction format:

# immediate format or i-type or i-format
# op, rs and rt remain the same size of bits.

op | rs | rt | constant or address (16 bits)

in a load word instruction the rt meaning changes to the destination field.

2.6 Logical Operators

A shift moves all of the bits in a word to the left or right leaving 0 in its place. These are sll and srl in mips (shift left logical and shift right logical).

There is also AND, OR, NOT, and NOR.

2.7 Instructions for Making Decisions

beq and bne (branch equal and branch not equal) goes to a statement if something is true. These are conditional branches there are also unconditional branches like jump, which will jump to a certain statement no matter what.

Most programming languages have a case/switch statement. This requires a jump address table to jump to whichever instructions are true.

2.8 Supporting Procedures in Computer Hardware

A procedure has 6 steps:

Put parameters in a place where the procedure can access them.
Transfer control to the procedure.
Acquire the storage resources needed for the procedure.
Perform the desired task.
Put the result value in a place where calling the program can access it.
Return control to the point of origin, since a procedure can be called from several points in a program.

Registers are fastest place to hold data. There are 4 argument registers a0-a3, 2 value registers to return v0-v1, and 1 return address (ra). For procedures there is a specific instruction jal (jump and link) this jumps to the address and saves it in the return address.

2.9 Communicating with People

Computers originally created for numbers, but when they were commercially viably there was a need to be able to process text. Computers today offer an 8 bit byte to represet characters with the American Standard Code for Information Interchange (ASCII) being the standard.

Characters are normally combined into strings.

There are 3 choices for string representation:

The first position is reserved to indicate length
An accompanying variable has the length
The last position of the string indicates the length

Unicode is a unversal encoding of the alphabets of most human langauges. Unicode uses 16 bits.

2.12 Translating and Starting a Program

Compiler

The compiler transforms the C program into an assembly language program. Compiling is good because it allows the programmer to write programs in less lines of code. This also makes it easier to understand for humans.

Compilers use to be inefficient so some low level stuff was writter in assembly, but now compilers can output optimized code nearly as well as assembly experts.

Assembler

Assembler takes assembly code and turns it into machine code. The assembler outputs an object file that consist of language instructions, data, and information needed to place instructions properly into memory.

To produce binary, the assembler must determine the address corresponding to the labels, assemblers do this by using a symbol table. Which is an object containing a key value pair of addresses and symbols.

Object file usually contains 6 things:

object file header giving the size and position of stuff
the text segment contains the machine language code
the static data segment contains data allocated to life of program.
relocation information identifies instructions and data words that depend on absolute addresses
symbol table containing previously mentioned key value store
debugging information contains information regarding compilation

Linker

Linkers allow you to not have to recompile everything when a line changes within the application. Also known as a link editor. It contains things at a procedure level so it only has to recompile the procedures that are changed.

Steps for linking:

Place code and data modules symbolically in memory.
Determine the addresses of data and instruction labels.
Patch both the internal and external references.

The linker produces an executable file that can be run on the computer.

Loader

Loader in UNIX follows these steps:

Reads the executable file header to determine size of the text and data segments
Creates an address space large enough for the text and data
Copies the instructions and data from the executable file into memory
Copies the parameters (if any) to the main program onto the stack
Initializes the machine registers and sets the stack pointer to the first free location.

Dynamically Linked Libraries

Library routines are not linked and loaded until the program is run. In initial implementations the loader ran a dynamic linker for libraries. Using the file to find library references.

Java Virtual Machine

Java code is compiled down to java bytecode instruction set. This byte code is used by the interpreter called the Java Virtual Machine (JVM). An interpreter is a program that simulates an instruction set architecture.

The downside with interpreters is the hits to performance. To combat this Java development allowed for compilers to translate while the program was running. This is known as “Just in Time compilers” (JIT).