# addem.s # Add five numbers # C. Vickery # CS-343 # Entry point .globl main # Five data values to sum .data alpha: .word 5, -5, 10, -10, -123 # main() # ------------------------------------------ .text main: li $t0, 0 # i = 0 li $a0, 0 # sum = 0 li $t1, 0x14 # lim = 20 loop: lw $a1, alpha($t0) # x = alpha[i] add $a0, $a0, $a1 # sum += x addi $t0, $t0, 4 # i++ blt $t0, $t1, loop # i < lim li $v0, 1 # print_int() syscall li $v0, 10 # exit() syscall .end
To begin, here is what the SPIM assembler generated for addem.s:
[0x00400024] 0x34080000 ori $8, $0, 0 ; 14: li $t0, 0 # i = 0 [0x00400028] 0x34040000 ori $4, $0, 0 ; 15: li $a0, 0 # sum = 0 [0x0040002c] 0x34090014 ori $9, $0, 20 ; 16: li $t1, 0x14 # lim = 20 [0x00400030] 0x3c011001 lui $1, 4097 ; 18: lw $a1, alpha($t0) # x = alpha[i] [0x00400034] 0x00280821 addu $1, $1, $8 [0x00400038] 0x8c250000 lw $5, 0($1) [0x0040003c] 0x00852020 add $4, $4, $5 ; 19: add $a0, $a0, $a1 # sum += x [0x00400040] 0x21080004 addi $8, $8, 4 ; 20: addi $t0, $t0, 4 # i++ [0x00400044] 0x0109082a slt $1, $8, $9 ; 21: blt $t0, $t1, loop # i < lim [0x00400048] 0x1420fffa bne $1, $0, -24 [loop-0x00400048] [0x0040004c] 0x34020001 ori $2, $0, 1 ; 23: li $v0, 1 # print_int() [0x00400050] 0x0000000c syscall ; 24: syscall [0x00400054] 0x3402000a ori $2, $0, 10 ; 25: li $v0, 10 # exit() [0x00400058] 0x0000000c syscall ; 26: syscall
Question 1. The lw instruction specifies
alpha($t0)
as the effective address, but
alpha
was assigned to memory address 0x10010000, which
does not fit in the 16-bit Address field of a lw instruction.
So the assembler generated three instructions to compute the effective
address in register $1 (also known as $at, the "assembler temporary"
register). The lui instruction has the leftmost 16 bits of the
address 0x10010000 as its immediate operand (the rightmost 16 bits of
the lui instruction); this 16-bit value gets loaded into the
leftmost 16 bits of $at, and the rightmost 16 bits of $at are set to
zero. Since the address that alpha represents ends with 0x0000, the
complete address of alpha has been loaded into $at by the lui
instruction. If the rightmost 16 bits of the address alpha represents
were not all zeros, the assembler would have generated an ori
instruction to put the correct value into $at bits 0:15.
The second instruction is an addu (add unsigned) to add the
contents of $8 (which is $t0, the register I used as the index into
the alpha array) to $at. The assember generated an unsigned add
because that's what lw instructions do: the value in the
Address field of the instruction (bits 0:15 of the instruction) are a
signed value, but the contents of rs ($t0 in this case) are
unsigned. What's signed and what's unsigned is backwards from what
you might intuit from an expression like "alpha($t0)
"
where you would think the array address (alpha) would be unsigned and
the subscript (register $t0) would be signed. But it makes sense when
you think of the "Address" field of the instruction actually being a
(signed) offset rather than the base address of the array.
Finally, the assembler generated a conventional lw instruction using the complete effective address in $at for the rs register and zero for the offset.
Question 2. The li instructions were turned into ori instructions with $zero (register $0, the pseudo-register that can't be changed and which always provides the value 0x00000000) as the rt register. The assembler could have generated addi instructions, also with $zero for the rt register, with the same effect.
Question 3. The blt instruction generated a slt instruction followed by a bne. That is, the assembler used an slt instruction to compare the two registers, leaving the value 0 (false) or 1 (true) in register $at to indicate the result of the comparison. Then the bne conditionally branched back to the top of the loop if the comparison was true.
The SPIM assembler and simulator calculated the branch target address by subtracting the address of the target instruction (0x00400030) from the address of the bne instruction (0x0040048), a difference of negative 18 in hexadecimal or -24 in decimal. But the difference between the addresses of two instructions will always end with two binary zeros, so the architecture specifies that the target address field of the branch instructions drops those two bits and leaves the difference in word addresses rather than the difference in byte addresses in the instruction. Thus, the hexadecimal code for the instruction is 0x1420fffa, rightmost sixteen bits are 0xFFFA, and the decimal value of this is -6, which is -24 divided by 4.
As discussed in class, there is a discrepancy between how the SPIM software handles branches and how they are done in Chapter 5. Since the address of an instruction is provided by the PC register, the hardware calculates the branch target address by adding the address field of the instruction (sign extended and shifted left two places) to the PC. The datapath in Chapter 5 of the book calculates the branch target address using the value of the PC after it has been incremented by 4 and thus points to the instruction after the branch instruction, but the simulator and assembler use the address of the branch instruction itself. In class I used the terms "PC+4" and "PC" to talk about this difference.
Question 4. 38 clock cycles. Each instruction takes exactly one clock cycle to execute. There are three li instructions before the loop, and seven instructions inside the loop (three for the lw, an add, an addi, and two for the blt. The instructions inside the loop get executed 5 times each.
Question 5. Assuming the ALU control logic is extended so that an ALUOp input value of 112 causes the ALU function code to be based on the opcode field of the instruction instead of the func field, as discussed in class, the following will work:
Input or output | Signal name | addi | andi | ori | slti |
---|---|---|---|---|---|
Inputs | Op5 | 0 | 0 | 0 | 0 |
Op4 | 0 | 0 | 0 | 0 | |
Op3 | 1 | 1 | 1 | 1 | |
Op2 | 0 | 1 | 1 | 0 | |
Op1 | 0 | 0 | 0 | 1 | |
Op0 | 0 | 0 | 1 | 0 | |
Outputs | RegDst | 0 | 0 | 0 | 0 |
ALUSrc | 1 | 1 | 1 | 1 | |
MemtoReg | 0 | 0 | 0 | 0 | |
RegWrite | 1 | 1 | 1 | 1 | |
MemRead | 0 | 0 | 0 | 0 | |
MemWrite | 0 | 0 | 0 | 0 | |
Branch | 0 | 0 | 0 | 0 | |
ALUOp1 | 1 | 1 | 1 | 1 | |
ALUOp0 | 1 | 1 | 1 | 1 |
Question 6.
Function generate_controlword receives instruction, returns control word Decode instruction, giving opcode case opcode in R-format: return RegDst | RegWrite | ALUOp1 lw: return ALUSrc | MemtoReg | RegWrite | MemRead sw: return ALUSrc | MemWrite beq: return Branch | ALUop0
Question 7. There are two listings: ControlWord.java, which includes the function to generate the control workds, and ControlUnit.java, which provides a wrapper for reading the file containing the instructions passing them to the control word generator, and showing the results.