CS-343 Assignment 7 Solutions

Answers

Figure 4.1 shows two adders. Which one (the left one or the right one) is used to compute Branch Target Addresses?
The right one. The left one has 4 as one input because it computes PC+4. The right one computes PC+4 + part of the instruction (namely the immediate field shifted left two positions and sign extended), which is the Branch Target Address.
Explain why the output of the ALU is connected to both the Data Memory and the Register File.
The ALU is used to compute memory addresses for lw and sw instructions, so its output has to go to the address input of data memory. It also computes arithmetic results for R-type instructions, so its output has to go to the WD input of the register file also. At most one of these two destinations would be used for any particular instruction.
Where do the rs and rt instruction fields connect in Figure 4.7?
To the RR1 and RR2 inputs of the register file.
Where do the R[rs] and R[rt] values connect in Figure 4.7?
They come from the RD1 and RD2 outputs of the register file.
Construct a table with three rows, labelled R-type, lw, and sw, and three columns labelled RegWrite, MemWrite, and MemRead. Put a “1” in each cell where the indicated control signal would be true for the given op code, and a “0” in all the other cells.

RegWrite MemWrite MemRead

R-type 1 0 0

lw 1 0 1

sw 0 1 0
Why is there no “RegRead” control signal?
Because it is always safe to read two registers from the register file even if their values are not used. As long as their values are not used to alter the state of the registers and/or the data memory, no harm is done by reading them out.
For the Data Memory unit (Figure 4.8), assume the MemRead control signal signal is true, and answer the following questions:
1. What is the op code of the instruction being executed?
  lw: that is the only instruction that reads from main memory.
2. Which inputs and/or outputs are irrelevant when MemRead is true?
  DataIn is ignored when reading from memory.
3. Which inputs and/or outputs must be set to zero when MemRead is true?
  The MemWrite control signal must be false when reading.
For the Data Memory unit (Figure 4.8), assume the MemWrite control signal signal is true, and answer the following questions:
1. What is the op code of the instruction being executed?
  sw: that is the only instruction that writes to main memory.
2. Which inputs and/or outputs are irrelevant when MemWrite is true?
  DataOut is not used when writing to memory
3. Which inputs and/or outputs must be set to zero when MemWrite is true?
  MemRead must be false when writing.
What is the ALU used for during the execution of beq instructions?
It subtracts R[rt] from R[rs] to see if they are equal.
What is the purpose of the MemToReg control signal?
If it is true, the data read from memory will be connected to the WD input of the register file. Otherwise, the output of the ALU will be connected to the WD input of the register file.
For which op code(s) will the MemToReg control signal be true?
Only the lw instruction.
What is the purpose of the AND gate in Figure 4.17?
To tell if the op code is beq AND the ALU result is zero.
What must the value of RegDst be for R-type instructions?
For R-type instructions, the rd field of the instruction should be connected to the WR (write register number) input of the register file. Since the rd field occupies bits 15-11 of R-type instructions, RegDst should be 1 for R-type instructions.
What must the value of RegDst be for lw instructions?
For lw instructions, the immediate field “covers” the rd field, and the rt field, bits 20-16, must be used instead. Thus, RegDst has to be set to 0 for lw instructions.
What must the value of RegDst be for sw instructions, and why?
It doesn’t matter: sw instructions don’t cause anything to be written into the register file, so it doesn’t matter what bits are connected to the WR input to the register file.
What must the value of RegDst be for beq instructions, and why?
It doesn't matter: beq instructions don’t cause anything to be written into the register file, so it doesn’t matter what bits are connected to the WR input to the register file.
Define instruction latency.
The amount of time it takes to process (fetch and execute) an instruction.
Define instruction throughput.
The rate at which instructions are processed (completed). Instructions per second.
What is the relationship between latency and throughput for the single-cycle datapath?
For the single-cycle datapath, they are simple reciprocals of each other. It takes exactly one clock period to fetch and execute an instruction using the single-cycle datapath, and one instruction is completed for each clock period. In this case, the instruction rate (throughput) is the reciprocal of the clock period.
What is the equation for the time it takes to execute a program?
Seconds/Program = Instructions/Program × Clock cycles/Instruction × Seconds/Clock cycle
What determines the minimum clock period for the single-cycle datapath? (General answer.)
The largest number of propagations delays necessary to process the slowest instruction.
Which one instruction determines the minimum clock period for the single-cycle datapath, and why?
lw, because it requires the longest sequence of dependencies: instruction fetch, register read, compute address, read from memory, write to register file.
What is a perfectly balanced pipeline?
One in which the number of propagation delays in each stage are exactly equal.
What is the relationship between latency and throughput for a pipelined datapath?
Latency is the sum of the propagation delays in all the stages for a perfectly balanced pipeline. But if the pipeline is not perfectly balanced, the latency is n times the number of propagation delays in the slowest stage, where n is the number of pipeline stages.
What is the maximum clock speed for a perfectly balanced n-stage pipelined datapath compared to a single-cycle datapath implemented using the same technology (same propogation delays for the gates)?
It will be n times faster.
What is the instruction latency for a perfectly balanced n-stage pipelined datapath compared to a single-cycle datapath implemented using the same technology?
The latency does not change.
What is the instruction throughput for a perfectly balanced n-stage pipelined datapath compared to a single-cycle datapath implemented using the same technology?
It will be n times greater.
Define structural, data, and control hazards.
Structural: when two different instructions need the same piece of computational hardware at the same time and one has to wait for the other. Data: When an instruction requires an operand value that is not available at the time it is needed. Control: when a branch instruction causes instructions already partially executed to be abandoned.
Give an example of a structural hazard and tell how it can be eliminated.
Trying to use the ALU to compute the BTA and the difference between R[rs] and R[rt] at the same time. The solution is to add a separate adder to the datapath for computing the BTA.
Give an example of a data hazard and give the name of the technique that can be used to minimize its effect.
When an operand from an instruction is a result produced by a previous instruction that is still working its way through the pipeline. Register Forwarding is the name of the technique that gets the needed result from a “downstream” pipeline register without waiting for it to work its way down to the last stage and written back to the register file.
How are unavoidable data hazards dealt with in the MIPS instruction set architecture?
Delayed loads is the technique that defines the MIPS instruction set so that the next instruction following a lw instruction is guaranteed not to receive the value fetched by the lw instruction; it gets the previous value.
Give an example of a control hazard and name a technique that can be used to minimize its effect.
All conditional branch instructions introduce control hazards. One solution is speculative execution in which part of the datapath hardware id duplicated so that both the instructions immediately following the branch instruction and those starting at the BTA start being processed while waiting for the the decision whether to branch or not has been resolved.
What type of hazard is register forwarding used for?
Data hazards: see above.
Summarize how register forwarding works.
When an instruction enters the ID stage, its rs and rt register numbers are compared to the destination register numbers of all instructions currently ahead of it in the pipeline. If there is a match, the data from the corresponding pipeline register is substituted for the R[rs] or R[rt] output of the register file as necessary.
What is the advantage of deep (many stages) pipelines?
The throughput increases proportionately to the number of stages, because the clock speed increases in proportion to the number of stages.
What is the disadvantage of deep pipelines?
The penalty for control hazards is greater than for shallower pipelines because more partially completed instructions have to be discarded.
Write a sentence that summarizes the contents of all pipeline registers.
Each pipeline register holds all the control and data information that will be needed in successive stages of the pipeline to complete execution of the instruction at the current stage of the pipeline.
If a pipeline has n stages, how many places does the system clock connect to?
n + 1: One for the PC, one for the register file, plus n - 1 for the pipeline registers.

	RegWrite	MemWrite	MemRead
R-type	1	0	0
lw	1	0	1
sw	0	1	0