Yes.. It has happened again. Whoever said it happens only once in a lifetime was definitely lying. For a reader who has never been in the hearing distance of a Spanish speaker the title translates to “My Second love”. It is my feeble attempt to keep in touch with whatever little Spanish I have learnt so far.
So after heaps I wasn’t completely closed to the possibility that I will come along some idea that would leave me enamoured. Something so fundamentally beautiful that would raise the bar of what I consider an ingenious thought. And so the day came when in my Computer Architecture class at University of Michigan I was taught -
Lets dive a little into what is this idea all about. Modern day processors are out of order and pipelined. The first adjective essentially means that a processor, irrespective of what order the instructions are fed to it, will execute them in the order it likes to gain maximum performance and the programmer shall still see the results that he/she expects. The second adjective simply means that the logic for each instruction is broken down into smaller pieces which can be termed as different stages in the execution.This allows us to start an instruction before the preceding one completes as multiple instructions can be in different stages at the same time. If the longest stage of our execution pipeline took x time units then the best possible throughput we could get is at least one instruction every x time units.
But there are reasons why we don’t quiet get this performance. One of these is data hazards.
Consider this code snippet :
Instruction 1 : MULT R0,R0,R1 –> R1 = R0 * R0
Instruction 2 : ADD R1,R2,R3 –> R3 = R1 + R2
Instruction 3 : MULT R5,R6,R3 –> R3 = R5 * R6
Instruction 4 : ADD R7,R8,R6 –> R6 = R7 + R8
Now instruction 2 cannot begin execution till instruction 1 completes because it needs the value of register R1 which instruction 1 will compute. This is RAW (Read after Write) dependency. Also, instruction 2 and 3 write to the same register R3 and even though the processor could merrily execute them out of order given they don’t effect each other we need to make sure that it is instruction 3 that writes last to register R3. This is WAW dependency (Write after Write). Finally we need to make sure that instruction 3 reads the value of R6 before we let instruction 4 write to it. This is WAR(Write after Read) dependency. Given all this complications, our dear processor might as well execute them in order!!
But this is not what happens in real processors because of the amazing idea of register renaming! If we really think about it, the only true dependency is RAW. By true what I mean is for the program to be correct, it is essential that the value computed by instruction 1 is what should actually be used by instruction 2. But the other dependencies are just name dependencies.
Consider our processor had infinite registers to play around with. But to the programmers we expose only a certain set of registers. These two groups are called : physical registers and architectural registers respectively. It is clear that ,
physical registers >= architectural registers
What register renaming does is :
–Every time we encounter an instruction we give it a new physical register to write to. Essentially rename an architectural location to new physical register.
–We maintain and update the mapping of architecture register -> physical register accordingly.
Now lets assume we have 14 physical registers(P0 to P13) and 10 architectural registers (R0 to R9). Consider that the intial mapping is
R0 – P0, R1 – P1 and so on.
In this sunlight of register renaming (sunlight: a phenomenon which was omnipresent in my life which is now kindof extinct) our code snippet will now be:
Instruction 1 : MULT P0,P0,P10 –> mapping R1 -> P1 changes to R1 -> P10
Instruction 2 : ADD P10,P2,P11 –> mapping R3 -> P3 changes to R3 -> P11, notice that P10 is referred here instead of P1 for R1
Instruction 3 : MULT P5,P6,P12 –> mapping R3 -> P11 changes to R3 -> P12
Instruction 4 : ADD P7,P8,P13 –> mapping R6 -> P6 changes to R6 -> P13
So what we achieved by this is, our processor can now execute instruction 2, 3 and 4 in any order as :
– WAW hazard is removed as both instruction 1 and instruction 2 have different destination registers.
– WAR hazard is removed as instruction 3 will read the right value whether instruction 4 executes before it or not.
That is the amazing concept of register renaming!! If you want to dig deeper here’s a well written paper : Register Renaming Paper