In-order fetch.
So, we have an in-order frontend and we
have an out-of-order issue, writeback and
commit.
So, what this is going to try and do is
we're going to try and fix some of the
problems that we saw, oops, here, where
we'll actually had, let's say this
instruction here, this add is waiting to
issue because our issue was in order.
So, that instruction could issue.
All of its inputs are ready.
R11 was written right there, or something
like that.
So, R11 is ready.
It's ready to issue.
But because we have in-order issue, by
definition, we can't issue out-of-order.
So, now let's look at a machine where we
can issue out-of-order.
So, we actually move it to the execute
units, do our register fetch,
out-of-order.
So, to do that, we actually have to add
another data structure, and this data
structure we're going to call the issue
queue.
And this is going to be something like an
associative FIFO, but it's not going to be
FIFO, we're going to basically going to
store instructions into it, and then we're
going to, we're going to put instructions
into it in order, because our fetch is in
order, or frontend is in order.
But we're going to pull out of it out of
order.
And, but we have to be a little careful,
because when we go to pull out of that
data structure, we have to make sure that
the.
All of the appropriate registers are, are
ready.
So, all of our dependencies are ready or
at least somewhere in the bypass or we can
go pick it off the bypass by the time it
actually needs the value.
Or in our, in our bypass stage.
Let's take a look here.
We still have architectural register file.
We got rid of all that complex reorder
buffer stuff on this example.
So, we don't have a reorder buffer.
We don't have a store buffer.
We have out-of-order commit.
So, we're going to have the same problems
we had before with precise exceptions in
this pipe but we, it's going to give us
sort of easy introduction to understanding
what the issue queue looks like.
So, the issue queue gets written when
instructions enter it.
Sort of at the decode stage.
This is, itself is a register structure.
So, it's like, flip flops in here and a
bunch of logic and stuff.
So, you can just go from this stage to
this stage in the pipe without having any,
registers.
So it gets read in there and things are
basically going to update into this.
So, when things actually end up in the
architecture register file, we're going to
mark bits in the issue queue, saying, oh,
that register you were waiting on, it's
now ready.
And if you get, let's say, two registers
that are ready and you're dependent on two
registers, you can issue.
And we can issue out-of-order.
Okay.
So, I have read and write here in the, the
issue stage.
Why do I, why do I have that?
Well, when you actually go to issue it you
basically want to mark that instruction
as, oh, yeah, I actually did issue this
instruction.
You want to, sometimes people build this
as actually remove the instruction from
the issue queue.
In this basic case, we're gonna leave it
in the issue queue until the end of the
pipe because we need some way to track
the, the liveness of the instruction.
But the, the next processor we're
basically gonna think of it as sort of a.
I, I won't say FIFO, because it's not
strictly first in first out, but it's,
it's a data structure where you can put
stuff in and then remove stuff out of.
So, it's a buffer or a multi entry buffer.
Okay, let's, let's take a look inside the
issue queue.
And this is assuming something like MIPS.
So, in our issue queue here we're going to
have.
First of all, we're going to actually
going to have the Opcode, because we're
going to have multiple instructions sort
of ganging up in here.
So, it's possible it could be like, I
don't know, in this case one, two, three,
four, five.
There's five instructions just sort of
sitting in this, in this data structure.
So, just sort of slacking.
There's a buffer and a slack in the front
of your pipe, effectively.
You might need the immediate's values,
because that's sort of part of the Opcode,
or part of the instruction, and then, then
we're going to basically have things for
the three different registers.
And what were going to put in here is we
are going to put the register identifiers,
so.
Here let's look, let's look at the sources
first, because that makes a little more
sense.
The V bit is going to say that this
instruction needs the source.
So, it's possible that, as we said, you
know, there's some like an immediate
instructions don't read both source
operands.
So, for an immediate, only one, probably
this valid bit is going to be set and this
one is going to be zero.
The P bit is pending.
So, what that means is somewhere, later in
the pipe there is a in-flight instruction
that writes to that register.
So, we're going to basically track that
with a bit here saying that it, that iis
still pending.
Now, the reason we actually have to keep
that pending information and the validity
informations are separate and why need to
keep both of them is it's possible that
multiple instructions could light up
simultaneously as being ready to go.
So, if you have, let's say, two
instructions that are both dependent only
on register five and register five is not
ready, it's, it's the destination of a
multiply.
But then register five gets ridden.
It's going to actually clear the pending
bits.
So, it's no longer pending and it might be
in multiple places.
So, it might be like here and here or
maybe even there.
It's going to clear the pending bits, and
whoever is trying to read register five,
and then, because we're only let's say
going to have single issue on this
processor.
You can only pull one thing out at a time.
So, you need to sort of pull one thing out
and you need someway to leave the other
instruction in the issue queue having all
of it's operands ready.
So, that's how you do that.
Okay, so how do we figure out if an
instruction is ready?
Well, we see if the if the, if it actually
uses it, that the operand, uses the,
excuse me, the, particular upper and
identifier and we see if it's pending.
And then we have to check the other, other
for, for two inputs.
We need to check the two inputs, and then
we also have to make sure there's no
structural hazard somewhere else in the
pipe, like if we're trying to schedule
write, write ports.
And we're going to use the scoreboard to
do that.
The other thing is if we want high
performance, we probably don't want to
have to wait for values to get to the end
of the pipe.
So, in reality, this, this, this is even
going to get more complicated because it's
going to be, these things plus information
coming from the scoreboard which is when,
when a particular register identifier is
ready or when a particular register is
ready.
Let's talk about the destination here.
This just tracks whether an instruction
writes a destination or not.
And like I said, in these, in these basic
pipelines, we're going to leave
instructions in the instruction queue
until they get to the end of the pipe.
And when it gets to the end of the pipe,
this is where we go to check to see, what
other locations we need to, sort of, clear
out.
So, if there's an instruction, let's say,
sitting here, which has the valid bits
set, and its range of register five is the
destination.
When that instruction commits, we're going
to clear it out of, the issue queue.
And, we're going to say, oh well it wrote
register five.
Let's scan through all these other places
and look for places that say register
five.
And if they say register five, we are
going to flip the pending bid from pending
to not pending anymore and we're going to
remove the instruction from the issue
queue.
Okay, so, centralized versus distributed
issue queues.
In a, in a, in a sort of logical sense, in
a perfect world it's probably nice to have
a big centralized issue queue.
You can scan over all the instructions.
You don't have to sort of look around, but
this can sometimes be harder to implement,
because you have to put all of your
instructions in one location.
And sometimes they don't necessarily sort
of cross communicate very well.
Like floating points units.
They have floating point registers versus
integer register files is one type.
They don't necessarily need a whole lot of
communication.
So, you could have distributed instruction
queues where you sort of steer let's say
here the execute or ALU and memory ops to
one of these for these functional units.
They have a different instruction queue
just for, multiplies or something like
that.
I'm going to, put a paper on the website
for you guys all to read.
It's Tomasulo algorithm.
It's a very famous paper, and in that they
talk about distributed instruction queues.
Strangely enough, sort of, the first place
in literature that these instruction
queues showed up was around that time, or
in that paper, and they actually go
straight to the distributed version and
kind of skip the centralized version,
which I always found a little bit odd but
lots of people build centralized ones
today.
One of the, one of the reasons that the I
think the Tomasulo algorithm people went
for this distributing one first is because
they were actually implementing it in
multiple discreet chips.
So they had a issue cube per chip.
So, they had a floating point chip and a
integer chip and they steered the
instructions.
They didn't wanna have to have one data
structure that cross, cross two chips.
Today we have, you know, lots of
integration.
So, we don't have to worry about that as
much.
Okay, so, I just wanted to briefly say,
this is a question that came up last time,
and scoreboards.
If you have to worry about
write-after-write hazards in the pipe.
The things we are talking about today, we
didn't have to really worry about that.
But this is something to talk about next
time, we're gonna have to think about
better scoreboards.
It looks the same as the previous
scoreboard, but you might need to keep the
functional unit, or you need to keep the
functional unit number, or some bits that
represent the functional unit, and then
those march down the pipe versus 1s
marching down the pipe.
You have, let's say, different numbers,
like one, two, three, or you have other
things marching down the pipe.
But that's only if you have to track
write-after-write hazards.
Okay.
So.
That was a quick aside.
Let's look at this in order, fetch,
out-of-order issue, out-of-order,
writeback and out-of-order commit
processor in a pipeline diagram.
And I'm going to show a new little thing
in our pipeline diagram here, we have a
lowercase i.
Which means the instruction enters the
issue queue.
And then, when it exits the issue queue,
it, it just starts going down into the
issue stage of the pipe.
And, trying to think where at, where am I?
This is not that complicated of a drawing.
Here I'm actually showing the, the issue
queue.
What is in the issue queue.
So I have two, or excuse me, three issue
queue slots.
Because, this is a relatively simplistic
pipe.
The, this first instruction which goes to
execute register two and register three
are just already ready, so we don't have
to worry about it.
But an example of something that's more
interesting, let's say, is, I don't know.
Where is it going.
Here we go.
This will enter the issue queue.
But register twelve does not become ready
until late.
So, once that becomes, comes ready, we can
basically start pulling things out of the
issue queue, and, and issue that
instruction.
So that, that shows up.
Let's see if we can show that here,
register, what is this, fourteen, twelve?
Okay.
So this one here, we're waiting for, that
to happen.
Let's see.
Yeah, we're waiting for no, where is it
twelve, just add.
[inaudible] this long.
Wrong.
Yes.
That is actually is what I want to show
here.
This is interesting.
That value actually becomes ready, early.
But.
Is that what's actually happening here,
the value becomes ready at the end of this
execute stage.
But it can't go to issue.
At that cycle because there's something
else issuing.
So, it has to be pushed out.
So, we're only doing a single issue per
cycle.
If we had multiple issues, you could think
of something more interesting happening
here.
Yeah, so this is actually a really
complicated case because if we pull this
back let's say to here we get a writeback
hazard.
The W would conflict with the other W.
If we try to issue here, sorry, that was a
reissue here.
If we try to issue, here we get a
structural hazard on the issue.
So, it actually gets pushed way out there.
So, it's kind of a, kind of a bummer.
So, sometimes you just lose.
Okay, so, here's a interesting case.
So, let's assume that all the instructions
are preloaded into the issue queue.
So, showing this, we have fetch, decode,
issue, issue, issue, and basically this
thing's just sort of sitting in the issue
queue and then we start looking.
What happens?
Those the performance get better than the
previous example?
So, interesting enough, even you if you
sort of issue everything into the pipe
early and then just sort of fill up your
issue queue.
Pulling out of the issue queue can still
be limiter and the number of ALUs can
still be a limiter.
So, there is other structural hazards.
The performance of this code in this
pipeline does not actually get any better
which is sort of interesting.
You think, Oh, well, I don't have to wait
for things to dribble into my instruction
queue.
I just sort of preload my instruction
queue.
But the performance really, really doesn't
end up being better.
So this motivates us going to both out of
order mixed with duplication of
structures.
Like, maybe we can issue two instructions
at the same time.
And maybe we can have two ALUs.
So this is, motivates us having a
superscalar.