So, we're gonna look at, two, ways to go
about implementing Snoopy cash coherence.
Update protocols, and validation
protocols. And we're gonna be focusing
today's lecture on what are called writes
invalidation or write invalidate
protocols. They're, they're more common
today. but I did want to bring up right
update or what are called right broadcast
protocols. In this other sense they kind
of, mirror something like, a write through
cache, versus a, write back cache. But
it's not, it's not the same thing. You can
actually mix and match write back and
write through with these different ideas
here. But let's, let's look at what these,
these two, ideas in the Snoopy protocols
are. So in a write update, whenever a
processor does a write and broadcasts it
across the bus to every other processor.
And every other processor can have data
in. Cache, it just updates the data in
place, because the broadcast, it has all
the data, it has the data up, up to date.
validation on the other hand, we're gonna
look at when a occurs it'll actually
invalidate other cache copies, the other
caches. Okay so, these two cases update ,
update . . The first case is the processor
takes a write miss. So, it takes a write
miss. Well, it's gonna broadcast the data
onto the bus, and update everybody. So
it's just sort of writing through. But not
just writing through to main memory. It's
also writing through to everyone else's
cache. What's nice is, when someone goes
to do a read. Well, there's no real. Like,
the, there, there's a read miss if you
just go out to main memory. And you're
guaranteed that main memory always has the
most up to date version. if it was in
your. The cache before. And other cores
and other caches broadcast will update
your value. So it's in your cache. You
have the to be a also in these update
protocols. Not the case with invalidation
protocols. So, our right invalidation
protocols. We're gonna have a, let's say
there's a right miss. Well, on a right
miss, you actually go in and validate. All
the other caches, copies of it, before you
can go get the data. So if another cache
had the data, it's going to cough up the
data and put it back in the main memory.
And then you're gonna go off to main
memory and get the data from there, in,
our, our naive invalidation protocol. So,
first step is if you want to do a write
and it's not in your cache, and, you don't
have Let's say a certain has access to it.
You're going to yell out to the bus, I
want to ride to this address. The caches
are going to. If someone responds and says
I have that data already. You have to wait
for them to put it on main memory, and
then once it's on main memory you can go
get that data. By repeating this.
Something kind of similar. you are
basically, when you yell does anyone have
it. Someone's data is not in your cache it
might be in someone else's cache. Someone
might pipe up and say I have that data and
I have it dirty. So I'm willing to take it
and put it back to main memory and you can
go get it from main memory but there's
other cases that can happen in this
protocol or in invalidated protocols. So
other things that can happen here are you
can say that who has that data? Does
someone else might have that data but they
might not have a legal copy of the data so
they might not pipe up. It might not say,
I have a copy of it. And instead, you
might just be able to get that data from
the other. So, let's go through a couple
different cases, another, a couple
different protocols that drop out of this.
The first one we're gonna look at is
called MSI, and it just stands for the
three states that each cache line is gonna
be in. so for these cache coherence
protocols, we typically try to track what
state they're in and whether they're
shared or not, on a per-cache-line basis.
Not on a per word basis. similarly, it's
like how we, you know? We only track the
tag bits and the valid bit and the dirty
bits, per a cache line basis. Because it's
just less bookkeeping to keep around. So
here we have our cache l-, or our cache
tag. And before, we had, let's say, a
valid bit and a dirty bit. And then the,
address portion of the tag. . Or the tag
portion of the address, rather. And in
this step, what we're gonna do, is,
instead of having a valid bit and a dirty
bit. . We'll use these bits to encode a
state. There's a little state machine in
there, practically. And, first off,
there's three different states here.
Modified, shared, and . Before we go
through the protocol, why don't you just,
say what these three different states,
roughly, roughly mean. So, in NSI
protocol, means for that particular line.
You don't know anything about it. It's not
in your cache. This is the equivalent.
Though it did not be valid you know, so a
caption so about, not, not, not here. Just
because one coster is cash as it me
bellasty does not mean a different mustard
can't have any different states. So I want
to make the big point here that the states
are not vocal. It's not like per cash line
the state is in the, the cash log is in
the end state we'll say. No, instead. Each
cache keeps track of a line. And it might
say whether it's actually in the cache,
not in the cache, and what state it's in.
And it can be different states. Now there
are some cross, products of cases which
are just not allowed. So we'll see, In
this protocol, two caches to not have the
same cache line and modified statement at
the same time, the modified state. Cuz
that would basically mean that two caches
are trying to modify the same piece of
data. And that's, that's what we're
basically using the . protocol . Okay? .
Okay. So, three of the states here. I is
invalid. It means you don't really know
anything about that line. S is shared,
which means that you have a read only
copy, or processor one here. Processor one
has a read only copy of that data. And
other caches can also have read only
copies. That data. Hence the, the term
shared. Because shared. But only in a
sense. So if everyone around here has
copies, we're not gonna have any coherence
problems. We start to have problems when
multiple people have writable copies. So
in this protocol, we're gonna have this
modified state here. Meaning that, that
line is now writable by a particular
cache. And we will actually have to
transition to this state. Before we can do
any rights. And everyone, as I said
before. If one processor has, or one cache
has, a cache state. It's modified state.
Everyone else needs to have an invalid
state. Or, have not in the cache. And this
is gonna guarantee us in hardware, that
the cache line is effectively, only being
read by one cache or one processor at a
time. Okay, so let's, let's look at some
of the, the edges here in this stage
transition diagram here. So the first
thing that we're going to look as is
cluster one, we'll say starts out with a
line not being in its cache, or turns out
with a line being of its cache. So our
first state arrow here is that cluster one
wants to read. It doesn't wanna write the
data, so it doesn't read here. And.
Invalid before or wasn't in the cache. So
it'll do a read from main memory. And
reading the cache and sharing. Now, we
have other trying to read this. And in our
protocol. What the other are gonna do is
when they. They're gonna actually issue a
request for that line. And if someone else
has something to say about it, they have
to pipe up during that, during that time.
And we'll see how that happens. Later. But
right now, other processors try to read
the data. Read that, read that line. No
one pipes up. And even the first guy, who
has this, this data in the shared state,
doesn't say anything. So the other
processor's gonna still read it from main
memory. And their transitions effectively
look the same. They're just going to load
it into their cache in the shared state.
So pull it in, and share. Okay. So now,
we're gonna look at this from a
perspective of P1. If you want, had it in
the shared state. And some other processor
wants to do a write. So they do a request
for exclusive access of that line. Because
processor one's cache only had it in the
shared state. It doesn't need to actually
say anything in response to that request
for exclusive access. But what it does
need to do is it needs to self invalidate
its own data. So . But instead, what it
has to do is transition. From s to invalid
because the data ends' catch is no longer
good because someone else is about to go
write it. So before it can write that
other coster issues its intent to write
and at that time it actually does its
state transition in its own catch. Okay so
let's look at the flip side of that let's
say processor one thinks that it wants to
do a write. Well so what processor one's
gonna do is its going to issue this
request to write or intent to write
transaction across the bus. And everyone
else is gonna do that transition. And
after it does this intent to write
transaction here, it's going to transition
from the S state to the M state. And only
at that time is it able to do the write,
once it gets into the M state. And it can
also do further reads and writes. It
doesn't have to ask anybody. So this is
pretty cool . invalidation protocol, you
can do reads and writes. We don't have to
broadcast any information across the bus.
It has the modified state. This processor
can read the data. It can write the data.
now everyone else can't read or write or
do anything to it during this time. But
that's okay, because typically in our
shared memory applications, one
processor's doing most of the writing. And
sometimes processors can go read it. And
typically that, you know, everyone's not
trying to fight for the same piece of
data. Okay. So now let's Look at the case
where in the, the modified state and some
other processor wants to write the data.
Well, this is a little bit more complex
because now the other processor is going
to issue this intent to write across the
bus, a request for to cross the bus, . And
I think someone else wants to move to the
modified state, so before they can move to
the modified state they need R, . Runs
most up to date data. So what's gonna
happen is p1 is actually going right back
to data at that point to main memory i n
our, in our basic model and assigned model
here. So get it right back to main memory.
And then the other processor is going to
be able to get it from main memory. And
transition into the m state from the
invalid state. And we'll, we'll see that
arc in a second. You could think about
optimizations here where you try to pass
the data from one processor's invalidate.
Or, one processor's , and another
processor's read. We're not gonna look at
that quite yet. optimization. But, right
now, if you're in the m state, you're
gonna write it back to main memory. And
the other processor's gonna delete it from
main memory, in this, in this state. Okay,
so this brings us to, another important
transition. Here is that, if you have it
modified, and someone else wants to read
the data, you don't actually have to
transition to validate it. You can just
transition to the shared state. And you
probably want to do this, because you're,
otherwise when some other processor tries
to read the data that you just wrote. You
know, like a producer-consumer, style
computation. You would just be effectively
invalidating your, processors cache. And
that would be pretty bad because, you're
effectively fighting locality at this
point. You're fighting. temporal locality.
And so instead when another processor
tries to read the data, we need to write
back that data to main memory. Another
processor has to go get it from main
memory. But you have the most up to date
version. And they only want read access to
it. So you're gonna transition down to the
shared state. So someone else has it
readable, and you have it readable also at
the same time. And finally we're going to
validate back up to, to M. This edge, and
that edge, basically, look around trying
to validate, invalid here. and we're gonna
look at. If you have , if you have a cash
line invalid. And you want to go get, want
a brightness. You're going to scream on
the bus. I want you to get this . And in
the meantime, the other cashiers might
have to do what you had to do in the same
cas e, so they may need to go to a write
back and they may need to say I have that
right now. . And the next Right. You sort
of wait around a little bit and then you
go get it from . And so, do we, do we need
an . yes. You need your own key that for
any given line, that the transactions,
the, the transactions in the bus are
common. And the way you do that is by
having . So people trying to invite lets
say the same line at the same time. So
we'll take two and both have an S. Share
data, and, and copies. And exactly the
same time they both wanna do a write. This
is why we need to issue this intent to
write. And we're not able to do a write
until we successfully get into the M state
here. So what's gonna happen is two
processors are gonna try to issue intent
to write. But the arbiter is only allow
one of those two to occur bus, on the bus
at the same time. It's just, you know,
arbiter says one thing can happen. And the
bus only has, you know, one, one thing can
happen at a time. Someone's gonna win. And
that might be based on sort of priority
encoding, or it might be round-robin,
depending on how you go about doing that,
or make some fairness guarantee. Someone
is going to, one of the two clusters is
gonna win. And when that one cluster wins,
they're the one that is able to take the
bus and issue this request, inu2014,
intent to write, and everyone else has to
play by that. So what's gonna happen is
the, if cluster one wins and sends this
intent to write, and cluster two is about
to send the intent to write. Now that it
sees someone else's intention to write, it
has to basically abort its intention,
attention, attempt to write and instead go
on this transition. And then what they're
gonna do is they're gonna retry the write
again in the future. And when they retry
the write to the future, they're going to
basically come in on this arc here, which
is gonna bump processor one from M to I
after the write has happened. But,
typically if you're careful about, forward
progress guarantees, so you have to be. Be
very caref ul.
You go to implement these things that, if
a cluster goes to do a write and you have
sort of retries happening in the system,
and let's say you get the data modified
into your own cache, you have to guarantee
that, that lobe that's gonna be a store
operation actually posts. Because a very
subtle mistake that's very easy to have
happen here is, you decouple the memory
system from the main processor, to some
extent, and two people are trying to do
writes at the same time, and you just have
lines basically bounce back and forth
between modifying the one cache and
modifying the other cache, modifying. Five
one cache five one daily cache is back and
forth forever, but no one's actually able
to do it in stores. So you have to make
sure you pull in your cache, atomically
though like everything else get it
modified. You are able to post at least
something till at least you have a four or
five year guarantee, and then you have a
processor . Oh. So yeah. Stick a of the
stage transitions relative to the other
stage transitions before that same line
there . Now, for right now we're gonna say
the bus only. Has one transaction
happening at a time. And that's arbitrated
by, some . So, we're not gonna be able to
see other cache lines transition . But you
can think about cases where you can have,
let's say, n transactions happening at the
same time. Number of transactions
happening at the same time. But, the
arbiter, guarantees that none of the same
address or the same , for instance. And
that actually happens in some of these
distributed protocols we'll look at in .
In two lectures, we're gonna look at. We
build these for multiple people to
basically be transitioning their state
diagrams simultaneously, at the same time,
but to different addresses. Okay, hm.
Okay, we're running a little bit low on
time here. I think I'm gonnau2014 Oh,
we'll, we'll go through this example, yes.
though this is basically reiterating what
we said before. Okay, so we have Processor
one, we'll say it has a read. So we're
gonna show two state d iagrams here. For
the same cache line. Officer one does a
read, he's gonna get it . So get the data
from memory. Officer one is right. He's
going to issue an intent to right, get it,
and then in a modified state . Officer two
now must do a read. So the first thing is
it's going to try to read this data, and
that's going to cause officer one to
transition. Does the request for reading
this data as it's trying to transition to
S here. And, and, it's not able to
transition to S until it knows that the
other processor has done the following
transition. It is basically going to do a
write back to main memory because
Processor two is reading. So it's going to
transition from here to there. And now
this modified data is up-to-date in main
memory and Processor two is going to read
the updated state. Okay, so now say
Processor two tries to write. Well, it's
gonna wanna. Transition from here to m. So
issue this intent to write. And at that
point, processor one is going to
transition to the down state. And
processor two gets it modifiable in its
cache. And it's actually to the right.
Okay. So we're gonna look at processor one
now doing a read. It's in the imbalanced
state right now. So it's basically just
going to enter in on this arc here. Though
processor two has it in a modified state.
So to transition. Here down to I. Or,
excuse me. Actually, there back to S. I
guess it's, We're gonna have . So
processor one does the read. Processor two
is gonna do the write back. And it's gonna
transition to the shared state here.
While, processor one transitions from I
back to S. Processor one does the write.
We're sort of back to where we were
before. It's going to issue its intent to
write. And the, processor two here is
going to transition. Transition from S to
invalid at that point. So this is a basic
NSI protocol, at this point. I think
that's the end of this. Yeah. Okay, so now
The last two minutes, we're gonna talk
about more advanced cache coherence
protocols. . . Okay. So quick, quick
observation here. I think we already said
this, is that, no two processors. This
protocol enforces that no two processors.
You can have a line in the m state
simultaneously. Two processors can have a
line in the S state simultaneously.
Everyone can have a line invalid. It's
just invalid. But we can't, can't . And
this really helps us keep coherence of our
memory. Because we're not going to allow
any processor to be able to have a line
exclusive. Or, have a line, and another
processor be able to read that data in the
meantime. But, be careful. This does not
necessarily imply sequential consistency.
This might be an input model, depending
on. Sort of, ordering a lot of things that
you're really not interested in. Okay, so,
we're gonna look at a more complex kind of
performance protocol. And the, the insight
that this protocol is growing out of is
that you don't necessarily need to issue,
how do I put this? in all cases, you may
not. You need to tell everybody that you
want to do it right. So we'll, we'll,
we'll go through this, But the, the basic
idea here is that we can pull in data into
our cache, into this new state here that
we're adding. Because we, we still have M,
S, and I, but we add an E state here,
which stands for exclusive. Exclusive is
actually a read-only state. But it's a
read-only state saying that no one. else
can read it either. And why we add this,
is. If, let's say we do a read, or trying
to do a write to that data sometime in the
future. We can pull it in, in this state,
and not have to ask anybody. We can
guarantee that we don't have to ask
anybody in order to transition to go to a
write to that data. So that's the, the
insight. So, same, same idea here. point
out that, This, you'll see some sometimes
in books, they'll have, they'll call this
the Illinois protocol. It was, developed
at the University of Illinois Champaign.
which is why they call it that. or
sometimes people call it . It's probably
more widely known now as the MESI
protrocol. It's a four-state protocol. It,
and one of the nice things about this is,
if you hav e sort of two state bits, it
doesn't side have three states. Well, we
need two bits to encode that anyway. So it
doesn't actually add any extra storage
space to have an extra state bit here in,
in this protocol. And as I said, this is
effectively the same load of bits we had.
Or we get a valid bit and a dirty bit
before we get cache coherence. Well we now
have sort of notions of dirty and notions
of invalid. And so these other states are
all valid. So we, we can use those exact
same key bits, and we don't need to add
any more storage space to our cache to
implement this actual variance protocol.
So something to think about. So let's,
let's walk, let's walk through all the
different arcs in this diagram. But as I
said the main difference here is we took
before we, we, we took what was before,
the shared state, and we split it into two
states. A new shared state, and an
exclusive state. Which means that it's a
read only copy. But, no one else, will
have a copy of it. sort of guaranteed that
no one else will have a copy of it. And,
by default, when we bring in a cache line
from main memory, and we do a read this,
in this protocol, we're most likely going
to. Bring it into the E state, o let's see
if I can this protocol. We're just gonna
bring it into the E state. So if one cache
does a read of . Instead of having it in
the shared state. Instead, it's going to
bring it into the exclusive state. So
that, that same processor, in the future,
write. It will transition over to this
state with no messages on the . So we, we
effectively in a few, a few sub states.
And it's pretty common for someone to do a
read . Somewhere in a cache line, and then
do a write. . So let's say you're try to
increment that, variable. You're gonna
read it, add one to it, and then write
that same value right back. . Okay. So,
this is gonna look pretty similar to what
we had before. And we're gonna say that a
read miss, can still happen. So, if you're
processer one, and the read miss. And the
data is in someone else's cache, we
transition. into the shared state. So when
you're reading this, you don't get
exclusive for this, this state. But
instead, because since someone else's
cache, they pipe up and say. Oh, this is
already shared. Pull it in as shared. So
we're gonna transition from invalid to
shared out of read miss. But we're also
gonna have another arc, here, which is a
read miss, but it's not shared, it just
came in from main memory. Sort of given
that idea that it's likely that we're
actually going to have a read and then a
write. And you just pull it in without
making contact, anybody . But otherwise,
this actually looks pretty similar to the
MSI that we had before. If it's read
shared, you start off with an s here. any
other processor. Someone responded back,
and says they had it shared. So they know
not to go straight to the E. . But you're
just gonna stay in this, this state. Now.
When, let's say, you have attention right,
you're going to probably . Mm. And they're
going to transition valid. So similarly,
like, before, only one cache is ever gonna
happen in M. But, unlike before, we're
going to say that only one cache can
happen either in M or V. That's what E
means, it's exclusive. . So can read and
write in this, in this state. Now,
exclusive here.
Exclusive.
Processor one can read it, but if anyone
else tried to read it. Oops. Where'd it
go? There we go. If other processor tried
to read it, we're going to actually
transition from E to S, to our shared
state here. But if we have a cache line E,
and we want to do a write, we don't have
to invalidate anybody. We don't have to
arbitrate on the bus. Instead we can just
transition from E to M directly and do the
write. Because we had exclusive already.
the rest looks pretty similar to NSI.
You're unsure, no person wants a fuel
rise, we're in transition from S to I. And
likewise, if we were in the exclusive
state of no possible rise in fuel price.
We're going to transition the I. And note
the difference between this arc and this
arc. So, going to read, we can just
downgrade to shar e.
Other processor's going to write, we need
to make it invalid, from these read only
exclusive stage. . So, what are we . We
added other processors. Reads, this is the
same thing we had down the side here. That
protocol. We had a, we had it modified.
And the other processor wants to do a
write. We need to write back the data, and
transition to shared. And another
processor wants to do a write. We
transitioned from the modified stage to
right back to data. We need to transition
to analyses. So, it looks pretty similar
to SI Except we visioned the shares state
into two different states. , So far. Okay.
So I'm going to stop here today and just
say next time we're gonna dish up, from a
[??]. coming around here and we have
another stage [??]. What is actually used
in the called [??]. And this actually
allows us to start to use cash transfers.
Without having to go out and read memory
to get the data all the time. and then
we'll also talk about what's used in the
sort of, standard Intel processor, which
also allows us to . But effectively adds
another state which functions the same as
our shared state. But allows us to sort
data. Okay, let's stop here for today.