Okay. So lets move up the physical
implementation of buses. And move to some
more logical problems. With sequential
consistency mixed together with coherent
protocols. So we're gonna look at them all
here, we have two processors. Cpu1 and
CPU2. And we are going to introduce caches
into our SMP, or Symmetric Multi
Processor. So we have two caches here. And
looking to address A. So there's two
caches in this memory. Let's look what
happens. Oh, let's say, if we have a right
back cache. And at the beginning of time,
everyone has the 100 for address, when,
when, address A. So cache one, cache two
and the memory all agree that address A
has value 100. Okay. So now, we see that,
let's say. Cpu one wants to update A when
it's 200. Okay. In a write back scenario,
it's just gonna do a store here into this
location. But it's write back. So this
data doesn't show up anywhere else over
here, until it actually gets invalidated.
Now, all of a sudden, if we don't have any
sophisticated protocols. Cpu two here,
let's say, does a read of address A. All
it's gonna get. 100 and not 200. It's
gonna get the wrong out of date value.
Well that, that's not super great in fact
that easily violates a bunch of
consistency now that might have easily had
some other problems like trying to let it
roll backwards like you also might have
you're just completely out of, out of sync
here relative to everything. In a right
through cache you can have the same
problem here and just slightly different
math so let's say you have a right through
cache CP1 does the store and it puts 200
here and 200 there . And CPU2 doesn't ever
see that value. Because there is nothing
to tell CPU2 to update its value. So in
our coherence protocol, we have to think
about how, how this works. So we have two
questions at the bottom of the slide here.
Do these stale values matter? Well, if you
want to communicate to CPU1 to CPU2. At
some point, something needs to have some
mechanism to actually move that data over
into CPU 2's cache, otherwise it'll never
see the new val ue.
So values really, really do matter. And,
and what is our view of shared
programming. there are some, this brings
up an interesting question here. If you,
if you want full sequential consistency,
this clearly is not your . It's just
consistency. Now, one of the questions
that comes up is, what does the programmer
expect? Do they expect full sequential
consistency? Or do they expect something
weaker? Cuz you can think about having a
model like this. But then having special
instructions, we will say, which somehow
have CPU's two invalidated data. Such that
it will pick up, you know, a new, new
value. Or maybe special instruction to
push a store out to the main memory. So
depending on your programming model You
could think about having a very hard to
use programming model. Which would have
data find where it needs to go. But then
you're effectively just doing messaging,
instead of doing shared memory here. be
effect-, effectively sending a message
from CPU one to CPU two, if you were
trying to push the data back and forth. So
a little something to think about there,
is that. If the programmer assumes that
when they write a value, the other
processors will see it. Caches, muck that
all up. Okay. So let's, let's look at a,
a, walk through a example, in a little
more detail. This is actually the example
we had from last class. The two programs.
Or two, two threads. T one and T two here.
T one stores, one to x. T two stores
eleven to y. Program two loads effectively
x and y, and then stores them right off
into x private in wi-fi. Sorry it doesn't
in the flip board. Okay so let's look at a
right back cache. . So we'll start off.
With memory here having initial value of
zero and ten in x and y and we execute t
one and well, we'll say that writes one
and eleven to x and y respectively but its
right back so this doesn't actually have
to get out to make memory okay. Now let's
say the cache in coster one you know
they've. And it needs some space for, the
cache line that y is in. But not x. the
cache lines. And, the one just happens to
get invalid it needs the space for the
next piece of code that's going to
execute. So what's going to happen here
is, let's say y gets pushed out here from
the main memory. As we've shown in the, in
the red there. But x is still zero. Ooh,
does anyone see problems coming up here?
We just all of the sudden, had code, we
had something that was sort of in order,
and you effectively have it out of order.
Data in main memory. Okay, now we go and
execute thread two. Thread two is going to
read what is in main memory, pulling into
its cache. And then write right back write
right back to its own cache. So we get
eleven and zero from y and x. And then
we'll say, cache one writes back x. So the
main memory now has one and eleven. And
then. Finally cache two writes back, X
prime, Y prime, of zero and eleven. So a
couple of things to note here. X, the Xs
any Ys in the X prime and Y prime don't
match , but more importantly this is one
of our, sequentially, consistent,
inconsistent test cases. This is what Casy
said, shouldn't happen. So we've just seen
something where we have write back caches,
and we're trying to see if it can . bunch
of consistency. in its naive case. And the
answer is no. It's not sequentially
consistent. This also, just to make a
point here, can happen with write through
caches, . So let's, You need to sort of
carefully sort of construct a case where
this happened. But the same case here.
because it's write through. We first of
all, execute T1, , T1 here. So actually
push up to main memory one and eleven.
What you notice is an interesting little
thing here that we have X in our cache. So
lets just say there was some false sharing
where there was some blow that happens,
that pulled it into cache too early. Well
right through push from cache one to main
memory. But there was no guarantee. There
was no (compares) for result to push it
down into validate or somehow notify.
Cache two here. So this scale value is
still in cache one in our basic write
through case. Now, we have thread two x
cubed. And it just copies its x to x
prime. Its y to y prime. And it's right
through. So that shows up memory also
here. Oz is, main memory is effectively
inconsistent now. And this is also one of
our cases that, we had shown was not
sequentially consistent. So about
something to guarantee that we have
consistency, we can very carefully have
our hardware get out of consistency when
we introduce cache. Because there's
different places to effectively store
stale data. It can be in main memory, it
can be in someone else's cache, or it can
be in your own cache. So you can have the
stale data in multiple places. I want to
make a point at, at this point of the
lecture to. Differentiate a cache
coherence from a memory consistency model.
There's a lot of words in this slide. But
the main difference is what I'm trying to
get across is that. In a, a catch
coherency model. Or a cache coherence
protocol. Is some algorithm to try to keep
data coherent. So that when you rewrite
memory. Everyone's seeing something they
agree on. But it's the protocol that's
trying to keep it coherent. In contrast, a
memory consistency model is the thing that
our memory coherence is trying to enforce.
It is a abstract set of rules about how
memory should act. So sequential
consistency is one example of that.
Another example, as we talked about last
lecture, was cold store ordering, or
process store ordering, or ordering . So
those are memory. consistency models. And
then you actually have different cache
protocols which try to imp-, excuse me.
Implement those prospective consistency
models. In last lecture, I said that I
didn't think there were any processors
sequentially consistent semantics. And he,
and he proved me wrong. Because there's no
absolutes in life. the MIPS R 10,00
architecture. the out of work processor.
And they . Sequential consistency. like,
switching sequentially consistent memory
consistency model, I think also probably
before 86, in some simpler processors
probably were sequentially consistent.
Just'cause it was ki nd of easy for them
to do. But once you start to put it out of
order, it gets, gets much harder. Which is
why the, the R10000 was actually
interesting. Is'cause it was a super
scour, which had some limited to it. But
they, they effectively the ability to
replay all their . . And so if they found
something that was not supposed to
consistant, that effectively go back, and
restart one of the programs much earlier
and replay all the code. It was
interesting, interesting . About ,, .