1
00:00:03,200 --> 00:00:10,110
Okay. So lets move up the physical
implementation of buses. And move to some

2
00:00:10,110 --> 00:00:17,389
more logical problems. With sequential
consistency mixed together with coherent

3
00:00:17,389 --> 00:00:24,668
protocols. So we're gonna look at them all
here, we have two processors. Cpu1 and

4
00:00:24,668 --> 00:00:31,487
CPU2. And we are going to introduce caches
into our SMP, or Symmetric Multi

5
00:00:31,487 --> 00:00:38,834
Processor. So we have two caches here. And
looking to address A. So there's two

6
00:00:38,834 --> 00:00:47,334
caches in this memory. Let's look what
happens. Oh, let's say, if we have a right

7
00:00:47,334 --> 00:00:57,142
back cache. And at the beginning of time,
everyone has the 100 for address, when,

8
00:00:57,142 --> 00:01:05,969
when, address A. So cache one, cache two
and the memory all agree that address A

9
00:01:05,969 --> 00:01:13,909
has value 100. Okay. So now, we see that,
let's say. Cpu one wants to update A when

10
00:01:13,909 --> 00:01:20,989
it's 200. Okay. In a write back scenario,
it's just gonna do a store here into this

11
00:01:20,989 --> 00:01:27,719
location. But it's write back. So this
data doesn't show up anywhere else over

12
00:01:27,719 --> 00:01:34,798
here, until it actually gets invalidated.
Now, all of a sudden, if we don't have any

13
00:01:34,798 --> 00:01:42,490
sophisticated protocols. Cpu two here,
let's say, does a read of address A. All

14
00:01:42,490 --> 00:01:48,061
it's gonna get. 100 and not 200. It's
gonna get the wrong out of date value.

15
00:01:48,061 --> 00:01:53,117
Well that, that's not super great in fact
that easily violates a bunch of

16
00:01:53,117 --> 00:01:58,773
consistency now that might have easily had
some other problems like trying to let it

17
00:01:58,773 --> 00:02:04,428
roll backwards like you also might have
you're just completely out of, out of sync

18
00:02:04,428 --> 00:02:09,419
here relative to everything. In a right
through cache you can have the same

19
00:02:09,419 --> 00:02:15,207
problem here and just slightly different
math so let's say you have a right through

20
00:02:15,207 --> 00:02:22,905
cache CP1 does the store and it puts 200
here and 200 there . And CPU2 doesn't ever

21
00:02:22,905 --> 00:02:31,514
see that value. Because there is nothing
to tell CPU2 to update its value. So in

22
00:02:31,514 --> 00:02:40,232
our coherence protocol, we have to think
about how, how this works. So we have two

23
00:02:40,232 --> 00:02:49,276
questions at the bottom of the slide here.
Do these stale values matter? Well, if you

24
00:02:49,276 --> 00:02:55,755
want to communicate to CPU1 to CPU2. At
some point, something needs to have some

25
00:02:55,755 --> 00:03:00,689
mechanism to actually move that data over
into CPU 2's cache, otherwise it'll never

26
00:03:00,689 --> 00:03:04,909
see the new val ue.
So values really, really do matter. And,

27
00:03:04,909 --> 00:03:09,962
and what is our view of shared
programming. there are some, this brings

28
00:03:09,962 --> 00:03:14,658
up an interesting question here. If you,
if you want full sequential consistency,

29
00:03:14,658 --> 00:03:19,651
this clearly is not your . It's just
consistency. Now, one of the questions

30
00:03:19,651 --> 00:03:24,406
that comes up is, what does the programmer
expect? Do they expect full sequential

31
00:03:24,406 --> 00:03:29,698
consistency? Or do they expect something
weaker? Cuz you can think about having a

32
00:03:29,698 --> 00:03:35,681
model like this. But then having special
instructions, we will say, which somehow

33
00:03:35,681 --> 00:03:41,511
have CPU's two invalidated data. Such that
it will pick up, you know, a new, new

34
00:03:41,511 --> 00:03:48,478
value. Or maybe special instruction to
push a store out to the main memory. So

35
00:03:48,478 --> 00:03:54,763
depending on your programming model You
could think about having a very hard to

36
00:03:54,763 --> 00:04:01,800
use programming model. Which would have
data find where it needs to go. But then

37
00:04:01,800 --> 00:04:07,866
you're effectively just doing messaging,
instead of doing shared memory here. be

38
00:04:07,866 --> 00:04:12,800
effect-, effectively sending a message
from CPU one to CPU two, if you were

39
00:04:12,800 --> 00:04:18,266
trying to push the data back and forth. So
a little something to think about there,

40
00:04:18,266 --> 00:04:23,133
is that. If the programmer assumes that
when they write a value, the other

41
00:04:23,133 --> 00:04:28,466
processors will see it. Caches, muck that
all up. Okay. So let's, let's look at a,

42
00:04:28,466 --> 00:04:33,666
a, walk through a example, in a little
more detail. This is actually the example

43
00:04:33,666 --> 00:04:42,234
we had from last class. The two programs.
Or two, two threads. T one and T two here.

44
00:04:42,234 --> 00:04:51,736
T one stores, one to x. T two stores
eleven to y. Program two loads effectively

45
00:04:51,736 --> 00:05:01,609
x and y, and then stores them right off
into x private in wi-fi. Sorry it doesn't

46
00:05:01,609 --> 00:05:12,890
in the flip board. Okay so let's look at a
right back cache. . So we'll start off.

47
00:05:12,890 --> 00:05:22,091
With memory here having initial value of
zero and ten in x and y and we execute t

48
00:05:22,091 --> 00:05:31,292
one and well, we'll say that writes one
and eleven to x and y respectively but its

49
00:05:31,292 --> 00:05:41,840
right back so this doesn't actually have
to get out to make memory okay. Now let's

50
00:05:41,840 --> 00:05:49,006
say the cache in coster one you know
they've. And it needs some space for, the

51
00:05:49,006 --> 00:05:54,960
cache line that y is in. But not x. the
cache lines. And, the one just happens to

52
00:05:54,960 --> 00:05:59,854
get invalid it needs the space for the
next piece of code that's going to

53
00:05:59,854 --> 00:06:05,146
execute. So what's going to happen here
is, let's say y gets pushed out here from

54
00:06:05,146 --> 00:06:10,372
the main memory. As we've shown in the, in
the red there. But x is still zero. Ooh,

55
00:06:10,372 --> 00:06:15,532
does anyone see problems coming up here?
We just all of the sudden, had code, we

56
00:06:15,532 --> 00:06:20,890
had something that was sort of in order,
and you effectively have it out of order.

57
00:06:20,890 --> 00:06:29,286
Data in main memory. Okay, now we go and
execute thread two. Thread two is going to

58
00:06:29,286 --> 00:06:38,200
read what is in main memory, pulling into
its cache. And then write right back write

59
00:06:38,200 --> 00:06:46,079
right back to its own cache. So we get
eleven and zero from y and x. And then

60
00:06:46,079 --> 00:06:54,371
we'll say, cache one writes back x. So the
main memory now has one and eleven. And

61
00:06:54,371 --> 00:07:00,766
then. Finally cache two writes back, X
prime, Y prime, of zero and eleven. So a

62
00:07:00,766 --> 00:07:07,398
couple of things to note here. X, the Xs
any Ys in the X prime and Y prime don't

63
00:07:07,398 --> 00:07:14,282
match , but more importantly this is one
of our, sequentially, consistent,

64
00:07:14,282 --> 00:07:21,250
inconsistent test cases. This is what Casy
said, shouldn't happen. So we've just seen

65
00:07:21,250 --> 00:07:30,467
something where we have write back caches,
and we're trying to see if it can . bunch

66
00:07:30,467 --> 00:07:37,390
of consistency. in its naive case. And the
answer is no. It's not sequentially

67
00:07:37,390 --> 00:07:44,137
consistent. This also, just to make a
point here, can happen with write through

68
00:07:44,137 --> 00:07:52,199
caches, . So let's, You need to sort of
carefully sort of construct a case where

69
00:07:52,199 --> 00:08:00,173
this happened. But the same case here.
because it's write through. We first of

70
00:08:00,173 --> 00:08:07,006
all, execute T1, , T1 here. So actually
push up to main memory one and eleven.

71
00:08:07,006 --> 00:08:13,839
What you notice is an interesting little
thing here that we have X in our cache. So

72
00:08:13,839 --> 00:08:20,589
lets just say there was some false sharing
where there was some blow that happens,

73
00:08:20,589 --> 00:08:27,422
that pulled it into cache too early. Well
right through push from cache one to main

74
00:08:27,422 --> 00:08:33,843
memory. But there was no guarantee. There
was no (compares) for result to push it

75
00:08:33,843 --> 00:08:40,298
down into validate or somehow notify.
Cache two here. So this scale value is

76
00:08:40,298 --> 00:08:47,116
still in cache one in our basic write
through case. Now, we have thread two x

77
00:08:47,116 --> 00:08:53,755
cubed. And it just copies its x to x
prime. Its y to y prime. And it's right

78
00:08:53,755 --> 00:09:01,560
through. So that shows up memory also
here. Oz is, main memory is effectively

79
00:09:01,560 --> 00:09:08,736
inconsistent now. And this is also one of
our cases that, we had shown was not

80
00:09:08,736 --> 00:09:14,490
sequentially consistent. So about
something to guarantee that we have

81
00:09:14,490 --> 00:09:21,610
consistency, we can very carefully have
our hardware get out of consistency when

82
00:09:21,610 --> 00:09:27,450
we introduce cache. Because there's
different places to effectively store

83
00:09:27,450 --> 00:09:33,850
stale data. It can be in main memory, it
can be in someone else's cache, or it can

84
00:09:33,850 --> 00:09:40,330
be in your own cache. So you can have the
stale data in multiple places. I want to

85
00:09:40,330 --> 00:09:47,051
make a point at, at this point of the
lecture to. Differentiate a cache

86
00:09:47,051 --> 00:09:55,207
coherence from a memory consistency model.
There's a lot of words in this slide. But

87
00:09:55,207 --> 00:10:02,479
the main difference is what I'm trying to
get across is that. In a, a catch

88
00:10:02,479 --> 00:10:10,242
coherency model. Or a cache coherence
protocol. Is some algorithm to try to keep

89
00:10:10,242 --> 00:10:18,005
data coherent. So that when you rewrite
memory. Everyone's seeing something they

90
00:10:18,005 --> 00:10:24,570
agree on. But it's the protocol that's
trying to keep it coherent. In contrast, a

91
00:10:24,570 --> 00:10:31,467
memory consistency model is the thing that
our memory coherence is trying to enforce.

92
00:10:31,467 --> 00:10:37,553
It is a abstract set of rules about how
memory should act. So sequential

93
00:10:37,553 --> 00:10:43,639
consistency is one example of that.
Another example, as we talked about last

94
00:10:43,639 --> 00:10:51,510
lecture, was cold store ordering, or
process store ordering, or ordering . So

95
00:10:51,510 --> 00:10:58,621
those are memory. consistency models. And
then you actually have different cache

96
00:10:59,550 --> 00:11:05,591
protocols which try to imp-, excuse me.
Implement those prospective consistency

97
00:11:05,591 --> 00:11:11,710
models. In last lecture, I said that I
didn't think there were any processors

98
00:11:12,639 --> 00:11:19,067
sequentially consistent semantics. And he,
and he proved me wrong. Because there's no

99
00:11:19,067 --> 00:11:25,263
absolutes in life. the MIPS R 10,00
architecture. the out of work processor.

100
00:11:25,263 --> 00:11:31,588
And they . Sequential consistency. like,
switching sequentially consistent memory

101
00:11:31,588 --> 00:11:37,592
consistency model, I think also probably
before 86, in some simpler processors

102
00:11:37,592 --> 00:11:42,682
probably were sequentially consistent.
Just'cause it was ki nd of easy for them

103
00:11:42,682 --> 00:11:48,033
to do. But once you start to put it out of
order, it gets, gets much harder. Which is

104
00:11:48,033 --> 00:11:52,471
why the, the R10000 was actually
interesting. Is'cause it was a super

105
00:11:52,471 --> 00:11:58,997
scour, which had some limited to it. But
they, they effectively the ability to

106
00:11:58,997 --> 00:12:05,931
replay all their . . And so if they found
something that was not supposed to

107
00:12:06,821 --> 00:12:12,826
consistant, that effectively go back, and
restart one of the programs much earlier

108
00:12:12,826 --> 00:12:20,240
and replay all the code. It was
interesting, interesting . About ,, .