1
00:00:03,940 --> 00:00:10,888
So, we're gonna look at, two, ways to go
about implementing Snoopy cash coherence.

2
00:00:10,888 --> 00:00:16,679
Update protocols, and validation
protocols. And we're gonna be focusing

3
00:00:16,679 --> 00:00:22,884
today's lecture on what are called writes
invalidation or write invalidate

4
00:00:22,884 --> 00:00:29,502
protocols. They're, they're more common
today. but I did want to bring up right

5
00:00:29,502 --> 00:00:36,285
update or what are called right broadcast
protocols. In this other sense they kind

6
00:00:36,285 --> 00:00:42,968
of, mirror something like, a write through
cache, versus a, write back cache. But

7
00:00:42,968 --> 00:00:49,273
it's not, it's not the same thing. You can
actually mix and match write back and

8
00:00:49,273 --> 00:00:55,738
write through with these different ideas
here. But let's, let's look at what these,

9
00:00:55,738 --> 00:01:02,043
these two, ideas in the Snoopy protocols
are. So in a write update, whenever a

10
00:01:02,043 --> 00:01:08,508
processor does a write and broadcasts it
across the bus to every other processor.

11
00:01:08,508 --> 00:01:15,360
And every other processor can have data
in. Cache, it just updates the data in

12
00:01:15,360 --> 00:01:23,206
place, because the broadcast, it has all
the data, it has the data up, up to date.

13
00:01:24,413 --> 00:01:33,768
validation on the other hand, we're gonna
look at when a occurs it'll actually

14
00:01:33,768 --> 00:01:44,028
invalidate other cache copies, the other
caches. Okay so, these two cases update ,

15
00:01:44,028 --> 00:01:51,755
update . . The first case is the processor
takes a write miss. So, it takes a write

16
00:01:51,755 --> 00:01:56,676
miss. Well, it's gonna broadcast the data
onto the bus, and update everybody. So

17
00:01:56,676 --> 00:02:01,981
it's just sort of writing through. But not
just writing through to main memory. It's

18
00:02:01,981 --> 00:02:07,030
also writing through to everyone else's
cache. What's nice is, when someone goes

19
00:02:07,030 --> 00:02:11,887
to do a read. Well, there's no real. Like,
the, there, there's a read miss if you

20
00:02:11,887 --> 00:02:17,064
just go out to main memory. And you're
guaranteed that main memory always has the

21
00:02:17,064 --> 00:02:23,048
most up to date version. if it was in
your. The cache before. And other cores

22
00:02:23,048 --> 00:02:30,205
and other caches broadcast will update
your value. So it's in your cache. You

23
00:02:30,205 --> 00:02:39,906
have the to be a also in these update
protocols. Not the case with invalidation

24
00:02:39,906 --> 00:02:46,970
protocols. So, our right invalidation
protocols. We're gonna have a, let's say

25
00:02:46,970 --> 00:02:54,306
there's a right miss. Well, on a right
miss, you actually go in and validate. All

26
00:02:54,306 --> 00:03:01,729
the other caches, copies of it, before you
can go get the data. So if another cache

27
00:03:01,729 --> 00:03:09,061
had the data, it's going to cough up the
data and put it back in the main memory.

28
00:03:09,061 --> 00:03:15,934
And then you're gonna go off to main
memory and get the data from there, in,

29
00:03:16,209 --> 00:03:23,449
our, our naive invalidation protocol. So,
first step is if you want to do a write

30
00:03:23,449 --> 00:03:31,157
and it's not in your cache, and, you don't
have Let's say a certain has access to it.

31
00:03:31,157 --> 00:03:37,940
You're going to yell out to the bus, I
want to ride to this address. The caches

32
00:03:37,940 --> 00:03:45,244
are going to. If someone responds and says
I have that data already. You have to wait

33
00:03:45,244 --> 00:03:52,113
for them to put it on main memory, and
then once it's on main memory you can go

34
00:03:52,113 --> 00:03:58,113
get that data. By repeating this.
Something kind of similar. you are

35
00:03:58,113 --> 00:04:04,480
basically, when you yell does anyone have
it. Someone's data is not in your cache it

36
00:04:04,480 --> 00:04:09,951
might be in someone else's cache. Someone
might pipe up and say I have that data and

37
00:04:09,951 --> 00:04:15,487
I have it dirty. So I'm willing to take it
and put it back to main memory and you can

38
00:04:15,487 --> 00:04:20,372
go get it from main memory but there's
other cases that can happen in this

39
00:04:20,372 --> 00:04:26,495
protocol or in invalidated protocols. So
other things that can happen here are you

40
00:04:26,495 --> 00:04:31,641
can say that who has that data? Does
someone else might have that data but they

41
00:04:31,641 --> 00:04:37,119
might not have a legal copy of the data so
they might not pipe up. It might not say,

42
00:04:37,119 --> 00:04:42,462
I have a copy of it. And instead, you
might just be able to get that data from

43
00:04:42,462 --> 00:04:47,458
the other. So, let's go through a couple
different cases, another, a couple

44
00:04:47,458 --> 00:04:53,495
different protocols that drop out of this.
The first one we're gonna look at is

45
00:04:53,495 --> 00:04:59,115
called MSI, and it just stands for the
three states that each cache line is gonna

46
00:04:59,115 --> 00:05:04,597
be in. so for these cache coherence
protocols, we typically try to track what

47
00:05:04,597 --> 00:05:09,940
state they're in and whether they're
shared or not, on a per-cache-line basis.

48
00:05:09,940 --> 00:05:15,606
Not on a per word basis. similarly, it's
like how we, you know? We only track the

49
00:05:15,606 --> 00:05:21,553
tag bits and the valid bit and the dirty
bits, per a cache line basis. Because it's

50
00:05:21,553 --> 00:05:27,079
just less bookkeeping to keep around. So
here we have our cache l-, or our cache

51
00:05:27,079 --> 00:05:32,186
tag. And before, we had, let's say, a
valid bit and a dirty bit. And then the,

52
00:05:32,606 --> 00:05:38,413
address portion of the tag. . Or the tag
portion of the address, rather. And in

53
00:05:38,413 --> 00:05:43,660
this step, what we're gonna do, is,
instead of having a valid bit and a dirty

54
00:05:43,660 --> 00:05:50,242
bit. . We'll use these bits to encode a
state. There's a little state machine in

55
00:05:50,242 --> 00:05:56,155
there, practically. And, first off,
there's three different states here.

56
00:05:56,155 --> 00:06:02,457
Modified, shared, and . Before we go
through the protocol, why don't you just,

57
00:06:02,690 --> 00:06:08,058
say what these three different states,
roughly, roughly mean. So, in NSI

58
00:06:08,058 --> 00:06:15,061
protocol, means for that particular line.
You don't know anything about it. It's not

59
00:06:15,061 --> 00:06:21,347
in your cache. This is the equivalent.
Though it did not be valid you know, so a

60
00:06:21,347 --> 00:06:26,784
caption so about, not, not, not here. Just
because one coster is cash as it me

61
00:06:26,784 --> 00:06:32,946
bellasty does not mean a different mustard
can't have any different states. So I want

62
00:06:32,946 --> 00:06:39,108
to make the big point here that the states
are not vocal. It's not like per cash line

63
00:06:39,108 --> 00:06:45,018
the state is in the, the cash log is in
the end state we'll say. No, instead. Each

64
00:06:45,018 --> 00:06:50,269
cache keeps track of a line. And it might
say whether it's actually in the cache,

65
00:06:50,269 --> 00:06:55,585
not in the cache, and what state it's in.
And it can be different states. Now there

66
00:06:55,585 --> 00:07:01,164
are some cross, products of cases which
are just not allowed. So we'll see, In

67
00:07:01,164 --> 00:07:06,612
this protocol, two caches to not have the
same cache line and modified statement at

68
00:07:06,612 --> 00:07:11,798
the same time, the modified state. Cuz
that would basically mean that two caches

69
00:07:11,798 --> 00:07:16,589
are trying to modify the same piece of
data. And that's, that's what we're

70
00:07:16,589 --> 00:07:27,266
basically using the . protocol . Okay? .
Okay. So, three of the states here. I is

71
00:07:27,266 --> 00:07:35,026
invalid. It means you don't really know
anything about that line. S is shared,

72
00:07:35,026 --> 00:07:44,317
which means that you have a read only
copy, or processor one here. Processor one

73
00:07:44,317 --> 00:07:51,975
has a read only copy of that data. And
other caches can also have read only

74
00:07:51,975 --> 00:07:58,722
copies. That data. Hence the, the term
shared. Because shared. But only in a

75
00:07:59,662 --> 00:08:06,791
sense. So if everyone around here has
copies, we're not gonna have any coherence

76
00:08:06,791 --> 00:08:13,058
problems. We start to have problems when
multiple people have writable copies. So

77
00:08:13,058 --> 00:08:19,090
in this protocol, we're gonna have this
modified state here. Meaning that, that

78
00:08:19,090 --> 00:08:24,731
line is now writable by a particular
cache. And we will actually have to

79
00:08:24,731 --> 00:08:31,905
transition to this state. Before we can do
any rights. And everyone, as I said

80
00:08:31,905 --> 00:08:40,695
before. If one processor has, or one cache
has, a cache state. It's modified state.

81
00:08:40,695 --> 00:08:48,326
Everyone else needs to have an invalid
state. Or, have not in the cache. And this

82
00:08:48,326 --> 00:08:56,343
is gonna guarantee us in hardware, that
the cache line is effectively, only being

83
00:08:56,343 --> 00:09:03,530
read by one cache or one processor at a
time. Okay, so let's, let's look at some

84
00:09:03,530 --> 00:09:10,158
of the, the edges here in this stage
transition diagram here. So the first

85
00:09:10,158 --> 00:09:16,786
thing that we're going to look as is
cluster one, we'll say starts out with a

86
00:09:16,786 --> 00:09:24,619
line not being in its cache, or turns out
with a line being of its cache. So our

87
00:09:24,619 --> 00:09:31,849
first state arrow here is that cluster one
wants to read. It doesn't wanna write the

88
00:09:31,849 --> 00:09:37,954
data, so it doesn't read here. And.
Invalid before or wasn't in the cache. So

89
00:09:37,954 --> 00:09:43,846
it'll do a read from main memory. And
reading the cache and sharing. Now, we

90
00:09:43,846 --> 00:09:53,242
have other trying to read this. And in our
protocol. What the other are gonna do is

91
00:09:53,242 --> 00:09:59,851
when they. They're gonna actually issue a
request for that line. And if someone else

92
00:09:59,851 --> 00:10:06,221
has something to say about it, they have
to pipe up during that, during that time.

93
00:10:06,221 --> 00:10:12,212
And we'll see how that happens. Later. But
right now, other processors try to read

94
00:10:12,212 --> 00:10:17,747
the data. Read that, read that line. No
one pipes up. And even the first guy, who

95
00:10:17,747 --> 00:10:22,994
has this, this data in the shared state,
doesn't say anything. So the other

96
00:10:22,994 --> 00:10:28,960
processor's gonna still read it from main
memory. And their transitions effectively

97
00:10:28,960 --> 00:10:34,783
look the same. They're just going to load
it into their cache in the shared state.

98
00:10:34,783 --> 00:10:41,396
So pull it in, and share. Okay. So now,
we're gonna look at this from a

99
00:10:41,396 --> 00:10:47,703
perspective of P1. If you want, had it in
the shared state. And some other processor

100
00:10:47,703 --> 00:10:54,206
wants to do a write. So they do a request
for exclusive access of that line. Because

101
00:10:54,206 --> 00:11:00,630
processor one's cache only had it in the
shared state. It doesn't need to actually

102
00:11:00,630 --> 00:11:06,819
say anything in response to that request
for exclusive access. But what it does

103
00:11:06,819 --> 00:11:13,869
need to do is it needs to self invalidate
its own data. So . But instead, what it

104
00:11:13,869 --> 00:11:21,815
has to do is transition. From s to invalid
because the data ends' catch is no longer

105
00:11:21,815 --> 00:11:28,478
good because someone else is about to go
write it. So before it can write that

106
00:11:28,478 --> 00:11:35,142
other coster issues its intent to write
and at that time it actually does its

107
00:11:35,142 --> 00:11:42,318
state transition in its own catch. Okay so
let's look at the flip side of that let's

108
00:11:42,318 --> 00:11:49,323
say processor one thinks that it wants to
do a write. Well so what processor one's

109
00:11:49,323 --> 00:11:55,908
gonna do is its going to issue this
request to write or intent to write

110
00:11:55,908 --> 00:12:01,818
transaction across the bus. And everyone
else is gonna do that transition. And

111
00:12:01,818 --> 00:12:07,728
after it does this intent to write
transaction here, it's going to transition

112
00:12:07,728 --> 00:12:13,945
from the S state to the M state. And only
at that time is it able to do the write,

113
00:12:13,945 --> 00:12:19,854
once it gets into the M state. And it can
also do further reads and writes. It

114
00:12:19,854 --> 00:12:27,030
doesn't have to ask anybody. So this is
pretty cool . invalidation protocol, you

115
00:12:27,030 --> 00:12:32,081
can do reads and writes. We don't have to
broadcast any information across the bus.

116
00:12:32,081 --> 00:12:37,070
It has the modified state. This processor
can read the data. It can write the data.

117
00:12:37,255 --> 00:12:42,182
now everyone else can't read or write or
do anything to it during this time. But

118
00:12:42,182 --> 00:12:46,555
that's okay, because typically in our
shared memory applications, one

119
00:12:46,555 --> 00:12:52,345
processor's doing most of the writing. And
sometimes processors can go read it. And

120
00:12:52,345 --> 00:12:57,088
typically that, you know, everyone's not
trying to fight for the same piece of

121
00:12:57,088 --> 00:13:03,559
data. Okay. So now let's Look at the case
where in the, the modified state and some

122
00:13:03,559 --> 00:13:10,191
other processor wants to write the data.
Well, this is a little bit more complex

123
00:13:10,191 --> 00:13:17,075
because now the other processor is going
to issue this intent to write across the

124
00:13:17,075 --> 00:13:25,554
bus, a request for to cross the bus, . And
I think someone else wants to move to the

125
00:13:25,554 --> 00:13:33,135
modified state, so before they can move to
the modified state they need R, . Runs

126
00:13:33,135 --> 00:13:40,166
most up to date data. So what's gonna
happen is p1 is actually going right back

127
00:13:40,166 --> 00:13:47,464
to data at that point to main memory i n
our, in our basic model and assigned model

128
00:13:47,464 --> 00:13:54,763
here. So get it right back to main memory.
And then the other processor is going to

129
00:13:54,763 --> 00:14:01,527
be able to get it from main memory. And
transition into the m state from the

130
00:14:01,527 --> 00:14:08,348
invalid state. And we'll, we'll see that
arc in a second. You could think about

131
00:14:08,348 --> 00:14:14,530
optimizations here where you try to pass
the data from one processor's invalidate.

132
00:14:14,530 --> 00:14:20,862
Or, one processor's , and another
processor's read. We're not gonna look at

133
00:14:20,862 --> 00:14:27,572
that quite yet. optimization. But, right
now, if you're in the m state, you're

134
00:14:27,572 --> 00:14:33,754
gonna write it back to main memory. And
the other processor's gonna delete it from

135
00:14:33,754 --> 00:14:39,860
main memory, in this, in this state. Okay,
so this brings us to, another important

136
00:14:39,860 --> 00:14:45,068
transition. Here is that, if you have it
modified, and someone else wants to read

137
00:14:45,068 --> 00:14:49,898
the data, you don't actually have to
transition to validate it. You can just

138
00:14:49,898 --> 00:14:54,985
transition to the shared state. And you
probably want to do this, because you're,

139
00:14:54,985 --> 00:15:00,330
otherwise when some other processor tries
to read the data that you just wrote. You

140
00:15:00,330 --> 00:15:05,546
know, like a producer-consumer, style
computation. You would just be effectively

141
00:15:05,546 --> 00:15:10,826
invalidating your, processors cache. And
that would be pretty bad because, you're

142
00:15:10,826 --> 00:15:16,801
effectively fighting locality at this
point. You're fighting. temporal locality.

143
00:15:16,801 --> 00:15:22,292
And so instead when another processor
tries to read the data, we need to write

144
00:15:22,292 --> 00:15:27,571
back that data to main memory. Another
processor has to go get it from main

145
00:15:27,571 --> 00:15:33,907
memory. But you have the most up to date
version. And they only want read access to

146
00:15:33,907 --> 00:15:40,242
it. So you're gonna transition down to the
shared state. So someone else has it

147
00:15:40,242 --> 00:15:46,085
readable, and you have it readable also at
the same time. And finally we're going to

148
00:15:46,930 --> 00:15:52,128
validate back up to, to M. This edge, and
that edge, basically, look around trying

149
00:15:52,128 --> 00:15:58,293
to validate, invalid here. and we're gonna
look at. If you have , if you have a cash

150
00:15:58,293 --> 00:16:06,209
line invalid. And you want to go get, want
a brightness. You're going to scream on

151
00:16:06,209 --> 00:16:11,919
the bus. I want you to get this . And in
the meantime, the other cashiers might

152
00:16:11,919 --> 00:16:17,174
have to do what you had to do in the same
cas e, so they may need to go to a write

153
00:16:17,174 --> 00:16:23,940
back and they may need to say I have that
right now. . And the next Right. You sort

154
00:16:23,940 --> 00:16:31,606
of wait around a little bit and then you
go get it from . And so, do we, do we need

155
00:16:31,606 --> 00:16:39,356
an . yes. You need your own key that for
any given line, that the transactions,

156
00:16:39,356 --> 00:16:45,440
the, the transactions in the bus are
common. And the way you do that is by

157
00:16:45,440 --> 00:16:52,856
having . So people trying to invite lets
say the same line at the same time. So

158
00:16:52,856 --> 00:17:00,424
we'll take two and both have an S. Share
data, and, and copies. And exactly the

159
00:17:00,424 --> 00:17:06,254
same time they both wanna do a write. This
is why we need to issue this intent to

160
00:17:06,254 --> 00:17:11,626
write. And we're not able to do a write
until we successfully get into the M state

161
00:17:11,626 --> 00:17:16,605
here. So what's gonna happen is two
processors are gonna try to issue intent

162
00:17:16,605 --> 00:17:21,846
to write. But the arbiter is only allow
one of those two to occur bus, on the bus

163
00:17:21,846 --> 00:17:26,955
at the same time. It's just, you know,
arbiter says one thing can happen. And the

164
00:17:26,955 --> 00:17:32,209
bus only has, you know, one, one thing can
happen at a time. Someone's gonna win. And

165
00:17:32,209 --> 00:17:37,021
that might be based on sort of priority
encoding, or it might be round-robin,

166
00:17:37,021 --> 00:17:42,339
depending on how you go about doing that,
or make some fairness guarantee. Someone

167
00:17:42,339 --> 00:17:47,405
is going to, one of the two clusters is
gonna win. And when that one cluster wins,

168
00:17:47,405 --> 00:17:52,406
they're the one that is able to take the
bus and issue this request, inu2014,

169
00:17:52,406 --> 00:17:57,852
intent to write, and everyone else has to
play by that. So what's gonna happen is

170
00:17:57,852 --> 00:18:02,917
the, if cluster one wins and sends this
intent to write, and cluster two is about

171
00:18:02,917 --> 00:18:08,406
to send the intent to write. Now that it
sees someone else's intention to write, it

172
00:18:08,406 --> 00:18:13,843
has to basically abort its intention,
attention, attempt to write and instead go

173
00:18:13,843 --> 00:18:19,556
on this transition. And then what they're
gonna do is they're gonna retry the write

174
00:18:19,556 --> 00:18:25,131
again in the future. And when they retry
the write to the future, they're going to

175
00:18:25,131 --> 00:18:30,706
basically come in on this arc here, which
is gonna bump processor one from M to I

176
00:18:30,706 --> 00:18:36,074
after the write has happened. But,
typically if you're careful about, forward

177
00:18:36,074 --> 00:18:39,575
progress guarantees, so you have to be. Be
very caref ul.

178
00:18:39,578 --> 00:18:44,190
You go to implement these things that, if
a cluster goes to do a write and you have

179
00:18:44,190 --> 00:18:48,801
sort of retries happening in the system,
and let's say you get the data modified

180
00:18:48,801 --> 00:18:53,301
into your own cache, you have to guarantee
that, that lobe that's gonna be a store

181
00:18:53,301 --> 00:18:57,745
operation actually posts. Because a very
subtle mistake that's very easy to have

182
00:18:57,745 --> 00:19:02,078
happen here is, you decouple the memory
system from the main processor, to some

183
00:19:02,078 --> 00:19:06,578
extent, and two people are trying to do
writes at the same time, and you just have

184
00:19:06,578 --> 00:19:10,690
lines basically bounce back and forth
between modifying the one cache and

185
00:19:10,690 --> 00:19:15,895
modifying the other cache, modifying. Five
one cache five one daily cache is back and

186
00:19:15,895 --> 00:19:20,922
forth forever, but no one's actually able
to do it in stores. So you have to make

187
00:19:20,922 --> 00:19:25,573
sure you pull in your cache, atomically
though like everything else get it

188
00:19:25,573 --> 00:19:30,726
modified. You are able to post at least
something till at least you have a four or

189
00:19:30,726 --> 00:19:37,388
five year guarantee, and then you have a
processor . Oh. So yeah. Stick a of the

190
00:19:37,388 --> 00:19:42,604
stage transitions relative to the other
stage transitions before that same line

191
00:19:42,604 --> 00:19:47,940
there . Now, for right now we're gonna say
the bus only. Has one transaction

192
00:19:47,940 --> 00:19:54,065
happening at a time. And that's arbitrated
by, some . So, we're not gonna be able to

193
00:19:54,065 --> 00:19:59,940
see other cache lines transition . But you
can think about cases where you can have,

194
00:19:59,940 --> 00:20:05,377
let's say, n transactions happening at the
same time. Number of transactions

195
00:20:05,377 --> 00:20:10,502
happening at the same time. But, the
arbiter, guarantees that none of the same

196
00:20:10,502 --> 00:20:16,752
address or the same , for instance. And
that actually happens in some of these

197
00:20:17,502 --> 00:20:23,741
distributed protocols we'll look at in .
In two lectures, we're gonna look at. We

198
00:20:23,741 --> 00:20:29,117
build these for multiple people to
basically be transitioning their state

199
00:20:29,117 --> 00:20:34,856
diagrams simultaneously, at the same time,
but to different addresses. Okay, hm.

200
00:20:34,856 --> 00:20:40,305
Okay, we're running a little bit low on
time here. I think I'm gonnau2014 Oh,

201
00:20:40,305 --> 00:20:46,480
we'll, we'll go through this example, yes.
though this is basically reiterating what

202
00:20:46,480 --> 00:20:52,147
we said before. Okay, so we have Processor
one, we'll say it has a read. So we're

203
00:20:52,147 --> 00:20:59,363
gonna show two state d iagrams here. For
the same cache line. Officer one does a

204
00:20:59,363 --> 00:21:07,520
read, he's gonna get it . So get the data
from memory. Officer one is right. He's

205
00:21:07,520 --> 00:21:16,134
going to issue an intent to right, get it,
and then in a modified state . Officer two

206
00:21:16,134 --> 00:21:24,291
now must do a read. So the first thing is
it's going to try to read this data, and

207
00:21:24,291 --> 00:21:31,018
that's going to cause officer one to
transition. Does the request for reading

208
00:21:31,018 --> 00:21:35,942
this data as it's trying to transition to
S here. And, and, it's not able to

209
00:21:35,942 --> 00:21:41,198
transition to S until it knows that the
other processor has done the following

210
00:21:41,198 --> 00:21:46,189
transition. It is basically going to do a
write back to main memory because

211
00:21:46,189 --> 00:21:51,511
Processor two is reading. So it's going to
transition from here to there. And now

212
00:21:51,511 --> 00:21:57,034
this modified data is up-to-date in main
memory and Processor two is going to read

213
00:21:57,034 --> 00:22:01,891
the updated state. Okay, so now say
Processor two tries to write. Well, it's

214
00:22:01,891 --> 00:22:07,908
gonna wanna. Transition from here to m. So
issue this intent to write. And at that

215
00:22:07,908 --> 00:22:12,897
point, processor one is going to
transition to the down state. And

216
00:22:12,897 --> 00:22:18,807
processor two gets it modifiable in its
cache. And it's actually to the right.

217
00:22:18,807 --> 00:22:25,100
Okay. So we're gonna look at processor one
now doing a read. It's in the imbalanced

218
00:22:25,100 --> 00:22:31,317
state right now. So it's basically just
going to enter in on this arc here. Though

219
00:22:31,317 --> 00:22:38,087
processor two has it in a modified state.
So to transition. Here down to I. Or,

220
00:22:38,087 --> 00:22:44,530
excuse me. Actually, there back to S. I
guess it's, We're gonna have . So

221
00:22:44,530 --> 00:22:51,050
processor one does the read. Processor two
is gonna do the write back. And it's gonna

222
00:22:51,050 --> 00:22:57,105
transition to the shared state here.
While, processor one transitions from I

223
00:22:57,105 --> 00:23:02,927
back to S. Processor one does the write.
We're sort of back to where we were

224
00:23:02,927 --> 00:23:09,137
before. It's going to issue its intent to
write. And the, processor two here is

225
00:23:09,137 --> 00:23:16,165
going to transition. Transition from S to
invalid at that point. So this is a basic

226
00:23:16,165 --> 00:23:22,788
NSI protocol, at this point. I think
that's the end of this. Yeah. Okay, so now

227
00:23:23,053 --> 00:23:29,588
The last two minutes, we're gonna talk
about more advanced cache coherence

228
00:23:29,588 --> 00:23:37,890
protocols. . . Okay. So quick, quick
observation here. I think we already said

229
00:23:37,890 --> 00:23:44,690
this, is that, no two processors. This
protocol enforces that no two processors.

230
00:23:44,690 --> 00:23:49,593
You can have a line in the m state
simultaneously. Two processors can have a

231
00:23:49,593 --> 00:23:54,302
line in the S state simultaneously.
Everyone can have a line invalid. It's

232
00:23:54,302 --> 00:24:00,560
just invalid. But we can't, can't . And
this really helps us keep coherence of our

233
00:24:00,560 --> 00:24:05,722
memory. Because we're not going to allow
any processor to be able to have a line

234
00:24:05,722 --> 00:24:10,883
exclusive. Or, have a line, and another
processor be able to read that data in the

235
00:24:10,883 --> 00:24:15,980
meantime. But, be careful. This does not
necessarily imply sequential consistency.

236
00:24:15,980 --> 00:24:23,160
This might be an input model, depending
on. Sort of, ordering a lot of things that

237
00:24:23,160 --> 00:24:32,925
you're really not interested in. Okay, so,
we're gonna look at a more complex kind of

238
00:24:32,925 --> 00:24:42,454
performance protocol. And the, the insight
that this protocol is growing out of is

239
00:24:42,454 --> 00:24:52,219
that you don't necessarily need to issue,
how do I put this? in all cases, you may

240
00:24:52,219 --> 00:24:58,832
not. You need to tell everybody that you
want to do it right. So we'll, we'll,

241
00:24:58,832 --> 00:25:06,272
we'll go through this, But the, the basic
idea here is that we can pull in data into

242
00:25:06,272 --> 00:25:13,193
our cache, into this new state here that
we're adding. Because we, we still have M,

243
00:25:13,193 --> 00:25:20,374
S, and I, but we add an E state here,
which stands for exclusive. Exclusive is

244
00:25:20,374 --> 00:25:27,723
actually a read-only state. But it's a
read-only state saying that no one. else

245
00:25:27,723 --> 00:25:32,468
can read it either. And why we add this,
is. If, let's say we do a read, or trying

246
00:25:32,468 --> 00:25:37,455
to do a write to that data sometime in the
future. We can pull it in, in this state,

247
00:25:37,455 --> 00:25:41,774
and not have to ask anybody. We can
guarantee that we don't have to ask

248
00:25:41,774 --> 00:25:46,579
anybody in order to transition to go to a
write to that data. So that's the, the

249
00:25:46,579 --> 00:25:52,357
insight. So, same, same idea here. point
out that, This, you'll see some sometimes

250
00:25:52,357 --> 00:25:57,527
in books, they'll have, they'll call this
the Illinois protocol. It was, developed

251
00:25:57,527 --> 00:26:03,393
at the University of Illinois Champaign.
which is why they call it that. or

252
00:26:03,393 --> 00:26:09,139
sometimes people call it . It's probably
more widely known now as the MESI

253
00:26:09,139 --> 00:26:14,817
protrocol. It's a four-state protocol. It,
and one of the nice things about this is,

254
00:26:14,817 --> 00:26:20,833
if you hav e sort of two state bits, it
doesn't side have three states. Well, we

255
00:26:20,833 --> 00:26:26,308
need two bits to encode that anyway. So it
doesn't actually add any extra storage

256
00:26:26,308 --> 00:26:31,918
space to have an extra state bit here in,
in this protocol. And as I said, this is

257
00:26:31,918 --> 00:26:37,284
effectively the same load of bits we had.
Or we get a valid bit and a dirty bit

258
00:26:37,284 --> 00:26:43,019
before we get cache coherence. Well we now
have sort of notions of dirty and notions

259
00:26:43,019 --> 00:26:48,481
of invalid. And so these other states are
all valid. So we, we can use those exact

260
00:26:48,481 --> 00:26:53,806
same key bits, and we don't need to add
any more storage space to our cache to

261
00:26:53,806 --> 00:26:59,063
implement this actual variance protocol.
So something to think about. So let's,

262
00:26:59,063 --> 00:27:05,208
let's walk, let's walk through all the
different arcs in this diagram. But as I

263
00:27:05,208 --> 00:27:11,488
said the main difference here is we took
before we, we, we took what was before,

264
00:27:11,488 --> 00:27:17,847
the shared state, and we split it into two
states. A new shared state, and an

265
00:27:17,847 --> 00:27:24,290
exclusive state. Which means that it's a
read only copy. But, no one else, will

266
00:27:24,290 --> 00:27:32,176
have a copy of it. sort of guaranteed that
no one else will have a copy of it. And,

267
00:27:32,176 --> 00:27:38,959
by default, when we bring in a cache line
from main memory, and we do a read this,

268
00:27:38,959 --> 00:27:45,228
in this protocol, we're most likely going
to. Bring it into the E state, o let's see

269
00:27:45,228 --> 00:27:51,507
if I can this protocol. We're just gonna
bring it into the E state. So if one cache

270
00:27:51,507 --> 00:27:57,519
does a read of . Instead of having it in
the shared state. Instead, it's going to

271
00:27:57,519 --> 00:28:02,595
bring it into the exclusive state. So
that, that same processor, in the future,

272
00:28:03,397 --> 00:28:09,342
write. It will transition over to this
state with no messages on the . So we, we

273
00:28:09,342 --> 00:28:15,554
effectively in a few, a few sub states.
And it's pretty common for someone to do a

274
00:28:15,554 --> 00:28:23,620
read . Somewhere in a cache line, and then
do a write. . So let's say you're try to

275
00:28:23,620 --> 00:28:29,898
increment that, variable. You're gonna
read it, add one to it, and then write

276
00:28:29,898 --> 00:28:36,829
that same value right back. . Okay. So,
this is gonna look pretty similar to what

277
00:28:36,829 --> 00:28:43,678
we had before. And we're gonna say that a
read miss, can still happen. So, if you're

278
00:28:43,678 --> 00:28:49,793
processer one, and the read miss. And the
data is in someone else's cache, we

279
00:28:49,793 --> 00:28:57,086
transition. into the shared state. So when
you're reading this, you don't get

280
00:28:57,086 --> 00:29:03,061
exclusive for this, this state. But
instead, because since someone else's

281
00:29:03,061 --> 00:29:09,541
cache, they pipe up and say. Oh, this is
already shared. Pull it in as shared. So

282
00:29:09,541 --> 00:29:16,105
we're gonna transition from invalid to
shared out of read miss. But we're also

283
00:29:16,105 --> 00:29:22,586
gonna have another arc, here, which is a
read miss, but it's not shared, it just

284
00:29:22,586 --> 00:29:28,290
came in from main memory. Sort of given
that idea that it's likely that we're

285
00:29:28,290 --> 00:29:33,761
actually going to have a read and then a
write. And you just pull it in without

286
00:29:33,761 --> 00:29:41,101
making contact, anybody . But otherwise,
this actually looks pretty similar to the

287
00:29:41,101 --> 00:29:47,195
MSI that we had before. If it's read
shared, you start off with an s here. any

288
00:29:47,195 --> 00:29:52,734
other processor. Someone responded back,
and says they had it shared. So they know

289
00:29:52,734 --> 00:29:58,690
not to go straight to the E. . But you're
just gonna stay in this, this state. Now.

290
00:29:58,690 --> 00:30:05,272
When, let's say, you have attention right,
you're going to probably . Mm. And they're

291
00:30:05,272 --> 00:30:14,059
going to transition valid. So similarly,
like, before, only one cache is ever gonna

292
00:30:14,059 --> 00:30:21,031
happen in M. But, unlike before, we're
going to say that only one cache can

293
00:30:21,031 --> 00:30:30,104
happen either in M or V. That's what E
means, it's exclusive. . So can read and

294
00:30:30,104 --> 00:30:37,840
write in this, in this state. Now,
exclusive here.

295
00:30:37,840 --> 00:30:40,980
Exclusive.
Processor one can read it, but if anyone

296
00:30:40,980 --> 00:30:47,105
else tried to read it. Oops. Where'd it
go? There we go. If other processor tried

297
00:30:47,105 --> 00:30:52,759
to read it, we're going to actually
transition from E to S, to our shared

298
00:30:52,759 --> 00:30:59,119
state here. But if we have a cache line E,
and we want to do a write, we don't have

299
00:30:59,119 --> 00:31:05,401
to invalidate anybody. We don't have to
arbitrate on the bus. Instead we can just

300
00:31:05,401 --> 00:31:11,840
transition from E to M directly and do the
write. Because we had exclusive already.

301
00:31:12,151 --> 00:31:19,704
the rest looks pretty similar to NSI.
You're unsure, no person wants a fuel

302
00:31:19,704 --> 00:31:27,534
rise, we're in transition from S to I. And
likewise, if we were in the exclusive

303
00:31:27,534 --> 00:31:34,386
state of no possible rise in fuel price.
We're going to transition the I. And note

304
00:31:34,386 --> 00:31:40,335
the difference between this arc and this
arc. So, going to read, we can just

305
00:31:40,335 --> 00:31:44,486
downgrade to shar e.
Other processor's going to write, we need

306
00:31:44,486 --> 00:31:53,233
to make it invalid, from these read only
exclusive stage. . So, what are we . We

307
00:31:53,233 --> 00:31:59,524
added other processors. Reads, this is the
same thing we had down the side here. That

308
00:32:00,433 --> 00:32:06,117
protocol. We had a, we had it modified.
And the other processor wants to do a

309
00:32:06,117 --> 00:32:11,726
write. We need to write back the data, and
transition to shared. And another

310
00:32:11,726 --> 00:32:16,974
processor wants to do a write. We
transitioned from the modified stage to

311
00:32:16,974 --> 00:32:23,062
right back to data. We need to transition
to analyses. So, it looks pretty similar

312
00:32:23,062 --> 00:32:32,707
to SI Except we visioned the shares state
into two different states. , So far. Okay.

313
00:32:32,707 --> 00:32:40,639
So I'm going to stop here today and just
say next time we're gonna dish up, from a

314
00:32:40,639 --> 00:32:47,402
[??]. coming around here and we have
another stage [??]. What is actually used

315
00:32:47,402 --> 00:32:54,500
in the called [??]. And this actually
allows us to start to use cash transfers.

316
00:32:54,500 --> 00:32:59,971
Without having to go out and read memory
to get the data all the time. and then

317
00:32:59,971 --> 00:33:05,309
we'll also talk about what's used in the
sort of, standard Intel processor, which

318
00:33:05,309 --> 00:33:11,515
also allows us to . But effectively adds
another state which functions the same as

319
00:33:11,515 --> 00:33:16,720
our shared state. But allows us to sort
data. Okay, let's stop here for today.