1 00:00:03,940 --> 00:00:10,888 So, we're gonna look at, two, ways to go about implementing Snoopy cash coherence. 2 00:00:10,888 --> 00:00:16,679 Update protocols, and validation protocols. And we're gonna be focusing 3 00:00:16,679 --> 00:00:22,884 today's lecture on what are called writes invalidation or write invalidate 4 00:00:22,884 --> 00:00:29,502 protocols. They're, they're more common today. but I did want to bring up right 5 00:00:29,502 --> 00:00:36,285 update or what are called right broadcast protocols. In this other sense they kind 6 00:00:36,285 --> 00:00:42,968 of, mirror something like, a write through cache, versus a, write back cache. But 7 00:00:42,968 --> 00:00:49,273 it's not, it's not the same thing. You can actually mix and match write back and 8 00:00:49,273 --> 00:00:55,738 write through with these different ideas here. But let's, let's look at what these, 9 00:00:55,738 --> 00:01:02,043 these two, ideas in the Snoopy protocols are. So in a write update, whenever a 10 00:01:02,043 --> 00:01:08,508 processor does a write and broadcasts it across the bus to every other processor. 11 00:01:08,508 --> 00:01:15,360 And every other processor can have data in. Cache, it just updates the data in 12 00:01:15,360 --> 00:01:23,206 place, because the broadcast, it has all the data, it has the data up, up to date. 13 00:01:24,413 --> 00:01:33,768 validation on the other hand, we're gonna look at when a occurs it'll actually 14 00:01:33,768 --> 00:01:44,028 invalidate other cache copies, the other caches. Okay so, these two cases update , 15 00:01:44,028 --> 00:01:51,755 update . . The first case is the processor takes a write miss. So, it takes a write 16 00:01:51,755 --> 00:01:56,676 miss. Well, it's gonna broadcast the data onto the bus, and update everybody. So 17 00:01:56,676 --> 00:02:01,981 it's just sort of writing through. But not just writing through to main memory. It's 18 00:02:01,981 --> 00:02:07,030 also writing through to everyone else's cache. What's nice is, when someone goes 19 00:02:07,030 --> 00:02:11,887 to do a read. Well, there's no real. Like, the, there, there's a read miss if you 20 00:02:11,887 --> 00:02:17,064 just go out to main memory. And you're guaranteed that main memory always has the 21 00:02:17,064 --> 00:02:23,048 most up to date version. if it was in your. The cache before. And other cores 22 00:02:23,048 --> 00:02:30,205 and other caches broadcast will update your value. So it's in your cache. You 23 00:02:30,205 --> 00:02:39,906 have the to be a also in these update protocols. Not the case with invalidation 24 00:02:39,906 --> 00:02:46,970 protocols. So, our right invalidation protocols. We're gonna have a, let's say 25 00:02:46,970 --> 00:02:54,306 there's a right miss. Well, on a right miss, you actually go in and validate. All 26 00:02:54,306 --> 00:03:01,729 the other caches, copies of it, before you can go get the data. So if another cache 27 00:03:01,729 --> 00:03:09,061 had the data, it's going to cough up the data and put it back in the main memory. 28 00:03:09,061 --> 00:03:15,934 And then you're gonna go off to main memory and get the data from there, in, 29 00:03:16,209 --> 00:03:23,449 our, our naive invalidation protocol. So, first step is if you want to do a write 30 00:03:23,449 --> 00:03:31,157 and it's not in your cache, and, you don't have Let's say a certain has access to it. 31 00:03:31,157 --> 00:03:37,940 You're going to yell out to the bus, I want to ride to this address. The caches 32 00:03:37,940 --> 00:03:45,244 are going to. If someone responds and says I have that data already. You have to wait 33 00:03:45,244 --> 00:03:52,113 for them to put it on main memory, and then once it's on main memory you can go 34 00:03:52,113 --> 00:03:58,113 get that data. By repeating this. Something kind of similar. you are 35 00:03:58,113 --> 00:04:04,480 basically, when you yell does anyone have it. Someone's data is not in your cache it 36 00:04:04,480 --> 00:04:09,951 might be in someone else's cache. Someone might pipe up and say I have that data and 37 00:04:09,951 --> 00:04:15,487 I have it dirty. So I'm willing to take it and put it back to main memory and you can 38 00:04:15,487 --> 00:04:20,372 go get it from main memory but there's other cases that can happen in this 39 00:04:20,372 --> 00:04:26,495 protocol or in invalidated protocols. So other things that can happen here are you 40 00:04:26,495 --> 00:04:31,641 can say that who has that data? Does someone else might have that data but they 41 00:04:31,641 --> 00:04:37,119 might not have a legal copy of the data so they might not pipe up. It might not say, 42 00:04:37,119 --> 00:04:42,462 I have a copy of it. And instead, you might just be able to get that data from 43 00:04:42,462 --> 00:04:47,458 the other. So, let's go through a couple different cases, another, a couple 44 00:04:47,458 --> 00:04:53,495 different protocols that drop out of this. The first one we're gonna look at is 45 00:04:53,495 --> 00:04:59,115 called MSI, and it just stands for the three states that each cache line is gonna 46 00:04:59,115 --> 00:05:04,597 be in. so for these cache coherence protocols, we typically try to track what 47 00:05:04,597 --> 00:05:09,940 state they're in and whether they're shared or not, on a per-cache-line basis. 48 00:05:09,940 --> 00:05:15,606 Not on a per word basis. similarly, it's like how we, you know? We only track the 49 00:05:15,606 --> 00:05:21,553 tag bits and the valid bit and the dirty bits, per a cache line basis. Because it's 50 00:05:21,553 --> 00:05:27,079 just less bookkeeping to keep around. So here we have our cache l-, or our cache 51 00:05:27,079 --> 00:05:32,186 tag. And before, we had, let's say, a valid bit and a dirty bit. And then the, 52 00:05:32,606 --> 00:05:38,413 address portion of the tag. . Or the tag portion of the address, rather. And in 53 00:05:38,413 --> 00:05:43,660 this step, what we're gonna do, is, instead of having a valid bit and a dirty 54 00:05:43,660 --> 00:05:50,242 bit. . We'll use these bits to encode a state. There's a little state machine in 55 00:05:50,242 --> 00:05:56,155 there, practically. And, first off, there's three different states here. 56 00:05:56,155 --> 00:06:02,457 Modified, shared, and . Before we go through the protocol, why don't you just, 57 00:06:02,690 --> 00:06:08,058 say what these three different states, roughly, roughly mean. So, in NSI 58 00:06:08,058 --> 00:06:15,061 protocol, means for that particular line. You don't know anything about it. It's not 59 00:06:15,061 --> 00:06:21,347 in your cache. This is the equivalent. Though it did not be valid you know, so a 60 00:06:21,347 --> 00:06:26,784 caption so about, not, not, not here. Just because one coster is cash as it me 61 00:06:26,784 --> 00:06:32,946 bellasty does not mean a different mustard can't have any different states. So I want 62 00:06:32,946 --> 00:06:39,108 to make the big point here that the states are not vocal. It's not like per cash line 63 00:06:39,108 --> 00:06:45,018 the state is in the, the cash log is in the end state we'll say. No, instead. Each 64 00:06:45,018 --> 00:06:50,269 cache keeps track of a line. And it might say whether it's actually in the cache, 65 00:06:50,269 --> 00:06:55,585 not in the cache, and what state it's in. And it can be different states. Now there 66 00:06:55,585 --> 00:07:01,164 are some cross, products of cases which are just not allowed. So we'll see, In 67 00:07:01,164 --> 00:07:06,612 this protocol, two caches to not have the same cache line and modified statement at 68 00:07:06,612 --> 00:07:11,798 the same time, the modified state. Cuz that would basically mean that two caches 69 00:07:11,798 --> 00:07:16,589 are trying to modify the same piece of data. And that's, that's what we're 70 00:07:16,589 --> 00:07:27,266 basically using the . protocol . Okay? . Okay. So, three of the states here. I is 71 00:07:27,266 --> 00:07:35,026 invalid. It means you don't really know anything about that line. S is shared, 72 00:07:35,026 --> 00:07:44,317 which means that you have a read only copy, or processor one here. Processor one 73 00:07:44,317 --> 00:07:51,975 has a read only copy of that data. And other caches can also have read only 74 00:07:51,975 --> 00:07:58,722 copies. That data. Hence the, the term shared. Because shared. But only in a 75 00:07:59,662 --> 00:08:06,791 sense. So if everyone around here has copies, we're not gonna have any coherence 76 00:08:06,791 --> 00:08:13,058 problems. We start to have problems when multiple people have writable copies. So 77 00:08:13,058 --> 00:08:19,090 in this protocol, we're gonna have this modified state here. Meaning that, that 78 00:08:19,090 --> 00:08:24,731 line is now writable by a particular cache. And we will actually have to 79 00:08:24,731 --> 00:08:31,905 transition to this state. Before we can do any rights. And everyone, as I said 80 00:08:31,905 --> 00:08:40,695 before. If one processor has, or one cache has, a cache state. It's modified state. 81 00:08:40,695 --> 00:08:48,326 Everyone else needs to have an invalid state. Or, have not in the cache. And this 82 00:08:48,326 --> 00:08:56,343 is gonna guarantee us in hardware, that the cache line is effectively, only being 83 00:08:56,343 --> 00:09:03,530 read by one cache or one processor at a time. Okay, so let's, let's look at some 84 00:09:03,530 --> 00:09:10,158 of the, the edges here in this stage transition diagram here. So the first 85 00:09:10,158 --> 00:09:16,786 thing that we're going to look as is cluster one, we'll say starts out with a 86 00:09:16,786 --> 00:09:24,619 line not being in its cache, or turns out with a line being of its cache. So our 87 00:09:24,619 --> 00:09:31,849 first state arrow here is that cluster one wants to read. It doesn't wanna write the 88 00:09:31,849 --> 00:09:37,954 data, so it doesn't read here. And. Invalid before or wasn't in the cache. So 89 00:09:37,954 --> 00:09:43,846 it'll do a read from main memory. And reading the cache and sharing. Now, we 90 00:09:43,846 --> 00:09:53,242 have other trying to read this. And in our protocol. What the other are gonna do is 91 00:09:53,242 --> 00:09:59,851 when they. They're gonna actually issue a request for that line. And if someone else 92 00:09:59,851 --> 00:10:06,221 has something to say about it, they have to pipe up during that, during that time. 93 00:10:06,221 --> 00:10:12,212 And we'll see how that happens. Later. But right now, other processors try to read 94 00:10:12,212 --> 00:10:17,747 the data. Read that, read that line. No one pipes up. And even the first guy, who 95 00:10:17,747 --> 00:10:22,994 has this, this data in the shared state, doesn't say anything. So the other 96 00:10:22,994 --> 00:10:28,960 processor's gonna still read it from main memory. And their transitions effectively 97 00:10:28,960 --> 00:10:34,783 look the same. They're just going to load it into their cache in the shared state. 98 00:10:34,783 --> 00:10:41,396 So pull it in, and share. Okay. So now, we're gonna look at this from a 99 00:10:41,396 --> 00:10:47,703 perspective of P1. If you want, had it in the shared state. And some other processor 100 00:10:47,703 --> 00:10:54,206 wants to do a write. So they do a request for exclusive access of that line. Because 101 00:10:54,206 --> 00:11:00,630 processor one's cache only had it in the shared state. It doesn't need to actually 102 00:11:00,630 --> 00:11:06,819 say anything in response to that request for exclusive access. But what it does 103 00:11:06,819 --> 00:11:13,869 need to do is it needs to self invalidate its own data. So . But instead, what it 104 00:11:13,869 --> 00:11:21,815 has to do is transition. From s to invalid because the data ends' catch is no longer 105 00:11:21,815 --> 00:11:28,478 good because someone else is about to go write it. So before it can write that 106 00:11:28,478 --> 00:11:35,142 other coster issues its intent to write and at that time it actually does its 107 00:11:35,142 --> 00:11:42,318 state transition in its own catch. Okay so let's look at the flip side of that let's 108 00:11:42,318 --> 00:11:49,323 say processor one thinks that it wants to do a write. Well so what processor one's 109 00:11:49,323 --> 00:11:55,908 gonna do is its going to issue this request to write or intent to write 110 00:11:55,908 --> 00:12:01,818 transaction across the bus. And everyone else is gonna do that transition. And 111 00:12:01,818 --> 00:12:07,728 after it does this intent to write transaction here, it's going to transition 112 00:12:07,728 --> 00:12:13,945 from the S state to the M state. And only at that time is it able to do the write, 113 00:12:13,945 --> 00:12:19,854 once it gets into the M state. And it can also do further reads and writes. It 114 00:12:19,854 --> 00:12:27,030 doesn't have to ask anybody. So this is pretty cool . invalidation protocol, you 115 00:12:27,030 --> 00:12:32,081 can do reads and writes. We don't have to broadcast any information across the bus. 116 00:12:32,081 --> 00:12:37,070 It has the modified state. This processor can read the data. It can write the data. 117 00:12:37,255 --> 00:12:42,182 now everyone else can't read or write or do anything to it during this time. But 118 00:12:42,182 --> 00:12:46,555 that's okay, because typically in our shared memory applications, one 119 00:12:46,555 --> 00:12:52,345 processor's doing most of the writing. And sometimes processors can go read it. And 120 00:12:52,345 --> 00:12:57,088 typically that, you know, everyone's not trying to fight for the same piece of 121 00:12:57,088 --> 00:13:03,559 data. Okay. So now let's Look at the case where in the, the modified state and some 122 00:13:03,559 --> 00:13:10,191 other processor wants to write the data. Well, this is a little bit more complex 123 00:13:10,191 --> 00:13:17,075 because now the other processor is going to issue this intent to write across the 124 00:13:17,075 --> 00:13:25,554 bus, a request for to cross the bus, . And I think someone else wants to move to the 125 00:13:25,554 --> 00:13:33,135 modified state, so before they can move to the modified state they need R, . Runs 126 00:13:33,135 --> 00:13:40,166 most up to date data. So what's gonna happen is p1 is actually going right back 127 00:13:40,166 --> 00:13:47,464 to data at that point to main memory i n our, in our basic model and assigned model 128 00:13:47,464 --> 00:13:54,763 here. So get it right back to main memory. And then the other processor is going to 129 00:13:54,763 --> 00:14:01,527 be able to get it from main memory. And transition into the m state from the 130 00:14:01,527 --> 00:14:08,348 invalid state. And we'll, we'll see that arc in a second. You could think about 131 00:14:08,348 --> 00:14:14,530 optimizations here where you try to pass the data from one processor's invalidate. 132 00:14:14,530 --> 00:14:20,862 Or, one processor's , and another processor's read. We're not gonna look at 133 00:14:20,862 --> 00:14:27,572 that quite yet. optimization. But, right now, if you're in the m state, you're 134 00:14:27,572 --> 00:14:33,754 gonna write it back to main memory. And the other processor's gonna delete it from 135 00:14:33,754 --> 00:14:39,860 main memory, in this, in this state. Okay, so this brings us to, another important 136 00:14:39,860 --> 00:14:45,068 transition. Here is that, if you have it modified, and someone else wants to read 137 00:14:45,068 --> 00:14:49,898 the data, you don't actually have to transition to validate it. You can just 138 00:14:49,898 --> 00:14:54,985 transition to the shared state. And you probably want to do this, because you're, 139 00:14:54,985 --> 00:15:00,330 otherwise when some other processor tries to read the data that you just wrote. You 140 00:15:00,330 --> 00:15:05,546 know, like a producer-consumer, style computation. You would just be effectively 141 00:15:05,546 --> 00:15:10,826 invalidating your, processors cache. And that would be pretty bad because, you're 142 00:15:10,826 --> 00:15:16,801 effectively fighting locality at this point. You're fighting. temporal locality. 143 00:15:16,801 --> 00:15:22,292 And so instead when another processor tries to read the data, we need to write 144 00:15:22,292 --> 00:15:27,571 back that data to main memory. Another processor has to go get it from main 145 00:15:27,571 --> 00:15:33,907 memory. But you have the most up to date version. And they only want read access to 146 00:15:33,907 --> 00:15:40,242 it. So you're gonna transition down to the shared state. So someone else has it 147 00:15:40,242 --> 00:15:46,085 readable, and you have it readable also at the same time. And finally we're going to 148 00:15:46,930 --> 00:15:52,128 validate back up to, to M. This edge, and that edge, basically, look around trying 149 00:15:52,128 --> 00:15:58,293 to validate, invalid here. and we're gonna look at. If you have , if you have a cash 150 00:15:58,293 --> 00:16:06,209 line invalid. And you want to go get, want a brightness. You're going to scream on 151 00:16:06,209 --> 00:16:11,919 the bus. I want you to get this . And in the meantime, the other cashiers might 152 00:16:11,919 --> 00:16:17,174 have to do what you had to do in the same cas e, so they may need to go to a write 153 00:16:17,174 --> 00:16:23,940 back and they may need to say I have that right now. . And the next Right. You sort 154 00:16:23,940 --> 00:16:31,606 of wait around a little bit and then you go get it from . And so, do we, do we need 155 00:16:31,606 --> 00:16:39,356 an . yes. You need your own key that for any given line, that the transactions, 156 00:16:39,356 --> 00:16:45,440 the, the transactions in the bus are common. And the way you do that is by 157 00:16:45,440 --> 00:16:52,856 having . So people trying to invite lets say the same line at the same time. So 158 00:16:52,856 --> 00:17:00,424 we'll take two and both have an S. Share data, and, and copies. And exactly the 159 00:17:00,424 --> 00:17:06,254 same time they both wanna do a write. This is why we need to issue this intent to 160 00:17:06,254 --> 00:17:11,626 write. And we're not able to do a write until we successfully get into the M state 161 00:17:11,626 --> 00:17:16,605 here. So what's gonna happen is two processors are gonna try to issue intent 162 00:17:16,605 --> 00:17:21,846 to write. But the arbiter is only allow one of those two to occur bus, on the bus 163 00:17:21,846 --> 00:17:26,955 at the same time. It's just, you know, arbiter says one thing can happen. And the 164 00:17:26,955 --> 00:17:32,209 bus only has, you know, one, one thing can happen at a time. Someone's gonna win. And 165 00:17:32,209 --> 00:17:37,021 that might be based on sort of priority encoding, or it might be round-robin, 166 00:17:37,021 --> 00:17:42,339 depending on how you go about doing that, or make some fairness guarantee. Someone 167 00:17:42,339 --> 00:17:47,405 is going to, one of the two clusters is gonna win. And when that one cluster wins, 168 00:17:47,405 --> 00:17:52,406 they're the one that is able to take the bus and issue this request, inu2014, 169 00:17:52,406 --> 00:17:57,852 intent to write, and everyone else has to play by that. So what's gonna happen is 170 00:17:57,852 --> 00:18:02,917 the, if cluster one wins and sends this intent to write, and cluster two is about 171 00:18:02,917 --> 00:18:08,406 to send the intent to write. Now that it sees someone else's intention to write, it 172 00:18:08,406 --> 00:18:13,843 has to basically abort its intention, attention, attempt to write and instead go 173 00:18:13,843 --> 00:18:19,556 on this transition. And then what they're gonna do is they're gonna retry the write 174 00:18:19,556 --> 00:18:25,131 again in the future. And when they retry the write to the future, they're going to 175 00:18:25,131 --> 00:18:30,706 basically come in on this arc here, which is gonna bump processor one from M to I 176 00:18:30,706 --> 00:18:36,074 after the write has happened. But, typically if you're careful about, forward 177 00:18:36,074 --> 00:18:39,575 progress guarantees, so you have to be. Be very caref ul. 178 00:18:39,578 --> 00:18:44,190 You go to implement these things that, if a cluster goes to do a write and you have 179 00:18:44,190 --> 00:18:48,801 sort of retries happening in the system, and let's say you get the data modified 180 00:18:48,801 --> 00:18:53,301 into your own cache, you have to guarantee that, that lobe that's gonna be a store 181 00:18:53,301 --> 00:18:57,745 operation actually posts. Because a very subtle mistake that's very easy to have 182 00:18:57,745 --> 00:19:02,078 happen here is, you decouple the memory system from the main processor, to some 183 00:19:02,078 --> 00:19:06,578 extent, and two people are trying to do writes at the same time, and you just have 184 00:19:06,578 --> 00:19:10,690 lines basically bounce back and forth between modifying the one cache and 185 00:19:10,690 --> 00:19:15,895 modifying the other cache, modifying. Five one cache five one daily cache is back and 186 00:19:15,895 --> 00:19:20,922 forth forever, but no one's actually able to do it in stores. So you have to make 187 00:19:20,922 --> 00:19:25,573 sure you pull in your cache, atomically though like everything else get it 188 00:19:25,573 --> 00:19:30,726 modified. You are able to post at least something till at least you have a four or 189 00:19:30,726 --> 00:19:37,388 five year guarantee, and then you have a processor . Oh. So yeah. Stick a of the 190 00:19:37,388 --> 00:19:42,604 stage transitions relative to the other stage transitions before that same line 191 00:19:42,604 --> 00:19:47,940 there . Now, for right now we're gonna say the bus only. Has one transaction 192 00:19:47,940 --> 00:19:54,065 happening at a time. And that's arbitrated by, some . So, we're not gonna be able to 193 00:19:54,065 --> 00:19:59,940 see other cache lines transition . But you can think about cases where you can have, 194 00:19:59,940 --> 00:20:05,377 let's say, n transactions happening at the same time. Number of transactions 195 00:20:05,377 --> 00:20:10,502 happening at the same time. But, the arbiter, guarantees that none of the same 196 00:20:10,502 --> 00:20:16,752 address or the same , for instance. And that actually happens in some of these 197 00:20:17,502 --> 00:20:23,741 distributed protocols we'll look at in . In two lectures, we're gonna look at. We 198 00:20:23,741 --> 00:20:29,117 build these for multiple people to basically be transitioning their state 199 00:20:29,117 --> 00:20:34,856 diagrams simultaneously, at the same time, but to different addresses. Okay, hm. 200 00:20:34,856 --> 00:20:40,305 Okay, we're running a little bit low on time here. I think I'm gonnau2014 Oh, 201 00:20:40,305 --> 00:20:46,480 we'll, we'll go through this example, yes. though this is basically reiterating what 202 00:20:46,480 --> 00:20:52,147 we said before. Okay, so we have Processor one, we'll say it has a read. So we're 203 00:20:52,147 --> 00:20:59,363 gonna show two state d iagrams here. For the same cache line. Officer one does a 204 00:20:59,363 --> 00:21:07,520 read, he's gonna get it . So get the data from memory. Officer one is right. He's 205 00:21:07,520 --> 00:21:16,134 going to issue an intent to right, get it, and then in a modified state . Officer two 206 00:21:16,134 --> 00:21:24,291 now must do a read. So the first thing is it's going to try to read this data, and 207 00:21:24,291 --> 00:21:31,018 that's going to cause officer one to transition. Does the request for reading 208 00:21:31,018 --> 00:21:35,942 this data as it's trying to transition to S here. And, and, it's not able to 209 00:21:35,942 --> 00:21:41,198 transition to S until it knows that the other processor has done the following 210 00:21:41,198 --> 00:21:46,189 transition. It is basically going to do a write back to main memory because 211 00:21:46,189 --> 00:21:51,511 Processor two is reading. So it's going to transition from here to there. And now 212 00:21:51,511 --> 00:21:57,034 this modified data is up-to-date in main memory and Processor two is going to read 213 00:21:57,034 --> 00:22:01,891 the updated state. Okay, so now say Processor two tries to write. Well, it's 214 00:22:01,891 --> 00:22:07,908 gonna wanna. Transition from here to m. So issue this intent to write. And at that 215 00:22:07,908 --> 00:22:12,897 point, processor one is going to transition to the down state. And 216 00:22:12,897 --> 00:22:18,807 processor two gets it modifiable in its cache. And it's actually to the right. 217 00:22:18,807 --> 00:22:25,100 Okay. So we're gonna look at processor one now doing a read. It's in the imbalanced 218 00:22:25,100 --> 00:22:31,317 state right now. So it's basically just going to enter in on this arc here. Though 219 00:22:31,317 --> 00:22:38,087 processor two has it in a modified state. So to transition. Here down to I. Or, 220 00:22:38,087 --> 00:22:44,530 excuse me. Actually, there back to S. I guess it's, We're gonna have . So 221 00:22:44,530 --> 00:22:51,050 processor one does the read. Processor two is gonna do the write back. And it's gonna 222 00:22:51,050 --> 00:22:57,105 transition to the shared state here. While, processor one transitions from I 223 00:22:57,105 --> 00:23:02,927 back to S. Processor one does the write. We're sort of back to where we were 224 00:23:02,927 --> 00:23:09,137 before. It's going to issue its intent to write. And the, processor two here is 225 00:23:09,137 --> 00:23:16,165 going to transition. Transition from S to invalid at that point. So this is a basic 226 00:23:16,165 --> 00:23:22,788 NSI protocol, at this point. I think that's the end of this. Yeah. Okay, so now 227 00:23:23,053 --> 00:23:29,588 The last two minutes, we're gonna talk about more advanced cache coherence 228 00:23:29,588 --> 00:23:37,890 protocols. . . Okay. So quick, quick observation here. I think we already said 229 00:23:37,890 --> 00:23:44,690 this, is that, no two processors. This protocol enforces that no two processors. 230 00:23:44,690 --> 00:23:49,593 You can have a line in the m state simultaneously. Two processors can have a 231 00:23:49,593 --> 00:23:54,302 line in the S state simultaneously. Everyone can have a line invalid. It's 232 00:23:54,302 --> 00:24:00,560 just invalid. But we can't, can't . And this really helps us keep coherence of our 233 00:24:00,560 --> 00:24:05,722 memory. Because we're not going to allow any processor to be able to have a line 234 00:24:05,722 --> 00:24:10,883 exclusive. Or, have a line, and another processor be able to read that data in the 235 00:24:10,883 --> 00:24:15,980 meantime. But, be careful. This does not necessarily imply sequential consistency. 236 00:24:15,980 --> 00:24:23,160 This might be an input model, depending on. Sort of, ordering a lot of things that 237 00:24:23,160 --> 00:24:32,925 you're really not interested in. Okay, so, we're gonna look at a more complex kind of 238 00:24:32,925 --> 00:24:42,454 performance protocol. And the, the insight that this protocol is growing out of is 239 00:24:42,454 --> 00:24:52,219 that you don't necessarily need to issue, how do I put this? in all cases, you may 240 00:24:52,219 --> 00:24:58,832 not. You need to tell everybody that you want to do it right. So we'll, we'll, 241 00:24:58,832 --> 00:25:06,272 we'll go through this, But the, the basic idea here is that we can pull in data into 242 00:25:06,272 --> 00:25:13,193 our cache, into this new state here that we're adding. Because we, we still have M, 243 00:25:13,193 --> 00:25:20,374 S, and I, but we add an E state here, which stands for exclusive. Exclusive is 244 00:25:20,374 --> 00:25:27,723 actually a read-only state. But it's a read-only state saying that no one. else 245 00:25:27,723 --> 00:25:32,468 can read it either. And why we add this, is. If, let's say we do a read, or trying 246 00:25:32,468 --> 00:25:37,455 to do a write to that data sometime in the future. We can pull it in, in this state, 247 00:25:37,455 --> 00:25:41,774 and not have to ask anybody. We can guarantee that we don't have to ask 248 00:25:41,774 --> 00:25:46,579 anybody in order to transition to go to a write to that data. So that's the, the 249 00:25:46,579 --> 00:25:52,357 insight. So, same, same idea here. point out that, This, you'll see some sometimes 250 00:25:52,357 --> 00:25:57,527 in books, they'll have, they'll call this the Illinois protocol. It was, developed 251 00:25:57,527 --> 00:26:03,393 at the University of Illinois Champaign. which is why they call it that. or 252 00:26:03,393 --> 00:26:09,139 sometimes people call it . It's probably more widely known now as the MESI 253 00:26:09,139 --> 00:26:14,817 protrocol. It's a four-state protocol. It, and one of the nice things about this is, 254 00:26:14,817 --> 00:26:20,833 if you hav e sort of two state bits, it doesn't side have three states. Well, we 255 00:26:20,833 --> 00:26:26,308 need two bits to encode that anyway. So it doesn't actually add any extra storage 256 00:26:26,308 --> 00:26:31,918 space to have an extra state bit here in, in this protocol. And as I said, this is 257 00:26:31,918 --> 00:26:37,284 effectively the same load of bits we had. Or we get a valid bit and a dirty bit 258 00:26:37,284 --> 00:26:43,019 before we get cache coherence. Well we now have sort of notions of dirty and notions 259 00:26:43,019 --> 00:26:48,481 of invalid. And so these other states are all valid. So we, we can use those exact 260 00:26:48,481 --> 00:26:53,806 same key bits, and we don't need to add any more storage space to our cache to 261 00:26:53,806 --> 00:26:59,063 implement this actual variance protocol. So something to think about. So let's, 262 00:26:59,063 --> 00:27:05,208 let's walk, let's walk through all the different arcs in this diagram. But as I 263 00:27:05,208 --> 00:27:11,488 said the main difference here is we took before we, we, we took what was before, 264 00:27:11,488 --> 00:27:17,847 the shared state, and we split it into two states. A new shared state, and an 265 00:27:17,847 --> 00:27:24,290 exclusive state. Which means that it's a read only copy. But, no one else, will 266 00:27:24,290 --> 00:27:32,176 have a copy of it. sort of guaranteed that no one else will have a copy of it. And, 267 00:27:32,176 --> 00:27:38,959 by default, when we bring in a cache line from main memory, and we do a read this, 268 00:27:38,959 --> 00:27:45,228 in this protocol, we're most likely going to. Bring it into the E state, o let's see 269 00:27:45,228 --> 00:27:51,507 if I can this protocol. We're just gonna bring it into the E state. So if one cache 270 00:27:51,507 --> 00:27:57,519 does a read of . Instead of having it in the shared state. Instead, it's going to 271 00:27:57,519 --> 00:28:02,595 bring it into the exclusive state. So that, that same processor, in the future, 272 00:28:03,397 --> 00:28:09,342 write. It will transition over to this state with no messages on the . So we, we 273 00:28:09,342 --> 00:28:15,554 effectively in a few, a few sub states. And it's pretty common for someone to do a 274 00:28:15,554 --> 00:28:23,620 read . Somewhere in a cache line, and then do a write. . So let's say you're try to 275 00:28:23,620 --> 00:28:29,898 increment that, variable. You're gonna read it, add one to it, and then write 276 00:28:29,898 --> 00:28:36,829 that same value right back. . Okay. So, this is gonna look pretty similar to what 277 00:28:36,829 --> 00:28:43,678 we had before. And we're gonna say that a read miss, can still happen. So, if you're 278 00:28:43,678 --> 00:28:49,793 processer one, and the read miss. And the data is in someone else's cache, we 279 00:28:49,793 --> 00:28:57,086 transition. into the shared state. So when you're reading this, you don't get 280 00:28:57,086 --> 00:29:03,061 exclusive for this, this state. But instead, because since someone else's 281 00:29:03,061 --> 00:29:09,541 cache, they pipe up and say. Oh, this is already shared. Pull it in as shared. So 282 00:29:09,541 --> 00:29:16,105 we're gonna transition from invalid to shared out of read miss. But we're also 283 00:29:16,105 --> 00:29:22,586 gonna have another arc, here, which is a read miss, but it's not shared, it just 284 00:29:22,586 --> 00:29:28,290 came in from main memory. Sort of given that idea that it's likely that we're 285 00:29:28,290 --> 00:29:33,761 actually going to have a read and then a write. And you just pull it in without 286 00:29:33,761 --> 00:29:41,101 making contact, anybody . But otherwise, this actually looks pretty similar to the 287 00:29:41,101 --> 00:29:47,195 MSI that we had before. If it's read shared, you start off with an s here. any 288 00:29:47,195 --> 00:29:52,734 other processor. Someone responded back, and says they had it shared. So they know 289 00:29:52,734 --> 00:29:58,690 not to go straight to the E. . But you're just gonna stay in this, this state. Now. 290 00:29:58,690 --> 00:30:05,272 When, let's say, you have attention right, you're going to probably . Mm. And they're 291 00:30:05,272 --> 00:30:14,059 going to transition valid. So similarly, like, before, only one cache is ever gonna 292 00:30:14,059 --> 00:30:21,031 happen in M. But, unlike before, we're going to say that only one cache can 293 00:30:21,031 --> 00:30:30,104 happen either in M or V. That's what E means, it's exclusive. . So can read and 294 00:30:30,104 --> 00:30:37,840 write in this, in this state. Now, exclusive here. 295 00:30:37,840 --> 00:30:40,980 Exclusive. Processor one can read it, but if anyone 296 00:30:40,980 --> 00:30:47,105 else tried to read it. Oops. Where'd it go? There we go. If other processor tried 297 00:30:47,105 --> 00:30:52,759 to read it, we're going to actually transition from E to S, to our shared 298 00:30:52,759 --> 00:30:59,119 state here. But if we have a cache line E, and we want to do a write, we don't have 299 00:30:59,119 --> 00:31:05,401 to invalidate anybody. We don't have to arbitrate on the bus. Instead we can just 300 00:31:05,401 --> 00:31:11,840 transition from E to M directly and do the write. Because we had exclusive already. 301 00:31:12,151 --> 00:31:19,704 the rest looks pretty similar to NSI. You're unsure, no person wants a fuel 302 00:31:19,704 --> 00:31:27,534 rise, we're in transition from S to I. And likewise, if we were in the exclusive 303 00:31:27,534 --> 00:31:34,386 state of no possible rise in fuel price. We're going to transition the I. And note 304 00:31:34,386 --> 00:31:40,335 the difference between this arc and this arc. So, going to read, we can just 305 00:31:40,335 --> 00:31:44,486 downgrade to shar e. Other processor's going to write, we need 306 00:31:44,486 --> 00:31:53,233 to make it invalid, from these read only exclusive stage. . So, what are we . We 307 00:31:53,233 --> 00:31:59,524 added other processors. Reads, this is the same thing we had down the side here. That 308 00:32:00,433 --> 00:32:06,117 protocol. We had a, we had it modified. And the other processor wants to do a 309 00:32:06,117 --> 00:32:11,726 write. We need to write back the data, and transition to shared. And another 310 00:32:11,726 --> 00:32:16,974 processor wants to do a write. We transitioned from the modified stage to 311 00:32:16,974 --> 00:32:23,062 right back to data. We need to transition to analyses. So, it looks pretty similar 312 00:32:23,062 --> 00:32:32,707 to SI Except we visioned the shares state into two different states. , So far. Okay. 313 00:32:32,707 --> 00:32:40,639 So I'm going to stop here today and just say next time we're gonna dish up, from a 314 00:32:40,639 --> 00:32:47,402 [??]. coming around here and we have another stage [??]. What is actually used 315 00:32:47,402 --> 00:32:54,500 in the called [??]. And this actually allows us to start to use cash transfers. 316 00:32:54,500 --> 00:32:59,971 Without having to go out and read memory to get the data all the time. and then 317 00:32:59,971 --> 00:33:05,309 we'll also talk about what's used in the sort of, standard Intel processor, which 318 00:33:05,309 --> 00:33:11,515 also allows us to . But effectively adds another state which functions the same as 319 00:33:11,515 --> 00:33:16,720 our shared state. But allows us to sort data. Okay, let's stop here for today.