1 00:00:03,200 --> 00:00:10,110 Okay. So lets move up the physical implementation of buses. And move to some 2 00:00:10,110 --> 00:00:17,389 more logical problems. With sequential consistency mixed together with coherent 3 00:00:17,389 --> 00:00:24,668 protocols. So we're gonna look at them all here, we have two processors. Cpu1 and 4 00:00:24,668 --> 00:00:31,487 CPU2. And we are going to introduce caches into our SMP, or Symmetric Multi 5 00:00:31,487 --> 00:00:38,834 Processor. So we have two caches here. And looking to address A. So there's two 6 00:00:38,834 --> 00:00:47,334 caches in this memory. Let's look what happens. Oh, let's say, if we have a right 7 00:00:47,334 --> 00:00:57,142 back cache. And at the beginning of time, everyone has the 100 for address, when, 8 00:00:57,142 --> 00:01:05,969 when, address A. So cache one, cache two and the memory all agree that address A 9 00:01:05,969 --> 00:01:13,909 has value 100. Okay. So now, we see that, let's say. Cpu one wants to update A when 10 00:01:13,909 --> 00:01:20,989 it's 200. Okay. In a write back scenario, it's just gonna do a store here into this 11 00:01:20,989 --> 00:01:27,719 location. But it's write back. So this data doesn't show up anywhere else over 12 00:01:27,719 --> 00:01:34,798 here, until it actually gets invalidated. Now, all of a sudden, if we don't have any 13 00:01:34,798 --> 00:01:42,490 sophisticated protocols. Cpu two here, let's say, does a read of address A. All 14 00:01:42,490 --> 00:01:48,061 it's gonna get. 100 and not 200. It's gonna get the wrong out of date value. 15 00:01:48,061 --> 00:01:53,117 Well that, that's not super great in fact that easily violates a bunch of 16 00:01:53,117 --> 00:01:58,773 consistency now that might have easily had some other problems like trying to let it 17 00:01:58,773 --> 00:02:04,428 roll backwards like you also might have you're just completely out of, out of sync 18 00:02:04,428 --> 00:02:09,419 here relative to everything. In a right through cache you can have the same 19 00:02:09,419 --> 00:02:15,207 problem here and just slightly different math so let's say you have a right through 20 00:02:15,207 --> 00:02:22,905 cache CP1 does the store and it puts 200 here and 200 there . And CPU2 doesn't ever 21 00:02:22,905 --> 00:02:31,514 see that value. Because there is nothing to tell CPU2 to update its value. So in 22 00:02:31,514 --> 00:02:40,232 our coherence protocol, we have to think about how, how this works. So we have two 23 00:02:40,232 --> 00:02:49,276 questions at the bottom of the slide here. Do these stale values matter? Well, if you 24 00:02:49,276 --> 00:02:55,755 want to communicate to CPU1 to CPU2. At some point, something needs to have some 25 00:02:55,755 --> 00:03:00,689 mechanism to actually move that data over into CPU 2's cache, otherwise it'll never 26 00:03:00,689 --> 00:03:04,909 see the new val ue. So values really, really do matter. And, 27 00:03:04,909 --> 00:03:09,962 and what is our view of shared programming. there are some, this brings 28 00:03:09,962 --> 00:03:14,658 up an interesting question here. If you, if you want full sequential consistency, 29 00:03:14,658 --> 00:03:19,651 this clearly is not your . It's just consistency. Now, one of the questions 30 00:03:19,651 --> 00:03:24,406 that comes up is, what does the programmer expect? Do they expect full sequential 31 00:03:24,406 --> 00:03:29,698 consistency? Or do they expect something weaker? Cuz you can think about having a 32 00:03:29,698 --> 00:03:35,681 model like this. But then having special instructions, we will say, which somehow 33 00:03:35,681 --> 00:03:41,511 have CPU's two invalidated data. Such that it will pick up, you know, a new, new 34 00:03:41,511 --> 00:03:48,478 value. Or maybe special instruction to push a store out to the main memory. So 35 00:03:48,478 --> 00:03:54,763 depending on your programming model You could think about having a very hard to 36 00:03:54,763 --> 00:04:01,800 use programming model. Which would have data find where it needs to go. But then 37 00:04:01,800 --> 00:04:07,866 you're effectively just doing messaging, instead of doing shared memory here. be 38 00:04:07,866 --> 00:04:12,800 effect-, effectively sending a message from CPU one to CPU two, if you were 39 00:04:12,800 --> 00:04:18,266 trying to push the data back and forth. So a little something to think about there, 40 00:04:18,266 --> 00:04:23,133 is that. If the programmer assumes that when they write a value, the other 41 00:04:23,133 --> 00:04:28,466 processors will see it. Caches, muck that all up. Okay. So let's, let's look at a, 42 00:04:28,466 --> 00:04:33,666 a, walk through a example, in a little more detail. This is actually the example 43 00:04:33,666 --> 00:04:42,234 we had from last class. The two programs. Or two, two threads. T one and T two here. 44 00:04:42,234 --> 00:04:51,736 T one stores, one to x. T two stores eleven to y. Program two loads effectively 45 00:04:51,736 --> 00:05:01,609 x and y, and then stores them right off into x private in wi-fi. Sorry it doesn't 46 00:05:01,609 --> 00:05:12,890 in the flip board. Okay so let's look at a right back cache. . So we'll start off. 47 00:05:12,890 --> 00:05:22,091 With memory here having initial value of zero and ten in x and y and we execute t 48 00:05:22,091 --> 00:05:31,292 one and well, we'll say that writes one and eleven to x and y respectively but its 49 00:05:31,292 --> 00:05:41,840 right back so this doesn't actually have to get out to make memory okay. Now let's 50 00:05:41,840 --> 00:05:49,006 say the cache in coster one you know they've. And it needs some space for, the 51 00:05:49,006 --> 00:05:54,960 cache line that y is in. But not x. the cache lines. And, the one just happens to 52 00:05:54,960 --> 00:05:59,854 get invalid it needs the space for the next piece of code that's going to 53 00:05:59,854 --> 00:06:05,146 execute. So what's going to happen here is, let's say y gets pushed out here from 54 00:06:05,146 --> 00:06:10,372 the main memory. As we've shown in the, in the red there. But x is still zero. Ooh, 55 00:06:10,372 --> 00:06:15,532 does anyone see problems coming up here? We just all of the sudden, had code, we 56 00:06:15,532 --> 00:06:20,890 had something that was sort of in order, and you effectively have it out of order. 57 00:06:20,890 --> 00:06:29,286 Data in main memory. Okay, now we go and execute thread two. Thread two is going to 58 00:06:29,286 --> 00:06:38,200 read what is in main memory, pulling into its cache. And then write right back write 59 00:06:38,200 --> 00:06:46,079 right back to its own cache. So we get eleven and zero from y and x. And then 60 00:06:46,079 --> 00:06:54,371 we'll say, cache one writes back x. So the main memory now has one and eleven. And 61 00:06:54,371 --> 00:07:00,766 then. Finally cache two writes back, X prime, Y prime, of zero and eleven. So a 62 00:07:00,766 --> 00:07:07,398 couple of things to note here. X, the Xs any Ys in the X prime and Y prime don't 63 00:07:07,398 --> 00:07:14,282 match , but more importantly this is one of our, sequentially, consistent, 64 00:07:14,282 --> 00:07:21,250 inconsistent test cases. This is what Casy said, shouldn't happen. So we've just seen 65 00:07:21,250 --> 00:07:30,467 something where we have write back caches, and we're trying to see if it can . bunch 66 00:07:30,467 --> 00:07:37,390 of consistency. in its naive case. And the answer is no. It's not sequentially 67 00:07:37,390 --> 00:07:44,137 consistent. This also, just to make a point here, can happen with write through 68 00:07:44,137 --> 00:07:52,199 caches, . So let's, You need to sort of carefully sort of construct a case where 69 00:07:52,199 --> 00:08:00,173 this happened. But the same case here. because it's write through. We first of 70 00:08:00,173 --> 00:08:07,006 all, execute T1, , T1 here. So actually push up to main memory one and eleven. 71 00:08:07,006 --> 00:08:13,839 What you notice is an interesting little thing here that we have X in our cache. So 72 00:08:13,839 --> 00:08:20,589 lets just say there was some false sharing where there was some blow that happens, 73 00:08:20,589 --> 00:08:27,422 that pulled it into cache too early. Well right through push from cache one to main 74 00:08:27,422 --> 00:08:33,843 memory. But there was no guarantee. There was no (compares) for result to push it 75 00:08:33,843 --> 00:08:40,298 down into validate or somehow notify. Cache two here. So this scale value is 76 00:08:40,298 --> 00:08:47,116 still in cache one in our basic write through case. Now, we have thread two x 77 00:08:47,116 --> 00:08:53,755 cubed. And it just copies its x to x prime. Its y to y prime. And it's right 78 00:08:53,755 --> 00:09:01,560 through. So that shows up memory also here. Oz is, main memory is effectively 79 00:09:01,560 --> 00:09:08,736 inconsistent now. And this is also one of our cases that, we had shown was not 80 00:09:08,736 --> 00:09:14,490 sequentially consistent. So about something to guarantee that we have 81 00:09:14,490 --> 00:09:21,610 consistency, we can very carefully have our hardware get out of consistency when 82 00:09:21,610 --> 00:09:27,450 we introduce cache. Because there's different places to effectively store 83 00:09:27,450 --> 00:09:33,850 stale data. It can be in main memory, it can be in someone else's cache, or it can 84 00:09:33,850 --> 00:09:40,330 be in your own cache. So you can have the stale data in multiple places. I want to 85 00:09:40,330 --> 00:09:47,051 make a point at, at this point of the lecture to. Differentiate a cache 86 00:09:47,051 --> 00:09:55,207 coherence from a memory consistency model. There's a lot of words in this slide. But 87 00:09:55,207 --> 00:10:02,479 the main difference is what I'm trying to get across is that. In a, a catch 88 00:10:02,479 --> 00:10:10,242 coherency model. Or a cache coherence protocol. Is some algorithm to try to keep 89 00:10:10,242 --> 00:10:18,005 data coherent. So that when you rewrite memory. Everyone's seeing something they 90 00:10:18,005 --> 00:10:24,570 agree on. But it's the protocol that's trying to keep it coherent. In contrast, a 91 00:10:24,570 --> 00:10:31,467 memory consistency model is the thing that our memory coherence is trying to enforce. 92 00:10:31,467 --> 00:10:37,553 It is a abstract set of rules about how memory should act. So sequential 93 00:10:37,553 --> 00:10:43,639 consistency is one example of that. Another example, as we talked about last 94 00:10:43,639 --> 00:10:51,510 lecture, was cold store ordering, or process store ordering, or ordering . So 95 00:10:51,510 --> 00:10:58,621 those are memory. consistency models. And then you actually have different cache 96 00:10:59,550 --> 00:11:05,591 protocols which try to imp-, excuse me. Implement those prospective consistency 97 00:11:05,591 --> 00:11:11,710 models. In last lecture, I said that I didn't think there were any processors 98 00:11:12,639 --> 00:11:19,067 sequentially consistent semantics. And he, and he proved me wrong. Because there's no 99 00:11:19,067 --> 00:11:25,263 absolutes in life. the MIPS R 10,00 architecture. the out of work processor. 100 00:11:25,263 --> 00:11:31,588 And they . Sequential consistency. like, switching sequentially consistent memory 101 00:11:31,588 --> 00:11:37,592 consistency model, I think also probably before 86, in some simpler processors 102 00:11:37,592 --> 00:11:42,682 probably were sequentially consistent. Just'cause it was ki nd of easy for them 103 00:11:42,682 --> 00:11:48,033 to do. But once you start to put it out of order, it gets, gets much harder. Which is 104 00:11:48,033 --> 00:11:52,471 why the, the R10000 was actually interesting. Is'cause it was a super 105 00:11:52,471 --> 00:11:58,997 scour, which had some limited to it. But they, they effectively the ability to 106 00:11:58,997 --> 00:12:05,931 replay all their . . And so if they found something that was not supposed to 107 00:12:06,821 --> 00:12:12,826 consistant, that effectively go back, and restart one of the programs much earlier 108 00:12:12,826 --> 00:12:20,240 and replay all the code. It was interesting, interesting . About ,, .