Okay. So lets move up the physical implementation of buses. And move to some more logical problems. With sequential consistency mixed together with coherent protocols. So we're gonna look at them all here, we have two processors. Cpu1 and CPU2. And we are going to introduce caches into our SMP, or Symmetric Multi Processor. So we have two caches here. And looking to address A. So there's two caches in this memory. Let's look what happens. Oh, let's say, if we have a right back cache. And at the beginning of time, everyone has the 100 for address, when, when, address A. So cache one, cache two and the memory all agree that address A has value 100. Okay. So now, we see that, let's say. Cpu one wants to update A when it's 200. Okay. In a write back scenario, it's just gonna do a store here into this location. But it's write back. So this data doesn't show up anywhere else over here, until it actually gets invalidated. Now, all of a sudden, if we don't have any sophisticated protocols. Cpu two here, let's say, does a read of address A. All it's gonna get. 100 and not 200. It's gonna get the wrong out of date value. Well that, that's not super great in fact that easily violates a bunch of consistency now that might have easily had some other problems like trying to let it roll backwards like you also might have you're just completely out of, out of sync here relative to everything. In a right through cache you can have the same problem here and just slightly different math so let's say you have a right through cache CP1 does the store and it puts 200 here and 200 there . And CPU2 doesn't ever see that value. Because there is nothing to tell CPU2 to update its value. So in our coherence protocol, we have to think about how, how this works. So we have two questions at the bottom of the slide here. Do these stale values matter? Well, if you want to communicate to CPU1 to CPU2. At some point, something needs to have some mechanism to actually move that data over into CPU 2's cache, otherwise it'll never see the new val ue. So values really, really do matter. And, and what is our view of shared programming. there are some, this brings up an interesting question here. If you, if you want full sequential consistency, this clearly is not your . It's just consistency. Now, one of the questions that comes up is, what does the programmer expect? Do they expect full sequential consistency? Or do they expect something weaker? Cuz you can think about having a model like this. But then having special instructions, we will say, which somehow have CPU's two invalidated data. Such that it will pick up, you know, a new, new value. Or maybe special instruction to push a store out to the main memory. So depending on your programming model You could think about having a very hard to use programming model. Which would have data find where it needs to go. But then you're effectively just doing messaging, instead of doing shared memory here. be effect-, effectively sending a message from CPU one to CPU two, if you were trying to push the data back and forth. So a little something to think about there, is that. If the programmer assumes that when they write a value, the other processors will see it. Caches, muck that all up. Okay. So let's, let's look at a, a, walk through a example, in a little more detail. This is actually the example we had from last class. The two programs. Or two, two threads. T one and T two here. T one stores, one to x. T two stores eleven to y. Program two loads effectively x and y, and then stores them right off into x private in wi-fi. Sorry it doesn't in the flip board. Okay so let's look at a right back cache. . So we'll start off. With memory here having initial value of zero and ten in x and y and we execute t one and well, we'll say that writes one and eleven to x and y respectively but its right back so this doesn't actually have to get out to make memory okay. Now let's say the cache in coster one you know they've. And it needs some space for, the cache line that y is in. But not x. the cache lines. And, the one just happens to get invalid it needs the space for the next piece of code that's going to execute. So what's going to happen here is, let's say y gets pushed out here from the main memory. As we've shown in the, in the red there. But x is still zero. Ooh, does anyone see problems coming up here? We just all of the sudden, had code, we had something that was sort of in order, and you effectively have it out of order. Data in main memory. Okay, now we go and execute thread two. Thread two is going to read what is in main memory, pulling into its cache. And then write right back write right back to its own cache. So we get eleven and zero from y and x. And then we'll say, cache one writes back x. So the main memory now has one and eleven. And then. Finally cache two writes back, X prime, Y prime, of zero and eleven. So a couple of things to note here. X, the Xs any Ys in the X prime and Y prime don't match , but more importantly this is one of our, sequentially, consistent, inconsistent test cases. This is what Casy said, shouldn't happen. So we've just seen something where we have write back caches, and we're trying to see if it can . bunch of consistency. in its naive case. And the answer is no. It's not sequentially consistent. This also, just to make a point here, can happen with write through caches, . So let's, You need to sort of carefully sort of construct a case where this happened. But the same case here. because it's write through. We first of all, execute T1, , T1 here. So actually push up to main memory one and eleven. What you notice is an interesting little thing here that we have X in our cache. So lets just say there was some false sharing where there was some blow that happens, that pulled it into cache too early. Well right through push from cache one to main memory. But there was no guarantee. There was no (compares) for result to push it down into validate or somehow notify. Cache two here. So this scale value is still in cache one in our basic write through case. Now, we have thread two x cubed. And it just copies its x to x prime. Its y to y prime. And it's right through. So that shows up memory also here. Oz is, main memory is effectively inconsistent now. And this is also one of our cases that, we had shown was not sequentially consistent. So about something to guarantee that we have consistency, we can very carefully have our hardware get out of consistency when we introduce cache. Because there's different places to effectively store stale data. It can be in main memory, it can be in someone else's cache, or it can be in your own cache. So you can have the stale data in multiple places. I want to make a point at, at this point of the lecture to. Differentiate a cache coherence from a memory consistency model. There's a lot of words in this slide. But the main difference is what I'm trying to get across is that. In a, a catch coherency model. Or a cache coherence protocol. Is some algorithm to try to keep data coherent. So that when you rewrite memory. Everyone's seeing something they agree on. But it's the protocol that's trying to keep it coherent. In contrast, a memory consistency model is the thing that our memory coherence is trying to enforce. It is a abstract set of rules about how memory should act. So sequential consistency is one example of that. Another example, as we talked about last lecture, was cold store ordering, or process store ordering, or ordering . So those are memory. consistency models. And then you actually have different cache protocols which try to imp-, excuse me. Implement those prospective consistency models. In last lecture, I said that I didn't think there were any processors sequentially consistent semantics. And he, and he proved me wrong. Because there's no absolutes in life. the MIPS R 10,00 architecture. the out of work processor. And they . Sequential consistency. like, switching sequentially consistent memory consistency model, I think also probably before 86, in some simpler processors probably were sequentially consistent. Just'cause it was ki nd of easy for them to do. But once you start to put it out of order, it gets, gets much harder. Which is why the, the R10000 was actually interesting. Is'cause it was a super scour, which had some limited to it. But they, they effectively the ability to replay all their . . And so if they found something that was not supposed to consistant, that effectively go back, and restart one of the programs much earlier and replay all the code. It was interesting, interesting . About ,, .