So, we're gonna look at, two, ways to go about implementing Snoopy cash coherence. Update protocols, and validation protocols. And we're gonna be focusing today's lecture on what are called writes invalidation or write invalidate protocols. They're, they're more common today. but I did want to bring up right update or what are called right broadcast protocols. In this other sense they kind of, mirror something like, a write through cache, versus a, write back cache. But it's not, it's not the same thing. You can actually mix and match write back and write through with these different ideas here. But let's, let's look at what these, these two, ideas in the Snoopy protocols are. So in a write update, whenever a processor does a write and broadcasts it across the bus to every other processor. And every other processor can have data in. Cache, it just updates the data in place, because the broadcast, it has all the data, it has the data up, up to date. validation on the other hand, we're gonna look at when a occurs it'll actually invalidate other cache copies, the other caches. Okay so, these two cases update , update . . The first case is the processor takes a write miss. So, it takes a write miss. Well, it's gonna broadcast the data onto the bus, and update everybody. So it's just sort of writing through. But not just writing through to main memory. It's also writing through to everyone else's cache. What's nice is, when someone goes to do a read. Well, there's no real. Like, the, there, there's a read miss if you just go out to main memory. And you're guaranteed that main memory always has the most up to date version. if it was in your. The cache before. And other cores and other caches broadcast will update your value. So it's in your cache. You have the to be a also in these update protocols. Not the case with invalidation protocols. So, our right invalidation protocols. We're gonna have a, let's say there's a right miss. Well, on a right miss, you actually go in and validate. All the other caches, copies of it, before you can go get the data. So if another cache had the data, it's going to cough up the data and put it back in the main memory. And then you're gonna go off to main memory and get the data from there, in, our, our naive invalidation protocol. So, first step is if you want to do a write and it's not in your cache, and, you don't have Let's say a certain has access to it. You're going to yell out to the bus, I want to ride to this address. The caches are going to. If someone responds and says I have that data already. You have to wait for them to put it on main memory, and then once it's on main memory you can go get that data. By repeating this. Something kind of similar. you are basically, when you yell does anyone have it. Someone's data is not in your cache it might be in someone else's cache. Someone might pipe up and say I have that data and I have it dirty. So I'm willing to take it and put it back to main memory and you can go get it from main memory but there's other cases that can happen in this protocol or in invalidated protocols. So other things that can happen here are you can say that who has that data? Does someone else might have that data but they might not have a legal copy of the data so they might not pipe up. It might not say, I have a copy of it. And instead, you might just be able to get that data from the other. So, let's go through a couple different cases, another, a couple different protocols that drop out of this. The first one we're gonna look at is called MSI, and it just stands for the three states that each cache line is gonna be in. so for these cache coherence protocols, we typically try to track what state they're in and whether they're shared or not, on a per-cache-line basis. Not on a per word basis. similarly, it's like how we, you know? We only track the tag bits and the valid bit and the dirty bits, per a cache line basis. Because it's just less bookkeeping to keep around. So here we have our cache l-, or our cache tag. And before, we had, let's say, a valid bit and a dirty bit. And then the, address portion of the tag. . Or the tag portion of the address, rather. And in this step, what we're gonna do, is, instead of having a valid bit and a dirty bit. . We'll use these bits to encode a state. There's a little state machine in there, practically. And, first off, there's three different states here. Modified, shared, and . Before we go through the protocol, why don't you just, say what these three different states, roughly, roughly mean. So, in NSI protocol, means for that particular line. You don't know anything about it. It's not in your cache. This is the equivalent. Though it did not be valid you know, so a caption so about, not, not, not here. Just because one coster is cash as it me bellasty does not mean a different mustard can't have any different states. So I want to make the big point here that the states are not vocal. It's not like per cash line the state is in the, the cash log is in the end state we'll say. No, instead. Each cache keeps track of a line. And it might say whether it's actually in the cache, not in the cache, and what state it's in. And it can be different states. Now there are some cross, products of cases which are just not allowed. So we'll see, In this protocol, two caches to not have the same cache line and modified statement at the same time, the modified state. Cuz that would basically mean that two caches are trying to modify the same piece of data. And that's, that's what we're basically using the . protocol . Okay? . Okay. So, three of the states here. I is invalid. It means you don't really know anything about that line. S is shared, which means that you have a read only copy, or processor one here. Processor one has a read only copy of that data. And other caches can also have read only copies. That data. Hence the, the term shared. Because shared. But only in a sense. So if everyone around here has copies, we're not gonna have any coherence problems. We start to have problems when multiple people have writable copies. So in this protocol, we're gonna have this modified state here. Meaning that, that line is now writable by a particular cache. And we will actually have to transition to this state. Before we can do any rights. And everyone, as I said before. If one processor has, or one cache has, a cache state. It's modified state. Everyone else needs to have an invalid state. Or, have not in the cache. And this is gonna guarantee us in hardware, that the cache line is effectively, only being read by one cache or one processor at a time. Okay, so let's, let's look at some of the, the edges here in this stage transition diagram here. So the first thing that we're going to look as is cluster one, we'll say starts out with a line not being in its cache, or turns out with a line being of its cache. So our first state arrow here is that cluster one wants to read. It doesn't wanna write the data, so it doesn't read here. And. Invalid before or wasn't in the cache. So it'll do a read from main memory. And reading the cache and sharing. Now, we have other trying to read this. And in our protocol. What the other are gonna do is when they. They're gonna actually issue a request for that line. And if someone else has something to say about it, they have to pipe up during that, during that time. And we'll see how that happens. Later. But right now, other processors try to read the data. Read that, read that line. No one pipes up. And even the first guy, who has this, this data in the shared state, doesn't say anything. So the other processor's gonna still read it from main memory. And their transitions effectively look the same. They're just going to load it into their cache in the shared state. So pull it in, and share. Okay. So now, we're gonna look at this from a perspective of P1. If you want, had it in the shared state. And some other processor wants to do a write. So they do a request for exclusive access of that line. Because processor one's cache only had it in the shared state. It doesn't need to actually say anything in response to that request for exclusive access. But what it does need to do is it needs to self invalidate its own data. So . But instead, what it has to do is transition. From s to invalid because the data ends' catch is no longer good because someone else is about to go write it. So before it can write that other coster issues its intent to write and at that time it actually does its state transition in its own catch. Okay so let's look at the flip side of that let's say processor one thinks that it wants to do a write. Well so what processor one's gonna do is its going to issue this request to write or intent to write transaction across the bus. And everyone else is gonna do that transition. And after it does this intent to write transaction here, it's going to transition from the S state to the M state. And only at that time is it able to do the write, once it gets into the M state. And it can also do further reads and writes. It doesn't have to ask anybody. So this is pretty cool . invalidation protocol, you can do reads and writes. We don't have to broadcast any information across the bus. It has the modified state. This processor can read the data. It can write the data. now everyone else can't read or write or do anything to it during this time. But that's okay, because typically in our shared memory applications, one processor's doing most of the writing. And sometimes processors can go read it. And typically that, you know, everyone's not trying to fight for the same piece of data. Okay. So now let's Look at the case where in the, the modified state and some other processor wants to write the data. Well, this is a little bit more complex because now the other processor is going to issue this intent to write across the bus, a request for to cross the bus, . And I think someone else wants to move to the modified state, so before they can move to the modified state they need R, . Runs most up to date data. So what's gonna happen is p1 is actually going right back to data at that point to main memory i n our, in our basic model and assigned model here. So get it right back to main memory. And then the other processor is going to be able to get it from main memory. And transition into the m state from the invalid state. And we'll, we'll see that arc in a second. You could think about optimizations here where you try to pass the data from one processor's invalidate. Or, one processor's , and another processor's read. We're not gonna look at that quite yet. optimization. But, right now, if you're in the m state, you're gonna write it back to main memory. And the other processor's gonna delete it from main memory, in this, in this state. Okay, so this brings us to, another important transition. Here is that, if you have it modified, and someone else wants to read the data, you don't actually have to transition to validate it. You can just transition to the shared state. And you probably want to do this, because you're, otherwise when some other processor tries to read the data that you just wrote. You know, like a producer-consumer, style computation. You would just be effectively invalidating your, processors cache. And that would be pretty bad because, you're effectively fighting locality at this point. You're fighting. temporal locality. And so instead when another processor tries to read the data, we need to write back that data to main memory. Another processor has to go get it from main memory. But you have the most up to date version. And they only want read access to it. So you're gonna transition down to the shared state. So someone else has it readable, and you have it readable also at the same time. And finally we're going to validate back up to, to M. This edge, and that edge, basically, look around trying to validate, invalid here. and we're gonna look at. If you have , if you have a cash line invalid. And you want to go get, want a brightness. You're going to scream on the bus. I want you to get this . And in the meantime, the other cashiers might have to do what you had to do in the same cas e, so they may need to go to a write back and they may need to say I have that right now. . And the next Right. You sort of wait around a little bit and then you go get it from . And so, do we, do we need an . yes. You need your own key that for any given line, that the transactions, the, the transactions in the bus are common. And the way you do that is by having . So people trying to invite lets say the same line at the same time. So we'll take two and both have an S. Share data, and, and copies. And exactly the same time they both wanna do a write. This is why we need to issue this intent to write. And we're not able to do a write until we successfully get into the M state here. So what's gonna happen is two processors are gonna try to issue intent to write. But the arbiter is only allow one of those two to occur bus, on the bus at the same time. It's just, you know, arbiter says one thing can happen. And the bus only has, you know, one, one thing can happen at a time. Someone's gonna win. And that might be based on sort of priority encoding, or it might be round-robin, depending on how you go about doing that, or make some fairness guarantee. Someone is going to, one of the two clusters is gonna win. And when that one cluster wins, they're the one that is able to take the bus and issue this request, inu2014, intent to write, and everyone else has to play by that. So what's gonna happen is the, if cluster one wins and sends this intent to write, and cluster two is about to send the intent to write. Now that it sees someone else's intention to write, it has to basically abort its intention, attention, attempt to write and instead go on this transition. And then what they're gonna do is they're gonna retry the write again in the future. And when they retry the write to the future, they're going to basically come in on this arc here, which is gonna bump processor one from M to I after the write has happened. But, typically if you're careful about, forward progress guarantees, so you have to be. Be very caref ul. You go to implement these things that, if a cluster goes to do a write and you have sort of retries happening in the system, and let's say you get the data modified into your own cache, you have to guarantee that, that lobe that's gonna be a store operation actually posts. Because a very subtle mistake that's very easy to have happen here is, you decouple the memory system from the main processor, to some extent, and two people are trying to do writes at the same time, and you just have lines basically bounce back and forth between modifying the one cache and modifying the other cache, modifying. Five one cache five one daily cache is back and forth forever, but no one's actually able to do it in stores. So you have to make sure you pull in your cache, atomically though like everything else get it modified. You are able to post at least something till at least you have a four or five year guarantee, and then you have a processor . Oh. So yeah. Stick a of the stage transitions relative to the other stage transitions before that same line there . Now, for right now we're gonna say the bus only. Has one transaction happening at a time. And that's arbitrated by, some . So, we're not gonna be able to see other cache lines transition . But you can think about cases where you can have, let's say, n transactions happening at the same time. Number of transactions happening at the same time. But, the arbiter, guarantees that none of the same address or the same , for instance. And that actually happens in some of these distributed protocols we'll look at in . In two lectures, we're gonna look at. We build these for multiple people to basically be transitioning their state diagrams simultaneously, at the same time, but to different addresses. Okay, hm. Okay, we're running a little bit low on time here. I think I'm gonnau2014 Oh, we'll, we'll go through this example, yes. though this is basically reiterating what we said before. Okay, so we have Processor one, we'll say it has a read. So we're gonna show two state d iagrams here. For the same cache line. Officer one does a read, he's gonna get it . So get the data from memory. Officer one is right. He's going to issue an intent to right, get it, and then in a modified state . Officer two now must do a read. So the first thing is it's going to try to read this data, and that's going to cause officer one to transition. Does the request for reading this data as it's trying to transition to S here. And, and, it's not able to transition to S until it knows that the other processor has done the following transition. It is basically going to do a write back to main memory because Processor two is reading. So it's going to transition from here to there. And now this modified data is up-to-date in main memory and Processor two is going to read the updated state. Okay, so now say Processor two tries to write. Well, it's gonna wanna. Transition from here to m. So issue this intent to write. And at that point, processor one is going to transition to the down state. And processor two gets it modifiable in its cache. And it's actually to the right. Okay. So we're gonna look at processor one now doing a read. It's in the imbalanced state right now. So it's basically just going to enter in on this arc here. Though processor two has it in a modified state. So to transition. Here down to I. Or, excuse me. Actually, there back to S. I guess it's, We're gonna have . So processor one does the read. Processor two is gonna do the write back. And it's gonna transition to the shared state here. While, processor one transitions from I back to S. Processor one does the write. We're sort of back to where we were before. It's going to issue its intent to write. And the, processor two here is going to transition. Transition from S to invalid at that point. So this is a basic NSI protocol, at this point. I think that's the end of this. Yeah. Okay, so now The last two minutes, we're gonna talk about more advanced cache coherence protocols. . . Okay. So quick, quick observation here. I think we already said this, is that, no two processors. This protocol enforces that no two processors. You can have a line in the m state simultaneously. Two processors can have a line in the S state simultaneously. Everyone can have a line invalid. It's just invalid. But we can't, can't . And this really helps us keep coherence of our memory. Because we're not going to allow any processor to be able to have a line exclusive. Or, have a line, and another processor be able to read that data in the meantime. But, be careful. This does not necessarily imply sequential consistency. This might be an input model, depending on. Sort of, ordering a lot of things that you're really not interested in. Okay, so, we're gonna look at a more complex kind of performance protocol. And the, the insight that this protocol is growing out of is that you don't necessarily need to issue, how do I put this? in all cases, you may not. You need to tell everybody that you want to do it right. So we'll, we'll, we'll go through this, But the, the basic idea here is that we can pull in data into our cache, into this new state here that we're adding. Because we, we still have M, S, and I, but we add an E state here, which stands for exclusive. Exclusive is actually a read-only state. But it's a read-only state saying that no one. else can read it either. And why we add this, is. If, let's say we do a read, or trying to do a write to that data sometime in the future. We can pull it in, in this state, and not have to ask anybody. We can guarantee that we don't have to ask anybody in order to transition to go to a write to that data. So that's the, the insight. So, same, same idea here. point out that, This, you'll see some sometimes in books, they'll have, they'll call this the Illinois protocol. It was, developed at the University of Illinois Champaign. which is why they call it that. or sometimes people call it . It's probably more widely known now as the MESI protrocol. It's a four-state protocol. It, and one of the nice things about this is, if you hav e sort of two state bits, it doesn't side have three states. Well, we need two bits to encode that anyway. So it doesn't actually add any extra storage space to have an extra state bit here in, in this protocol. And as I said, this is effectively the same load of bits we had. Or we get a valid bit and a dirty bit before we get cache coherence. Well we now have sort of notions of dirty and notions of invalid. And so these other states are all valid. So we, we can use those exact same key bits, and we don't need to add any more storage space to our cache to implement this actual variance protocol. So something to think about. So let's, let's walk, let's walk through all the different arcs in this diagram. But as I said the main difference here is we took before we, we, we took what was before, the shared state, and we split it into two states. A new shared state, and an exclusive state. Which means that it's a read only copy. But, no one else, will have a copy of it. sort of guaranteed that no one else will have a copy of it. And, by default, when we bring in a cache line from main memory, and we do a read this, in this protocol, we're most likely going to. Bring it into the E state, o let's see if I can this protocol. We're just gonna bring it into the E state. So if one cache does a read of . Instead of having it in the shared state. Instead, it's going to bring it into the exclusive state. So that, that same processor, in the future, write. It will transition over to this state with no messages on the . So we, we effectively in a few, a few sub states. And it's pretty common for someone to do a read . Somewhere in a cache line, and then do a write. . So let's say you're try to increment that, variable. You're gonna read it, add one to it, and then write that same value right back. . Okay. So, this is gonna look pretty similar to what we had before. And we're gonna say that a read miss, can still happen. So, if you're processer one, and the read miss. And the data is in someone else's cache, we transition. into the shared state. So when you're reading this, you don't get exclusive for this, this state. But instead, because since someone else's cache, they pipe up and say. Oh, this is already shared. Pull it in as shared. So we're gonna transition from invalid to shared out of read miss. But we're also gonna have another arc, here, which is a read miss, but it's not shared, it just came in from main memory. Sort of given that idea that it's likely that we're actually going to have a read and then a write. And you just pull it in without making contact, anybody . But otherwise, this actually looks pretty similar to the MSI that we had before. If it's read shared, you start off with an s here. any other processor. Someone responded back, and says they had it shared. So they know not to go straight to the E. . But you're just gonna stay in this, this state. Now. When, let's say, you have attention right, you're going to probably . Mm. And they're going to transition valid. So similarly, like, before, only one cache is ever gonna happen in M. But, unlike before, we're going to say that only one cache can happen either in M or V. That's what E means, it's exclusive. . So can read and write in this, in this state. Now, exclusive here. Exclusive. Processor one can read it, but if anyone else tried to read it. Oops. Where'd it go? There we go. If other processor tried to read it, we're going to actually transition from E to S, to our shared state here. But if we have a cache line E, and we want to do a write, we don't have to invalidate anybody. We don't have to arbitrate on the bus. Instead we can just transition from E to M directly and do the write. Because we had exclusive already. the rest looks pretty similar to NSI. You're unsure, no person wants a fuel rise, we're in transition from S to I. And likewise, if we were in the exclusive state of no possible rise in fuel price. We're going to transition the I. And note the difference between this arc and this arc. So, going to read, we can just downgrade to shar e. Other processor's going to write, we need to make it invalid, from these read only exclusive stage. . So, what are we . We added other processors. Reads, this is the same thing we had down the side here. That protocol. We had a, we had it modified. And the other processor wants to do a write. We need to write back the data, and transition to shared. And another processor wants to do a write. We transitioned from the modified stage to right back to data. We need to transition to analyses. So, it looks pretty similar to SI Except we visioned the shares state into two different states. , So far. Okay. So I'm going to stop here today and just say next time we're gonna dish up, from a [??]. coming around here and we have another stage [??]. What is actually used in the called [??]. And this actually allows us to start to use cash transfers. Without having to go out and read memory to get the data all the time. and then we'll also talk about what's used in the sort of, standard Intel processor, which also allows us to . But effectively adds another state which functions the same as our shared state. But allows us to sort data. Okay, let's stop here for today.