1 00:00:04,060 --> 00:00:10,883 Okay, so, now that we've gone through the beginning excersises of what a directory 2 00:00:10,883 --> 00:00:16,906 based distributed shared memory machine looks like. Let's talk about how to 3 00:00:16,906 --> 00:00:24,204 actually figure out where the directory is. So you have an address. And usually 4 00:00:24,204 --> 00:00:28,055 these systems you don't want to do it on the physical address spaces. You're not 5 00:00:28,055 --> 00:00:31,568 going to want to do this on virtual addresses. You don't want to have to run 6 00:00:31,568 --> 00:00:35,852 this, This is because you're sharing data between lots of different systems. At this 7 00:00:35,852 --> 00:00:39,462 point you're sort of, out of the system bus. Your address is no longer virtual. 8 00:00:39,462 --> 00:00:43,217 You've gone, you've gone through the translation look inside buffer or then MMU 9 00:00:43,217 --> 00:00:51,360 and you've figured out what the physical address is. So, to figure out what the 10 00:00:51,620 --> 00:00:56,828 directory is, or sometimes called the, the if in a distributed memory machine the 11 00:00:56,828 --> 00:01:04,260 home node. Or is the it's a number of which one of these directories to go to. 12 00:01:07,840 --> 00:01:12,440 And there's a lot of different ways to do this. But one of the more common ones, is 13 00:01:12,440 --> 00:01:19,160 to just use some bits out of the address. So you take the number of directories in 14 00:01:19,160 --> 00:01:25,086 the system. Take the log base two of that. And then, you take that number of bits to 15 00:01:25,086 --> 00:01:30,205 be the home node number. So when you take a cache miss, and it's not in your cache. 16 00:01:30,205 --> 00:01:35,450 And you need to go figure out and do the load of that data, we'll say. You send a 17 00:01:35,450 --> 00:01:40,316 message and the message ID and the destination of that message will actually 18 00:01:40,316 --> 00:01:45,182 be the home node. And hopefully, your interconnect knows how to route the data 19 00:01:45,182 --> 00:01:52,604 to that directory. Now, taking the high outer bits has some benefits. Lets, lets 20 00:01:52,604 --> 00:01:57,411 take a look at that. As we discussed already in a, in a non-linear form memory 21 00:01:57,411 --> 00:02:03,089 access architecture, the OS can control the placement. I can do this because based 22 00:02:03,089 --> 00:02:08,691 on these high order bits, you can actually determine where, which node in the system 23 00:02:08,691 --> 00:02:14,023 or which directory in the system you're going to. So you can actually basically 24 00:02:14,023 --> 00:02:19,220 allocate memory, allocate your stack, allocate your instruction space, based on 25 00:02:19,520 --> 00:02:24,627 the physical address and the OS commands that. Cuz the OS has absolute authority 26 00:02:24,627 --> 00:02:31,742 over where physical a ddresses get doled out to. Downside is a directory or a home 27 00:02:31,742 --> 00:02:39,029 node can become a hot spot. So let's say all of a sudden, all of the processors in 28 00:02:39,029 --> 00:02:46,326 your system try to access one page of memory. There's like a, a hot page which 29 00:02:46,326 --> 00:02:51,314 has all the locks in the system. And, you're in some threaded program and you 30 00:02:51,314 --> 00:02:57,289 have to access those locks a lot. Well, if you look at that, that's all going to be 31 00:02:57,289 --> 00:03:02,164 down here. It's going to be sort of low order addresses. It might be from sort of 32 00:03:02,164 --> 00:03:07,230 here down, whatever your page size is will say. So even, even if you're not having 33 00:03:07,230 --> 00:03:13,384 false sharing, anything like that. You typically would try to pack all the data 34 00:03:13,384 --> 00:03:17,733 onto a page or a structure or something like that, and it's pretty hard to 35 00:03:17,733 --> 00:03:22,200 interleave it based on the very high order bits of your, of your address. And 36 00:03:22,200 --> 00:03:26,784 especially considering a program has effectively no control of the high order 37 00:03:26,784 --> 00:03:33,072 bits of a physical address, that's managed by the OS. So if you do this, one node can 38 00:03:33,072 --> 00:03:37,640 become a hot spot, because these are all alias to the same directory. So all, all 39 00:03:37,640 --> 00:03:42,381 the messaging traffic goes to one node and this almost starts to turn back into a 40 00:03:42,381 --> 00:03:46,544 bus. Now, we have one directory, all traffic has to go there. It's a little 41 00:03:46,544 --> 00:03:51,054 better cause we don't necessarily need to invalid all other locations, but the 42 00:03:51,054 --> 00:03:57,168 directory and the bandwidth in the directory starts to become critical. Hm, 43 00:03:57,168 --> 00:04:04,590 well that's a tough one. The flip side is you can start to try to have the low order 44 00:04:04,590 --> 00:04:11,001 bits determine where your directory is, or which home node you're using. So, you 45 00:04:11,001 --> 00:04:17,418 still have the, the offset within a cache line. But then you have the number the, 46 00:04:17,418 --> 00:04:21,554 the bits of the physical address that can determine what home node your going to be 47 00:04:21,554 --> 00:04:26,768 the low order bits. Well, this ends up being very well load balanced, because 48 00:04:26,768 --> 00:04:31,616 you'd choose different home nodes effectively atrandom depending on which 49 00:04:31,616 --> 00:04:37,475 cache line it is. So you know two cache lines will same cache line will go to the 50 00:04:37,475 --> 00:04:43,266 same home node or the same directory. But if you have certain different cache lines 51 00:04:43,266 --> 00:04:48,989 in which it is pretty common because it is pretty hard to content all unwanted cache 52 00:04:48,989 --> 00:04:55,325 lines. This is up much data in one cache line. You'll spread across the different 53 00:04:55,325 --> 00:05:01,243 controllers and you'll effectively have some good distribution. Flip though is the 54 00:05:01,550 --> 00:05:08,675 OS losses placement ability here. So it's, it's tricky, it's a tricky trade-off here 55 00:05:08,675 --> 00:05:13,635 to think about. some people have even built systems where it's configurable. 56 00:05:13,826 --> 00:05:19,040 this gets a little more advanced. And I touched on this in the last slide of, of 57 00:05:19,040 --> 00:05:24,508 today's lecture. But, you could think about having some systems where, depending 58 00:05:24,508 --> 00:05:29,340 on the actual address and depending what comes out of your, page table. Maybe 59 00:05:29,340 --> 00:05:34,031 making different, choices of how to do the mapping. But everyone has to agree on the 60 00:05:34,031 --> 00:05:37,557 mapping. Which gets a little bit tricky cuz the directory has to agree on the 61 00:05:37,557 --> 00:05:40,900 mapping. And all of the caches in the system have to, agree on the mapping. 62 00:05:42,100 --> 00:05:45,914 Okay, so let's take a look at what is inside of a directory. So we added this 63 00:05:45,914 --> 00:05:49,829 new hardware structure, and whenever we add a new hardware structure I like to 64 00:05:49,829 --> 00:05:54,938 look at all the bits inside of the hardware structure. So we add a new arbor 65 00:05:54,938 --> 00:06:00,315 structure, and this arbor structure has an entry per cache line in, in that 66 00:06:00,315 --> 00:06:05,355 particular memory connected to the directory. So if you were to look across 67 00:06:05,355 --> 00:06:10,799 the entire system, there will actually be an extra piece of data for every single 68 00:06:10,799 --> 00:06:15,882 cache line in the system. And the naive approach to this will habit such that 69 00:06:15,882 --> 00:06:20,724 every single cache line in the system, whether it's. Sorry, not every single 70 00:06:20,724 --> 00:06:25,959 cache line, every single memory line in the system. So if you've ten terabytes of 71 00:06:25,959 --> 00:06:31,129 memory in the system, the naive approach is going to have a directory entry for 72 00:06:31,129 --> 00:06:37,738 every single block size chunk of memory, a cache box size chunk of memory in the 73 00:06:37,738 --> 00:06:43,444 system. And these are held in big tables, typically they're held in SRAM. You might 74 00:06:43,444 --> 00:06:49,545 try to put them in DRAM. And what do we have here well the directory needs to know 75 00:06:49,545 --> 00:06:55,473 what state the cache line is in and we're going to look at three different states in 76 00:06:55,473 --> 00:07:01,772 our basic protocol here shared, uncached, and exclusive. So everything starts out as 77 00:07:01,772 --> 00:07:09,909 uncache d it's out in main memory. When it gets pulled into a cache's read only, the 78 00:07:09,909 --> 00:07:18,689 directory is going to okay that is now shared. If it gets pulled into a cache 79 00:07:18,689 --> 00:07:26,097 read/write, the directory is going to note that as exclusive. Now, if it's in shared 80 00:07:26,097 --> 00:07:32,929 or exclusive, we need to know what node, well if it's exclusively, you know, 81 00:07:32,929 --> 00:07:37,472 uniquely what node has that? So we can go message it when we need, need to go 82 00:07:37,472 --> 00:07:42,313 invalidate it. And if it's shared, we need to know the list of all possible places 83 00:07:42,313 --> 00:07:47,084 that it could be, that we're going to have to send messages to. And this is better 84 00:07:47,084 --> 00:07:51,541 then having to broadcast or send messages to all the nodes in the system. So we're 85 00:07:51,541 --> 00:07:57,167 going to have what's called a sharer list here. Which is a, in a naive full map 86 00:07:57,167 --> 00:08:03,427 directory is going to have one bit per core in the system, or per cache in the 87 00:08:03,427 --> 00:08:09,927 system. And it's either just going to have one or a zero in it. So if it's a one that 88 00:08:09,927 --> 00:08:16,427 means that core has a share or read only copy of the data. And when some other 89 00:08:16,427 --> 00:08:23,248 cache goes to get it in writable in its cache it's going to have to invalidate, 90 00:08:23,248 --> 00:08:34,313 let's say this one or zero with core's cache. Now if you're exclusive, your not 91 00:08:34,313 --> 00:08:39,104 going to have multiple bit set here. Cause this basically means that, that core has a 92 00:08:39,104 --> 00:08:43,837 writable copy and we can't have if we want to keep the data coherent we won't want 93 00:08:43,837 --> 00:08:48,690 multiple, we don't want multiple copy writings in the system. So as you can see 94 00:08:48,690 --> 00:08:53,486 here, denoted only one, one here. And if it's uncached, we don't need to track 95 00:08:53,486 --> 00:09:00,804 anything there, we just got, don't cares. There's one other state here that I, I 96 00:09:00,804 --> 00:09:06,441 have and it's pending. And this usually actually turns into a couple sub-states 97 00:09:06,626 --> 00:09:12,038 there's different ways to track this. At the directory, these transactions take 98 00:09:12,038 --> 00:09:17,040 multiple steps. You're going to send some data and start transitioning. Let's say, 99 00:09:17,040 --> 00:09:21,327 you want to get a data, data writable. Well it, that, the directory's going to 100 00:09:21,717 --> 00:09:26,914 have to invalidate all the other copies. It can't do this instantaneously, but we 101 00:09:26,914 --> 00:09:32,175 want to provide the appearance of a atomicity or, or, or that the operations 102 00:09:32,175 --> 00:09:37,415 are atomic in some way. So t ypically, you'll actually have some sub-states that 103 00:09:37,415 --> 00:09:42,728 are shared, that are stored here, which are something like, oh this cash line is 104 00:09:42,728 --> 00:09:47,631 currently transitioning from, I don't know, U to E. Don't allow some other 105 00:09:47,631 --> 00:09:53,310 transaction to happen to it right now. Just kind of block that. Another way to do 106 00:09:53,310 --> 00:09:58,810 that, the one way is to store it actually in the directory, as a state bit. Another 107 00:09:58,810 --> 00:10:03,853 way is you have some fully associative structure, a side structure, which just 108 00:10:03,853 --> 00:10:09,601 has all of the cache lines currently in flux. And, the directory's smart enough to 109 00:10:09,601 --> 00:10:15,208 know that if some other request comes in for that line, while it's in flux just to 110 00:10:15,414 --> 00:10:21,089 NACK that request, or negative acknowledge that request and tell the other cache to 111 00:10:21,089 --> 00:10:26,001 retry. So you can do it either way. but it gets pretty complicated. We're not going 112 00:10:26,001 --> 00:10:30,005 to talk about all the details of that but we'll talk about the high level 113 00:10:30,168 --> 00:10:37,864 transitions assuming that they are somehow topic. So here we're going to look at how 114 00:10:37,864 --> 00:10:44,916 MSI. It fits together with this. But you could actually think about doing this with 115 00:10:44,916 --> 00:10:50,301 Mesi or some other protocol. It's a little bit simpler, emphasize a little bit 116 00:10:50,301 --> 00:10:55,756 simpler so we're going to look at that. Also the benefit of something like a Mesi 117 00:10:55,756 --> 00:11:01,071 protocol is lessened in a directory because if you pull something in, in the 118 00:11:01,071 --> 00:11:07,117 exclusive state, which is unmodified at the beginning. And someone else wants to 119 00:11:07,117 --> 00:11:11,458 get a read only copy. You're basically going to have to send a message to that 120 00:11:11,458 --> 00:11:16,147 core. And that was inexpensive on a bus, because it could just see the transaction 121 00:11:16,147 --> 00:11:20,777 going across. And it would just snoop it and would demote from E to shared or 122 00:11:20,777 --> 00:11:25,060 something like that, E to S. But now, it actually turns into actual work. The 123 00:11:25,060 --> 00:11:29,344 directory's going to have to generate messages. And you're going to have to wait 124 00:11:29,344 --> 00:11:34,565 for responses coming back from a cache which had it in exclusive, so. full mezies 125 00:11:34,565 --> 00:11:39,644 a little bit less common when you stretch grow these distributed shared memory 126 00:11:39,837 --> 00:11:48,786 protocols. Okay, so this is a slide we had before. This is MSI on a bus. Well things 127 00:11:48,786 --> 00:11:53,699 change a little bit when we go to MSI for directory coherence. And before we go 128 00:11:53,699 --> 00:11:58,549 through this, I wanted to point out, that there is actually two different state 129 00:11:58,549 --> 00:12:05,060 machines going on here. There's one state machine that is happening in the cache 130 00:12:05,060 --> 00:12:10,466 controllers, so actually, in the cache of a respective processor. And then there's a 131 00:12:10,466 --> 00:12:16,139 different state machine which is happening in the directory. And you'll see that they 132 00:12:16,139 --> 00:12:21,345 have different letters here. This is SU and E versus MS and I. And, and we label 133 00:12:21,345 --> 00:12:26,818 these differently on purpose just to, not, not get totally confused. And these state 134 00:12:26,818 --> 00:12:32,024 machines interact by sending messages between each other, and as messages flow 135 00:12:32,024 --> 00:12:38,559 between the directory and the cache. There will be both going through different state 136 00:12:38,559 --> 00:12:47,216 transitions on this, on this two tables. Okay, so let's, let's jump into this. This 137 00:12:47,216 --> 00:12:53,022 is the same modified, shared and invalid states that we have in our bus space 138 00:12:53,022 --> 00:12:58,677 snoopy and aside protocol. We didn't change anything here. And the rules, the 139 00:12:58,677 --> 00:13:04,859 rules the same. If you haven't modified, you can do a right to this and not to send 140 00:13:04,859 --> 00:13:11,516 any messages. If you have a shared, you can read the data and not have to contact 141 00:13:11,516 --> 00:13:16,192 anybody. If you have an invalid, and you want to do anything with it, you probably 142 00:13:16,192 --> 00:13:21,048 need to contact somebody. Or you probably need to contact the directory. Before, we 143 00:13:21,048 --> 00:13:26,144 would have to send the transaction on the bus. Likewise, the transition from S to M 144 00:13:26,144 --> 00:13:31,060 or M to S where you, used to communicate it was the same. So think about this as 145 00:13:31,060 --> 00:13:36,599 the same state machine running, except running on a bus where before we would 146 00:13:36,599 --> 00:13:40,789 send transactions across the bus. Now we're going to take those transactions and 147 00:13:40,789 --> 00:13:45,304 turn them into messages that we send to the directory and messages that we receive 148 00:13:45,304 --> 00:13:49,548 from the directory that we have to respond to. So before when we were snooping 149 00:13:49,548 --> 00:13:54,009 traffic crossed the bus which caused us to transition different locations. So here 150 00:13:54,009 --> 00:13:58,144 other processor has intent to write and we saw that across the bus. So we had to 151 00:13:58,144 --> 00:14:02,914 transition ourselves to the invalid state. Now, we're actually going to get a message 152 00:14:02,914 --> 00:14:07,818 from the directory controller. So let's, let's walk through this. But it's, it's 153 00:14:07,818 --> 00:14:12,850 almost exactly the same as what we saw before. So this is the, the cache date for 154 00:14:12,850 --> 00:14:23,210 a particular line for processor P1. we'll start with the entry points. We start off 155 00:14:23,210 --> 00:14:29,990 an invalid and let's say we want to get a read, a readable copy of this line. So 156 00:14:29,990 --> 00:14:35,648 we're going to take a read miss. So what we're going to do is plus serve one is 157 00:14:35,648 --> 00:14:40,633 actually going to send a read miss message through the directory controller. And 158 00:14:40,633 --> 00:14:45,493 during that time, it does not have a readable copy. It cannot go and access the 159 00:14:45,493 --> 00:14:51,274 data. It's, it's a, it's effectively still in the I state. Sometimes people will 160 00:14:51,274 --> 00:14:55,451 actually have sort of a pending state here depending on how you go to implement this 161 00:14:55,598 --> 00:14:59,726 depends if you have a side structure sort or something like a mishandling registrar 162 00:14:59,726 --> 00:15:03,560 where you'll track that in. Or you can track that in the, the cache data itself. 163 00:15:05,026 --> 00:15:11,530 So you're going to read miss. You send the read miss message, and you're waiting for 164 00:15:11,530 --> 00:15:18,033 a response. This response is going to have the data that you need. And, it's going to 165 00:15:18,033 --> 00:15:24,698 be synchronization points saying, okay you're safe to transition to S. Okay that 166 00:15:24,698 --> 00:15:29,550 seems pretty simple. Similar sort of thing here for write miss if we're in the in 167 00:15:29,550 --> 00:15:34,522 invalid state and we do a write we're going to send a write miss request to the 168 00:15:34,522 --> 00:15:39,187 directory controller. It's going to do something and it may we may have to be 169 00:15:39,187 --> 00:15:43,419 waiting for awhile here cause it may have to go invalidate all of the other lines in 170 00:15:43,419 --> 00:15:47,899 the system. And then it gets a response and once it gets a response we have a data 171 00:15:47,899 --> 00:15:52,842 that we can transition to the modified state. So as we said, these arcs are 172 00:15:52,842 --> 00:15:59,161 pretty easy you can read by P1 and nothing changes or we can read or write from the M 173 00:15:59,161 --> 00:16:05,405 state by P1 and we also communicate with anybody. But now we have a few different 174 00:16:05,405 --> 00:16:12,273 messages coming in here. If we're in the shared state, we have to be responsive to 175 00:16:12,273 --> 00:16:19,446 an invalidation message. Which is a little bit different than a bus snoop. So before, 176 00:16:19,446 --> 00:16:25,398 we saw another processor trying to write. That's when transition goes to I, but now 177 00:16:25,398 --> 00:16:30,623 the directory controller sends us a message which says, invalidate this line 178 00:16:30,623 --> 00:16:35,734 and that will transition us to I here. Note, there will probably be a reply. We 179 00:16:35,734 --> 00:16:40,357 will probably have to send a reply, because the director controller wants to 180 00:16:40,357 --> 00:16:45,184 know. When all of the cache lines in the system have been validated and it may take 181 00:16:45,184 --> 00:16:49,382 a variable amount of time and its sending messages so it wants to wait for a reply 182 00:16:49,382 --> 00:16:55,306 to come back so we're going to have to send a reply. So this arc here is similar. 183 00:16:55,306 --> 00:17:00,783 Except, we need to write back data, cause we had modified data. We had writable 184 00:17:00,783 --> 00:17:06,260 data. We get an invalidate message from the directory controller. So we need to 185 00:17:06,260 --> 00:17:11,527 write back the data, and then reply afterwards. Similar, similar sort of idea 186 00:17:11,527 --> 00:17:18,116 here. Okay, so that leaves two arcs left here in the middle. We're in shared, and 187 00:17:18,116 --> 00:17:23,611 we want to do a right to a, to that cache line. So, our cache we have in the shared 188 00:17:23,611 --> 00:17:30,629 state. We want to do a write to it. Before we can actually do a write we have to send 189 00:17:30,629 --> 00:17:36,388 a message to the directory saying, I'm doing a write miss here. I want to get 190 00:17:36,388 --> 00:17:43,200 this data writable. And we have to wait for a reply before we transition here. 191 00:17:44,151 --> 00:17:49,103 because we have to wait for the directory contror to communicate with all the other 192 00:17:49,103 --> 00:17:55,732 cache's so that they don't have redoing copies and we can have a writable copy. So 193 00:17:55,732 --> 00:18:00,337 it's going to invalidate all the other readable copies in the meantime. And then 194 00:18:00,337 --> 00:18:05,842 finally, we have an edge coming this way which is from modified down to shared. And 195 00:18:05,842 --> 00:18:11,288 this is a little bit different. well, it's the same idea here. Another processor is 196 00:18:11,288 --> 00:18:17,075 tying to do a read. So we have in a modified state when another processor 197 00:18:17,075 --> 00:18:22,378 tries to do a read. So we receive a read miss message. We don't need to invalidate 198 00:18:22,378 --> 00:18:27,354 the data, but we need to write back the data. Cuz we have the most up to date 199 00:18:27,354 --> 00:18:32,265 copy, cuz we had it modified. So we're going to go into write back the data and 200 00:18:32,265 --> 00:18:36,848 that's going to be response, and then we're going to transition to share and 201 00:18:36,848 --> 00:18:42,360 state. We can keep a read copy of this, because the other, the other core is, is, 202 00:18:42,360 --> 00:18:49,484 is only having a, a readable copy of it also. Okay, so that's the. Any questions 203 00:18:49,484 --> 00:18:58,984 about that so far? Okay, so two interesting arcs that we're going to add 204 00:18:58,984 --> 00:19:12,245 in here is this one and this one. Which we didn't have in our base MSI protocol. And 205 00:19:12,245 --> 00:19:20,051 you know, you may not need these. But what these correspond to is, if our cache has 206 00:19:20,051 --> 00:19:27,858 the data in it and then because of let's say a conflict miss, or capacity miss it 207 00:19:27,858 --> 00:19:35,101 gets bumped out. It might be a good idea to go update the directory, and tell the 208 00:19:35,101 --> 00:19:40,218 directory that in the future if some other cache wants to go get that data, that it 209 00:19:40,218 --> 00:19:46,408 doesn't need to go contact you again. So, if it's in the modified state we can write 210 00:19:46,408 --> 00:19:51,289 back the data because we have dirty data we write back that the directory and then 211 00:19:51,289 --> 00:19:55,410 we notify the directory saying we don't have a copy of this anymore you can 212 00:19:55,410 --> 00:20:00,837 transition to having it uncached. Likewise here, if we have a read-only copy we may 213 00:20:00,837 --> 00:20:04,866 or may not want to do this. If we, if there's, you know, extra bandwidth on the, 214 00:20:04,866 --> 00:20:09,318 on the interconnect we might want to send a message when we do an invalidation here. 215 00:20:09,318 --> 00:20:13,665 And this is not an invalidation because of an invalidation message, but this is an 216 00:20:13,665 --> 00:20:18,422 invalidation, because it just gets bumped out of the cache. We may want to notify 217 00:20:18,422 --> 00:20:24,511 the directory saying please remove us from the sharer list. And if the sharer list is 218 00:20:24,511 --> 00:20:29,907 already empty, the, the directory might change the cache line from shared to being 219 00:20:29,907 --> 00:20:35,966 uncached completely. but I do want to point out that these are not strictly 220 00:20:35,966 --> 00:20:41,359 necessary. The reason they're not strictly necessary is, if we build the cache 221 00:20:41,359 --> 00:20:45,703 controller system such that if you're in the invalid state for a particular cached 222 00:20:45,703 --> 00:20:49,779 line, and you get some message coming in that would have been let's say this 223 00:20:49,779 --> 00:20:54,070 message, or that message, or some other arc. We can just reply back saying yeah, 224 00:20:54,070 --> 00:20:58,521 we don't have it anymore. We're invalid, you know, we don't really care about that, 225 00:20:58,521 --> 00:21:03,833 that transition. So if you were, you were here, the only message that's going to 226 00:21:03,833 --> 00:21:08,003 come really to you is an invalidation message that would just take you to this 227 00:21:08,003 --> 00:21:12,015 state anyway. So, we can just ignore the message or just reply the s ame as we 228 00:21:12,015 --> 00:21:20,951 would to the normal invalidation message. Okay, so directory state transition looks 229 00:21:20,951 --> 00:21:33,545 a little different here. We have uncached, shared, and exclusive. As we said, shared 230 00:21:33,545 --> 00:21:38,637 means there can be multiple read-only copies in the system. Exclusive means 231 00:21:38,637 --> 00:21:44,205 there's only one cache in the system with that data. What's interesting here is if 232 00:21:44,205 --> 00:21:49,962 you were to actually have a MESI protocol running, that would not change the 233 00:21:49,962 --> 00:21:56,264 protocol running in the directory. Because exclusive here is effectively the same, 234 00:21:56,264 --> 00:22:02,365 same state, with respect to how the directory sees the line you won't have to 235 00:22:02,365 --> 00:22:07,328 do anything different. Okay, so let's walk through a few transition here of the state 236 00:22:07,328 --> 00:22:13,162 of the cache line in the directory and this is not in the cache. Let's start off 237 00:22:13,162 --> 00:22:18,193 uncashed and let's say we're getting a message which is a read miss from 238 00:22:18,193 --> 00:22:24,704 processor P. Well, we should transition to S now. We should give it a readable copy 239 00:22:24,704 --> 00:22:30,123 and we should reply with the actual data and we should put P on the sharer list, so 240 00:22:30,123 --> 00:22:35,607 that we know that if someone else needs to go invalidate that line we need to go 241 00:22:35,607 --> 00:22:40,577 contact P. Now that we're in the shared state, let's say there's other read misses 242 00:22:40,577 --> 00:22:44,879 from other P's other processors here. Well were going to give it up the data and 243 00:22:44,879 --> 00:22:49,942 we're going to add it to the sharer list so we're take sharers and add to it. The 244 00:22:49,942 --> 00:22:55,773 processor the sharer list is just going to grow. Okay lets, lets start here and go 245 00:22:55,773 --> 00:23:02,042 the other way where an un uncached in all the sun in we get a rightness from proster 246 00:23:02,042 --> 00:23:07,720 P. Well we give it the data and the sharer list or the owner is going to get P 247 00:23:07,720 --> 00:23:13,841 uniquely on to it, ever going to give it in these causes day because we're on cache 248 00:23:13,841 --> 00:23:23,006 reform. We don't want to contact anybody else. let's look at this art here before 249 00:23:23,006 --> 00:23:28,902 we go to these. So this is a little bit different. Quite a bit different than what 250 00:23:28,902 --> 00:23:34,944 we had in these slides, because it's doing something different. But in this state 251 00:23:34,944 --> 00:23:45,318 here, we know, let's say, processor P zero has the data exclusively. But all of a 252 00:23:45,318 --> 00:23:50,278 sudden, a different processor, let's say processor two goes to access the da ta. 253 00:23:50,282 --> 00:23:55,627 Well, we already have the data in the exclusive state. So we're going to stay in 254 00:23:55,627 --> 00:23:59,655 this exclusive state cuz some other caches going to want to get it exclusive, but 255 00:23:59,655 --> 00:24:05,315 it's different cache. So what has to happen here is we need to go invalidate 256 00:24:05,315 --> 00:24:10,835 the data out of P zero. P zero is going to write back the data, it's going to 257 00:24:10,835 --> 00:24:19,581 transition to the invalid state. The, we need to then provide the data to the new 258 00:24:19,581 --> 00:24:27,814 processor P2 we'll say and add that P2 to the sharer list. So we can, we can 259 00:24:27,814 --> 00:24:33,219 transition to this state and then finally let's look at the edges between these two 260 00:24:33,219 --> 00:24:40,531 points oh, actually let's go this way first. if you've data that gets ridden 261 00:24:40,531 --> 00:24:46,631 back. so this is that arc, which I said is similar to the arc here, which is 262 00:24:46,631 --> 00:24:53,659 optional. Let's say you have data that gets right, ridden back here. Actually 263 00:24:53,659 --> 00:24:58,570 this, this arc may not be optional, let's think about that for a second. This arc 264 00:24:58,570 --> 00:25:04,932 may not be optional. no it's still optional. cuz you can just NACK the 265 00:25:04,932 --> 00:25:10,273 message effectively, and, and tell it it's in main memory. okay, so let's hear, and 266 00:25:10,273 --> 00:25:15,000 you see a data write back happening. So, message gets sent to you which is the 267 00:25:15,000 --> 00:25:19,849 equivalent of this arg here. The data was writeable, was exclusive to some cache, 268 00:25:19,849 --> 00:25:24,331 and it's no longer writeable. It's probably a good idea to go contact the 269 00:25:24,331 --> 00:25:29,303 directory, write back the data, and clear the sharer list. The sharer list is empty, 270 00:25:29,303 --> 00:25:36,925 so it knows that no one has a copy of it, at that point. Okay few other financials 271 00:25:36,925 --> 00:25:44,361 here, okay we are in the shared state. So we have multiple read-only copies. And one 272 00:25:44,361 --> 00:25:50,546 cache comes along and says,"Oh, I need to do a writeness message." I need to get a 273 00:25:50,546 --> 00:25:56,227 writtable. Well, now we actually have to go through a pretty long process. We're 274 00:25:56,227 --> 00:26:00,372 going to walk through the entire sharer list and send messages to all the sharers 275 00:26:00,372 --> 00:26:05,253 in the sharer list saying, invalidate this copy and tell me when you're done. We're 276 00:26:05,253 --> 00:26:09,355 going to collect all the responses at the directory. And once all the responses have 277 00:26:09,355 --> 00:26:15,231 come back, we know no one else has readable copy. We can give the data value 278 00:26:15,231 --> 00:26:26,648 to the requester. And add it to the sharer or owner list. Okay, last arc here is from 279 00:26:26,648 --> 00:26:32,666 E to S. This orange arc and that happens if we have a particular line as writable 280 00:26:32,666 --> 00:26:37,918 in one cash, and another cash wants to go read it now. Will send a read miss the 281 00:26:37,918 --> 00:26:44,218 other cache is going to downgrade from E to S, excuse me from M to S in its vocal 282 00:26:44,218 --> 00:26:49,527 cache. But the directory is going to transition from E to S here and we have to 283 00:26:49,527 --> 00:26:55,401 go get the most up to date from the node. So, we're going to send a fetches and a 284 00:26:55,401 --> 00:27:00,993 fetch request to the node that had it before and exclusive, once you get the up 285 00:27:00,993 --> 00:27:06,656 to most up to date data you can forward that to the new reader and everyone and, 286 00:27:06,656 --> 00:27:13,796 and we add their processor to the sharer list. Okay, so questions about that one so 287 00:27:13,796 --> 00:27:18,555 far? These, these do start to get a little complicated because you have multiple 288 00:27:18,555 --> 00:27:27,586 state machines interacting. Okay, so were going to speed up a little bit here. I 289 00:27:27,586 --> 00:27:32,267 include this chart from your book just to give you an example of. We went through, 290 00:27:32,267 --> 00:27:36,716 very quickly here, all the different messages. And, this chart here sums up all 291 00:27:36,716 --> 00:27:41,454 the different message types. And from who they could go from and who they could go 292 00:27:41,454 --> 00:27:46,423 to. And this is, this is in your textbook. and sometimes messages need to communicate 293 00:27:46,423 --> 00:27:50,641 addresses. Sometimes they need to communicate data. Sometimes they need to 294 00:27:50,641 --> 00:27:55,091 communicate which node the message is coming from. To add it to the, the sharer 295 00:27:55,091 --> 00:27:59,616 list. But I'm not going to go through this into, to great detail. One think I did 296 00:27:59,616 --> 00:28:06,179 want to say is, these message types here, do not include . So, when you go to 297 00:28:06,179 --> 00:28:16,888 request something, there's replies that come back. These replies after, that's not 298 00:28:16,888 --> 00:28:23,887 drawn in this diagram. We, we see data value reply but that's not, that's just 299 00:28:23,887 --> 00:28:28,994 what of, actual data. There's not like a, response coming back from the sharer 300 00:28:28,994 --> 00:28:33,583 acking the, the sharer, or acking the invalidator or something like that. 301 00:28:33,583 --> 00:28:38,560 Another type of message that is pretty common, that is not drawn here is a 302 00:28:38,560 --> 00:28:43,731 negative acknowledgement. So it's pretty common if you have a cache line that is 303 00:28:43,731 --> 00:28:48,256 being transitioned, it's in a pending state, at the directory , and get a 304 00:28:48,256 --> 00:28:55,123 request coming in. You might need to tell that cach retry later. I can't handle this 305 00:28:55,123 --> 00:28:56,180 case later right now.