1 00:00:03,200 --> 00:00:06,643 Okay. This brings us to our next optimization, 2 00:00:06,643 --> 00:00:11,665 which is the amount of [inaudible] optimizations and this is one that 3 00:00:11,665 --> 00:00:15,395 actually is no longer talked about in your textbook. 4 00:00:15,395 --> 00:00:20,346 The old versions of your textbook did talk about this optimization. 5 00:00:20,561 --> 00:00:25,870 I, I think it's still an important, interesting optimization to think about 6 00:00:25,870 --> 00:00:29,026 and this is something called a victim cache. 7 00:00:29,026 --> 00:00:33,402 The victim cache, let's take a look at the picture first. Here, 8 00:00:33,402 --> 00:00:36,990 Cpu, L1 cache, L2 cache, It's a multilevel cache. 9 00:00:36,990 --> 00:00:45,512 But now., If you're missing your L1, you put another 10 00:00:45,512 --> 00:00:51,860 little spot check. We put something we call a victim cache 11 00:00:51,860 --> 00:00:57,944 and a victim cache is a fully associative cache but it's usually very small. 12 00:00:57,944 --> 00:01:02,133 So, let's say, one, two, three, four entries, something like that. 13 00:01:02,133 --> 00:01:07,368 Seems very small cache structure here, but it's fully associative. 14 00:01:07,368 --> 00:01:13,302 But what's nice about this is sometimes in programs you have a problem where, I don't 15 00:01:13,302 --> 00:01:18,607 know, let's say, you have a two-way set associative L1, which you have three 16 00:01:18,607 --> 00:01:23,633 things that want to be in that one location that has one index to cache. 17 00:01:23,633 --> 00:01:28,660 This keep fighting and knocking each other out over and over and over again. 18 00:01:29,400 --> 00:01:35,222 Just actually one extra fully associative entry would solve this. 19 00:01:35,222 --> 00:01:41,402 And then you have the edit here in the way zero, way, way one, and now it's victim 20 00:01:41,402 --> 00:01:46,949 cache. So this is actually the, the first 21 00:01:46,949 --> 00:01:51,501 processor that showed up in was an HP PA-risc processor. 22 00:01:52,742 --> 00:01:57,541 Probably HP 7,000 per [inaudible] 7,000 series processor. 23 00:01:57,541 --> 00:02:02,920 But, you can actually have this, this little extra cache here. 24 00:02:03,168 --> 00:02:09,577 A couple extra little things you need to, to think about though, is, whether you 25 00:02:09,577 --> 00:02:14,865 check this cache in series or in parallel with the L1. 26 00:02:14,865 --> 00:02:19,680 If you're checking in parallel it's going to hurt your hit time. 27 00:02:19,680 --> 00:02:24,733 We've checked with the series, checked this first, then checked this, and then 28 00:02:24,733 --> 00:02:27,925 checked the L2. You know, checking these together so we'd 29 00:02:28,324 --> 00:02:33,377 have a worst hit time, like we need your L1 to sort of pass it along the way it 30 00:02:33,377 --> 00:02:37,700 was, so you don't have to put multiplexer and, after your L1 checked. 31 00:02:37,700 --> 00:02:44,081 It's a trade off. People, well, Well, checking both in parallel and in 32 00:02:44,081 --> 00:02:49,137 serial. Let's look at the mechanics of how this 33 00:02:49,137 --> 00:02:53,285 works. So on a this inner L1 we're we generate 34 00:02:53,285 --> 00:02:57,731 the victim, Gets your transition the victim to here 35 00:02:57,731 --> 00:03:03,020 and then we bring in the new data from here into the L1. 36 00:03:03,600 --> 00:03:08,679 Unless, the data is in the victim cache already and then all we do is basically 37 00:03:08,679 --> 00:03:12,215 swap the two, Take the lines in the victim and lines in 38 00:03:12,215 --> 00:03:17,229 the data, and swa, but you usually have sort of separate data paths in that. 39 00:03:17,229 --> 00:03:23,665 You can have going on simultaneously. So you could, after one big, we are one at 40 00:03:23,665 --> 00:03:30,323 the other, and vice versa. If you missing the victim cache, missing 41 00:03:30,323 --> 00:03:37,742 the L1, interesting question comes up here with your data that's in the victim cache. 42 00:03:37,742 --> 00:03:49,325 What happens to that? We put a victim in the victim cache. 43 00:03:49,325 --> 00:03:53,611 Do we need to do something with that victim first?question 44 00:03:54,382 --> 00:03:56,018 Mark here. Hm,, 45 00:03:56,216 --> 00:03:58,596 Unclear. It really depends on design. 46 00:03:58,596 --> 00:04:04,281 If the design is such that when we,, when we generate victim from the L1, we both 47 00:04:04,281 --> 00:04:08,050 put it in the victim cache and we flush it down to our L2. 48 00:04:08,050 --> 00:04:12,170 Then, when we have to, let's say, victimize from the victim cache, the 49 00:04:12,170 --> 00:04:17,027 victim cache would say only its two entries, and we need to put a new victim 50 00:04:17,027 --> 00:04:20,471 in the victim cache. We can just throw on to the ground. 51 00:04:20,471 --> 00:04:27,420 We don't actually need to deal with that victim because the data is [inaudible] If 52 00:04:27,420 --> 00:04:32,524 not, we'd actually have to take the upper victim cache that's possibly dirty and go 53 00:04:32,524 --> 00:04:38,141 over it out into the L2. But in reality what this is trying to do 54 00:04:38,141 --> 00:04:42,587 is it's trying to reduce our conflictnesses, by gaining us a pseudo 55 00:04:42,587 --> 00:04:46,304 higher associate cache with just a very small second. 56 00:04:46,304 --> 00:04:51,481 It's interesting to look sort of a [inaudible] and people that built on this, 57 00:04:51,481 --> 00:04:55,795 This victim cache, a very small victim cache, can go a long way. 58 00:04:55,994 --> 00:05:01,558 Because there's a lots of times in the program, there may be that one value, that 59 00:05:01,558 --> 00:05:08,486 one address which is what highly intended in your cache, or this L1 line of cache. 60 00:05:08,486 --> 00:05:13,996 But everything else in the cache is a sort of [inaudible] will say. 61 00:05:13,996 --> 00:05:18,170 And this can really help you with the [inaudible]. 62 00:05:18,170 --> 00:05:21,598 Okay, so going, going back to our victim cache here, 63 00:05:21,598 --> 00:05:26,191 If we look at the sorta our victim cache just by itself shouldn't be L1. 64 00:05:26,671 --> 00:05:32,431 This kind of looks like a multiple cache. The answer is going to be the same as what 65 00:05:32,431 --> 00:05:36,887 we had for a multiple cache. If you look at it just through the L1, 66 00:05:37,162 --> 00:05:44,878 it's going to make the Miss Penalty go down as we [inaudible] If you look at the 67 00:05:44,878 --> 00:05:50,970 L1 with the victim cache, both the Miss Penalty will be more than we had to go out 68 00:05:50,970 --> 00:05:52,680 to the next level cache.