1 00:00:03,480 --> 00:00:08,918 So, today we're going to continue our adventure in computer architecture and 2 00:00:08,918 --> 00:00:14,356 talk more about parallel computer architecture. last time we talked about 3 00:00:14,356 --> 00:00:19,794 coherence, memory coherence, and cache coherence, systems and to differentiate 4 00:00:19,794 --> 00:00:26,812 that from memory consistency models which is a model of how memory is supposed to 5 00:00:26,812 --> 00:00:33,814 work, versus the underlying algorithms that try to keep memory consistent, and 6 00:00:33,814 --> 00:00:40,904 try to implement the consistency models. We left off last time, we, we were talking 7 00:00:40,904 --> 00:00:48,740 about MOESI, or also known as the Illinois protocol, and we walked through all of the 8 00:00:48,740 --> 00:00:55,703 different arcs through here. And if you recall what we were talking about, was, we 9 00:00:55,703 --> 00:01:01,969 split the shared state from the MSI protocol into two states, shared and 10 00:01:01,969 --> 00:01:06,993 exclusive. And the insight here is, it's very common for programs to read a memory 11 00:01:06,993 --> 00:01:11,321 address, which will pull it into your cache. And then go modify that memory 12 00:01:11,321 --> 00:01:15,942 address. So for instance, if you want to increment a number. You're going to do a 13 00:01:15,942 --> 00:01:20,564 load. It's gonna bring it into your, your, ca-, or into your register set. But also 14 00:01:20,564 --> 00:01:24,956 into your cache. You're going to going to increment the number and then you do a 15 00:01:24,956 --> 00:01:29,330 write back to the exact same location. Pretty common in imperative programming 16 00:01:29,330 --> 00:01:33,381 languages. Declarative programming languages like Scheme and such, they may 17 00:01:33,111 --> 00:01:37,377 at times copy everything. But for declara- excuse me, imperative programming 18 00:01:37,377 --> 00:01:42,118 languages it's pretty common to actually change state in place. So, because of 19 00:01:42,118 --> 00:01:47,596 that, you can bring it right into, this exclusive stage, and then when you have to 20 00:01:47,596 --> 00:01:53,143 go to modify it, you would have to go and broadcast in the bus. You know, you would 21 00:01:53,143 --> 00:01:58,347 have to talk to anybody and would loose, hm, effectively this intent to write 22 00:01:58,347 --> 00:02:03,619 message. Then you would have to send otherwise across the bus and waiting for 23 00:02:03,619 --> 00:02:09,166 that address to be snooped on the bus, or be seen by all the other entities on the 24 00:02:09,166 --> 00:02:16,434 bus. note I, I say entities in the bus. We've been talking primarily about, 25 00:02:16,434 --> 00:02:23,120 processors, the last day, but there can be other entities on the bus that want to 26 00:02:23,120 --> 00:02:29,242 snoop the bus. So examples sometimes include cohe rent, IO devices. So, this 27 00:02:29,242 --> 00:02:33,655 isn't very popular right now, but I think this will become much more popular as soon 28 00:02:33,655 --> 00:02:37,910 as we start to have, GPUs or Graphics Processing Units or general purpose GPUs, 29 00:02:37,910 --> 00:02:42,007 which will be sitting, effectively, very close to our processor on the same bus, 30 00:02:42,007 --> 00:02:46,157 and will want to take part in the coherence traffic of the processor. So 31 00:02:46,157 --> 00:02:50,412 it's going to want to basically read and write to the same memory addresses that 32 00:02:50,412 --> 00:02:54,720 the processor is reading and writing, and take part in the cash coherence protocol. 33 00:02:54,908 --> 00:03:00,260 At a minimum, usually your IO devices need to effectively tell the processor when its 34 00:03:00,260 --> 00:03:05,360 doing a memory transaction that the processor should know about. So typically 35 00:03:05,360 --> 00:03:10,271 when you are moving data from a IO device to main memory, that's going to have to 36 00:03:10,271 --> 00:03:15,120 effectively go across the person. Everyone is going to have to validate their 37 00:03:15,120 --> 00:03:20,220 cache's, you have to snoop the traffic, or they will all have to snoop that memory 38 00:03:20,220 --> 00:03:27,032 traffic from the IO device. So we had talked about MOESI as an enhancement to 39 00:03:27,032 --> 00:03:32,668 MSI. Well, we left off last time, and we were going to talk about two more 40 00:03:32,668 --> 00:03:39,517 enhancements that are pretty common. one is been used widely in AMD Opterons. I 41 00:03:39,517 --> 00:03:46,367 think they still use this in AMD. I think they use something similar to this still 42 00:03:46,367 --> 00:03:53,010 in AMD, is my understanding. and the idea is you add an extra state here, which is 43 00:03:53,010 --> 00:03:59,063 called ownership, or the owned state. And effectively, what this is, is it looks 44 00:03:59,063 --> 00:04:05,267 just like our MOESI protocol from before. But now, instead of having data in the 45 00:04:05,267 --> 00:04:10,941 modified stage, when you, let's say another processor needs to go access that 46 00:04:10,941 --> 00:04:16,842 data, instead of having to send all that data back to main memory, and validate 47 00:04:16,842 --> 00:04:22,725 that line out to main memory, and go fetch it back from main memory. Instead, you can 48 00:04:22,725 --> 00:04:27,127 do direct cache to cache transfer. This is of a, basically an optimization here. So, 49 00:04:27,127 --> 00:04:31,832 you don't have to right back to data to main memory, and in fact you can allow 50 00:04:31,832 --> 00:04:37,095 main memory to be stale. And you can just transfer the data across the bus from the 51 00:04:37,095 --> 00:04:43,331 one cache to the cache which needs it. So in this example here, we're going to look 52 00:04:43,331 --> 00:04:48,833 at this edge here. So another processor wants to read the data. So we see an 53 00:04:48,833 --> 00:04:55,069 intent to write to a particular cache line and our processor currently has it in the 54 00:04:55,069 --> 00:05:02,480 modified state. We see this other processors intent to write, and. or excuse 55 00:05:02,480 --> 00:05:07,948 me. Intent to read. And we're actually going to provide the data out of our 56 00:05:07,948 --> 00:05:13,638 cache, and not write it back to main memory, and transition the line in our 57 00:05:13,638 --> 00:05:21,846 cache to this owned state. The other processors can now take it in, and take it 58 00:05:21,846 --> 00:05:31,005 in a shared state. So they will have it a read, read only copy. Now, note this is 59 00:05:31,005 --> 00:05:36,146 only for, for read-only, we'll talk about if another processor wants to write to the 60 00:05:36,146 --> 00:05:40,977 state in a second. So we have it in its own state, and what we're trying to do 61 00:05:40,977 --> 00:05:46,118 here is this processor is tracking that, that data needs to be written back to main 62 00:05:46,118 --> 00:05:50,701 memory at some point. That's the whole purpose of this state here, is we've 63 00:05:50,701 --> 00:05:55,718 basically designated a processor which owns the data and owns the modified state. 64 00:05:55,718 --> 00:06:01,675 So the processors which take at read only get it into the shared state, and if they 65 00:06:01,675 --> 00:06:06,200 need to invalidate the line, they don't need to contact anybody. Because they are 66 00:06:06,200 --> 00:06:09,933 having a share state, they have a read-only copy. They don't need to make 67 00:06:09,933 --> 00:06:14,063 any bus transactions. So if you think about it, if you actually want to 68 00:06:14,063 --> 00:06:18,757 effectively have one core, read the state from other, read this dirty state from the 69 00:06:18,757 --> 00:06:25,583 other core, and then in some points it goes in and just invalidates it in the, in 70 00:06:25,583 --> 00:06:31,861 the second core. If the data is not up to date as in, it would be in main memory, 71 00:06:31,861 --> 00:06:38,457 you lose the changes. So, by one processor keeping it in the own state here, it keeps 72 00:06:38,457 --> 00:06:44,973 track that at some point, if it never gets invalidated out of that processor's cache, 73 00:06:44,973 --> 00:06:51,625 it needs to write that out to main memory, to keep it up, up to date. Now, there's a 74 00:06:51,625 --> 00:06:57,123 couple other arcs here. you can transition from the own state back to the modified 75 00:06:57,123 --> 00:07:02,504 state if the processor, which has it in the owned state wants to go to a write. . 76 00:07:02,504 --> 00:07:06,858 It can't do that while it's in the owned state, because while it's in the owned 77 00:07:06,858 --> 00:07:11,212 state, other pro cessors may have shared copies of it. So, when it needs to do 78 00:07:11,212 --> 00:07:15,682 that, if it wants to do, P1 wants to do a write here, it needs to re-invalidate 79 00:07:15,682 --> 00:07:20,210 everyone else's copies across the bus. So it's going to have to send an intent to 80 00:07:20,210 --> 00:07:24,855 write for that line, and everyone else will snoop that traffic, and transition to 81 00:07:24,855 --> 00:07:29,660 the invalid state. And then, this processor will be able to transition to 82 00:07:29,660 --> 00:07:35,328 the modified state, and now it's able to actually modify the data. Okay. So we've 83 00:07:35,328 --> 00:07:41,812 got this arc here, which we sort of already talked about, is that if you're in 84 00:07:41,812 --> 00:07:47,881 the owned state, anyone else can get read only shared copies of it. . They can't go 85 00:07:47,881 --> 00:07:53,510 get an exclusive copy, because that would basically violate this notion, cuz then 86 00:07:53,510 --> 00:07:59,277 they would be able to upgrade to modified without telling anybody, and we don't want 87 00:07:59,277 --> 00:08:04,770 that. But they can get shared read-only copies of the data and then there's this 88 00:08:04,770 --> 00:08:10,400 arc here from owned to invalid, is if some other processor wants to write the data. 89 00:08:10,820 --> 00:08:17,850 We're going, processor one, P1 here will say, we'll see the intent to write from 90 00:08:17,850 --> 00:08:24,620 another processor. It will, snoop that traffic effectively, and at that point it 91 00:08:24,620 --> 00:08:34,133 will transition to this invalid state. note here that this intent to write, we 92 00:08:34,133 --> 00:08:39,786 may need to provide information across the bus when we're in the owned state. Because 93 00:08:39,786 --> 00:08:44,840 if the only, if we're the only owner of that, or the only cache that has that 94 00:08:44,840 --> 00:08:50,294 data, and the other processor is basically going straight into this state here via 95 00:08:50,294 --> 00:08:57,960 rightness, we're going to need to provide the data. Okay, so, questions about MOESI? 96 00:08:58,700 --> 00:09:02,295 So far, But a basic, extra optimization.'Cause we don't have to. We 97 00:09:02,295 --> 00:09:06,246 can basically transfer data around. And one cache can have a, a, a cache line in 98 00:09:06,246 --> 00:09:10,247 the owned state. And later, some other cache, you know, the exact same cache line 99 00:09:10,247 --> 00:09:14,197 in the own state. And it can basically bounce around without ever having to go 100 00:09:14,197 --> 00:09:18,835 out to main memory. And this, this decreases our bandwidth out to the main 101 00:09:18,835 --> 00:09:29,626 memory system. Okay. Then we're going to talk about MESIF or MESIF, which is 102 00:09:29,626 --> 00:09:36,488 actually used in the core I7 in the most up to date Intel processors. And it looks 103 00:09:36,488 --> 00:09:43,935 very similar to MOESI, except we're going to see an extra little letter in this one 104 00:09:43,935 --> 00:09:52,153 bubble here. Effectively, the, what's going on here is we add an extra state 105 00:09:52,153 --> 00:09:58,919 called the Forward State. And this is similar to sort of the optimization we saw 106 00:09:58,919 --> 00:10:09,510 in MOESI, except it can't keep the data writeable. So, what happens in this 107 00:10:09,510 --> 00:10:14,941 protocol is, let's say the first cache, which does a read miss on a line for 108 00:10:14,941 --> 00:10:21,127 widely shared data is going to be elected and going to get the data in this forward 109 00:10:21,127 --> 00:10:27,255 state. And then if other caches want to get read only copies, bring it in shared. 110 00:10:27,255 --> 00:10:34,640 Instead of having to go out to main memory, the cache that has it in the 111 00:10:34,640 --> 00:10:40,906 forward state is going to provide that data across the bus. So this is going to 112 00:10:40,906 --> 00:10:47,172 effectively decrease our bandwidth to main memory, by providing the data out of 113 00:10:47,172 --> 00:10:53,352 another cache's, cache, effectively, or another processor's cache rather, and then 114 00:10:53,352 --> 00:10:57,988 you won't have to, have to transition it. Now, this is a little bit of a 115 00:10:57,988 --> 00:11:03,220 simplification. There is a question here of, if you're in this forward state and 116 00:11:03,220 --> 00:11:08,550 you invalidate the data. Who has it? does anyone provide the data? So there's sort 117 00:11:08,550 --> 00:11:13,342 of two choices here. One choice is no one has it in the forward state. So when it's 118 00:11:13,342 --> 00:11:17,901 a snooper quest for a line, it actually has, it just have to go out of the main 119 00:11:17,901 --> 00:11:22,576 memory. That's kind of the easy case. The other case is you could try to actually 120 00:11:22,576 --> 00:11:27,427 build a protocol where another cache when one, one cache invalids the forward, it 121 00:11:27,427 --> 00:11:32,161 just chooses another cache. But probably the simplest thing you do is when the 122 00:11:32,161 --> 00:11:37,039 forward, the forwarding core invalidates the data. For whatever reason, you just go 123 00:11:37,039 --> 00:11:40,784 back out to main memory, because there's always a copy in main memory. So 124 00:11:40,784 --> 00:11:44,358 effectively you're just keeping read only copies. Yeah, you're right. You're 125 00:11:44,358 --> 00:11:47,835 probably going to enter, into the exclusive state. That's a good question. 126 00:11:50,240 --> 00:11:56,991 so I read two different versions of this in, in different books. So, I'm not quite 127 00:11:56,991 --> 00:12:03,135 sure Intel actually documents what they do for, for this. probably what's okay, so, 128 00:12:03,135 --> 00:12:07,399 so, you probably will, youre probably right. You probably want to enter straight 129 00:12:07,399 --> 00:12:13,657 into exclusive state. If you have a read only copy, What. You, you, yeah. So what's 130 00:12:13,657 --> 00:12:18,470 gonna happen is when you transition from E to S here, you're gonna transition from E 131 00:12:18,470 --> 00:12:23,226 to F. And then you're going to be able to make this, you'll end up in the F state. 132 00:12:23,226 --> 00:12:27,408 So the first person who actually downgrades is going to always end up in 133 00:12:27,408 --> 00:12:32,508 the F state. but like I said. I saw other references where people said, There were 134 00:12:32,508 --> 00:12:36,862 other people implementing something similar to this. Where they, Some have, 135 00:12:36,862 --> 00:12:41,838 some have some election where they figure out who is the, the forwarding, Node, but 136 00:12:41,838 --> 00:12:49,430 probably the easiest thing to do is to downgrade from E to F. So the rest of the 137 00:12:49,430 --> 00:12:55,694 course, we're gonna look at how to scale beyond these broadcast and these 138 00:12:55,694 --> 00:13:02,472 invalidate protocols that have to snoop on a bus. so, some of the problems of 139 00:13:02,472 --> 00:13:08,660 building these. Snooping systems is, that you need, it really affects how you design 140 00:13:08,660 --> 00:13:14,100 your processor. So first of all, you're gonna have to add more bandwidth into your 141 00:13:14,100 --> 00:13:20,473 cache. Or at least more bandwidth into your tag array. so one choice is going to 142 00:13:20,473 --> 00:13:26,711 dual port your tags. Another choice is you can steal cycles for snoops. So what I 143 00:13:26,711 --> 00:13:31,847 mean by steal cycles is if there is a bus transaction happening and you need to 144 00:13:31,847 --> 00:13:36,662 check this against your tags, you actually block the main processor that is 145 00:13:36,662 --> 00:13:41,862 associated with that cash from accessing the cash that cycle, so you, you generate 146 00:13:41,862 --> 00:13:48,424 a stall signal to the cash or to the main pipe. And one of the things here that get 147 00:13:48,823 --> 00:13:54,207 a little tricky is, and this will affects your design, is if you have a multilevel 148 00:13:54,207 --> 00:13:59,658 cache, usually you want to put your sort L2 tag array on the bus and snoop against 149 00:13:59,658 --> 00:14:04,644 your L2 tag array. But if it hits there and you figure out that you have to 150 00:14:04,644 --> 00:14:10,320 invalidate something. You're going to have to invalidate down the entire cache 151 00:14:10,320 --> 00:14:15,000 hierarchy, all the way down to the level one cache. So this can actually affect 152 00:14:15,000 --> 00:14:19,680 your throughput on your level one cache effectively. And also, it sort of is, is 153 00:14:19,680 --> 00:14:23,940 annoying to do, cuz it's going to effectively have to reach down and touch 154 00:14:23,940 --> 00:14:28,320 your tag array of your L1 cache. And as I had mentioned, I think, last time, 155 00:14:28,320 --> 00:14:33,386 briefly. If you're thinking about something like a exclusive cache. So a 156 00:14:33,386 --> 00:14:40,993 cache where the tags and the L2 don't have the tags in the L1. You're going to have 157 00:14:40,993 --> 00:14:46,236 to check both tags for every snoop transaction, and that can be pretty, 158 00:14:46,236 --> 00:14:52,378 pretty painful, to do. Or you have to copy, the L1 tags, but it's effectively 159 00:14:52,378 --> 00:14:58,520 the same thing as just having a, inclusive cache, but maybe for little less data 160 00:14:58,520 --> 00:15:06,002 storage. Okay, so what limits our performance? Why can't we just build 1,000 161 00:15:06,002 --> 00:15:11,278 processors on a big bus? Well it's the same idea if you have 1,000 people in this 162 00:15:11,278 --> 00:15:16,491 room, and they're all trying to shout to each other at the same time. At some point 163 00:15:16,491 --> 00:15:21,322 you, you run out of both bandwidth, and more importantly you need some way to 164 00:15:21,322 --> 00:15:27,817 coordinate them. But also, but also if you wanted to, if you're required to basically 165 00:15:27,817 --> 00:15:33,386 serialize, the occupancy on the bus goes up. So, if you have one bus with two 166 00:15:33,386 --> 00:15:38,880 people talking on the bus at a time, they each can, let's say, and they talk 167 00:15:38,880 --> 00:15:43,781 ten percent of the time, then you have a twenty percent utilized bus. Well all of a 168 00:15:43,781 --> 00:15:49,647 sudden, if you have ten people on this bus, you have 100% utilized bus and if you 169 00:15:49,647 --> 00:15:54,844 have 1,000 people, you have an oversubscribed bus so, you have to worry 170 00:15:54,844 --> 00:16:00,327 about the bandwidth, and occupancy, cuz we do need to make these different bus 171 00:16:00,327 --> 00:16:05,215 transactions atomic. So it's not quite just a bandwidth problem. And what, what I 172 00:16:05,215 --> 00:16:10,766 mean by balance, you could make the bus wider. To increase the bandwidth, but it's 173 00:16:10,766 --> 00:16:16,337 not going to solve our problems. Because there's an occupancy challenge here also 174 00:16:16,337 --> 00:16:21,426 that you need effectively atomic transactions to happen across the bus in 175 00:16:21,426 --> 00:16:28,852 order to keep the cache coherence protocol correct. Okay, so before we move off this 176 00:16:28,852 --> 00:16:34,634 topic into our interconnection networks, that we were talking about today, hm, I 177 00:16:34,634 --> 00:16:40,713 want to talk about one of the challenge of, that happens in simple cache coherence 178 00:16:40,713 --> 00:16:49,322 systems. And that's false sharing. So caches, like to track information on, a 179 00:16:49,322 --> 00:16:56,331 particular bloc size. So, we've talked about cach es which have 64 byte, lines, 180 00:16:56,331 --> 00:17:03,018 or 64 byte block sizes, and they can be bigger or smaller than that. Now, one of 181 00:17:03,018 --> 00:17:08,374 the things that happens that is pretty unpleasant in these coherence protocols is 182 00:17:08,374 --> 00:17:13,337 let's say, you take a piece of data which is shared, and needs to be coherent 183 00:17:13,337 --> 00:17:18,824 between two different processors. And it's gets communicated relatively often. And 184 00:17:18,824 --> 00:17:24,180 you put some other piece of critical data right next to it, on the same cache line. 185 00:17:25,480 --> 00:17:30,209 All of the sudden, what's going to happen is, because they're packed into one cache 186 00:17:30,209 --> 00:17:37,006 line, and we only track that information on a per cache line basis, whenever that 187 00:17:37,006 --> 00:17:43,615 one piece of data, let's say it's a four byte integer, and there's another four 188 00:17:43,615 --> 00:17:49,249 byte integer which is not shared, . Whatever the first four by energy which 189 00:17:49,249 --> 00:17:53,623 let's just say a lock or something like that gets, gets bounced around between 190 00:17:53,623 --> 00:17:57,781 caches you're gonna bounce around the other data. So this can effectively can 191 00:17:57,781 --> 00:18:02,046 hurt your performance common case performance for non shared data by having 192 00:18:02,046 --> 00:18:06,366 this true sharing of data happening. And this is not something that typically 193 00:18:06,366 --> 00:18:10,578 happens in a normal cache because in a uniprocessor cache system you're gonna 194 00:18:10,578 --> 00:18:14,520 bring the data in and it's gonna bring everything in and you get spacial 195 00:18:14,520 --> 00:18:19,772 locality. And if you. Pump it out, you know, you can, you can get conflicts which 196 00:18:19,772 --> 00:18:25,390 are sort of equivalent to this but it's a little bit different idea here. It's never 197 00:18:25,390 --> 00:18:30,540 going to be in the same line. But with false sharing, we, we do see this. Hm, now 198 00:18:30,540 --> 00:18:36,158 false sharing is interesting because people have come up with a whole measure 199 00:18:36,158 --> 00:18:41,642 of techniques to avoid it. So, anyone have an idea, one, one really good technique to 200 00:18:41,642 --> 00:18:47,700 avoid false sharing? What we can do, and this is pretty common, is either the 201 00:18:47,700 --> 00:18:54,164 programmer or the compiler can detect that this is happening and it will actually pad 202 00:18:54,164 --> 00:18:59,877 the information out. So waste memory for highly contended pieces of data, and 203 00:18:59,877 --> 00:19:06,442 co-locate it with nothing that is shared. . So one of the better examples of why you 204 00:19:06,442 --> 00:19:10,607 really have to care about this is something like your stack. Sometimes if 205 00:19:10,607 --> 00:19:14,975 you, if you were to have, let's say, a lock on your stack, there's a lot of data 206 00:19:14,975 --> 00:19:19,569 which you need to use often, and it's all local. Stacks between threads are all 207 00:19:19,569 --> 00:19:23,845 local. But if you have, like, some sort of variable that you pass to someone else, 208 00:19:23,845 --> 00:19:28,211 which is a struct, and inside that struct is a lock, or something like that. All of 209 00:19:28,211 --> 00:19:32,146 sudden, you're basically going to be bouncing a line around which is your 210 00:19:32,146 --> 00:19:36,931 stack. And it's, other people are going to be invalidating your stack. So one way to 211 00:19:36,931 --> 00:19:41,804 solve this is when you put a lock, and the compiler can sometimes recognize this. 212 00:19:41,804 --> 00:19:47,029 Because you can actually designate memory addresses as locks, with special keywords, 213 00:19:47,029 --> 00:19:51,726 sometimes, depending on the language. And when you do that, it'll say, oh, don't put 214 00:19:51,726 --> 00:19:56,364 this with anything else, or maybe only collocate this data with things that are 215 00:19:56,364 --> 00:20:01,120 other locks. because that may have bad sharing performance anyway, for instance. 216 00:20:01,120 --> 00:20:08,931 So and really what you want to do here, is not have a false sharing case. Now, the 217 00:20:08,931 --> 00:20:16,974 analog default sharing is actually true sharing. So there are, there are cases 218 00:20:16,974 --> 00:20:24,235 where you'll have multiple pieces of data that are, shared differently between 219 00:20:24,235 --> 00:20:29,458 different lines. But they are also widely shared. So example of this is, you have an 220 00:20:29,458 --> 00:20:34,553 array of locks, and different processors won't be grabbing these blocks randomly. 221 00:20:34,746 --> 00:20:39,969 You can use similar techniques in fold sharing. Now, you probably don't want all 222 00:20:39,969 --> 00:20:46,444 those locks to be on those cache line. Because the locks are basically going to 223 00:20:46,444 --> 00:20:50,725 be bouncing around, and everyone is going to be contending for that one cache line 224 00:20:50,725 --> 00:20:54,704 to get it, modified in their ache, or in the M state in their cache. So what you 225 00:20:54,704 --> 00:20:58,833 can think about doing is, is actually just doing the similar technique, and putting, 226 00:20:58,833 --> 00:21:08,960 each of those locks on a separate cache line. Okay, so let's switch gears here.