1 00:00:03,200 --> 00:00:10,170 So today we're going to start off and it is our final installment of ELE475. We 2 00:00:10,170 --> 00:00:15,636 have to cover all of the rest of computer architecture in this one lecture. So, 3 00:00:15,636 --> 00:00:21,312 there's a lot to cover. A lot of things to discuss. But, more seriously, today, we 4 00:00:21,312 --> 00:00:26,427 are going to, going to be finishing up what we were talking about, with 5 00:00:26,427 --> 00:00:31,823 interconnection networks. Mainly, credit based flow control. A little bit about 6 00:00:31,823 --> 00:00:37,149 deadlock and that will complete our interconnection networks. And then we'll 7 00:00:37,149 --> 00:00:41,696 go on to more scalable cash coherent systems. So cash coherent systems that 8 00:00:41,696 --> 00:00:46,622 have more than let's say eight nodes. So we'll look at how to scale up to thousands 9 00:00:46,622 --> 00:00:51,369 of nodes, and we'll touch on one coherence protocol that, that works for that. And 10 00:00:51,369 --> 00:00:57,726 that's called directory based cash coherence. So we left of last time we were 11 00:00:57,726 --> 00:01:04,621 talking about flow control between two separate nodes in a in of action network. 12 00:01:04,621 --> 00:01:10,118 And we talked about sort of local link based or hop based full control which is 13 00:01:10,118 --> 00:01:15,478 where we spend the end of last class talking about. We also mentioned this end 14 00:01:15,478 --> 00:01:21,112 to end full control and end to end full control is important a good example of 15 00:01:21,112 --> 00:01:27,268 this is something where you have a core which is trying to communicate to a memory 16 00:01:27,268 --> 00:01:32,002 controller. And you don't want to overrun the buffer in the memory controller cause 17 00:01:32,002 --> 00:01:36,538 if you overrun the buffer in the memory controller your memory transactions just 18 00:01:36,538 --> 00:01:41,058 drop on the floor. so it's possible that your network connection is. Link level 19 00:01:41,058 --> 00:01:46,370 flow controlled or hop based flow controlled. But, you still need a end to 20 00:01:46,370 --> 00:01:51,892 end flow control inside of your chip or your set of chips in your system to be 21 00:01:51,892 --> 00:01:57,693 able to prevent you from overrunning some other buffer that's farther away. Now you 22 00:01:57,693 --> 00:02:03,004 could, for instance, back up into the network and have the local flow control 23 00:02:03,004 --> 00:02:08,898 all the way, back up all the way to the core. You may now want to do that for a 24 00:02:08,898 --> 00:02:13,388 variety of reasons. One. If you look at these memory protocols very carefully you 25 00:02:13,388 --> 00:02:17,231 could end up with something that actually starts to look like a deadlock pretty 26 00:02:17,231 --> 00:02:20,930 quick as you start to backup into networking get sort of priorities mixed. 27 00:02:20,930 --> 00:02:27,793 Also more, more insidiously here is that as you back up this is probably not good 28 00:02:27,793 --> 00:02:32,600 for performance you probably want to stem the flow of traffic as soon as you can 29 00:02:32,600 --> 00:02:37,347 because if you start jamming more data in there your just going to increase the 30 00:02:37,347 --> 00:02:41,296 contention on your network. And the latency will shoot through the roof on 31 00:02:41,296 --> 00:02:45,290 your network and all of a sudden you're, you're sort of in a very poor operating 32 00:02:45,290 --> 00:02:49,484 regime. So it's probably better just to preemptively back off and not overrun the 33 00:02:49,484 --> 00:02:53,678 buffers that are far away. So, you have to worry about end to end flow control. and 34 00:02:53,678 --> 00:02:57,323 this, this, there's lots of different schemes for this. Probably one of the 35 00:02:57,323 --> 00:03:01,566 better ones is that you send some data and you wait for acknowledgments to come back 36 00:03:01,566 --> 00:03:05,960 and you count your acknowledgments and this is effectively some credit based flow 37 00:03:05,960 --> 00:03:14,016 control. We talked a little bit about different ways to flow control link level. 38 00:03:14,016 --> 00:03:18,693 So just to recall here we had one Q, another Q and some link in the middle. 39 00:03:18,693 --> 00:03:23,434 This link may be pipelined. And we sent data this way. And at some point the 40 00:03:23,434 --> 00:03:28,490 receiver says oh, I can't take any more data. So it sends a stall wire. But if you 41 00:03:28,490 --> 00:03:33,357 do this around your entire chip, where it's all combinational. Where all these 42 00:03:33,357 --> 00:03:38,730 little blobs here are combinational logic. You're critical path gets very long so you 43 00:03:38,730 --> 00:03:43,850 can start to think about trying to put registers on this path. Unfortunately when 44 00:03:43,850 --> 00:03:50,037 you do that all of a sudden. This FIFO and this register can't react in time, if 45 00:03:50,037 --> 00:03:54,862 they're, a stall signal comes back. So if a stall signal is asserted, it's going to 46 00:03:55,265 --> 00:04:00,626 send the data no matter what. It takes a cycle for that to show up so you end up 47 00:04:00,626 --> 00:04:06,121 with something where you need to queue this last piece of data here into a buffer 48 00:04:06,121 --> 00:04:11,214 because this stall is not seen until a cycle later. And this is, we call this 49 00:04:11,214 --> 00:04:15,523 skid buffering. And you can have similar sorts of things where if you have let's 50 00:04:15,523 --> 00:04:19,358 say a flip flop here but you don't feed into this register you might need multiple 51 00:04:19,358 --> 00:04:24,506 entries of skid buffering. Now, if you have the wrong number of buffers here on 52 00:04:24,506 --> 00:04:29,134 the receiver in your skid buffering what's going to happen is you actually end up 53 00:04:29,134 --> 00:04:33,821 dropping data. So if you your protocol mean lets say two buffers and instead you 54 00:04:33,821 --> 00:04:38,860 put one buffer and you assert the storm as data is trying to transmit across the link 55 00:04:38,860 --> 00:04:43,254 of that time. You're going to loose a piece of data and that's, that's not very 56 00:04:43,254 --> 00:04:48,851 desirable. So this brings us to the end of what we were talking about last time which 57 00:04:48,851 --> 00:04:53,280 was credit based flow control and credit based flow control instead of having a 58 00:04:53,280 --> 00:04:57,377 stop signal or a on off flow control signal coming back or a stall signal 59 00:04:57,377 --> 00:05:03,648 instead, you keep a counter at the sender side, which keeps track of how many 60 00:05:03,648 --> 00:05:08,421 entries there are over here in the receiver side. And this can take into 61 00:05:08,421 --> 00:05:13,858 account you know thi-, this register here doesn't get counted it's, it's the end 62 00:05:13,858 --> 00:05:20,193 point FIFO space that will back up and the data can be stored into. So when it starts 63 00:05:20,193 --> 00:05:25,115 out, you, in, you, you set the counter if you want full band with you to be the same 64 00:05:25,115 --> 00:05:29,976 number as entry you had in the receiver, you just have to send data. Whenever you 65 00:05:29,976 --> 00:05:34,777 send the word, you decomate your counter. When the counter reaches zero, you stop 66 00:05:34,777 --> 00:05:41,625 sending because you know that. All of these buff all of the round-trip latency 67 00:05:41,625 --> 00:05:46,487 here of the, the data and the responses coming back, or the credits coming back. 68 00:05:46,487 --> 00:05:51,536 If the stall signal were to be asserted, or if you were not to get back credit in 69 00:05:51,536 --> 00:05:57,822 the instantaneous cycle you would need all those entries to skid into. When a word 70 00:05:57,822 --> 00:06:02,833 gets read out of this buffer here or out of this fifo here you send back a credit 71 00:06:02,833 --> 00:06:07,784 and this will increment your counter. And depending on how you implement this you 72 00:06:07,784 --> 00:06:12,490 could have multiple flip flops here multiple flip flops there. And really all 73 00:06:12,490 --> 00:06:17,502 this really ends up doing is it ends up figuring out your credit loop and how big 74 00:06:17,502 --> 00:06:23,140 this counter needs to be. One other nice benefit of this credit based flow control 75 00:06:23,140 --> 00:06:28,622 system is you can actually size the credit counter different then the number of 76 00:06:28,622 --> 00:06:33,415 actual entries. Now, why would you want to do this? Well, one reason is, you could 77 00:06:33,415 --> 00:06:38,416 actually build a network which has, only, let's say, half the bandwidth. By reducing 78 00:06:38,416 --> 00:06:42,861 the number of entries over here, and reducing the credit counter. Now, the 79 00:06:42,861 --> 00:06:47,615 round trip latency is longer. So then, the number of credits that you can have 80 00:06:47,615 --> 00:06:52,368 outstanding so what's going to happen is, you're going to send some data. And you're 81 00:06:52,368 --> 00:06:56,576 going to stall early, wait for some credits to come back and then start 82 00:06:56,576 --> 00:07:01,450 sending more data. So you can effectively give less than ideal bandwidth of cost of 83 00:07:01,450 --> 00:07:06,503 link but you can do of less offer space on the receive side and this is a lot better 84 00:07:06,503 --> 00:07:11,437 than the other an off base for control where if you don't have the right number 85 00:07:11,437 --> 00:07:16,431 of buffers. You actually end up using data so its like incorrect design. Here's is a 86 00:07:16,431 --> 00:07:17,620 performance concern.