1 00:00:04,180 --> 00:00:09,086 Okay. So now, let's briefly touch on routing protocols. There's lots of 2 00:00:09,086 --> 00:00:14,930 different routing protocols, protocols out there. And roughly let's, let's split them 3 00:00:14,930 --> 00:00:20,774 into two different categories. Oblivious routing protocols where the routing path 4 00:00:20,774 --> 00:00:26,820 through the network is independent of the state of the network. And the second one 5 00:00:26,820 --> 00:00:31,456 is adaptive routing. So these are sort of smarter networks. They look for they 6 00:00:31,456 --> 00:00:36,208 somehow look at the state of the network, and make some decisions. And if you build 7 00:00:36,208 --> 00:00:40,670 a good adaptive routing network, what you're going to try and do is look for the 8 00:00:40,670 --> 00:00:45,131 hot spots in the routing network, and route around them. And this is where that 9 00:00:45,131 --> 00:00:50,529 path independence, or multi path networks comm, commit. If you only have one route 10 00:00:50,529 --> 00:00:54,922 from any point to any other given point, the routing is basically deterministic. 11 00:00:54,922 --> 00:00:59,370 You can't, can't go route in the wrong, you can't go route someplace else because 12 00:00:59,370 --> 00:01:03,487 you'll never be able to get to your destination. Where this becomes 13 00:01:03,487 --> 00:01:07,478 interesting is if you, if you look at just the basic mesh network. Our basic mesh 14 00:01:07,478 --> 00:01:11,518 network has made it from points between, or many different routes between any two 15 00:01:11,518 --> 00:01:15,508 points. So, if we have let's say bre, four by four grid you can just sort of go 16 00:01:15,508 --> 00:01:19,698 across the top and then along the side, let's say if you're going from the top to 17 00:01:19,698 --> 00:01:23,988 the bottom or you can sort of zigzag along the middle or go the other way. Go Y and 18 00:01:23,988 --> 00:01:28,502 then X, or X and then Y, just lots of different choices along, along there. So 19 00:01:28,502 --> 00:01:35,049 that's where we start to get to decide about different routing networks. under 20 00:01:35,049 --> 00:01:40,868 oblivious newtorks, there's, there's two sort of sub points here. One is 21 00:01:40,868 --> 00:01:45,613 deterministic which basically says, do the same thing all the time, versus a 22 00:01:45,613 --> 00:01:50,293 non-deterministic design. And, you might say, what is, what is a non-deterministic 23 00:01:50,293 --> 00:01:54,973 oblivious routing algorithm? Well, it's typically a randomized routing algorithm. 24 00:01:54,973 --> 00:01:59,890 And these can actually do slightly better than the deterministic oblivious routing 25 00:01:59,890 --> 00:02:04,806 algorithms. So let's take an example of this non-deterministic, but still doesn't 26 00:02:04,806 --> 00:02:09,190 look at the state of the network. So, let's, an example of this is something 27 00:02:09,190 --> 00:02:14,311 where you, let's say, choose some node in the system at random and you route there 28 00:02:14,311 --> 00:02:19,660 first. And then, once you've routed there, you route X and then Y to the destination. 29 00:02:19,940 --> 00:02:23,834 I would say, why would you want to do this? Well, it's actually sort of scatters 30 00:02:23,834 --> 00:02:27,882 your traffic around the network and actually route in different directions and 31 00:02:27,882 --> 00:02:32,084 potentially could increase the delivered aggregate bandwidth through your network 32 00:02:32,084 --> 00:02:36,030 by doing this randomized choice at the beginning where you choose some other 33 00:02:36,030 --> 00:02:40,027 node. Which might be routing a wrong, the wrong direction first, but then you'll 34 00:02:40,027 --> 00:02:44,126 have more aggregate balance, let's say to a destination. So, you can actually have 35 00:02:44,126 --> 00:02:50,106 non-deterministic oblivious routing. Adaptive routing, I'm going to sort of not 36 00:02:50,106 --> 00:02:56,412 talk about in this class because there's a, it's a very active research area and I 37 00:02:56,412 --> 00:03:02,049 recommend you take 588 if you want to learn more about that because that's, 38 00:03:02,049 --> 00:03:08,852 that's a very broad topic where you try to look for congestions network and route 39 00:03:08,852 --> 00:03:14,074 around the congestion. And if you try to do that, you have to make sure you're 40 00:03:14,074 --> 00:03:20,202 network doesn't deadlock or have other bad conditions that happen. And to some extend 41 00:03:20,202 --> 00:03:26,260 usually these networks trade off let's say ease of analysis at least for performance. 42 00:03:27,444 --> 00:03:35,558 I wanted to talk about one quick deterministic oblivious routing algorithm 43 00:03:35,558 --> 00:04:18,920 on a mesh. So let's, let's draw a mesh here. Okay. So, we have our sixteen node, 44 00:04:19,560 --> 00:04:36,313 four array, two cube here. And we wan to route let's say from A to B. The most 45 00:04:36,313 --> 00:04:46,157 basic, or one of the more basic routing algorithms that we're going to be looking 46 00:04:46,157 --> 00:04:55,271 at here we're going to call dimension ordered routing. , And it's called 47 00:04:55,271 --> 00:05:05,401 dimension ordered routing because there's actually two dimensions here. We'll label 48 00:05:05,401 --> 00:05:12,420 this as let's say X increasing, we'll label this as Y increasing. And, and 49 00:05:12,420 --> 00:05:17,447 dimension order routing says that we order the dimensions in which we route. So, we 50 00:05:17,447 --> 00:05:22,106 always route it in one of the dimensions before we go to route in the other 51 00:05:22,106 --> 00:05:30,008 dimension. So, if we want to get from here to here, and let's say we hav e X then Y 52 00:05:30,008 --> 00:05:37,652 is our routing, we're going to route this way, from A to B first. So, now we've 53 00:05:37,652 --> 00:05:46,840 matched our X location to where B is, and then we're going to route down. So, this 54 00:05:46,840 --> 00:05:52,222 is X and Y. This is oblivious. It doesn't matter about the traffic. And it has some 55 00:05:52,222 --> 00:05:59,097 bad problems that show up if you do this. If we have let's say C and D, and it's 56 00:05:59,097 --> 00:06:04,382 trying to route from C to D, it needs to share this link with traffic going from A 57 00:06:04,382 --> 00:06:08,734 to B. And it can't route around it. It can't like, I don't use those links or 58 00:06:08,734 --> 00:06:13,143 something else. It's just not allowed in our oblivious routing algorithm, and 59 00:06:13,143 --> 00:06:17,668 definitely not allowed in our dimension ordered. Now, dimension ordered, there's 60 00:06:17,668 --> 00:06:22,251 networks that are, let's say, X and then Y and you can also have, let's say, Y and 61 00:06:22,251 --> 00:06:27,879 then X. You can, this also generalizes to three dimensional n-dimensional sort of 62 00:06:27,879 --> 00:06:33,189 spaces. You do, let's say X, then Y, then Z. Or W, then X, then Y and then Z or 63 00:06:33,189 --> 00:06:43,096 something like that in a four dimensional cube. But one of the good properties of 64 00:06:43,096 --> 00:06:48,543 dimension order routing is it's not going to deadlock. And we'll talk about that in 65 00:06:48,543 --> 00:06:53,858 a few slides. But the network itself is never going to deadlock. That doesn't mean 66 00:06:53,858 --> 00:06:59,174 you can't get message dependent deadlock. But if you wormhole through the network 67 00:06:59,174 --> 00:07:04,752 here, you can actually come up with a proof that shows that you will not ever 68 00:07:04,752 --> 00:07:10,068 deadlock this network. And the, the rough sketch of the proof look goes like this. 69 00:07:10,593 --> 00:07:19,560 By only using X links first and then using Y links, you only ever acquire an X link, 70 00:07:19,840 --> 00:07:25,583 and then acquire a Y link, and then release an X link, and then release a Y 71 00:07:25,583 --> 00:07:31,740 link. So, you'll never actually grab a resource that you have needed, you have 72 00:07:31,740 --> 00:07:38,070 needed before. And everyone else is doing that same algorithm so they will never try 73 00:07:38,070 --> 00:07:42,595 to grab a resource in order that is, is, is faulty also. So, if we go back to our 74 00:07:42,595 --> 00:07:47,421 sort of dining philosophers problem here, you're guaranteed that you're going to 75 00:07:47,421 --> 00:07:52,067 acquire resource one and then acquire resource two. And everyone's doing that 76 00:07:52,067 --> 00:07:57,134 same algorithm so they will always acquire resource one or block, waiting to acquir e 77 00:07:57,134 --> 00:08:01,901 resource one and then you acquire resource two. But you'll never see a cycle. So, 78 00:08:01,901 --> 00:08:06,609 because of that, you'll next acquire X, and then Y, and then X in this dimension 79 00:08:06,609 --> 00:08:11,499 order of routing. So, you'll never get a cycle in your dependency graph. That's the 80 00:08:11,499 --> 00:08:15,725 rough sketch. And the, the, the full sketch is a little more complicated. 81 00:08:15,725 --> 00:08:20,434 Usually, you have these two dimensional mesh networks without end around, or 82 00:08:20,434 --> 00:08:25,203 without a taurus. You have to sort of prove it in this direction and in that 83 00:08:25,203 --> 00:08:29,368 direction. And then, sort of it, decomposes into sort of X from west to 84 00:08:29,368 --> 00:08:34,137 east, and X from east to west, and vice versa. But, that's the rough sketch of the 85 00:08:34,137 --> 00:08:39,293 proof. But, I just want to get across that. It's a basic routing algorithm, but 86 00:08:39,293 --> 00:08:42,911 there's lots of other choices here. as I sort of said there was this 87 00:08:42,911 --> 00:08:46,834 non-deterministic one, there's adaptive routing algorithms. So, a good adaptive 88 00:08:46,834 --> 00:08:50,605 one, sometimes called hot potato routing, is that if you, you don't have any 89 00:08:50,605 --> 00:08:54,681 buffering in the network, so you can't buffer along the way. If you get to a node 90 00:08:54,681 --> 00:08:59,012 which has a link which is congested, you just choose randomly some other direction, 91 00:08:59,012 --> 00:09:04,018 and you route that direction. And the, the thinking here is that some point you will 92 00:09:04,018 --> 00:09:08,684 actually be able to choose again and route towards the right direction and slowly 93 00:09:08,684 --> 00:09:13,179 you'll move your, your traffic or your packet will move to the end point. But, it 94 00:09:13,179 --> 00:09:17,959 allows you to sort of go around traffic congestion points. But, I don't want to go 95 00:09:17,959 --> 00:09:23,088 into too much detail on this because this is a whole lecture in itself. Okay. So, 96 00:09:23,088 --> 00:09:30,771 let's talk about flow control. This is a aspect of networking that I really enjoy, 97 00:09:30,771 --> 00:09:37,820 actually. So, flow control is making sure that you don't drop data. Making sure that 98 00:09:37,820 --> 00:09:43,349 you have back pressure in your networks so that you don't lose, lose data in your 99 00:09:43,349 --> 00:09:48,946 network. And, this flow control can either be local on a link by link or hop by hop 100 00:09:48,946 --> 00:09:54,457 basis, or it could be round trip. So a good example of round-trip flow control is 101 00:09:54,457 --> 00:09:59,347 going back to something like TCP/IP. You actually want to rate-limit the round-trip 102 00:09:59,347 --> 00:10:03,879 flow control so you don't fl ood the network if a given link is, is too small. 103 00:10:03,879 --> 00:10:08,768 So there's actually a protocol in there which will rate-limit the round-trip flow 104 00:10:08,768 --> 00:10:13,300 control. Or a good example, actually, on a on-chip network, so something like a 105 00:10:13,300 --> 00:10:18,310 mini-core, is if you're trying to guarantee that you do not overrun a buffer 106 00:10:18,310 --> 00:10:23,152 in the memory controller. But lots of different nodes are trying to route to the 107 00:10:23,152 --> 00:10:28,213 memory controller. You can actually have some sort of counter, which effectively is 108 00:10:28,213 --> 00:10:32,493 a round trip flow control to the far away end point node and say, okay, you are 109 00:10:32,493 --> 00:10:36,993 allowed to have five outstanding memory transactions to the memory controller. But 110 00:10:36,993 --> 00:10:43,040 if you try to send a sixth, you need to wait for the app to come back. So, this is 111 00:10:43,640 --> 00:10:48,075 a end to end or long distance flow control, but each of the hops along 112 00:10:48,075 --> 00:10:53,272 network themselves may have localized full control also. So, you can actually layer 113 00:10:53,272 --> 00:10:59,039 these two things together. And typically, sort of, for on chip networks, we would 114 00:10:59,039 --> 00:11:04,115 like some properties and even in computer system networks, we typically like the 115 00:11:04,115 --> 00:11:09,190 networks to be resilient. so we don't want sort of data to be dropped. But that's 116 00:11:09,190 --> 00:11:14,388 not, not necessarily guaranteed. People have built networks where links can drop, 117 00:11:14,388 --> 00:11:18,974 can drop data. So that's another option is, you can drop the data and try to 118 00:11:18,974 --> 00:11:23,500 retransmit it if you want reliable transmission from one point to another. 119 00:11:25,500 --> 00:11:34,097 Okay. So, let's look at a quick example here of flow control in our networks. This 120 00:11:34,097 --> 00:11:40,567 is not in your handouts. so you just, pay attention. so the first type of flow 121 00:11:40,567 --> 00:11:45,852 control we're going to look at is basically a stall wire, or a big stop 122 00:11:45,852 --> 00:11:52,239 sign. So here or is what we have, is we have a sender, with a Q. So Q sub sender 123 00:11:52,239 --> 00:11:58,853 and Q sub R, or Q sub receiver. And we've named sort of all the points along here. 124 00:11:59,078 --> 00:12:04,714 let's say, in this network we have a register in between the sender and the 125 00:12:04,714 --> 00:12:10,801 receiver and this is in our link. This is sort of our link cost. Takes a cycle and 126 00:12:10,801 --> 00:12:22,096 we pipeline the wire. Okay. So now, data arrives here into D, which is our sort of 127 00:12:22,096 --> 00:12:28,194 receive point into the receiver. And at some point it decides, I can't ta ke any 128 00:12:28,194 --> 00:12:34,136 more data. I can't, can't read that data. I need to go process some of it. So, I 129 00:12:34,136 --> 00:12:40,069 need to stall the network. So it says, stop. So it says stop here. And in a 130 00:12:40,069 --> 00:12:45,138 perfect world, this goes through some combinational logic. Just completely, just 131 00:12:45,138 --> 00:12:50,401 like in your pipeline stalls this point in the pipe stalls. This flip flop stalls 132 00:12:50,401 --> 00:12:55,340 that Q, stalls the, going this way. Stalls A, stalls the sender, tells it not to 133 00:12:55,340 --> 00:13:03,472 actually send any more data. So it's a stall. We stop in D. C stops, B stops, A 134 00:13:03,472 --> 00:13:09,328 stops instantaneously. At some point, D unstalls because it's going to go read 135 00:13:09,328 --> 00:13:13,889 data more again and the network keep flowing again. So, sorry. What this is 136 00:13:13,889 --> 00:13:18,574 showing is, these are packets. These are not FITS. These are complete packets 137 00:13:18,574 --> 00:13:23,385 getting sent, or complete FLITS getting sent. This is our flow control unit. So 138 00:13:23,385 --> 00:13:28,320 we're looking at each one of these A, B's, and C's here, and D's are, are uniquely 139 00:13:28,320 --> 00:13:35,077 flow controllable units. So that's on/off flow control for combinational stall 140 00:13:35,077 --> 00:13:40,158 single. Now, the problem with this is, it took us a cycle to send data this wire 141 00:13:40,158 --> 00:13:45,500 because we had a flip flp going that way, but our stall wire runs all the way back. 142 00:13:46,444 --> 00:13:50,998 well, that's not, that's not good from a cycle time perspective. Cuz if you look at 143 00:13:50,998 --> 00:13:55,103 the combinational sort of path now, you're going to see well, maybe you have an 144 00:13:55,103 --> 00:13:59,600 output of this which says this data available that goes through some logic in 145 00:13:59,600 --> 00:14:03,873 D and that generates a stall signal. And then, has to propagate physically a long 146 00:14:03,873 --> 00:14:08,794 distance away and go into lots of other logic. And after having designed a couple 147 00:14:08,794 --> 00:14:11,882 of these sorts of networks, you're never going to be able to build that. That's, if 148 00:14:11,882 --> 00:14:15,649 you want to have a high performance network, that time is, is, is, is too 149 00:14:15,649 --> 00:14:20,914 long. Cuz if your cycle time is anywhere near the length of wire, of the delay of 150 00:14:20,914 --> 00:14:26,862 length wire you have, you're not going to be able to build them. Okay. So now, we 151 00:14:26,862 --> 00:14:31,472 start to look at, we, we switch to something and say, let's try to solve that 152 00:14:31,472 --> 00:14:38,811 and put a pipeline flop in here on the receive on the return path of the stall 153 00:14:38,811 --> 00:14:45,004 wire, of the stall signal. Okay. S o, what, what happens? Well, if we look at 154 00:14:45,004 --> 00:14:55,004 this for packet zero, one, two, three, at some point d decides to stall. But, that 155 00:14:55,004 --> 00:15:05,590 is not able to stop let's say, R from sending the cycle before. Because we now 156 00:15:05,590 --> 00:15:10,345 have a flip flop in this path. So, it doesn't know about that. Likewise, it 157 00:15:10,345 --> 00:15:17,251 doesn't know, it can't stop B and A from moving forward one cycle either. So, if we 158 00:15:17,251 --> 00:15:21,401 look at the pipeline diagram, what that really means is these arrows here, we 159 00:15:21,401 --> 00:15:25,823 can't stall this on a dime. We have to wait and we actually are not able to stall 160 00:15:25,823 --> 00:15:30,137 until the next packet. Likewise, we're not able to stall B and A until later. And 161 00:15:30,137 --> 00:15:33,905 this can even be worse if you have multiple pipeline registers going 162 00:15:33,905 --> 00:15:42,595 backwards here, which is pretty common. Now, the implication of this is, you were 163 00:15:42,595 --> 00:15:51,224 not able to stall C at all. So, like it or not, the data's coming this direction from 164 00:15:51,224 --> 00:15:59,947 our D, Q sub R. Huh. So, what do you go about doing at this point? So, this is a 165 00:15:59,947 --> 00:16:05,289 tough one. you could either drop the data. That's probably not very advantageous cuz 166 00:16:05,289 --> 00:16:10,072 you're just going to drop random data in your network. Networks like to actually 167 00:16:10,072 --> 00:16:18,758 deliver data. A better thing to do is to have extra buffering in Q sub R, and we'll 168 00:16:18,758 --> 00:16:25,628 call this skid buffering. So that the data, it's called skid buffering cuz it's 169 00:16:25,628 --> 00:16:30,167 kind of like in a car. When you hit the stop pedal or you hit the, hit the, the 170 00:16:30,167 --> 00:16:34,888 brake pedal, you need some runway to, to skid into before you can actually stop. 171 00:16:34,888 --> 00:16:39,851 So, you hit the brake, and you're a sort of skid to a stop and you need that, that 172 00:16:39,851 --> 00:16:45,117 extra area. So, we budget that into this Q here and you can actually build a network 173 00:16:45,298 --> 00:16:51,518 build a flow control system where you have extra space in Q sub R to skid into. And 174 00:16:51,518 --> 00:17:02,279 then when you unstall, you'll see actually that there's basically that should be a D, 175 00:17:02,279 --> 00:17:06,178 sorry. it takes a cycle to unstall also. So that's, that's kind of not good. 176 00:17:06,178 --> 00:17:10,447 Conveniently, you have something to feed at that point because we have that extra 177 00:17:10,447 --> 00:17:14,768 value in Q sub R. But there are cases that you can come up to with, where you stall 178 00:17:14,768 --> 00:17:18,772 and then not stall. And then stall and then not s tall. Where you can actually 179 00:17:18,772 --> 00:17:23,041 accidentally introduce bubbles with this type of flow control. And I don't have a, 180 00:17:23,041 --> 00:17:27,204 a diagram of that here. But that would show up as basically a bubble throughout 181 00:17:27,204 --> 00:17:31,419 the entire pipe and no data being read. But that's, you can construct that with a 182 00:17:31,419 --> 00:17:37,408 sort of on/off flow control pipe line flow control. So, you could also think about 183 00:17:37,408 --> 00:17:42,375 having on/off flow control of partial pipeline stall control. So what I mean by 184 00:17:42,375 --> 00:17:47,530 that is, instead of feeding up into this register here, now we need two skid buffer 185 00:17:47,530 --> 00:17:52,622 entries over there cuz we can't stop B flowing into here or C flowing into there. 186 00:17:52,622 --> 00:17:57,840 So we can't stop any of this stuff and we need two entries now in that skid buffer. 187 00:17:59,440 --> 00:18:05,104 And we're running out of time, but I wanted to try to finish this flow control 188 00:18:05,104 --> 00:18:10,566 notion up here. A good solution to this is called Credit-Based flow control. So, in 189 00:18:10,566 --> 00:18:18,337 credit-based flow control, you put a counter in the center and the counter 190 00:18:18,337 --> 00:18:25,972 counts how many entries there are in the receive FIFO. And what it does is it 191 00:18:25,972 --> 00:18:33,484 starts sending data. And whenever data is removed from this FIFO, a credit gets sent 192 00:18:33,484 --> 00:18:38,848 back. And it can be, if, like it can be pipelines multiple levels deep here on the 193 00:18:38,848 --> 00:18:44,003 return, point. When the credit comes back, you increment the counter. But when you 194 00:18:44,003 --> 00:18:48,723 send a word, you decrement the counter. And if the counter ever reaches zero, you 195 00:18:48,723 --> 00:18:56,120 stop sending. This is actually ends up being superior than this order design 196 00:18:56,120 --> 00:19:01,135 because you don't have funny stutters on the receive side when you go to build a 197 00:19:01,135 --> 00:19:06,335 credit-based flow control system. That's a little bit beyond, showing that proof is a 198 00:19:06,335 --> 00:19:11,722 little beyond this course. But I did want to say here that when you compute this, 199 00:19:11,722 --> 00:19:16,860 you need to compute how large the credit needs to be and how big this receive FIFO 200 00:19:16,860 --> 00:19:23,023 needs to be. And they need to be sized appropriately. If you size one of them 201 00:19:23,023 --> 00:19:29,294 wrong, in this case, if you size one of them wrong in the on/off flow control, 202 00:19:29,294 --> 00:19:34,346 you're actually going to end up losing data. If you skid into your skid buffer 203 00:19:34,346 --> 00:19:39,667 and you need two skid entry slots, and you only have one skid ent ry slot, you're 204 00:19:39,667 --> 00:19:44,584 going to lose data. In the credit-based flow control system, the counter just 205 00:19:44,584 --> 00:19:49,544 reaches zero. So, just impact your bandwidth, but you're not going to lose 206 00:19:49,544 --> 00:19:54,664 data. Now, having said that, you should, if you want high performance through this 207 00:19:54,664 --> 00:19:59,590 network, you should size this and this counter to be the same. And you should 208 00:19:59,590 --> 00:20:03,868 size it big to take into account the entire round trip latency of this, this 209 00:20:04,192 --> 00:20:11,296 communication. lets stop here for today. I'll briefly finish up deadlock next time. 210 00:20:11,296 --> 00:20:17,219 And then, we'll move on to building large mini core systems or large multi-core 211 00:20:17,219 --> 00:20:21,120 systems, and coherence protocols for them next lecture.