1
00:00:03,200 --> 00:00:10,170
So today we're going to start off and it
is our final installment of ELE475. We

2
00:00:10,170 --> 00:00:15,636
have to cover all of the rest of computer
architecture in this one lecture. So,

3
00:00:15,636 --> 00:00:21,312
there's a lot to cover. A lot of things to
discuss. But, more seriously, today, we

4
00:00:21,312 --> 00:00:26,427
are going to, going to be finishing up
what we were talking about, with

5
00:00:26,427 --> 00:00:31,823
interconnection networks. Mainly, credit
based flow control. A little bit about

6
00:00:31,823 --> 00:00:37,149
deadlock and that will complete our
interconnection networks. And then we'll

7
00:00:37,149 --> 00:00:41,696
go on to more scalable cash coherent
systems. So cash coherent systems that

8
00:00:41,696 --> 00:00:46,622
have more than let's say eight nodes. So
we'll look at how to scale up to thousands

9
00:00:46,622 --> 00:00:51,369
of nodes, and we'll touch on one coherence
protocol that, that works for that. And

10
00:00:51,369 --> 00:00:57,726
that's called directory based cash
coherence. So we left of last time we were

11
00:00:57,726 --> 00:01:04,621
talking about flow control between two
separate nodes in a in of action network.

12
00:01:04,621 --> 00:01:10,118
And we talked about sort of local link
based or hop based full control which is

13
00:01:10,118 --> 00:01:15,478
where we spend the end of last class
talking about. We also mentioned this end

14
00:01:15,478 --> 00:01:21,112
to end full control and end to end full
control is important a good example of

15
00:01:21,112 --> 00:01:27,268
this is something where you have a core
which is trying to communicate to a memory

16
00:01:27,268 --> 00:01:32,002
controller. And you don't want to overrun
the buffer in the memory controller cause

17
00:01:32,002 --> 00:01:36,538
if you overrun the buffer in the memory
controller your memory transactions just

18
00:01:36,538 --> 00:01:41,058
drop on the floor. so it's possible that
your network connection is. Link level

19
00:01:41,058 --> 00:01:46,370
flow controlled or hop based flow
controlled. But, you still need a end to

20
00:01:46,370 --> 00:01:51,892
end flow control inside of your chip or
your set of chips in your system to be

21
00:01:51,892 --> 00:01:57,693
able to prevent you from overrunning some
other buffer that's farther away. Now you

22
00:01:57,693 --> 00:02:03,004
could, for instance, back up into the
network and have the local flow control

23
00:02:03,004 --> 00:02:08,898
all the way, back up all the way to the
core. You may now want to do that for a

24
00:02:08,898 --> 00:02:13,388
variety of reasons. One. If you look at
these memory protocols very carefully you

25
00:02:13,388 --> 00:02:17,231
could end up with something that actually
starts to look like a deadlock pretty

26
00:02:17,231 --> 00:02:20,930
quick as you start to backup into
networking get sort of priorities mixed.

27
00:02:20,930 --> 00:02:27,793
Also more, more insidiously here is that
as you back up this is probably not good

28
00:02:27,793 --> 00:02:32,600
for performance you probably want to stem
the flow of traffic as soon as you can

29
00:02:32,600 --> 00:02:37,347
because if you start jamming more data in
there your just going to increase the

30
00:02:37,347 --> 00:02:41,296
contention on your network. And the
latency will shoot through the roof on

31
00:02:41,296 --> 00:02:45,290
your network and all of a sudden you're,
you're sort of in a very poor operating

32
00:02:45,290 --> 00:02:49,484
regime. So it's probably better just to
preemptively back off and not overrun the

33
00:02:49,484 --> 00:02:53,678
buffers that are far away. So, you have to
worry about end to end flow control. and

34
00:02:53,678 --> 00:02:57,323
this, this, there's lots of different
schemes for this. Probably one of the

35
00:02:57,323 --> 00:03:01,566
better ones is that you send some data and
you wait for acknowledgments to come back

36
00:03:01,566 --> 00:03:05,960
and you count your acknowledgments and
this is effectively some credit based flow

37
00:03:05,960 --> 00:03:14,016
control. We talked a little bit about
different ways to flow control link level.

38
00:03:14,016 --> 00:03:18,693
So just to recall here we had one Q,
another Q and some link in the middle.

39
00:03:18,693 --> 00:03:23,434
This link may be pipelined. And we sent
data this way. And at some point the

40
00:03:23,434 --> 00:03:28,490
receiver says oh, I can't take any more
data. So it sends a stall wire. But if you

41
00:03:28,490 --> 00:03:33,357
do this around your entire chip, where
it's all combinational. Where all these

42
00:03:33,357 --> 00:03:38,730
little blobs here are combinational logic.
You're critical path gets very long so you

43
00:03:38,730 --> 00:03:43,850
can start to think about trying to put
registers on this path. Unfortunately when

44
00:03:43,850 --> 00:03:50,037
you do that all of a sudden. This FIFO and
this register can't react in time, if

45
00:03:50,037 --> 00:03:54,862
they're, a stall signal comes back. So if
a stall signal is asserted, it's going to

46
00:03:55,265 --> 00:04:00,626
send the data no matter what. It takes a
cycle for that to show up so you end up

47
00:04:00,626 --> 00:04:06,121
with something where you need to queue
this last piece of data here into a buffer

48
00:04:06,121 --> 00:04:11,214
because this stall is not seen until a
cycle later. And this is, we call this

49
00:04:11,214 --> 00:04:15,523
skid buffering. And you can have similar
sorts of things where if you have let's

50
00:04:15,523 --> 00:04:19,358
say a flip flop here but you don't feed
into this register you might need multiple

51
00:04:19,358 --> 00:04:24,506
entries of skid buffering. Now, if you
have the wrong number of buffers here on

52
00:04:24,506 --> 00:04:29,134
the receiver in your skid buffering what's
going to happen is you actually end up

53
00:04:29,134 --> 00:04:33,821
dropping data. So if you your protocol
mean lets say two buffers and instead you

54
00:04:33,821 --> 00:04:38,860
put one buffer and you assert the storm as
data is trying to transmit across the link

55
00:04:38,860 --> 00:04:43,254
of that time. You're going to loose a
piece of data and that's, that's not very

56
00:04:43,254 --> 00:04:48,851
desirable. So this brings us to the end of
what we were talking about last time which

57
00:04:48,851 --> 00:04:53,280
was credit based flow control and credit
based flow control instead of having a

58
00:04:53,280 --> 00:04:57,377
stop signal or a on off flow control
signal coming back or a stall signal

59
00:04:57,377 --> 00:05:03,648
instead, you keep a counter at the sender
side, which keeps track of how many

60
00:05:03,648 --> 00:05:08,421
entries there are over here in the
receiver side. And this can take into

61
00:05:08,421 --> 00:05:13,858
account you know thi-, this register here
doesn't get counted it's, it's the end

62
00:05:13,858 --> 00:05:20,193
point FIFO space that will back up and the
data can be stored into. So when it starts

63
00:05:20,193 --> 00:05:25,115
out, you, in, you, you set the counter if
you want full band with you to be the same

64
00:05:25,115 --> 00:05:29,976
number as entry you had in the receiver,
you just have to send data. Whenever you

65
00:05:29,976 --> 00:05:34,777
send the word, you decomate your counter.
When the counter reaches zero, you stop

66
00:05:34,777 --> 00:05:41,625
sending because you know that. All of
these buff all of the round-trip latency

67
00:05:41,625 --> 00:05:46,487
here of the, the data and the responses
coming back, or the credits coming back.

68
00:05:46,487 --> 00:05:51,536
If the stall signal were to be asserted,
or if you were not to get back credit in

69
00:05:51,536 --> 00:05:57,822
the instantaneous cycle you would need all
those entries to skid into. When a word

70
00:05:57,822 --> 00:06:02,833
gets read out of this buffer here or out
of this fifo here you send back a credit

71
00:06:02,833 --> 00:06:07,784
and this will increment your counter. And
depending on how you implement this you

72
00:06:07,784 --> 00:06:12,490
could have multiple flip flops here
multiple flip flops there. And really all

73
00:06:12,490 --> 00:06:17,502
this really ends up doing is it ends up
figuring out your credit loop and how big

74
00:06:17,502 --> 00:06:23,140
this counter needs to be. One other nice
benefit of this credit based flow control

75
00:06:23,140 --> 00:06:28,622
system is you can actually size the credit
counter different then the number of

76
00:06:28,622 --> 00:06:33,415
actual entries. Now, why would you want to
do this? Well, one reason is, you could

77
00:06:33,415 --> 00:06:38,416
actually build a network which has, only,
let's say, half the bandwidth. By reducing

78
00:06:38,416 --> 00:06:42,861
the number of entries over here, and
reducing the credit counter. Now, the

79
00:06:42,861 --> 00:06:47,615
round trip latency is longer. So then, the
number of credits that you can have

80
00:06:47,615 --> 00:06:52,368
outstanding so what's going to happen is,
you're going to send some data. And you're

81
00:06:52,368 --> 00:06:56,576
going to stall early, wait for some
credits to come back and then start

82
00:06:56,576 --> 00:07:01,450
sending more data. So you can effectively
give less than ideal bandwidth of cost of

83
00:07:01,450 --> 00:07:06,503
link but you can do of less offer space on
the receive side and this is a lot better

84
00:07:06,503 --> 00:07:11,437
than the other an off base for control
where if you don't have the right number

85
00:07:11,437 --> 00:07:16,431
of buffers. You actually end up using data
so its like incorrect design. Here's is a

86
00:07:16,431 --> 00:07:17,620
performance concern.