1
00:00:02,540 --> 00:00:12,528
But before, before we do that, I wanted to
talk about programming and a competitor to

2
00:00:12,528 --> 00:00:22,161
using shared memory for programming. So,
so far, we have looked at a shared memory

3
00:00:22,161 --> 00:00:27,398
architecture to communicate from one core
to another core. So, what does that means

4
00:00:27,398 --> 00:00:32,252
is one core does a store to some memory
address. And sometime in the future,

5
00:00:32,252 --> 00:00:40,117
another core goes to read that memory
address and gets the data. What's

6
00:00:40,117 --> 00:00:48,727
interesting about that paradigm is the
sender of the data, the core which is

7
00:00:48,727 --> 00:00:53,446
writing the data or doing the store does
not need to know which core is going to

8
00:00:53,446 --> 00:01:01,382
read the data in the future. Or it's very
possible that no one will go read that

9
00:01:01,382 --> 00:01:09,049
data in the future. And in shared memory
architectures, that doesn't, that doesn't

10
00:01:09,049 --> 00:01:15,500
affect anything. We can actually have one
core writes a piece of data in the memory

11
00:01:15,500 --> 00:01:20,600
and some address and given this implicit
name, we effectively have some associative

12
00:01:20,600 --> 00:01:25,580
match where you can look up based on the
address, the piece of data and go find the

13
00:01:25,580 --> 00:01:30,440
piece of data sometime in the future. Now,
you probably need to use locking and to,

14
00:01:30,440 --> 00:01:35,120
to guaranteed causality between one
processor writing the data and another

15
00:01:35,120 --> 00:01:40,705
processor reading the data in shared
memory but you don't need to know the

16
00:01:40,705 --> 00:01:46,904
destination. In contrast, we're going to
talk about explicit message passing as a

17
00:01:46,904 --> 00:01:52,733
programming model. Now, there are lots of
hardware implementations, you can go

18
00:01:52,733 --> 00:01:58,945
implement this. But let's start off by
talking about exclusive messaging as a

19
00:01:58,945 --> 00:02:04,620
program model to communicate between
different threads, processes, or cores.

20
00:02:06,278 --> 00:02:11,077
And what we're going to have here is we're
going to add, we're going to, we're going

21
00:02:10,688 --> 00:02:20,644
to have an API or a, a programming
interface here of send, where it names the

22
00:02:20,644 --> 00:02:30,571
destination, and passes into it, a pointer
to some data. And in our messaging API,

23
00:02:30,571 --> 00:02:38,415
this is going to take this data and
somehow get it to the receiver or the

24
00:02:38,415 --> 00:02:48,122
destination. On the receive side, the most
basic thing they can do is you can receive

25
00:02:48,122 --> 00:02:56,904
data. Note this receive here does not
denote who it's receiving data from. So, a

26
00:02:56,904 --> 00:03:03,129
common of thing you can do is just receive
in order, receive the next piece of data

27
00:03:03,129 --> 00:03:09,124
that it gets in. Now, that may not be
super useful so you may want to extend

28
00:03:09,124 --> 00:03:15,196
this to having the receive actually take
a, a source also. But it's not required.

29
00:03:15,196 --> 00:03:21,421
There are the reason I'm, I'm talking
about this in this abstract fashion here

30
00:03:21,421 --> 00:03:28,242
is there's many different explicit message
passing programming interfaces because

31
00:03:28,242 --> 00:03:33,955
it's just software that people have
implemented over, over the years. but at

32
00:03:33,955 --> 00:03:39,669
the, at the fundamental level, there's a
send and a receive. And you could think

33
00:03:39,669 --> 00:03:45,510
about having the send, or the send
requires a destination and receive can

34
00:03:45,510 --> 00:03:50,260
just receive of, of, basically the input
queue or it would try to receive from a

35
00:03:50,260 --> 00:03:55,581
pre-recorded source, or another way to do
this is what they do in, in MPI, which is

36
00:03:55,581 --> 00:03:59,762
called the public message passing
interface, which is a probably the most

37
00:03:59,762 --> 00:04:04,449
common programming interface for
messaging, is it won't take a source, but

38
00:04:04,449 --> 00:04:09,655
instead, it will take a tag, which is
effectively a number which says, what

39
00:04:09,655 --> 00:04:15,477
message type is the or not message type,
sorry, what, what traffic flow is it, you

40
00:04:15,477 --> 00:04:21,513
sort of name the flow. This is kind of a
tag is, is sort of a more generalized form

41
00:04:21,513 --> 00:04:28,190
of, in TCP/IP, they have ports, for
instance, right? And UDP has ports. Okay.

42
00:04:28,190 --> 00:04:32,662
So, questions so far about this basic
program model and why is it different than

43
00:04:32,662 --> 00:04:41,100
loads and stores through shared addresses.
Very different. They are jewels of each

44
00:04:41,100 --> 00:04:44,740
other. I'm going to tell you right now
that you, if you can implement one thing

45
00:04:44,740 --> 00:04:48,120
of shared memory, you can implement
messaging and vice versa, and there have

46
00:04:48,120 --> 00:04:51,716
been lots of fights over this, and at some
point or, or, the whole community realized

47
00:04:51,716 --> 00:04:55,140
that you can do either with either. You
can do any programming model, you can do

48
00:04:55,140 --> 00:05:14,016
one and the other. Okay. So, the default
message type we're going to talk about is

49
00:05:14,016 --> 00:05:27,299
unicast. So, that's one-to-one. If you
ascend to a destination, the destination

50
00:05:27,299 --> 00:05:27,299
is one other node or one other process, or
one other thread. The reason that I say,

51
00:05:27,299 --> 00:05:27,299
process, thread, and node is depending on
how API is structured. You might have, you

52
00:05:27,299 --> 00:05:27,299
could use a, a messaging, exclusive
message passing system on something like a

53
00:05:27,299 --> 00:05:27,299
uniprocessor. We have different threads or
different processes running and you are

54
00:05:27,299 --> 00:05:31,206
only messaging for one process to another
process even though it is one core. But

55
00:05:31,206 --> 00:05:35,590
there's a pretty natural extension of
that. You can take those two processes and

56
00:05:35,590 --> 00:05:40,599
put them on different cores, and have them
communicate using the same interface. We

57
00:05:40,599 --> 00:05:46,750
could also do this between two threads,
but that's less common. Okay. So, that, so

58
00:05:46,750 --> 00:05:52,315
unicast is one-to-one. Some messaging
networks and some messaging primitives

59
00:05:52,315 --> 00:05:58,319
will actually program models will have a
couple of other choices here. We can have

60
00:05:58,319 --> 00:06:03,738
multicast. So,- this is communication from
one node, but instead of having a

61
00:06:03,738 --> 00:06:09,522
destination here which denotes one other
location, it can denote a set of other

62
00:06:09,522 --> 00:06:14,910
people to communicate with. It is still
messaging except for destination, we've

63
00:06:14,910 --> 00:06:19,206
expanded our notion of destination here so
it can be a set, not just a single thing.

64
00:06:22,198 --> 00:06:29,035
We can also have broadcast, where you can
have one node, instead of having

65
00:06:29,035 --> 00:06:33,782
destination here, you can have some
magical flags which says, communicate this

66
00:06:33,782 --> 00:06:38,836
with every other node in the system or
every other process in the system or every

67
00:06:38,836 --> 00:06:43,891
other threat in the system or every other
process in that process group or thread

68
00:06:43,891 --> 00:06:51,346
group there is. There's ways to sort of
say that in these software protocols. just

69
00:06:51,346 --> 00:07:00,153
for a quick question here. Has anyone here
programmed an MPI? Okay. And once in a

70
00:07:00,153 --> 00:07:07,577
while, perform a programming. Good. So,
we'll talk briefly. I'm going to show a

71
00:07:07,577 --> 00:07:14,997
brief code example here of one process
communicating with another process via a

72
00:07:14,997 --> 00:07:21,444
very common messaging library called the
message passing interface or MPI for

73
00:07:21,444 --> 00:07:30,030
short. So, let's look at this piece of
code. It's a C code and what's happening

74
00:07:30,030 --> 00:07:37,239
here is, okay we start our program, we
have my ID. So, the, the program model of

75
00:07:37,239 --> 00:07:45,196
MPI is that it will actually start the
same process on multiple cores or multiple

76
00:07:45,196 --> 00:07:52,977
processes or multiple not, not, it's not
threads but multiple processes or multiple

77
00:07:52,977 --> 00:08:00,684
cores will sta rt up the exact same
program. So, this is sometimes called SPMD

78
00:08:00,684 --> 00:08:07,554
or single program multiple data program
model. SPMD. So, the SPMD model here, what

79
00:08:07,554 --> 00:08:13,922
you're going to notice is that actually
we're going to execute this program twice.

80
00:08:13,922 --> 00:08:20,644
One on the one core, and one on the second
core. Now, you're going to ,, well, if

81
00:08:20,644 --> 00:08:26,306
you're executing exact same program, are
they going to do the exact same thing?

82
00:08:26,306 --> 00:08:32,044
Well, let's take a look. Okay. So, we have
this thing called my ID. This is what

83
00:08:32,044 --> 00:08:39,394
tells us in what core we are. And there's
something here called numprocs, which is

84
00:08:39,394 --> 00:08:45,219
going to tell us how many processors there
are in the computer. And we fill these in.

85
00:08:45,219 --> 00:08:50,827
So, you see here, we start off by calling
MPI INT MPI INT sets up the message

86
00:08:50,827 --> 00:08:56,289
passing system from a software
perspective. Then we call MPI

87
00:08:56,289 --> 00:09:00,860
communication size with some magic
parameters. And it's going to fill in this

88
00:09:00,860 --> 00:09:05,550
field, which tells us how many processors
there are in this, in this MPI program.

89
00:09:05,550 --> 00:09:10,304
When you launch an MPI program, there's a
special MPI Start command that you will

90
00:09:10,304 --> 00:09:14,887
have to run, which will actually, is, it
takes a parameter of how many it takes

91
00:09:14,887 --> 00:09:19,699
parameter of programmer trying to run and
multiple processors that you are trying to

92
00:09:19,699 --> 00:09:24,282
run, and the number of processes that you
are trying to run it on. So, our program

93
00:09:24,282 --> 00:09:28,578
now can dynamically detect how many
processors were running on. It can also

94
00:09:28,578 --> 00:09:34,581
detect what's called our rank, which is
our ID. So now, we can actually detect

95
00:09:34,581 --> 00:09:41,075
before the first process or the second
process launched in a two-process system

96
00:09:41,075 --> 00:09:47,258
or which one we are, and will fill in my
ID with a rank. And this is a library

97
00:09:47,258 --> 00:09:51,895
which will figure this out for us. In
this, this MPI implementation. Okay. So,

98
00:09:51,895 --> 00:09:56,782
we are assert now. We say the number of
procs equals two. this is just to make

99
00:09:56,782 --> 00:10:02,045
sure we're not running on three processors
or ten processors or something like that.

100
00:10:02,045 --> 00:10:06,869
We want strictly two and, And not one, and
not so that will, that will fail if we get

101
00:10:06,869 --> 00:10:11,693
a problem there. Okay. So now, this is,
this is where we can do different things.

102
00:10:11,693 --> 00:10:16,078
So, single program multiple data. The
reason it's c alled this is we can

103
00:10:16,078 --> 00:10:20,839
actually make decisions based on our
processor ID or my ID here. So, we'll do

104
00:10:20,839 --> 00:10:25,600
something which says is my ID zero? Do
this else, do this other piece of code,

105
00:10:28,760 --> 00:10:34,260
okay? So, is everyone want to wage your
guess here what this program does so far?

106
00:10:38,720 --> 00:10:42,360
So, the one, one processor is going to be
executing this, the other processor is

107
00:10:42,360 --> 00:10:51,648
going to be executing that. First
processor executes a send. Let's, let's

108
00:10:51,648 --> 00:10:58,386
look at this first line here and see what
happens. This is the data we're trying to

109
00:10:58,386 --> 00:11:03,380
send so we're sending x, which is an
integer, which we load with the number

110
00:11:03,380 --> 00:11:14,611
475, which is our class, or course number.
And we send that with a particular tag.

111
00:11:14,611 --> 00:11:23,511
So, MPI is structured around this notion
of tags, which is how you can effectively

112
00:11:23,511 --> 00:11:34,250
connect up a sender and a receiver. Oh,
sorry. That's how you can connect up

113
00:11:34,250 --> 00:11:39,656
multi, multiple pieces of traffic between
senders and receivers. The other way that

114
00:11:39,656 --> 00:11:45,127
you figure out is by looking at, there's
a, a number of who you're sending and

115
00:11:45,127 --> 00:11:51,012
receiving to, so this is the core number.
So, what this says here is send x to rank

116
00:11:51,012 --> 00:11:57,785
number one with tag, tag, we're going to
say tag is 475 also, and this MPI rest of

117
00:11:57,785 --> 00:12:04,941
this stuff here, don't worry about. And
this is, this is a length. So, this is

118
00:12:04,941 --> 00:12:10,832
going to say, send one word of size
integer to processor one with a particular

119
00:12:10,832 --> 00:12:17,429
tag. And this MPI COMM WORLD basically is
extra flags you can pass there which says,

120
00:12:17,429 --> 00:12:23,712
do I want to send to ev, all other MPI sub
processes or you can make subgroups?

121
00:12:23,712 --> 00:12:30,406
There's this notion of groups that MPI has
but it's kind of, more complicated. Okay.

122
00:12:30,406 --> 00:12:36,343
So, at the same time, this other program
is executing here. The first thing it goes

123
00:12:36,343 --> 00:12:41,987
to do is it receives. So, if it gets here
first, it's just going to block and wait

124
00:12:41,987 --> 00:12:50,418
in this receive. Conveniently, this first
processor did ascend, and has a matched

125
00:12:50,418 --> 00:12:59,733
receive here. So, it's going to receive
with tag 475 length one of an MPI integer

126
00:12:59,733 --> 00:13:08,274
from zero, and return as a status code.
So, it's going to fill in y with the data

127
00:13:08,274 --> 00:13:13,686
that got sent on this message. That's
pretty fun. We just communicate inform

128
00:13:13,686 --> 00:13:18,801
ation between two processes. Okay. So,
let's, let's look through what's happening

129
00:13:18,801 --> 00:13:28,840
here. Process zero as you do the send. So,
it's going to send 475 to processor one.

130
00:13:31,309 --> 00:13:35,631
Processor one could have gotten here
early, at this receive. And processor one

131
00:13:35,631 --> 00:13:40,845
does not execute any of this code. And it
has different address base, x's and y's

132
00:13:40,845 --> 00:13:46,445
are different, we have no shared memory
here. It gets here, and it's going to do

133
00:13:46,445 --> 00:13:52,198
receive. And at some point, the send shows
up and the, the message shows up to the

134
00:13:52,198 --> 00:13:57,952
receiver. It gets received and put into y.
So, it just fills it in which is why we

135
00:13:57,952 --> 00:14:05,093
pass the address of y. Now, we're going to
increment y by 105, so we did some

136
00:14:05,093 --> 00:14:11,616
computation on this node. At the, at the
same time while were, we're doing this

137
00:14:11,616 --> 00:14:16,657
receive, and we're doing this increment,
and we're doing this send, process zero is

138
00:14:16,657 --> 00:14:22,462
basically just sitting here waiting,
trying to receive. So, we're doing, we're

139
00:14:22,462 --> 00:14:27,270
having process zero. Sends some data to
process one. Process one is going to do

140
00:14:27,270 --> 00:14:32,278
some math on it that's going to increment
the number, and it's going to send that

141
00:14:32,278 --> 00:14:38,027
number back. So, it does the send back
here which have y again. And we're going

142
00:14:38,027 --> 00:14:44,492
to send y to process zero of length one,
and when the message shows up, we're going

143
00:14:44,492 --> 00:14:50,259
to receive in the process zero, and
process zero is going to print out a

144
00:14:50,259 --> 00:14:57,996
message. okay. What, what does it say?
Received number 580. Yes, it should go 580

145
00:14:57,996 --> 00:15:04,247
A, not 580 not 580 without an A. Because
580 without an A is not the parallel

146
00:15:04,247 --> 00:15:09,616
programming class, that's the security
class next to it. But 580 A, you should go

147
00:15:09,616 --> 00:15:14,984
to it. So, it prints out the number. So,
it can communicate from the one process to

148
00:15:14,984 --> 00:15:20,625
the other process and back with messaging.
So, one of the things we should note here

149
00:15:20,625 --> 00:15:27,311
is, we are both moving data and we are
synchronizing at the same time in this

150
00:15:27,311 --> 00:15:32,480
message. So, it's, it's doing two
different operations here. So, this is a

151
00:15:32,480 --> 00:15:37,530
great question. So, this is a programming
model right now. We're not going, you

152
00:15:37,530 --> 00:15:42,980
know, there's many different ways to
implement MPI and people to implement MPI.

153
00:15:43,260 --> 00:15:52,210
The interface of these MPI send, MPI
receive, are many different ways. So,

154
00:15:52,210 --> 00:15:56,981
people have implement this, implement this
over a message, hardware message passing

155
00:15:56,981 --> 00:16:01,191
network, which does not have to go into
the OS, for instance. You send and it

156
00:16:01,191 --> 00:16:05,345
actually, there's, effectively hardware
there which implements MPI send and

157
00:16:05,345 --> 00:16:09,780
receive, sends it out over the networks
and receives it on the receive side and

158
00:16:09,780 --> 00:16:14,215
there's some interconnection network in
the middle. That's one way. Another kind

159
00:16:13,990 --> 00:16:19,951
of way is that you actually run MPI over a
shared memory machine. At which point, MPI

160
00:16:19,951 --> 00:16:24,998
send. This is going to sound kind of
funny. It basically takes the data, copies

161
00:16:24,998 --> 00:16:29,589
it into RAM somewhere and then when the
receiver goes to receive it, it looks to

162
00:16:29,589 --> 00:16:34,059
open in the hash table, finds the location
in RAM and doesn't reach from that pointer

163
00:16:34,059 --> 00:16:37,845
and that's, that's an option. And
actually, that is one of the most common

164
00:16:37,845 --> 00:16:42,157
ways that people run MPI today is, that
doesn't have to go in to the OS because

165
00:16:42,157 --> 00:16:47,550
it's just shared memory. on small
machines, people run MPI that way. And it

166
00:16:47,550 --> 00:16:52,109
actually has good, has typically had a
better performance than sort of the

167
00:16:52,109 --> 00:16:56,450
equivalent thing of writing the short
memory program. And the reason for that is

168
00:16:56,450 --> 00:17:00,258
you've effectively, explicitly made the
communication or you've made the

169
00:17:00,258 --> 00:17:04,563
communication explicit. So, that coherence
protocol, you're basically going to be

170
00:17:04,563 --> 00:17:08,812
build to optimize for producer-consumer.
You have a right to some location. And

171
00:17:08,812 --> 00:17:12,896
then, someone else is going to read it,
your not going to have random false

172
00:17:12,896 --> 00:17:16,815
sharing problems, your not going to lots
other problems. So, that's pretty common.

173
00:17:16,980 --> 00:17:21,506
where MPI has the biggest use today is in
massively peril computations. So, this is

174
00:17:21,506 --> 00:17:26,743
sort of the supercomputers of the world.
sort of the, all the supercomputers minus

175
00:17:26,743 --> 00:17:32,511
the vector of supercomputers who typically
don't use MPI, but the massively parallel

176
00:17:32,511 --> 00:17:38,904
computers. So, like the biggest computers
in the world right now, the I don't know,

177
00:17:38,904 --> 00:17:46,515
the Roadrunner computer at Los Alamos, for
instance, and the similar sorts of

178
00:17:46,515 --> 00:17:52,683
computers like that. They will have
special network cards that implemen t MPI

179
00:17:52,683 --> 00:17:57,275
effectively and they do it in user space.
Now, another way to do it is that you can

180
00:17:57,275 --> 00:18:01,755
actually implement MPI over TCP/IP going
through the operating system. And that's

181
00:18:01,755 --> 00:18:06,179
pretty common when people just run small
clusters of computers is MPI send and

182
00:18:06,179 --> 00:18:10,434
receive will trap into the operating
system. It will go into the library first

183
00:18:10,434 --> 00:18:14,746
and then up, the library will watch you
read and write to the sockets. At some

184
00:18:14,746 --> 00:18:19,338
point, that will go out over the network
card. All of those are possible and all of

185
00:18:19,338 --> 00:18:23,416
those have very optimized MPI
implementations because this is the most

186
00:18:23,416 --> 00:18:28,944
widely used parallel messaging library for
high performance application and has the

187
00:18:28,944 --> 00:18:36,029
largest . So, people have optimized this
quite a bit. Okay. So, this, this brings

188
00:18:36,029 --> 00:18:40,533
us to the question of, a lot of words in
this slide, but I do want to sort of talk

189
00:18:40,533 --> 00:18:44,703
about this, of message passing versus
shared memory. So, we have two different

190
00:18:44,703 --> 00:18:49,707
parallel programming models. Message
passing, where you have to explicitly name

191
00:18:49,707 --> 00:18:54,254
your destinations. And shared memory,
where you went to some random location at

192
00:18:54,254 --> 00:18:58,591
some point in the future, someone goes and
reads from it and you don't know who the

193
00:18:58,591 --> 00:19:01,987
reader is. It could be any of the
processors in the system. If, you, you

194
00:19:01,987 --> 00:19:06,532
could just read around them it could be
random. It could be literally random like

195
00:19:06,532 --> 00:19:10,659
you read some random number and it tells
you whose going to do a reading of that

196
00:19:10,659 --> 00:19:14,770
location. But if you think about it,
because you have any, you can write two

197
00:19:14,770 --> 00:19:19,233
locations someone else can go read it in
shared memory. You are effectively having

198
00:19:19,233 --> 00:19:24,225
a one to all communication versus message
passing allows your microarchitecture and

199
00:19:24,225 --> 00:19:29,566
your system to do the optimize around
point-to-point communication. Okay. So,

200
00:19:29,566 --> 00:19:36,960
let's compare these two things. we have
message passing here. In message passing,

201
00:19:37,360 --> 00:19:45,447
typically, memory is private per node or
per process or per core. So, memory is not

202
00:19:45,447 --> 00:19:53,213
shared. Shared memory by definition,
memory is shared. Message passing of

203
00:19:53,213 --> 00:19:58,300
explicit send and receives in our software
code. So, we're actually going to put a

204
00:19:58,300 --> 00:20:02,760
MPI send and MPI receive. We' re actually
going to have sends and receives.

205
00:20:06,100 --> 00:20:10,636
Conversely, shared memory, we have these
implicit communication via loads and

206
00:20:10,636 --> 00:20:15,173
stores. And the load and the store names
an address but it does not name a core

207
00:20:15,173 --> 00:20:21,583
number or a process number or a thread
number. Message passing moves data. We do

208
00:20:21,583 --> 00:20:26,889
a send and we put some data in there. Also
by definition of receiving a message on

209
00:20:26,889 --> 00:20:30,584
the receive side, there's some
synchronization there. There's a

210
00:20:30,584 --> 00:20:36,790
producer-consumer relationship from the
sender to the receiver. In shared memory,

211
00:20:36,790 --> 00:20:42,686
you have to explicitly have
synchronization. sorry, I should say

212
00:20:42,686 --> 00:20:46,839
explicit. You need to add explicit
synchronization via fences, locks, and

213
00:20:46,839 --> 00:20:51,343
flags. So, you need to add something in
there to do synchronization. And if you

214
00:20:51,343 --> 00:20:55,321
don't, you don't, if you don't have
synchronization, you could have race

215
00:20:55,321 --> 00:20:59,942
conditions. You can get the wrong data.
You can pick up the incorrect data, which

216
00:20:59,942 --> 00:21:06,888
is a, a pretty common programming error in
shared memory programming. So you don't

217
00:21:06,888 --> 00:21:16,128
need to know the destination, you don't
need to know the destination up here. From

218
00:21:16,128 --> 00:21:21,100
a, from a programming perspective, what I
wanted to point out here is message

219
00:21:21,100 --> 00:21:24,509
passing is very natural for
producer-consumer style computations. So,

220
00:21:24,509 --> 00:21:28,869
if you have one node producing some value
and another node reading the value, it's

221
00:21:28,619 --> 00:21:33,130
very, very natural. You set up, you set up
a you send, you're sending from one node

222
00:21:33,130 --> 00:21:37,239
to the other node and receive it at the
other node and you have a channel between

223
00:21:37,239 --> 00:21:41,249
them and you can communicate. And you can
sort of send data in there, it's all in

224
00:21:41,249 --> 00:21:45,309
order, we'll say, and you can just send
down the channel stuff comes out the other

225
00:21:45,309 --> 00:21:51,760
side. Very natural for producer-consumer.
Shared memory, you have to implement

226
00:21:51,760 --> 00:21:57,364
producer-consumer like we did on last
lecture, where you have, like a Python

227
00:21:57,364 --> 00:22:02,968
memory, and locks on that structure. But
what is easy on shared memory, which is

228
00:22:02,968 --> 00:22:09,382
hard to view in message passing, is if you
have a large shared data structure. Let's

229
00:22:09,382 --> 00:22:17,740
take, for example, you have a big table.
And you're trying to process this table in

230
00:22:17,740 --> 00:22:23,868
parallel and you have, I don't know, a big
you have lots of little files. And you

231
00:22:23,868 --> 00:22:28,556
give each of these files to a different
processor. The processor reads the file

232
00:22:28,556 --> 00:22:33,184
and what we're trying to do here is we're
trying to build a histogram. So, it's

233
00:22:33,184 --> 00:22:37,872
going to read some number out of a file
and based on that number, it's going to

234
00:22:37,872 --> 00:22:42,085
look at a shared data structure and
increment that location by one. Now,

235
00:22:42,085 --> 00:22:46,476
because it's shared memory, you want to
make sure the two processes or two

236
00:22:46,476 --> 00:22:51,104
processors or two threads or two processes
or two processors is not trying to

237
00:22:51,104 --> 00:22:56,102
increment the number at the same time, so
you probably want to lock that location in

238
00:22:56,102 --> 00:23:00,255
the table or lock the whole table,
increment it, and then unlock it so you

239
00:23:00,255 --> 00:23:05,034
don't have two people trying to increment
it at the same time. But it's very natural

240
00:23:05,034 --> 00:23:09,642
to build a shared table in shared memory
and have locks on that shared table and

241
00:23:09,642 --> 00:23:16,438
have different people operating on that
shared piece of data at the same time. So,

242
00:23:16,438 --> 00:23:22,130
the interesting thing here is that you can
actually tunnel shared memory over

243
00:23:22,130 --> 00:23:28,863
messaging. And you can tunnel messaging
over shared memory. So, let's look at the

244
00:23:28,863 --> 00:23:36,793
first example here of how to implement
shared memory on top of messaging. You

245
00:23:36,793 --> 00:23:40,299
could do this in software. And in
software, we, how we do this is

246
00:23:40,299 --> 00:23:45,148
effectively try and turn all of our loads
and stores into sends and receives. Maybe

247
00:23:45,148 --> 00:23:49,530
from, and there's maybe like one
centralized node which has a big notion of

248
00:23:49,530 --> 00:23:53,971
memory. That's one way to do this or you
distribute it somehow. It's going to be

249
00:23:53,971 --> 00:23:58,469
pretty painful to do. But people have
implemented systems like this, where you

250
00:23:58,469 --> 00:24:03,693
actually implement loads and stores or
your compiler will go and pick out loads

251
00:24:03,693 --> 00:24:10,938
and stores and turn them into messages. A
more common thing to have happen is to

252
00:24:10,938 --> 00:24:18,139
actually have hardware that automatically
turns communications into messaging. So,

253
00:24:18,139 --> 00:24:23,352
we are going to take this motion of a
programming model and instead, we are

254
00:24:23,352 --> 00:24:28,493
going to turn into a broader motion of
entities trying to communicate via

255
00:24:28,493 --> 00:24:35,050
explicit communications that are sent and
then sort of interconnect. And this is

256
00:24:35,050 --> 00:24:42,225
actually the mo st common way these days
that people go about implementing these

257
00:24:42,498 --> 00:24:48,871
large shared memory machines. This is
actually, the memory traffic will get

258
00:24:48,871 --> 00:24:54,427
packetized. And we'll talk about
packetization in a second. And it will get

259
00:24:54,427 --> 00:24:59,275
sent over a network and then received and
then some receiver will do something with

260
00:24:59,275 --> 00:25:03,602
it. So, an example here, let's say, we
have this core here and this core wants to

261
00:25:03,602 --> 00:25:09,862
do a load. And the, the, the, the data is
in main memory. On our bus here, the core

262
00:25:09,862 --> 00:25:15,351
could just shout, I need address five. And
the memory will shout back, got it,

263
00:25:15,351 --> 00:25:23,960
address five has the value six. If we have
a switch interconnect, we put switches in

264
00:25:23,960 --> 00:25:29,573
here. We can have the core actually
packetize the memory request. So, we'll

265
00:25:29,573 --> 00:25:35,724
take this load and it will make a message,
send it over the network, and it will show

266
00:25:35,724 --> 00:25:42,679
up at the memory. The memory will read the
message, make a response, and, And it can

267
00:25:42,679 --> 00:25:48,048
send it back over the network also. So
effectively, what we have here is we can

268
00:25:48,048 --> 00:25:52,509
actually tunnel shared memory over a
message network. And this is actually, for

269
00:25:52,509 --> 00:25:56,585
large cache coherence systems. The most
common thing that happens. We'll be

270
00:25:56,585 --> 00:26:01,211
talking about this in greater depth in two
classes when we have our directory based

271
00:26:01,211 --> 00:26:05,232
protocols, which are effectively
implementing shared memory over these

272
00:26:05,232 --> 00:26:11,972
switched interconnection networks. You
could also implement messaging over shared

273
00:26:11,972 --> 00:26:17,364
memory and this is pretty common in small
systems that have shared memory. They have

274
00:26:17,364 --> 00:26:23,593
no other way to communicate. And this is
exactly what we talked about last lecture.

275
00:26:23,593 --> 00:26:29,232
We had a FIFO, we had head and tail
pointers, producer can effectively enqueue

276
00:26:29,232 --> 00:26:35,016
onto this queue in main, main memory.
Consumer can go read from it and you could

277
00:26:35,016 --> 00:26:40,108
implement messaging this way. So, I guess,
what I'm trying to get at here is

278
00:26:40,108 --> 00:26:44,743
messaging and shared memory are duals of
each other. They may be more natural for

279
00:26:44,743 --> 00:26:49,207
one thing or another, but you can
implement any algorithm, you can implement

280
00:26:49,207 --> 00:26:52,240
one in the other. And that's been shown at
this point.