Okay, so now I'll start talking about,
we're gonna, we're gonna spend the rest of
the lecture talking about different
coherence protocols, and relative merits
of them, on a bus. I wanted to contrast
this to what we're gonna talk about in two
lectures, where we're gonna be talking
about different coherence protocols across
switch interconnects, and places where you
don't have a shared medium or a shared
broadcast medium. Okay. So let's, as a
warm-up here, we're gonna start off by
looking at what can happen with. We know
happen at the same time as memory
transactions from a uni-processor. So,
let's take a look at where you can have
consistency problems in a uni-processor
system. As a warm-up and a motivator here.
So here, we have a processor, and that's a
cache. This is memory. And then, somewhere
in the memory bus here, we off a bit with
a DMA agent. Or a direct memory access
agent. Sometimes it's called, All
semester, even. Because it, it has a term,
because of multiple agents which can
effectively drive transactions onto main
memory clusters. So if you go look at
like, for instance, PCI Exu2014, PCI or
PCI Express, which are sort of the
extension cards for your system, they'll,
they'll use the term, bus mastering. What
that really means is that there's a DMA
engine out at the I/O place. Okay, so,
what I'm trying to get across here is, you
can actually. overlap in a uni-processor
system. Moving data from the disk to main
memory without having to use the
processor. Because you have . Originally,
or, or, . And require program to IO in
order to go access the disk. So a simple
way the processor would actually read an
address, which is translated. And you
don't have the here. Some . This impostor
and possibly . As you can tell that
requires you to this memory. It's kinda
slow. So people decided let's put extra
direct memory access engines out at diode
prices. So, this actually goes back to
early, early computers. Like mainframes
like very sophisticated the, the, DMA. And
they're, they're, they were, they're obvi
ously . System 360 they have, coviariable
DMA engines that they can effectively .
But the simplest case given is basically
going to have a register which says where
we provide disk and where memory and how
long and, and a, and a go button, where
go, you could sit there . You could also
possibly do it the other way. Okay, so
let's look at this from a coherence
perspective. And let's choose this cache.
Let's look a memory to disk transaction.
So we look we look at the DMA it should
say having this location in memory into
this location of the disk. We tell it to
go. Now while it's doing that the
processor writes to an address that tells
it to... Inside this page. What happens?
It's not write through. There's no
coherence protocol really going on in this
case so far. Well, values, hopefully. Or
maybe it does, these cache lines do out to
main memory. So then it gets some of the
new values, and some of the old values.
The point I'm trying to make here is that,
you don't know. And what's gonna happen?
and that, that's a little scary. You
wanna, you know? You wanna in your system.
Now, you can say, well maybe the
processors shouldn't go and write to that,
those memory addresses. That is a valid
solution. this is actually pretty common
on, on modern day systems. Is that the OS
will know when the actually is in flight.
They will just make sure not to go access
that, those memory addresses. Well, what I
was trying to introduce here is that you
actually processor system, can have
coherence problems, with respect to. Where
the different addresses are. You can
actually have . . You can also have it
going the other way. To the disks to main
memory transferring in that direction.
Well, let's say there's some values in the
cache of the CPU. This is actually
probably the more interesting case, is
that you have some data in the CPU's
cache, erasing the disk to physical
memory. But all of a sudden, the CPU's
cache. Doesn't pick up the new value. And
it wants to go and read that. It's like
reading a file off a disk or some thing.
This is gonna get the wrong value. So this
introduces, this moves us to our first
idea in our coherence protocols, that has
a funny name, called snoopy caches. no,
this is not named for the, the dog in the
Peanuts cartoons. But instead, Stu Goodman
and, I believe it's a professor now and
one of his students, came up with the idea
that you have the cash on the what's going
on and effectively update the cash with
the, the data that's flying by on the bus.
So, if we look at this from a little bit
more harder perspective. . We have our
cache and we have the tags. And escape
into the cache. And it is effectively
sitting there watching the bus. And if a
address that is in the cache slides behind
the bus, it needs to do something about
it. It probably needs to invalidate the
address if its a right occurring across
the bus. You can also do it the other way
that if you have a DNA engine which is
reading from a memory and the data is
dirty in cache and it's not a memory, it's
a right back cache, it may need to provide
data to the, the I O device that's trying
to read from a memory and override
effectively where its coming from main
memory. Now it probably doesn't wanna try
that after you're on the bus but there's
also arbitration that's actually, actually
happening there. To determine who has the
actual data. . But you have to figure out
sort of what is, what is the right
fitting? You know, we're talking a little
bit more in today's lecture, what is the
right thing to do in these, these
interesting formative cases? Before we
move on here, this, this is getting a
little bit hard. maybe back in 1983, this
wasn't so bad. But nowadays, we just tags
and our . A dual coordinate.'Cause in the
Snoopy protocol, we're gonna need all
possible memory, transactions that are
going on, let's say, by one processor.
And/or and DMA engine, to be verified by
every other processor entity in the
system. That's a fair amount of bandwidth
coming into here. And you could add two
ports to that. And that's okay if the
cache is sort of f arther out. But it
might slow down your cache . Level one
cache. So typically the way that people
build this is they'll try to maybe move
forward or snoop on a level two cache and
have level one cache which is not
necessarily snooped. Now this is where we
get back to inclusive versus exclusive
caches. If you're level two cache is
inclusive of all data in level one this is
actually not so bad to do. Because you are
guaranteed to the tags in level two cache.
Global one Cash. So, you don't have to go,
move all the way down for cash. If you use
exclusive cash, well, life gets a lot
harder. Cuz basically you need a check
with a Global one emblem to . so, anyways,
all I was trying to get across here's that
this, this significantly increases the
price of or cash design here, your tab
design as you add more portions to this.
And it's not actually an area question, I
mean. Just makes it larger, puts more of,
a clock cycle perfomance, specially if
it's, your global one captioning, where
two important task. That's a critical
tacking your processor to go, and certain
one, two, and anything else that too. One
ways all this is actually, you have a
unique, ported tax structure and you
somehow delay, the rest of, the, catch
soup transaction, while, you arbitrate and
wait for time, in order to go access the
tax. So you. But you can, multiplex let's
say the, the one portion of the tag, every
other cycle. One cycle's the main
processor. One cycle is for the
transaction.
. So just, have a little more of block
diagram view of this. We have a bus, of
multi-processors, and our snoopy cache,
which actually has to see, all of the,
items. No traffic. And in all the main
processor traffic across this bus. So
that's a, a lot of bandwidth because
essentially you're broadcasting... Well,
you might have to broadcast all of your
actions from one processor to all the, all
the other processors. But we're looking at
techniques to reduce the requirements of,
of this broadcast to be the subset of, of
the data. Okay so questions so far? , and,
and adding supports before going into
protocols.