CDS and BCR Frequently Asked Questions
What is Cooperative Data Sharing?
Cooperative Data Sharing (CDS) is a family
of APIs that utilize certain techniques (like dynamic shared memory allocation,
shared public queues, and user-controlled copy-on-write protocols) to provide
efficient , flexible, and portable communication in a wide variety of high-latency
and low-latency situations. Instead of requiring the user/programmer
to decide on a restrictive style of communication interface (e.g. shared
memory or message passing) beforehand, and then forcing them to figure
out how to optimize each subsequent communication based on the relative
positioning of the communicating entities and the interface chosen,
CDS just lets the programmer to specify the behavior (i.e. semantics) required
of each communication in their program. CDS figures out how to provide
that behavior most efficiently at run time, based on the relative positioning
of the communicating entities (and perhaps other factors).
What is BCR?
BCR (which stands for "Before CDS Redesign",
or "C-1, D-1, S-1") is one particular member ("flavor") of the CDS communication
interface family. It already has some known shortcomings (as its
name implies), but also has the advantage of being the only member that
is available now, and is open source. Someday, when the interface
is more evolved, the term "CDS" may refer to a specific interface, but
until then, each will have some sort of identifying name.
How is CDS (or BCR) better than message passing
(say, MPI)?
Message passing interfaces are designed and
optimized around some basic assumptions--such as, that the "sender" and
"receiver" of communicated data will be on separate processors, potentially
separated by high latency channels, and that it is therefore acceptable
to physically copy data from one entity to the other. As a result,
even when the communicating processes are on a single processor, and even
when the interface is configured with so-called "shared memory" options,
that copying of data from one process to another is still required for
each communication, just to make the communication act as the programmer
expects it will. That is often unneeded overhead. Message passing
interfaces also often restrict the programmer to a specific set of ways
that they can interact, yielding problems when the program needs some other
way (e.g. a more demand-driven style). CDS only copies data when
it is needed (by either the programmer's specified behavior for that communication,
or because the communicating entities are indeed separated by high-latency
channel). And, the CDS programmer has a wide variety of options for
the behavior of each and every communication.
How is CDS (or BCR) better than distributed
shared memory?
Shared memory interfaces make a different
set of assumptions--e.g. that there will be relatively low latency between
communicating processors. As such, they don't usually provide constructs
to help hide latency (such as queues, and/or the ability for the programmer
to specify where data will be needed next). These interfaces also
often assume homogeneous platforms, so don't appropriately allow for data
to be converted as it is being communicated between machines with different
data representations. CDS addresses these issues.
Are there more general reasons to use CDS?
CDS can free the programmer of restrictive
assumptions which complicate the programming process. For example,
if one is using a message passing interface, it can become a programming
hassle if the problem at hand calls for a demand-driven style where data
must be made available to any process that needs it. In CDS, it's
very simple. Likewise, if one is using a shared memory interface,
and decides that they need to queue up regions for a particular process
to access in order as it has time, it can become a programming issue, but
in CDS, it's natural.
Portability has been the leverage that
has motivated productivity in software over the decades, and has instigated
the development of high-level languages and standards. It allows
software development costs to be amortized over a broader and/or longer
useful life. CDS provides such portability in communication.
In addition to the portability between high- and low-latency environments,
which CDS handles better than either message passing or shared memory,
there's another kind of portability related to granularity. That
is, because of the overhead inherent in a message passing interface,
the programmer must try to make the number of processes roughly match the
number of processors to get the optimal performance, but the number of
processors is itself an architectural trait. In CDS, such numerical
matching becomes less important, because when multiple computing entities
end up on the same processor, their communication overhead is minimal,
making them almost like components in a single entity instead of individual
entities.
Are there any drawbacks to CDS (or BCR) over
message passing or shared memory?
Of course, though we expect to see those lessen
over time(see future plans). Right now, BCR has moderately good performance
in many cases, but has not been tuned for specific architectures as MPI
has (over many years), so it will depend in large part what you want to
do. CDS currently has no equivalent functionality to MPI-IO or many
collective operations, though it may actually have superior performance
in the long run by leaving these to higher-level functions.
Similar arguments can be made for shared
memory. CDS has a weak release consistency model, and implementation
of even that could be further optimized. If you expect stronger consistency
models, and you already have a shared memory solution that works well and
that is not likely to need porting to other environments, BCR may not be
the answer for you. However, using CDS in some distributed memory
machines may provide better performance than using the vendor's "native"
shared memory interfaces, because of CDS's superior ability to hide latency.