PLearn::PLMPI Class Reference

** PLMPI is just a "namespace holder" (because we're not actually using namespaces) for a few MPI related variables. All members are static ** More...

#include <PLMPI.h>

Collaboration diagram for PLearn::PLMPI:

[legend]List of all members.


Static Public Member Functions
void	init (int argc, char **argv)
	definition of PLMPI inline methods
void	finalize ()
void	exchangeBlocks (double data, int n, int blocksize, double buffer=0)
void	exchangeColumnBlocks (Mat sourceBlock, Mat destBlocks)
void	exchangeBlocks (float data, int n, int blocksize, float buffer=0)
Static Public Attributes
bool	using_mpi = false
	true when USING_MPI is defined, false otherwise
int	size = 0
	total number of nodes (or processes) running in this MPI_COMM_WORLD (0 if not using mpi)
int	rank = 0
	rank of this node (if not using mpi it's always 0)
bool	synchronized = true
	Do ALL the nodes have a synchronized state and are carrying the same sequential instructions?
PStream	mycout
PStream	mycerr
PStream	mycin
int	tag = 2909
	Defaults to 2909.
Static Protected Attributes
streambuf *	new_cin_buf = 0

Detailed Description

** PLMPI is just a "namespace holder" (because we're not actually using namespaces) for a few MPI related variables. All members are static **

Example of code using the PLMPI facility:

In your main function, make sure to call

int main(int argc, char** argv) { PLMPI::init(&argc,&argv); ... [ your code here ] PLMPI::finalize(); }

Inside the if USING_MPI section, you can use any MPI calls you see fit.

Note the useful global static variables PLMPI::rank that gives you the rank of your process and PLMPI::size that gives the number of parallel processes running in MPI_COMM_WORLD. These are initialised in the init code by calling MPI_Comm_size and MPI_Commm_rank. They are provided so you don't have to call those functions each time....

File input/output -----------------

NFS shared files can typically be opened for *reading* from all the nodes simultaneously without a problem.

For performance reason, it may sometimes be useful to have local copies (on a /tmp like directory) of heavily accessed files so tat each node can open its own local copy for reading, resulting in no NFS network traffic and no file-server overload.

In general a file cannot be opened for writing (or read&write) simultaneously by several processes/nodes. Special handling will typically be necessary. A strategy consists in having only rank#0 responsible for the file, and all other nodes communicating with rank#0 to access it.

Direct non-read operations on file descriptors or file-like objects through system or C calls, are likely to result in an undesired behaviour, when done simultaneously on all nodes!!! This includes calls to the C library's printf and scanf functions. Such code will generally require specific handling/rewriting for parallel execution.

cin, cout and cerr ------------------

Currently, the init function redirects the cout and cin of all processes of rank other than #0 to/from /dev/null but cerr is left as is. Thus sections not specifically written for parallel execution will have the following (somewhat reasonable) default behaviour:

If the same thing is written to cout by several nodes, it will actually be written only once (only by rank#0). This is to avoid n=size replications of a regular program's "output".

If for some strange reason you *do* want a non-rank#0 node to write to your terminal's cout, you can use PLMPI::mycout (which is copied from the initial cout), but result will probably be ugly and intermixed with the other processes' cout.

cerr however (which is typically used for debugging, rather than proper "program output") is kept as is. So a sequential section writing to cerr on n parallel processes will result in n replications of the message appearing on your terminal. (which might be useful for debugging, to see what each processes is doing, etc...).

For proper handling of cin, all cin accesses should for now be within if(PLMPI::rank==0) blocks (and possibly followed by communicating the entered data to the other nodes what the need to know).

pofstream ---------

pofstream is a class that will only open the actual file for writing on the rank#0 node. On all other nodes, it's directed towards /dev/null. This can be used to ensure that regular, non-parallel code (including the sequential sections of a program using some parallel sections), when launched with mpirun, will not try to open the same file for writing several times, and not duplicate its "output". The opened file will solely be the responsibility of node of rank#0. For instance, messages could be sent from other nodes to node rank#0, where rank#0 would be the only one to write to the file.

NOTE: While using pofstream may be useful for quickly adapting old code, I rather recommend using the regular ofstream in conjunction with if(PLMPI::rank==0) blocks to control what to do.

The PLMPI::synchronization flag -------------------------------

This flag is important for a particular form of parallelism:

In this paradigm, the data on all nodes (running an identical program) are required to be in the same synchronized state before they enter a parallel computation. In this paradigm, the nodes will typically all execute the same "sequential" section of the code, on identical data, roughly at the same time. This state of things will be denoted by "synchronized=true". Then at some point they may temporarily enter a section where each node will carry different computation (or a similar computation but on different parts of the data). This state will be denoted by "synchronized=false". They typically resynchronize their states before leaving the section, and resume their synchronized sequential behaviour until they encounter the next parallel section.

It is very important that sections using this type of paralelism, check if synchronized==true prior to entering their parallel implementation and fall back to a strictly sequential impementation otherwise. They should also set the flag to false at entry (so that functions they call will stay sequential) and then set it back to true after resynchronization, prior to resuming the synchronized "sequential" computations.

The real meaning of synchronized=true prior to entering a parallel code is actually "all the data *this section uses* is the same on all nodes, when they reach this point" rather than "all the data in the whole program is the same". This is a subtle difference, but can allow you to set or unset the flag on a fine grain basis prior to calling potentially parallel functions.

Here is what typical code should look like under this paradigm:

int do_something( ... parameters ...) { ... sequential part

if(USING_MPI && PLMPI::synchronized && size_of_problem_is_worth_a_parallel_computation) { // Parallel implementation if USING_MPI PLMPI::synchronized = false;

... each node starts doing sometihng different ex: switch(PLMPI::rank) ...

Here we may call other functions, that should entirely run in sequential mode (unless you know what you are doing!)

... we resynchronize the state of each node. (collect their collective answer, etc...) For ex: MPI_Allgather(...)

PLMPI::synchronized = true; #endif } else // default sequential implementation { }

}

Definition at line 227 of file PLMPI.h.

Member Function Documentation

void PLearn::PLMPI::exchangeBlocks ( float * data,

int n,

int blocksize,

float * buffer = 0

) [static]

Definition at line 111 of file PLMPI.cc.
References PLERROR.

void PLearn::PLMPI::exchangeBlocks ( double * data,

int n,

int blocksize,

double * buffer = 0

) [static]

Definition at line 68 of file PLMPI.cc.
References PLERROR.

void PLearn::PLMPI::exchangeColumnBlocks ( Mat sourceBlock,

Mat destBlocks

) [static]

Definition at line 154 of file PLMPI.cc.
References PLearn::TMat< T >::length(), PLearn::Mat, PLERROR, PLWARNING, and PLearn::TMat< T >::width().

void PLearn::PLMPI::finalize ( ) [inline, static]

Definition at line 346 of file PLMPI.h.

void PLearn::PLMPI::init ( int * argc,

char *** argv

) [inline, static]

definition of PLMPI inline methods

Definition at line 328 of file PLMPI.h.
References mycerr, mycin, mycout, PLearn::nullin, PLearn::nullout, rank, and size.

Member Data Documentation

PStream PLearn::PLMPI::mycerr [static]

Definition at line 63 of file PLMPI.cc.
Referenced by init().

PStream PLearn::PLMPI::mycin [static]

Definition at line 64 of file PLMPI.cc.
Referenced by init().

PStream PLearn::PLMPI::mycout [static]

These will correspond to this process's initial streams before we tamper with them (init changes cout and cin, redirecting them to/from /dev/null for nodes other than the rank#0 node)
Definition at line 62 of file PLMPI.cc.
Referenced by init().

streambuf * PLearn::PLMPI::new_cin_buf = 0 [static, protected]

Definition at line 49 of file PLMPI.cc.

int PLearn::PLMPI::rank = 0 [static]

rank of this node (if not using mpi it's always 0)

Definition at line 60 of file PLMPI.cc.
Referenced by init().

int PLearn::PLMPI::size = 0 [static]

total number of nodes (or processes) running in this MPI_COMM_WORLD (0 if not using mpi)

Definition at line 59 of file PLMPI.cc.
Referenced by init().

bool PLearn::PLMPI::synchronized = true [static]

Do ALL the nodes have a synchronized state and are carrying the same sequential instructions?
The synchronized flag is used for a particular kind of parallelism and is described in more details above, including a sample of how it should typically be used. When synchronized is true at a given point in the instruciton stream, it roughly means: *** all the data *used by the following section* is the same on all nodes when they are at this point***". It's set to true initially. (But will be set to false if you launch the PLMPIServ server, which uses a different parallelisation paradigm).
Definition at line 57 of file PLMPI.cc.

int PLearn::PLMPI::tag = 2909 [static]

Defaults to 2909.

Definition at line 66 of file PLMPI.cc.

bool PLearn::PLMPI::using_mpi = false [static]

true when USING_MPI is defined, false otherwise

Definition at line 54 of file PLMPI.cc.

The documentation for this class was generated from the following files:

Generated on Tue Aug 17 16:24:34 2004 for PLearn by

1.3.7

PLearn::PLMPI Class Reference

Static Public Member Functions

Static Public Attributes

Static Protected Attributes

Detailed Description

Member Function Documentation

Member Data Documentation