Several people wrote significant parts of PLearn, and if you take a closer look at the code, you will see a number of clearly different coding styles and philosophies. However, as of this writing (09/2000), the overall design and organization of most of the library is still to be blamed (or praised...) on me. So these remarks are my personal view of things, and do not necessarily reflect the opinion of everybody on the PLearn developer team, but I hope it will help you understand the reasons why things are the way they are, and hopefully have you choose to keep them that way...
Pascal
People who discover C++ tend to first be overwhelmed with its wealth of features, and then seem to want to use them all at once in even the simplest piece of code (complex templates, deep multiple inheritance trees, exceptions, multiple nested namespaces; add multi-threading on top of that and you're sure to write the most unreadable, unportable and compiler-bug trigerring error-prone code ever). Finally, after great intellectual efforts, they discover that even their compiler (not to mention their debugger!) has trouble understanding it all and, if they manage to have it swallow the code, they realise that no other compiler will (portability anybody?). This is still quite true as of this writing (09/2000) and was even more so a few years ago, yet tools will keep improving until some day, hopefully, they all behave perfectly by the book, according to the standard, but until this blessed day comes, beware... Many people then give up, frustrated, and decide to go back to C, which is a shame. C++ is a much better language than C, especially for writing Object Oriented code, and it does make the programmer's life much easier... as long as you keep things simple.
So please, especially if you're a beginner, keep this in mind when writing C++ code: having so many “cool” features in the language doesn't mean that you must use them all at once. Choose wisely and, if in doubt, always prefer the simplest solution...
Any project implicitly or explicitly sets some goals and these directly influence the way code is written. With PLearn, one of the founding goal, was to be able to describe complex machine-learning experiments by assembling simple building-blocks directly in C++, without resorting to a layer of home-grown dedicated language (as experience had proven us that it is hard to grow and maintain such a language, which appears always too limited anyway). Obviously we also want to have them run efficiently (hence the choice of C++ rather than a higher level interpreted language).
Any system should ideally be simple to understand and use, lightning fast, and extremely geneal. Yet there is always a tradeoff to be made between these 3 highly desirable characteristics. Here is the priority I gave them, in the design of the library, it logically follows from the project's primary goals:
As I mentionned earlier, moderation is good in everything, including in moderation... ;)
C++ allows you to omit parameter names in prototypes (and only give their types). This defies the purpose of clarity, and is thus considered bad practice by the author and in PLearn in general. Except for possible default values (that are to appear only in the .h), the prototype in the .h file should be identical to the definition in the .cc file and include parameter names.
(Ex: people usually have trouble understanding what float* f(float*, int, int, char, char*, float); is supposed to do, and defining a new type for each argument is not the right way of making this more understandable... giving them a meaningful name is.)
real is defined throughout the whole library to be either float or double, depending on a compilation flag (USEDOUBLE or USEFLOAT).
Also we encourage people not to define a new type if it
conceptually corresponds to one of those three concepts, in particular
I for one (and I'm surely not alone) dislike to have to write
namespace::subnamespace::classname::interiorclass::length_type
when the damn thing is just an integer, if you get my point. Please
use int, it saves the user keystrokes, code lookup time, and
eases understanding (i.e.: genericity- - but simplicity++ and
ease_of_use++, see section on desing priorities above).
The use of unsigned int types is also a source of annoyance to me,
and of potential nasty bugs. Ex:
for (unsigned int i=10; i>=0; --i)
So again, unless you really need the extra bit of precision, use int (also saves a few keystrokes).
A kind of string type is also usually seen as part of the set of basic types, but we'll discuss this in the section on the standard library.
Also, for now, I do not encourage the use of sub-namespaces to organize the code within PLearn (with or without #ifdefs). It's already hard enough to get the organization right in terms of concepts, class hierarchies, and files, without introducing yet another hierarchy of things (which besides, would go mostly untested as we always compile with USENAMESPACE undefined, for now anyway).
So essentially we don't use exceptions in PLearn, but a very simple runtime error mechanism: error("my meaningful error message"); will result in a call to function errormsg that simply prints out the message and exits the program. Thus it is easy to set a breakpoint in errormsg in the debugger and trace what happened. This is a no-fuss solution that does the job. Notice that the errormsg function can easily be modified to throw an exception if you wish to do a proper error recovery (in case brutally exiting the program is not an acceptable behaviour).
Exceptions can also be useful for other things, but for typical runtime-errors, please use error(errormsg).
As the compilers improved, I started allowing myself to use simple templates for things where they were really appropriate, (i.e. smart pointers and generic containers). And I would recommend everybody to stick to this. Please, refrain from using templates as much as possible: it will make your code easier to write, to read, to debug, to port, to understand, and also faster to compile. It's usually easy to later “templatize” a working and well-tested non-templated code if really needed. But it's always annoying to have to “de-templatize” a complex template code because the compiler on your new target platform cannot understand it (chances are that you won't either).
Also we often use concrete classes, and in general prefer flat class hierarchies than very deep ones, as they are easier to comprehend.
In early versions of PLearn, we did not use much of the standard library (as no compilers yet agreed on a standard), except for iostreams. Now that there is a well established standard, and that all compiler makers are working towards conforming to it, we are slowly moving PLearn to using more of the standard library facilities.
A number of useful additional functions for user-friendly string manipulation can be found in file PLearnCore/stringutils.h A brief (and certainly not up to date) description of it, as well as a pointer to a quick overview of the basic string operations can be found here.
The following naming conventions are used throughout PLearn. They are mostly inspired by the Java naming conventions. Anybody who uses or wishes to extend PLearn should be aware of them (as it makes understanding of the code easier) and try to respect them (as it will make the understanding of their code easier to other people who will have the privilege to dig into it).
A few reasonable exceptions are tolerated throughout the code (such as function P for probability instead of a lowercase p, or a member variable K for a kernel matrix...) But exceptions that don't serve any purpose should not be!
The PLearn library is far from perfect, it still has a lot of rough edges (my to-do list is growing every day), and there are several things that I would do differently if I was to start all over again. But it is nevertheless already a very usable tool, that for the most part, I feel, meets its primary design goals. Besides I consider good code design an iterative process: one starts with an initially rough version and iteratively refines it under the light of real-world experience. The code base is not carved in stone, it is an evolving being, and the source code is there so that you can tweak it and adapt it to your needs, and hopefully help make it better.