1. Overview of PLearn

1.1 Introduction

Machine Learning algorithms are usually described in scientific papers in a standard mathematical formulation, often framed as an optimization of a given cost function. PLearn is a C++ library that uses the object-oriented and operator overloading capabilities of the C++ language to allow, among other things, to express cost functions and their optimization as a standard C++ program, in a declarative manner that is as close as possible to their mathematical formalization.

Most neural-network and general machine-learning simulation environments define their own scripting language. While it is very tempting for every computer-scientist to craft his own language, creating a complete, clear, efficient and bug-free language is a horrendous task, so this is how things usually go: one starts bulding a simple scripting syntax (typically lisp-like because it's easy to parse) to specify simple experiments. Quickly it appears too limited, and it may grow to include loops, functions, data structures, etc. Eventually it ends up including some sort of support for object-oriented programming, and finally for efficiency you want it to be compiled rather than interpreted! In the end, you end up with a huge mess of a system that was not designed to grow that much from the beginning, and which is often impossible to comprehend and maintain for anybody but its author. The end-result might sometimes be impressive, but at the cost of a lot of efforts diverted from your actual research. While C++ is far from being the perfect language, it is very powerful, can be both very expressive and generate highly efficient code, and most of all it has the immense advantage of being developed and well-supported by worldwide teams of dedicated and competent people...

When providing the correct type abstractions, C++ can be an expressive-enough language to directly serve as a highly customizable, extensible and efficient “scripting language” for designing and running even the most demanding experiments in machine-learning research and development. So this is what this library is all about: providing the right type abstractions. What originally got me started on this project was the desire to be able to optimize a complex cost function by just expressing it in a declarative way as close as possible to the mathematical formulation. This lead to the original implementation of the Var class. Since then PLearn has grown to include many other useful types and abstractions.

While PLearn has been successfully used by several people for over a year, it is still very much work in progress. As its primary use is for our own research, we did not want to carve it in stone: thus future versions may look quite different from this one, as we are still reworking the class hierarchy. But it is nevertheless already very usable, so feel free to play around and experiment with it!

Probably the biggest problem, like with many projects of this kind, is the cruel lack of documentation. This manual will attempt to give you a high-level understanding of the basic concepts, but to work out the details, you'll have to look at the actual code. Also, to fully use the potential of this library, you are expected to be somewhat comfortable with the C++ language.

Have fun!

Pascal

1.2 Developer CVS access

If you are going to contribute to PLearn on Berlios (http://www.berlios.de):

If you don't have one already, create a Berlios account for yourself
Send me (plearner@users.berlios.de) your account login, so that I can add you to the developer list.
Make sure the CVS_RSH environment variable is set to ssh in your .cshrc or .bashrc

Check-out PLearn as follows(Don't forget to change USERNAME to your username):

svn checkout svn+ssh://USERNAME@svn.berlios.de/svnroot/repos/plearn/trunk PLearn

the current devellopment is on Belios, but in the past Sourceforge was used.

1.3 Additional tools for developers

In addition, if you wish to develop new learning algorithms, or otherwise contribute to the librery, the following tools will be useful:

ssh for write access to the SourceForge CVS repository.
gdb for basic debugging (or a better debugger if you have one!)
valgrind a wonderful free tool for memory-bug hunting.
python for running python scripts, as well as the PLearn build system
perl for running perl scripts
LaTeX, pdflatex, dvips, latex2html, doxygen to re-generate the documentation.