PLearn::Learner Class Reference

#include <Learner.h>

Inheritance diagram for PLearn::Learner:

[legend]Collaboration diagram for PLearn::Learner:

[legend]List of all members.


Public Types
typedef Object	inherited
Public Member Functions
string	basename () const
	returns expdir+train_set->getAlias() (if train_set is indeed defined and has an alias...)
	Learner (int the_inputsize=0, int the_targetsize=0, int the_outputsize=0)
virtual	~Learner ()
virtual void	setExperimentDirectory (const string &the_expdir)
	The experiment directory is the directory in which files related to this model are to be saved.
string	getExperimentDirectory () const
	PLEARN_DECLARE_ABSTRACT_OBJECT (Learner)
	Does the necessary operations to transform a shallow copy (this) into a deep copy by deep-copying all the members that need to be.
virtual void	makeDeepCopyFromShallowCopy (CopiesMap &copies)
virtual void	build ()
	** SUBCLASS WRITING: ** This method should be redefined in subclasses, to just call inherited::build() and then build_()
virtual void	setTrainingSet (VMat training_set)
	Declare the train_set.
VMat	getTrainingSet ()
virtual void	train (VMat training_set)=0
virtual void	newtrain (VecStatsCollector &train_stats)
virtual void	newtest (VMat testset, VecStatsCollector &test_stats, VMat testoutputs=0, VMat testcosts=0)
	Should perform test on testset, updating test cost statistics, and optionally filling testoutputs and testcosts.
virtual void	train (VMat training_set, VMat accept_prob, real max_accept_prob=1.0, VMat weights=VMat())
virtual void	use (const Vec &input, Vec &output)=0
virtual void	use (const Mat &inputs, Mat outputs)
virtual void	computeOutput (const VVec &input, Vec &output)
	* SUBCLASS WRITING: * This should be overloaded in subclasses to compute the output from the input
virtual void	computeCostsFromOutputs (const VVec &input, const Vec &output, const VVec &target, const VVec &weight, Vec &costs)
	* SUBCLASS WRITING: * This should be overloaded in subclasses to compute the weighted costs from already computed output.
virtual void	computeOutputAndCosts (const VVec &input, VVec &target, const VVec &weight, Vec &output, Vec &costs)
	Default calls computeOutput and computeCostsFromOutputs You may overload this if you have a more efficient way to compute both output and weighted costs at the same time.
virtual void	computeCosts (const VVec &input, VVec &target, VVec &weight, Vec &costs)
	Default calls computeOutputAndCosts This may be overloaded if there is a more efficient way to compute the costs directly, without computing the whole output vector.
virtual void	setModel (const Vec &new_options)
virtual void	forget ()
virtual bool	measure (int step, const Vec &costs)
virtual void	oldwrite (ostream &out) const
virtual void	oldread (istream &in)
	DEPRECATED For backward compatibility with old saved object.
void	save (const string &filename="") const
	DEPRECATED. Call PLearn::save(filename, object) instead.
void	load (const string &filename="")
	DEPRECATED. Call PLearn::load(filename, object) instead.
virtual void	stop_if_wanted ()
	stopping condition, by default when a file named experiment_name + "_stop" is found to exist.
int	inputsize () const
	Simple accessor methods: (do NOT overload! Set inputsize_ and outputsize_ instead).
int	targetsize () const
int	outputsize () const
int	weightsize () const
int	epoch () const
virtual int	costsize () const
	**** SUBCLASS WRITING: should be re-defined if user re-defines computeCost default version returns
void	setTestCostFunctions (Array< CostFunc > costfunctions)
	Call this method to define what cost functions are computed by default (these are generic cost functions which compare the output with the target).
void	setTestStatistics (StatsItArray statistics)
	This method defines what statistics are computed on the costs (which compute a vector of statistics that depend on all the test costs).
virtual void	setTestDuringTrain (ostream &testout, int every, Array< VMat > testsets)
	testout: the stream where the test results are to be written every: how often (number of iterations) the tests should be performed
virtual void	setTestDuringTrain (Array< VMat > testsets)
const Array< VMat > &	getTestDuringTrain () const
	return the test sets that are used during training
void	setEarlyStopping (int which_testset, int which_testresult, real max_degradation, real min_value=-FLT_MAX, real min_improvement=0, bool relative_changes=true, bool save_best=true, int max_degraded_steps=-1)
virtual void	computeCost (const Vec &input, const Vec &target, const Vec &output, const Vec &cost)
	computes the cost vec, given input, target and output The default version applies the declared CostFunc's on the (output,target) pair, putting the cost computed for each CostFunc in an element of the cost vector.
virtual void	useAndCost (const Vec &input, const Vec &target, Vec output, Vec cost)
	By default this function calls use(input, output) and then computeCost(input, target, output, cost) So you can overload computeCost to change cost computation.
virtual void	useAndCostOnTestVec (const VMat &test_set, int i, const Vec &output, const Vec &cost)
	Default version calls useAndCost on test_set(i) so you don't need to overload this method unless you want to provide a more efficient implementation (for ex.
virtual void	apply (const VMat &data, VMat outputs)
virtual void	applyAndComputeCosts (const VMat &data, VMat outputs, VMat costs)
virtual void	applyAndComputeCostsOnTestMat (const VMat &test_set, int i, const Mat &output_block, const Mat &cost_block)
	Like useAndCostOnTestVec, but on a block (of length minibatch_size) of rows from the test set: apply learner and compute outputs and costs for the block of test_set rows starting at i.
virtual void	computeCosts (const VMat &data, VMat costs)
virtual void	computeLeaveOneOutCosts (const VMat &data, VMat costs)
	For each data point i, trains with dataset removeRow(data,i) and calls useAndCost on point i, puts results in costs vmat.
virtual void	computeLeaveOneOutCosts (const VMat &data, VMat costsmat, CostFunc costf)
Vec	computeTestStatistics (const VMat &costs)
virtual Vec	test (VMat test_set, const string &save_test_outputs="", const string &save_test_costs="")
	This function should work with and without MPI.
virtual Array< string >	costNames () const
virtual Array< string >	testResultsNames () const
virtual Array< string >	trainObjectiveNames () const
	returns an array of strings corresponding to the names of the fields that will be written to objectiveout (by default this calls testResultsNames() )
void	appendMeasurer (Measurer &measurer)
Vec	getTrainCost ()
Static Public Member Functions
PStream &	default_vlog ()
	The default stream to which lout is set upon construction of all Learners (defaults to cout).
Public Attributes
int	inputsize_
	The data VMat's are assumed to be formed of inputsize().
int	targetsize_
	columns followed by targetsize() columns.
int	outputsize_
	the use() method produces an output vector of size outputsize().
int	weightsize_
bool	dont_parallelize
	By default, MPI parallelization done at given level prevents further parallelization at lower levels.
PStream	testout
	test during train specifications
int	test_every
Vec	avg_objective
	average of the objective function(s) over the last test_every steps
Vec	avgsq_objective
	average of the squared objective function(s) over the last test_every steps
VMat	train_set
	the current set being used for training
Array< VMat >	test_sets
	test sets to test on during train
int	minibatch_size
	test by blocks of this size using apply rather than use
int	report_test_progress_every
Vec	options
	DEPRECATED options in the construction of the model through setModel.
int	earlystop_testsetnum
	index of test set (in test_sets) to use for early stopping
int	earlystop_testresultindex
	index of statistic (as returned by test) to use
real	earlystop_max_degradation
	maximum degradation in error from last best value
real	earlystop_min_value
	minimum error beyond which we stop
real	earlystop_min_improvement
	minimum improvement in error otherwise we stop
bool	earlystop_relative_changes
	are max_degradation and min_improvement relative?
bool	earlystop_save_best
	if yes, then return with saved "best" model
int	earlystop_max_degraded_steps
	max. nb of steps beyond best found [in version >= 1]
bool	save_at_every_epoch
	save learner at each epoch?
bool	save_objective
int	best_step
	the step (usually epoch) at which validation cost was best
real	earlystop_minval
string	experiment_name
Array< CostFunc >	test_costfuncs
StatsItArray	test_statistics
PStream	vlog
	The log stream to which all the verbose output from this learner should be sent.
PStream	objectiveout
	The log stream to use to record the objective function during training.
Vec	vec_input
	Next generation learners allow inputs to be anything, not just Vec
Static Public Attributes
int	use_file_if_bigger = 64000000L
	number of elements above which a file VMatrix rather
bool	force_saving_on_all_processes = false
	otherwise in MPI only CPU0 actually saves
Protected Member Functions
void	openTrainObjectiveStream ()
	opens the train.objective file for appending in the expdir
ostream &	getTrainObjectiveStream ()
	resturns the stream for writing train objective (and other costs) The stream is opened by calling openTrainObjectivestream if it wasn't already
void	openTestResultsStreams ()
	opens the files in append mode for writing the test results
ostream &	getTestResultsStream (int k)
	Returns the stream corresponding to testset k (as specified by setTestDuringTrain) The stream is opened by calling opentestResultsStreams if it wasn's already.
void	freeTestResultsStreams ()
	frees the resources used by the test_results_streams
void	outputResultLineToFile (const string &filename, const Vec &results, bool append, const string &names)
	output a test result line to a file
void	setTrainCost (Vec &cost)
Static Protected Member Functions
void	declareOptions (OptionList &ol)
	redefine this in subclasses: call declareOption(...) for each option, and then call inherited::declareOptions(options) ( see the declareOption function further down)
Protected Attributes
Vec	tmpvec
ofstream *	train_objective_stream
	file stream where to save objecties and costs during training
Array< ofstream * >	test_results_streams
	opened streams where to save test results
string	expdir
	the directory in which to save files related to this model (see setExperimentDirectory()) You may assume that it ends with a slash (setExperimentDirectory(...) ensures this).
int	epoch_
	It's used as part of the model filename saved by calling save(), which measure() does if ??? incomplete ???
bool	distributed_
	This is set to true to indicate that MPI parallelization occured at the level of this learner possibly with data distributed across several nodes (in which case PLMPI::synchronized should be false) (this is initially false).
real	earlystop_previousval
	temporary values relevant for early stopping
Array< Measurer * >	measurers
	array of measurers:
bool	measure_cpu_time_first
bool	each_cpu_saves_its_errors
Vec	train_cost
Private Member Functions
void	build_ ()
Static Private Attributes
Vec	tmp_input
Vec	tmp_target
Vec	tmp_weight
Vec	tmp_output
Vec	tmp_costs

Detailed Description

The base class for learning algorithms, which should be the main "products" of PLearn.

The main thing that a Learner can do are: void train(VMat training_set); < get trained void use(const Vec& input, Vec& output); < compute output given input Vec test(VMat test_set); < compute some performance statistics on a test set < compute outputs and costs when applying trained model on data void applyAndComputeCosts(const VMat& data, VMat outputs, VMat costs);

Definition at line 72 of file Learner.h.

Member Typedef Documentation

typedef Object PLearn::Learner::inherited

Reimplemented from PLearn::Object.
Reimplemented in PLearn::ConditionalDistribution, PLearn::ConditionalGaussianDistribution, PLearn::Distribution, PLearn::EmpiricalDistribution, PLearn::LocallyWeightedDistribution, PLearn::NeuralNet, and PLearn::GraphicalBiText.
Definition at line 131 of file Learner.h.

Constructor & Destructor Documentation

PLearn::Learner::Learner ( int the_inputsize = 0,

int the_targetsize = 0,

int the_outputsize = 0

)

**** SUBCLASS WRITING: **** All subclasses of Learner should implement this form of constructor Constructors should simply set all build options (member variables) to acceptable values and call build() that will do the actual job of constructing the object.
Definition at line 74 of file Learner.cc.
References default_vlog(), PLearn::mean_stats(), measure_cpu_time_first, minibatch_size, report_test_progress_every, setEarlyStopping(), setTestStatistics(), PLearn::stderr_stats(), test_every, and vlog.

PLearn::Learner::~Learner ( ) [virtual]

Definition at line 365 of file Learner.cc.
References freeTestResultsStreams(), and train_objective_stream.

Member Function Documentation

void PLearn::Learner::appendMeasurer ( Measurer & measurer ) [inline]

Declare a new measurer whose measure method will be called when the measure method of this learner is called (in particular after each training epoch).
Definition at line 552 of file Learner.h.
References PLearn::TVec< Measurer * >::append(), and measurers.

void PLearn::Learner::apply ( const VMat & data,

VMat outputs

) [virtual]

Calls the 'use' method many times on the first inputsize() elements of each row of a 'data' VMat, and put the machine's 'outputs' in a writable VMat (e.g. maybe a file, or a matrix). Note: if one wants to compute costs as well, then the method applyAndComputeCosts should be called instead.
Definition at line 522 of file Learner.cc.
References inputsize(), PLearn::VMat::length(), outputsize(), PLearn::TVec< T >::subVec(), PLearn::use(), and PLearn::VMat::width().

void PLearn::Learner::applyAndComputeCosts ( const VMat & data,

VMat outputs,

VMat costs

) [virtual]

This method calls useAndCost repetitively on all the rows of data, putting all the resulting output and cost vectors in the outputs and costs VMat's.
Definition at line 599 of file Learner.cc.
References costsize(), k, PLearn::VMat::length(), minibatch_size, outputsize(), PLearn::TVec< T >::subVec(), and useAndCostOnTestVec().
Referenced by applyAndComputeCostsOnTestMat().

void PLearn::Learner::applyAndComputeCostsOnTestMat ( const VMat & test_set,

int i,

const Mat & output_block,

const Mat & cost_block

) [virtual]

Like useAndCostOnTestVec, but on a block (of length minibatch_size) of rows from the test set: apply learner and compute outputs and costs for the block of test_set rows starting at i.
By default calls applyAndComputeCosts.
Definition at line 807 of file Learner.cc.
References applyAndComputeCosts(), PLearn::TMat< T >::length(), and PLearn::VMat::subMatRows().
Referenced by test().

string PLearn::Learner::basename ( ) const

returns expdir+train_set->getAlias() (if train_set is indeed defined and has an alias...)

Definition at line 118 of file Learner.cc.
References c_str(), PLearn::Object::classname(), expdir, experiment_name, PLERROR, PLWARNING, and train_set.
Referenced by measure(), and stop_if_wanted().

void PLearn::Learner::build ( ) [virtual]

**** SUBCLASS WRITING: **** This method should be redefined in subclasses, to just call inherited::build() and then build_()

Reimplemented from PLearn::Object.
Reimplemented in PLearn::ConditionalGaussianDistribution, PLearn::Distribution, PLearn::LocallyWeightedDistribution, PLearn::NeuralNet, and PLearn::GraphicalBiText.
Definition at line 236 of file Learner.cc.
References build_().

void PLearn::Learner::build_ ( ) [private]

**** SUBCLASS WRITING: **** The build_ and build methods should be redefined in subclasses build_ should do the actual building of the Learner according to build options (member variables) previously set. (These may have been set by hand, by a constructor, by the load method, or by setOption) As build() may be called several times (after changing options, to "rebuild" an object with different build options), make sure your implementation can handle this properly.
Reimplemented from PLearn::Object.
Reimplemented in PLearn::Distribution, PLearn::LocallyWeightedDistribution, PLearn::NeuralNet, and PLearn::GraphicalBiText.
Definition at line 229 of file Learner.cc.
References earlystop_minval, and earlystop_previousval.
Referenced by build().

void PLearn::Learner::computeCost ( const Vec & input,

const Vec & target,

const Vec & output,

const Vec & cost

) [virtual]

computes the cost vec, given input, target and output The default version applies the declared CostFunc's on the (output,target) pair, putting the cost computed for each CostFunc in an element of the cost vector.
If you overload this method in subclasses (e.g. to compute a cost that depends on the internal elements of the model), you must also redefine costsize() and costNames() accordingly.
Reimplemented in PLearn::NeuralNet.
Definition at line 280 of file Learner.cc.
References k, PLearn::TVec< CostFunc >::size(), and test_costfuncs.
Referenced by computeCostsFromOutputs(), and useAndCost().

void PLearn::Learner::computeCosts ( const VMat & data,

VMat costs

) [virtual]

This method calls useAndCost repetitively on all the rows of data, throwing away the resulting output vectors but putting all the cost vectors in the costs VMat.
Definition at line 539 of file Learner.cc.
References costsize(), PLearn::endl(), PLearn::VMat::length(), minibatch_size, outputsize(), and useAndCostOnTestVec().

void PLearn::Learner::computeCosts ( const VVec & input,

VVec & target,

VVec & weight,

Vec & costs

) [virtual]

Default calls computeOutputAndCosts This may be overloaded if there is a more efficient way to compute the costs directly, without computing the whole output vector.

Definition at line 998 of file Learner.cc.
References computeOutputAndCosts(), outputsize(), PLearn::TVec< T >::resize(), and tmp_output.

void PLearn::Learner::computeCostsFromOutputs ( const VVec & input,

const Vec & output,

const VVec & target,

const VVec & weight,

Vec & costs

) [virtual]

*** SUBCLASS WRITING: *** This should be overloaded in subclasses to compute the weighted costs from already computed output.

Definition at line 966 of file Learner.cc.
References computeCost(), PLearn::TVec< T >::length(), PLearn::VVec::length(), PLERROR, PLearn::TVec< T >::resize(), tmp_input, tmp_target, and tmp_weight.
Referenced by computeOutputAndCosts().

void PLearn::Learner::computeLeaveOneOutCosts ( const VMat & data,

VMat costsmat,

CostFunc costf

) [virtual]

Same as above, except a single cost passed as argument is computed, rather than all the Learner's costs setTestCostFunctions (and its possible additional internal cost).
Definition at line 574 of file Learner.cc.
References PLearn::CostFunc, PLearn::flush(), inputsize(), PLearn::VMat::length(), outputsize(), PLERROR, PLearn::removeRow(), PLearn::TVec< T >::subVec(), targetsize(), train(), PLearn::use(), vlog, and PLearn::VMat::width().

void PLearn::Learner::computeLeaveOneOutCosts ( const VMat & data,

VMat costs

) [virtual]

For each data point i, trains with dataset removeRow(data,i) and calls useAndCost on point i, puts results in costs vmat.

Definition at line 553 of file Learner.cc.
References costsize(), PLearn::flush(), PLearn::VMat::length(), outputsize(), PLearn::removeRow(), train(), useAndCostOnTestVec(), and vlog.

void PLearn::Learner::computeOutput ( const VVec & input,

Vec & output

) [virtual]

*** SUBCLASS WRITING: *** This should be overloaded in subclasses to compute the output from the input

Definition at line 956 of file Learner.cc.
References PLearn::VVec::length(), PLearn::TVec< T >::resize(), tmp_input, and PLearn::use().
Referenced by computeOutputAndCosts().

void PLearn::Learner::computeOutputAndCosts ( const VVec & input,

VVec & target,

const VVec & weight,

Vec & output,

Vec & costs

) [virtual]

Default calls computeOutput and computeCostsFromOutputs You may overload this if you have a more efficient way to compute both output and weighted costs at the same time.

Definition at line 991 of file Learner.cc.
References computeCostsFromOutputs(), and computeOutput().
Referenced by computeCosts().

Vec PLearn::Learner::computeTestStatistics ( const VMat & costs )

Given a VMat of costs as computed for example with computeCosts or with applyAndComputeCosts, compute and the test statistics over those costs. This is the concatenation of the statistics computed for each of the columns (cost functions) of costs.
Definition at line 620 of file Learner.cc.
References PLearn::StatsItArray::computeStats(), PLearn::concat(), and test_statistics.

Array< string > PLearn::Learner::costNames ( ) const [virtual]

returns an Array of strings for the names of the components of the cost. Default version returns the info() strings of the cost functions in test_costfuncs
Reimplemented in PLearn::NeuralNet.
Definition at line 821 of file Learner.cc.
References PLearn::Object::info(), PLearn::TVec< T >::size(), PLearn::TVec< CostFunc >::size(), PLearn::space_to_underscore(), and test_costfuncs.
Referenced by testResultsNames().

int PLearn::Learner::costsize ( ) const [virtual]

**** SUBCLASS WRITING: should be re-defined if user re-defines computeCost default version returns

Reimplemented in PLearn::NeuralNet.
Definition at line 818 of file Learner.cc.
References PLearn::TVec< CostFunc >::size(), and test_costfuncs.
Referenced by applyAndComputeCosts(), computeCosts(), computeLeaveOneOutCosts(), and test().

void PLearn::Learner::declareOptions ( OptionList & ol ) [static, protected]

redefine this in subclasses: call declareOption(...) for each option, and then call inherited::declareOptions(options) ( see the declareOption function further down)
ex: static void declareOptions(OptionList& ol) { declareOption(ol, "inputsize", &MyObject::inputsize_, OptionBase::buildoption, "the size of the input\n it must be provided"); declareOption(ol, "weights", &MyObject::weights, OptionBase::learntoption, "the learnt model weights"); inherited::declareOptions(ol); }
Reimplemented from PLearn::Object.
Reimplemented in PLearn::ConditionalGaussianDistribution, PLearn::Distribution, PLearn::EmpiricalDistribution, PLearn::LocallyWeightedDistribution, PLearn::NeuralNet, and PLearn::GraphicalBiText.
Definition at line 143 of file Learner.cc.
References PLearn::declareOption(), and PLearn::OptionList.

PStream & PLearn::Learner::default_vlog ( ) [static]

The default stream to which lout is set upon construction of all Learners (defaults to cout).

Definition at line 64 of file Learner.cc.
References PLearn::PStream::outmode.
Referenced by Learner().

int PLearn::Learner::epoch ( ) const [inline]

Definition at line 406 of file Learner.h.
References epoch_.
Referenced by measure().

void PLearn::Learner::forget ( ) [virtual]

*** SUBCLASS WRITING: *** This method should be called AFTER or inside the build method, e.g. in order to re-initialize parameters. It should put the Learner in a 'fresh' state, not being influenced by any past call to train (everything learned is forgotten!).
Reimplemented in PLearn::NeuralNet.
Definition at line 242 of file Learner.cc.
References earlystop_minval, earlystop_previousval, and epoch_.

void PLearn::Learner::freeTestResultsStreams ( ) [protected]

frees the resources used by the test_results_streams

Definition at line 345 of file Learner.cc.
References k, PLearn::TVec< ofstream * >::resize(), PLearn::TVec< ofstream * >::size(), and test_results_streams.
Referenced by openTestResultsStreams(), and ~Learner().

string PLearn::Learner::getExperimentDirectory ( ) const [inline]

Definition at line 229 of file Learner.h.
References expdir.

const Array<VMat>& PLearn::Learner::getTestDuringTrain ( ) const [inline]

return the test sets that are used during training

Definition at line 433 of file Learner.h.
References test_sets.

ostream & PLearn::Learner::getTestResultsStream ( int k ) [protected]

Returns the stream corresponding to testset k (as specified by setTestDuringTrain) The stream is opened by calling opentestResultsStreams if it wasn's already.

Definition at line 354 of file Learner.cc.
References k, openTestResultsStreams(), PLearn::TVec< ofstream * >::size(), and test_results_streams.

Vec PLearn::Learner::getTrainCost ( ) [inline]

Definition at line 562 of file Learner.h.
References train_cost.

VMat PLearn::Learner::getTrainingSet ( ) [inline]

Definition at line 255 of file Learner.h.
References train_set.

ostream & PLearn::Learner::getTrainObjectiveStream ( ) [protected]

resturns the stream for writing train objective (and other costs) The stream is opened by calling openTrainObjectivestream if it wasn't already

Definition at line 312 of file Learner.cc.
References openTrainObjectiveStream(), and train_objective_stream.

int PLearn::Learner::inputsize ( ) const [inline]

Simple accessor methods: (do NOT overload! Set inputsize_ and outputsize_ instead).

Definition at line 402 of file Learner.h.
References inputsize_.
Referenced by apply(), PLearn::NeuralNet::build_(), PLearn::compute2dGridOutputs(), computeLeaveOneOutCosts(), PLearn::displayDecisionSurface(), PLearn::NeuralNet::initializeParams(), PLearn::LocallyWeightedDistribution::log_density(), PLearn::LocallyWeightedDistribution::train(), PLearn::Distribution::train(), and useAndCostOnTestVec().

void PLearn::Learner::load ( const string & filename = "" ) [virtual]

DEPRECATED. Call PLearn::load(filename, object) instead.

Reimplemented from PLearn::Object.
Definition at line 912 of file Learner.cc.
References experiment_name, and PLERROR.

void PLearn::Learner::makeDeepCopyFromShallowCopy ( CopiesMap & copies ) [virtual]

Does the necessary operations to transform a shallow copy (this) into a deep copy by deep-copying all the members that need to be. Typical implementation:
void CLASS_OF_THIS::makeDeepCopyFromShallowCopy(CopiesMap& copies) { SUPERCLASS_OF_THIS::makeDeepCopyFromShallowCopy(copies); member_ptr = member_ptr->deepCopy(copies); member_smartptr = member_smartptr->deepCopy(copies); member_mat.makeDeepCopyFromShallowCopy(copies); member_vec.makeDeepCopyFromShallowCopy(copies); ... }
Reimplemented from PLearn::Object.
Reimplemented in PLearn::EmpiricalDistribution, and PLearn::NeuralNet.
Definition at line 89 of file Learner.cc.
References avg_objective, avgsq_objective, PLearn::CopiesMap, PLearn::deepCopyField(), test_costfuncs, and test_statistics.

bool PLearn::Learner::measure ( int step,

const Vec & costs

) [virtual]

**** SUBCLASS WRITING: This method should be called by iterative training algorithm's train method after each training step (meaning of training step is learner-dependent) passing it the current step number and the costs relevant for the training process. Training must be stopped if the returned value is true: it indicates early-stopping criterion has been met. Default version writes step and costs to objectiveout stream at each step Default version also performs the tests specified by setTestDuringTrain every 'test_every' steps and decides upon early-stopping as specified by setEarlyStopping. Default version also calls the measure method of all measurers that have been declared for addition with appendMeasurer
This is the measure method from Measurer. You may override this method if you wish to measure other things during the training. In this case your method will probably want to call this default version (Learner::measure) as part of it.
Reimplemented from PLearn::Measurer.
Definition at line 398 of file Learner.cc.
References PLearn::abs(), basename(), best_step, each_cpu_saves_its_errors, earlystop_max_degradation, earlystop_max_degraded_steps, earlystop_min_improvement, earlystop_min_value, earlystop_minval, earlystop_previousval, earlystop_relative_changes, earlystop_save_best, earlystop_testresultindex, earlystop_testsetnum, PLearn::endl(), epoch(), epoch_, expdir, fname, PLearn::join(), PLearn::TVec< T >::length(), PLearn::load(), measurers, minibatch_size, outputResultLineToFile(), PLERROR, PLearn::save(), save_at_every_epoch, save_objective, PLearn::TVec< Measurer * >::size(), PLearn::TVec< VMat >::size(), test(), test_every, test_sets, testResultsNames(), PLearn::tostring(), trainObjectiveNames(), and vlog.

void PLearn::Learner::newtest ( VMat testset,

VecStatsCollector & test_stats,

VMat testoutputs = 0,

VMat testcosts = 0

) [virtual]

Should perform test on testset, updating test cost statistics, and optionally filling testoutputs and testcosts.

Definition at line 1010 of file Learner.cc.
References PLERROR.

void PLearn::Learner::newtrain ( VecStatsCollector & train_stats ) [virtual]

*** SUBCLASS WRITING: *** Should do the actual training until epoch==nepochs and should call update on the stats with training costs measured on-line
Definition at line 1006 of file Learner.cc.
References PLERROR.

void PLearn::Learner::oldread ( istream & in ) [virtual]

DEPRECATED For backward compatibility with old saved object.

Reimplemented from PLearn::Object.
Definition at line 868 of file Learner.cc.
References earlystop_max_degradation, earlystop_max_degraded_steps, earlystop_min_improvement, earlystop_min_value, earlystop_relative_changes, earlystop_save_best, earlystop_testresultindex, earlystop_testsetnum, epoch_, expdir, experiment_name, inputsize_, outputsize_, PLearn::readField(), PLearn::readFooter(), PLearn::readHeader(), save_at_every_epoch, targetsize_, test_costfuncs, test_every, and test_statistics.

void PLearn::Learner::oldwrite ( ostream & out ) const [virtual]

*** SUBCLASS WRITING: *** This matched pair of Object functions needs to be redefined by sub-classes. They are used for saving/loading a model to memory or to file. However, subclasses can call this one to deal with the saving/loading of the following data fields: the current options and the early stopping parameters.
Definition at line 846 of file Learner.cc.
References earlystop_max_degradation, earlystop_max_degraded_steps, earlystop_min_improvement, earlystop_min_value, earlystop_relative_changes, earlystop_save_best, earlystop_testresultindex, earlystop_testsetnum, experiment_name, inputsize_, outputsize_, save_at_every_epoch, targetsize_, test_costfuncs, test_every, test_statistics, PLearn::writeField(), PLearn::writeFooter(), and PLearn::writeHeader().

void PLearn::Learner::openTestResultsStreams ( ) [protected]

opens the files in append mode for writing the test results

Definition at line 320 of file Learner.cc.
References PLearn::endl(), expdir, freeTestResultsStreams(), PLearn::join(), k, PLERROR, PLearn::TVec< ofstream * >::resize(), PLearn::TVec< VMat >::size(), test_results_streams, test_sets, and testResultsNames().
Referenced by getTestResultsStream().

void PLearn::Learner::openTrainObjectiveStream ( ) [protected]

opens the train.objective file for appending in the expdir

Definition at line 294 of file Learner.cc.
References PLearn::endl(), expdir, PLearn::join(), PLERROR, train_objective_stream, and trainObjectiveNames().
Referenced by getTrainObjectiveStream().

void PLearn::Learner::outputResultLineToFile ( const string & filename,

const Vec & results,

bool append,

const string & names

) [protected]

output a test result line to a file

Definition at line 101 of file Learner.cc.
References PLearn::endl(), epoch_, and fname.
Referenced by measure().

int PLearn::Learner::outputsize ( ) const [inline]

Definition at line 404 of file Learner.h.
References outputsize_.
Referenced by apply(), applyAndComputeCosts(), PLearn::NeuralNet::build_(), PLearn::compute2dGridOutputs(), computeCosts(), computeLeaveOneOutCosts(), PLearn::displayDecisionSurface(), and test().

PLearn::Learner::PLEARN_DECLARE_ABSTRACT_OBJECT ( Learner )

Does the necessary operations to transform a shallow copy (this) into a deep copy by deep-copying all the members that need to be.

void PLearn::Learner::save ( const string & filename = "" ) const [virtual]

DEPRECATED. Call PLearn::save(filename, object) instead.

Reimplemented from PLearn::Object.
Definition at line 898 of file Learner.cc.
References experiment_name, force_saving_on_all_processes, and PLERROR.

void PLearn::Learner::setEarlyStopping ( int which_testset,

int which_testresult,

real max_degradation,

real min_value = -FLT_MAX,

real min_improvement = 0,

bool relative_changes = true,

bool save_best = true,

int max_degraded_steps = -1

)

which_testset and which_testresult select the appropriate testset and costfunction to base early-stopping on from those that were specified in setTestDuringTrain degradation is the difference between the current value and the smallest value ever attained, training will be stopped if it grows beyond max_degradation training will be stopped if current value goes below min_value training will be stopped if difference between previous value and current value is below min_improvement if (relative_changes) is true then max_degradation is relative to the smallest value ever attained, and min_improvement is relative to the previous value. if (save_best) then save the lowest validation error model (with the write method, to memory), and if early stopping occurs reload this saved model (with the read method).
Definition at line 381 of file Learner.cc.
References earlystop_max_degradation, earlystop_max_degraded_steps, earlystop_min_improvement, earlystop_min_value, earlystop_minval, earlystop_previousval, earlystop_relative_changes, earlystop_save_best, earlystop_testresultindex, and earlystop_testsetnum.
Referenced by Learner().

void PLearn::Learner::setExperimentDirectory ( const string & the_expdir ) [virtual]

The experiment directory is the directory in which files related to this model are to be saved.
Typically, the following files will be saved in that directory: model.psave (saved best model) model#.psave (model saved after epoch #) model#.<trainset_alias>.objective (training objective and costs after each epoch) model#.<testset_alias>.results (test results after each epoch)
Definition at line 215 of file Learner.cc.
References PLearn::abspath(), expdir, PLearn::force_mkdir(), and PLERROR.

void PLearn::Learner::setModel ( const Vec & new_options ) [virtual]

** DEPRECATED ** Do not use! use the setOption and build methods instead
Definition at line 814 of file Learner.cc.
References PLERROR.

void PLearn::Learner::setTestCostFunctions ( Array< CostFunc > costfunctions ) [inline]

Call this method to define what cost functions are computed by default (these are generic cost functions which compare the output with the target).

Definition at line 415 of file Learner.h.
References test_costfuncs.

void PLearn::Learner::setTestDuringTrain ( Array< VMat > testsets ) [virtual]

Definition at line 362 of file Learner.cc.
References test_sets.

void PLearn::Learner::setTestDuringTrain ( ostream & testout,

int every,

Array< VMat > testsets

) [virtual]

testout: the stream where the test results are to be written every: how often (number of iterations) the tests should be performed

Definition at line 287 of file Learner.cc.
References test_every, test_sets, and testout.

void PLearn::Learner::setTestStatistics ( StatsItArray statistics ) [inline]

This method defines what statistics are computed on the costs (which compute a vector of statistics that depend on all the test costs).

Definition at line 420 of file Learner.h.
References setTestStatistics(), and test_statistics.
Referenced by Learner(), and setTestStatistics().

void PLearn::Learner::setTrainCost ( Vec & cost ) [inline, protected]

Definition at line 558 of file Learner.h.
References PLearn::TVec< T >::length(), PLearn::TVec< T >::resize(), setTrainCost(), and train_cost.
Referenced by setTrainCost().

virtual void PLearn::Learner::setTrainingSet ( VMat training_set ) [inline, virtual]

Declare the train_set.

Definition at line 254 of file Learner.h.
References setTrainingSet(), and train_set.
Referenced by setTrainingSet().

void PLearn::Learner::stop_if_wanted ( ) [virtual]

stopping condition, by default when a file named experiment_name + "_stop" is found to exist.
If that is the case then this file is removed and exit(0) is performed.
Definition at line 922 of file Learner.cc.
References basename(), PLearn::endl(), PLearn::file_exists(), fname, PLearn::save(), PLearn::tostring(), and vlog.
Referenced by test().

int PLearn::Learner::targetsize ( ) const [inline]

Definition at line 403 of file Learner.h.
References targetsize_.
Referenced by PLearn::NeuralNet::build_(), computeLeaveOneOutCosts(), PLearn::Distribution::train(), and useAndCostOnTestVec().

Vec PLearn::Learner::test ( VMat test_set,

const string & save_test_outputs = "",

const string & save_test_costs = ""

) [virtual]

This function should work with and without MPI.
Return statistics computed by test_statistics on the test_costfuncs. If (save_test_outputs) then the test outputs are saved in the given file, and similary if (save_test_costs).
Definition at line 635 of file Learner.cc.
References PLearn::TmpFilenames::addFilename(), applyAndComputeCostsOnTestMat(), PLearn::binread(), PLearn::binwrite(), PLearn::StatsItArray::computeStats(), PLearn::concat(), costsize(), PLearn::TVec< T >::data(), dont_parallelize, PLearn::StatsItArray::finish(), PLearn::StatsItArray::getResults(), PLearn::StatsItArray::init(), PLearn::TVec< T >::length(), PLearn::VMat::length(), PLearn::Mat, minibatch_size, outputsize(), PLearn::StatsItArray::requiresMultiplePasses(), PLearn::TVec< T >::resize(), PLearn::TMat< T >::resize(), stop_if_wanted(), test_statistics, PLearn::StatsItArray::update(), use_file_if_bigger, useAndCostOnTestVec(), and vlog.
Referenced by measure().

Array< string > PLearn::Learner::testResultsNames ( ) const [virtual]

returns an Array of strings for the names of the cost statistics returned by methods test and computeTestStatistics. Default version returns a cross product between the info() strings of test_statistics and the cost names returned by costNames()
Definition at line 829 of file Learner.cc.
References costNames(), k, PLearn::TVec< T >::size(), PLearn::TVec< StatsIt >::size(), PLearn::space_to_underscore(), and test_statistics.
Referenced by measure(), openTestResultsStreams(), PLearn::prettyprint_test_results(), and trainObjectiveNames().

virtual void PLearn::Learner::train ( VMat training_set,

VMat accept_prob,

real max_accept_prob = 1.0,

VMat weights = VMat()

) [inline, virtual]

*** SUBCLASS WRITING: *** Does the actual training. Permit to train from a sampling of a training set.
Definition at line 292 of file Learner.h.
References PLERROR.

virtual void PLearn::Learner::train ( VMat training_set ) [pure virtual]

*** SUBCLASS WRITING: *** Does the actual training. Subclasses must implement this method. The method should upon entry, call setTrainingSet(training_set); Make sure that a if(measure(step, objective_value)) is done after each training step, and that training is stopped if it returned true
Implemented in PLearn::ConditionalGaussianDistribution, PLearn::Distribution, PLearn::EmpiricalDistribution, PLearn::LocallyWeightedDistribution, PLearn::NeuralNet, and PLearn::GraphicalBiText.
Referenced by computeLeaveOneOutCosts().

Array< string > PLearn::Learner::trainObjectiveNames ( ) const [virtual]

returns an array of strings corresponding to the names of the fields that will be written to objectiveout (by default this calls testResultsNames() )

Definition at line 843 of file Learner.cc.
References testResultsNames().
Referenced by measure(), and openTrainObjectiveStream().

virtual void PLearn::Learner::use ( const Mat & inputs,

Mat outputs

) [inline, virtual]

Definition at line 302 of file Learner.h.
References PLearn::TMat< T >::length(), and PLearn::use().

virtual void PLearn::Learner::use ( const Vec & input,

Vec & output

) [pure virtual]

*** SUBCLASS WRITING: *** Uses a trained decider on input, filling output. If the cost should also be computed, then the user should call useAndCost instead of this method.
Implemented in PLearn::ConditionalDistribution, PLearn::Distribution, PLearn::NeuralNet, and PLearn::GraphicalBiText.
Referenced by PLearn::compute2dGridOutputs().

void PLearn::Learner::useAndCost ( const Vec & input,

const Vec & target,

Vec output,

Vec cost

) [virtual]

By default this function calls use(input, output) and then computeCost(input, target, output, cost) So you can overload computeCost to change cost computation.

Reimplemented in PLearn::NeuralNet.
Definition at line 274 of file Learner.cc.
References computeCost(), and PLearn::use().
Referenced by useAndCostOnTestVec().

void PLearn::Learner::useAndCostOnTestVec ( const VMat & test_set,

int i,

const Vec & output,

const Vec & cost

) [virtual]

Default version calls useAndCost on test_set(i) so you don't need to overload this method unless you want to provide a more efficient implementation (for ex.
if you have precomputed things for the test_set that you can use).
Definition at line 250 of file Learner.cc.
References inputsize(), k, minibatch_size, PLearn::TVec< T >::resize(), PLearn::TVec< T >::subVec(), targetsize(), tmpvec, useAndCost(), and PLearn::VMat::width().
Referenced by applyAndComputeCosts(), computeCosts(), computeLeaveOneOutCosts(), and test().

int PLearn::Learner::weightsize ( ) const [inline]

Definition at line 405 of file Learner.h.
References weightsize_.
Referenced by PLearn::NeuralNet::build_(), PLearn::LocallyWeightedDistribution::build_(), PLearn::LocallyWeightedDistribution::log_density(), and PLearn::LocallyWeightedDistribution::train().

Member Data Documentation

Vec PLearn::Learner::avg_objective

average of the objective function(s) over the last test_every steps

Definition at line 146 of file Learner.h.
Referenced by makeDeepCopyFromShallowCopy().

Vec PLearn::Learner::avgsq_objective

average of the squared objective function(s) over the last test_every steps

Definition at line 147 of file Learner.h.
Referenced by makeDeepCopyFromShallowCopy().

int PLearn::Learner::best_step

the step (usually epoch) at which validation cost was best

Definition at line 174 of file Learner.h.
Referenced by measure().

bool PLearn::Learner::distributed_ [protected]

This is set to true to indicate that MPI parallelization occured at the level of this learner possibly with data distributed across several nodes (in which case PLMPI::synchronized should be false) (this is initially false).

Definition at line 123 of file Learner.h.

bool PLearn::Learner::dont_parallelize

By default, MPI parallelization done at given level prevents further parallelization at lower levels.
If true, this means "don't parallelize processing at this level"
Definition at line 140 of file Learner.h.
Referenced by test().

bool PLearn::Learner::each_cpu_saves_its_errors [protected]

Definition at line 193 of file Learner.h.
Referenced by measure().

real PLearn::Learner::earlystop_max_degradation

maximum degradation in error from last best value

Definition at line 165 of file Learner.h.
Referenced by measure(), oldread(), oldwrite(), and setEarlyStopping().

int PLearn::Learner::earlystop_max_degraded_steps

max. nb of steps beyond best found [in version >= 1]

Definition at line 170 of file Learner.h.
Referenced by measure(), oldread(), oldwrite(), and setEarlyStopping().

real PLearn::Learner::earlystop_min_improvement

minimum improvement in error otherwise we stop

Definition at line 167 of file Learner.h.
Referenced by measure(), oldread(), oldwrite(), and setEarlyStopping().

real PLearn::Learner::earlystop_min_value

minimum error beyond which we stop

Definition at line 166 of file Learner.h.
Referenced by measure(), oldread(), oldwrite(), and setEarlyStopping().

real PLearn::Learner::earlystop_minval

Definition at line 180 of file Learner.h.
Referenced by build_(), forget(), measure(), and setEarlyStopping().

real PLearn::Learner::earlystop_previousval [protected]

temporary values relevant for early stopping

Definition at line 178 of file Learner.h.
Referenced by build_(), forget(), measure(), and setEarlyStopping().

bool PLearn::Learner::earlystop_relative_changes

are max_degradation and min_improvement relative?

Definition at line 168 of file Learner.h.
Referenced by measure(), oldread(), oldwrite(), and setEarlyStopping().

bool PLearn::Learner::earlystop_save_best

if yes, then return with saved "best" model

Definition at line 169 of file Learner.h.
Referenced by measure(), oldread(), oldwrite(), and setEarlyStopping().

int PLearn::Learner::earlystop_testresultindex

index of statistic (as returned by test) to use

Definition at line 164 of file Learner.h.
Referenced by measure(), oldread(), oldwrite(), and setEarlyStopping().

int PLearn::Learner::earlystop_testsetnum

index of test set (in test_sets) to use for early stopping

Definition at line 163 of file Learner.h.
Referenced by measure(), oldread(), oldwrite(), and setEarlyStopping().

int PLearn::Learner::epoch_ [protected]

It's used as part of the model filename saved by calling save(), which measure() does if ??? incomplete ???

Definition at line 118 of file Learner.h.
Referenced by epoch(), forget(), measure(), oldread(), and outputResultLineToFile().

string PLearn::Learner::expdir [protected]

the directory in which to save files related to this model (see setExperimentDirectory()) You may assume that it ends with a slash (setExperimentDirectory(...) ensures this).

Definition at line 115 of file Learner.h.
Referenced by basename(), getExperimentDirectory(), measure(), oldread(), openTestResultsStreams(), openTrainObjectiveStream(), and setExperimentDirectory().

string PLearn::Learner::experiment_name

Definition at line 183 of file Learner.h.
Referenced by basename(), load(), oldread(), oldwrite(), and save().

bool PLearn::Learner::force_saving_on_all_processes = false [static]

otherwise in MPI only CPU0 actually saves

Definition at line 72 of file Learner.cc.
Referenced by save().

int PLearn::Learner::inputsize_

The data VMat's are assumed to be formed of inputsize().

Definition at line 133 of file Learner.h.
Referenced by inputsize(), oldread(), and oldwrite().

bool PLearn::Learner::measure_cpu_time_first [protected]

Definition at line 191 of file Learner.h.
Referenced by Learner().

Array<Measurer*> PLearn::Learner::measurers [protected]

array of measurers:

Definition at line 189 of file Learner.h.
Referenced by appendMeasurer(), and measure().

int PLearn::Learner::minibatch_size

test by blocks of this size using apply rather than use

Definition at line 150 of file Learner.h.
Referenced by applyAndComputeCosts(), computeCosts(), Learner(), measure(), test(), and useAndCostOnTestVec().

PStream PLearn::Learner::objectiveout

The log stream to use to record the objective function during training.

Definition at line 208 of file Learner.h.

Vec PLearn::Learner::options

DEPRECATED options in the construction of the model through setModel.

Definition at line 160 of file Learner.h.

int PLearn::Learner::outputsize_

the use() method produces an output vector of size outputsize().

Definition at line 135 of file Learner.h.
Referenced by oldread(), oldwrite(), and outputsize().

int PLearn::Learner::report_test_progress_every

report test progress in vlog (see below) every that many iterations For each nth test sample, this will print a "Test sample #n" line in vlog (where n is the value in report_test_progress_every)
Definition at line 156 of file Learner.h.
Referenced by Learner().

bool PLearn::Learner::save_at_every_epoch

save learner at each epoch?

Definition at line 172 of file Learner.h.
Referenced by measure(), oldread(), and oldwrite().

bool PLearn::Learner::save_objective

Definition at line 173 of file Learner.h.
Referenced by measure().

int PLearn::Learner::targetsize_

columns followed by targetsize() columns.

Definition at line 134 of file Learner.h.
Referenced by oldread(), oldwrite(), and targetsize().

Array<CostFunc> PLearn::Learner::test_costfuncs

Definition at line 195 of file Learner.h.
Referenced by computeCost(), costNames(), costsize(), makeDeepCopyFromShallowCopy(), oldread(), oldwrite(), and setTestCostFunctions().

int PLearn::Learner::test_every

Definition at line 145 of file Learner.h.
Referenced by Learner(), measure(), oldread(), oldwrite(), and setTestDuringTrain().

Array<ofstream*> PLearn::Learner::test_results_streams [protected]

opened streams where to save test results

Definition at line 80 of file Learner.h.
Referenced by freeTestResultsStreams(), getTestResultsStream(), and openTestResultsStreams().

Array<VMat> PLearn::Learner::test_sets

test sets to test on during train

Definition at line 149 of file Learner.h.
Referenced by getTestDuringTrain(), measure(), openTestResultsStreams(), and setTestDuringTrain().

StatsItArray PLearn::Learner::test_statistics

Definition at line 196 of file Learner.h.
Referenced by computeTestStatistics(), makeDeepCopyFromShallowCopy(), oldread(), oldwrite(), setTestStatistics(), test(), and testResultsNames().

PStream PLearn::Learner::testout

test during train specifications

Definition at line 144 of file Learner.h.
Referenced by setTestDuringTrain().

Vec PLearn::Learner::tmp_costs [static, private]

Definition at line 62 of file Learner.cc.

Vec PLearn::Learner::tmp_input [static, private]

Definition at line 58 of file Learner.cc.
Referenced by computeCostsFromOutputs(), and computeOutput().

Vec PLearn::Learner::tmp_output [static, private]

Definition at line 61 of file Learner.cc.
Referenced by computeCosts().

Vec PLearn::Learner::tmp_target [static, private]

Definition at line 59 of file Learner.cc.
Referenced by computeCostsFromOutputs().

Vec PLearn::Learner::tmp_weight [static, private]

Definition at line 60 of file Learner.cc.
Referenced by computeCostsFromOutputs().

Vec PLearn::Learner::tmpvec [protected]

Definition at line 76 of file Learner.h.
Referenced by useAndCostOnTestVec().

Vec PLearn::Learner::train_cost [protected]

Definition at line 560 of file Learner.h.
Referenced by getTrainCost(), and setTrainCost().

ofstream* PLearn::Learner::train_objective_stream [protected]

file stream where to save objecties and costs during training

Definition at line 79 of file Learner.h.
Referenced by getTrainObjectiveStream(), openTrainObjectiveStream(), and ~Learner().

VMat PLearn::Learner::train_set

the current set being used for training

Definition at line 148 of file Learner.h.
Referenced by basename(), getTrainingSet(), and setTrainingSet().

int PLearn::Learner::use_file_if_bigger = 64000000L [static]

number of elements above which a file VMatrix rather

Definition at line 71 of file Learner.cc.
Referenced by test().

Vec PLearn::Learner::vec_input

**Next generation** learners allow inputs to be anything, not just Vec

Definition at line 313 of file Learner.h.

PStream PLearn::Learner::vlog

The log stream to which all the verbose output from this learner should be sent.

Definition at line 207 of file Learner.h.
Referenced by computeLeaveOneOutCosts(), Learner(), measure(), stop_if_wanted(), and test().

int PLearn::Learner::weightsize_

Definition at line 136 of file Learner.h.
Referenced by weightsize().

The documentation for this class was generated from the following files:

Generated on Tue Aug 17 16:27:24 2004 for PLearn by

1.3.7

PLearn::Learner Class Reference

Public Types

Public Member Functions

Static Public Member Functions

Public Attributes

Static Public Attributes

Protected Member Functions

Static Protected Member Functions

Protected Attributes

Private Member Functions

Static Private Attributes

Detailed Description

Member Typedef Documentation

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation