Main Page | Namespace List | Class Hierarchy | Alphabetical List | Class List | File List | Namespace Members | Class Members | File Members

PLearn::StatsCollector Class Reference

#include <StatsCollector.h>

Inheritance diagram for PLearn::StatsCollector:

Inheritance graph
[legend]
Collaboration diagram for PLearn::StatsCollector:

Collaboration graph
[legend]
List of all members.

Public Types

typedef Object inherited
typedef Object inherited

Public Member Functions

 PLEARN_DECLARE_OBJECT (StatsCollector)
 StatsCollector (int the_maxnvalues=0)
real n () const
 number of samples seen with update (length of VMat for ex.)

real nmissing () const
real nnonmissing () const
real sum () const
real sumsquare () const
real min () const
real max () const
real mean () const
real variance () const
real stddev () const
real stderror () const
real first_obs () const
real last_obs () const
real sharperatio () const
real getStat (const string &statname) const
 Returns the index in the vector returned by getAllStats of the stat with the given name.

virtual void build ()
 simply calls inherited::build() then build_()

void forget ()
 clears all statistics, allowing to restart collecting them

void update (real val, real weight=1.0)
 update statistics with next value val of sequence

void finalize ()
 finishes whatever computation are needed after all updates have been made

map< real, StatsCollectorCounts > * getCounts ()
int getMaxNValues ()
Mat cdf (bool normalized=true) const
 returns a Mat with x,y coordinates for plotting the cdf only if normalized will the cdf go to 1, otherwise it will go to nsamples

void sortIds ()
 fix 'id' attribute of all StatCollectorCounts so that increasing ids correspond to increasing real values NOT TESTED YET (Julien)

RealMapping getBinMapping (double discrete_mincount, double continuous_mincount, real tolerance=.1, TVec< double > *fcount=0) const
 returns a mapping that maps values to a bin number (from 0 to mapping.length()-1) The mapping will leave missing values as MISSING_VALUE And values outside the [min, max] range will be mapped to -1 Tolerance is used to test wheter we join the two last bins or not.

RealMapping getAllValuesMapping (TVec< double > *fcount=0) const
RealMapping getAllValuesMapping (TVec< bool > *to_be_included, TVec< double > *fcount=0, bool ignore_other=false, real tolerance=0) const
 Same as getAllValuesMapping, except we can specify a bool vector, that indicates whether the k-th range should be included or not.

virtual void oldwrite (ostream &out) const
virtual void oldread (istream &in)
 DEPRECATED For backward compatibility with old saved object.

virtual void print (ostream &out) const

Public Attributes

int maxnvalues
 maximum number of different values to keep track of in counts (if 0, we will only keep track of global statistics)

double nmissing_
 (weighted) number of missing values

double nnonmissing_
 (weighted) number of non missing value

double sum_
 sum of all (values-first_

double sumsquare_
 sum of square of all (values-first_)

double sumweights_
 sum of the weights

real min_
 the min

real max_
 the max

real first_
 first encountered nonmissing observation

real last_
 last encountered nonmissing observation

map< real, StatsCollectorCountscounts
 will contain up to maxnvalues values and associated Counts as well as a last element which maps FLT_MAX, so that we don't miss anything (empty if maxnvalues=0)


Static Protected Member Functions

void declareOptions (OptionList &ol)
 Declares this class' options.


Private Member Functions

void build_ ()
 This does the actual building.


Detailed Description

"A StatsCollector allows to compute basic global statistics for a series of numbers,\n" "as well as statistics within automatically determined ranges.\n" "The first maxnvalues encountered values will be used as points to define\n" "the ranges, so to get reasonable results, your sequence should be iid, and NOT sorted!"

Definition at line 81 of file StatsCollector.h.


Member Typedef Documentation

typedef Object PLearn::StatsCollector::inherited
 

Reimplemented from PLearn::Object.

Definition at line 89 of file StatsCollector.h.

typedef Object PLearn::StatsCollector::inherited
 

Reimplemented from PLearn::Object.

Definition at line 84 of file StatsCollector.h.


Constructor & Destructor Documentation

PLearn::StatsCollector::StatsCollector int  the_maxnvalues = 0  ) 
 

Definition at line 73 of file StatsCollector.cc.

References build_(), and MISSING_VALUE.


Member Function Documentation

void PLearn::StatsCollector::build  )  [virtual]
 

simply calls inherited::build() then build_()

Reimplemented from PLearn::Object.

Definition at line 145 of file StatsCollector.cc.

References build_().

Referenced by PLearn::ConditionalDensityNet::train().

void PLearn::StatsCollector::build_  )  [private]
 

This does the actual building.

Reimplemented from PLearn::Object.

Definition at line 137 of file StatsCollector.cc.

References counts, and maxnvalues.

Referenced by build(), forget(), and StatsCollector().

Mat PLearn::StatsCollector::cdf bool  normalized = true  )  const
 

returns a Mat with x,y coordinates for plotting the cdf only if normalized will the cdf go to 1, otherwise it will go to nsamples

Definition at line 440 of file StatsCollector.cc.

References PLearn::TMat< T >::column(), counts, PLearn::Mat, max_, min_, nnonmissing_, and val.

Referenced by PLearn::interactiveDisplayCDF(), and PLearn::ConditionalDensityNet::train().

void PLearn::StatsCollector::declareOptions OptionList ol  )  [static, protected]
 

Declares this class' options.

Reimplemented from PLearn::Object.

Definition at line 103 of file StatsCollector.cc.

References PLearn::declareOption(), and PLearn::OptionList.

void PLearn::StatsCollector::finalize  )  [inline]
 

finishes whatever computation are needed after all updates have been made

Definition at line 167 of file StatsCollector.h.

Referenced by PLearn::RepeatSplitter::build_().

real PLearn::StatsCollector::first_obs  )  const [inline]
 

Definition at line 142 of file StatsCollector.h.

References first_.

Referenced by print().

void PLearn::StatsCollector::forget  ) 
 

clears all statistics, allowing to restart collecting them

Definition at line 151 of file StatsCollector.cc.

References build_(), counts, first_, last_, max_, min_, MISSING_VALUE, nmissing_, nnonmissing_, sum_, and sumsquare_.

Referenced by PLearn::ConditionalDensityNet::train().

RealMapping PLearn::StatsCollector::getAllValuesMapping TVec< bool > *  to_be_included,
TVec< double > *  fcount = 0,
bool  ignore_other = false,
real  tolerance = 0
const
 

Same as getAllValuesMapping, except we can specify a bool vector, that indicates whether the k-th range should be included or not.

The boolean 'ignore_other' indicates whether a value not appearing in the mapping should be mapped to itself (false), or to -1 (true). We can also give a 'tolerance': in this case, each mapping will be expanded by '-epsilon' below and '+epsilon' above, with epsilon = tolerance * mean(difference between two consecutive values). If two consecutive mappings have a non-empty intersection after the expansion, they will be merged.

Definition at line 341 of file StatsCollector.cc.

References PLearn::RealMapping::addMapping(), PLearn::TVec< T >::append(), count, counts, k, PLearn::RealMapping::keep_other_as_is, PLearn::mean(), mean(), nmissing_, nnonmissing_, PLearn::RealMapping::other_mapsto, PLERROR, PLWARNING, PLearn::TVec< T >::resize(), and update().

RealMapping PLearn::StatsCollector::getAllValuesMapping TVec< double > *  fcount = 0  )  const
 

Definition at line 336 of file StatsCollector.cc.

References getAllValuesMapping().

Referenced by getAllValuesMapping().

RealMapping PLearn::StatsCollector::getBinMapping double  discrete_mincount,
double  continuous_mincount,
real  tolerance = .1,
TVec< double > *  fcount = 0
const
 

returns a mapping that maps values to a bin number (from 0 to mapping.length()-1) The mapping will leave missing values as MISSING_VALUE And values outside the [min, max] range will be mapped to -1 Tolerance is used to test wheter we join the two last bins or not.

If last be is short of more then tolerance*100% of continuous_mincount elements, we join it with the previous bin.

Definition at line 210 of file StatsCollector.cc.

References PLearn::RealMapping::addMapping(), PLearn::TVec< T >::append(), PLearn::TVec< T >::back(), count, counts, PLearn::RealMapping::lastMapping(), max_, min_, nmissing_, nnonmissing_, PLERROR, PLWARNING, PLearn::TVec< T >::pop_back(), PLearn::RealMapping::removeMapping(), PLearn::TVec< T >::resize(), PLearn::RealMapping::setMappingForOther(), and PLearn::RealMapping::size().

map<real,StatsCollectorCounts>* PLearn::StatsCollector::getCounts  )  [inline]
 

Definition at line 169 of file StatsCollector.h.

References counts.

Referenced by PLearn::RepeatSplitter::build_().

int PLearn::StatsCollector::getMaxNValues  )  [inline]
 

Definition at line 170 of file StatsCollector.h.

References maxnvalues.

real PLearn::StatsCollector::getStat const string statname  )  const
 

Returns the index in the vector returned by getAllStats of the stat with the given name.

Currently available names are E (mean) V (variance) STDDEV MIN MAX STDERROR SHARPERATIO Will call PLERROR statname is invalid

Definition at line 559 of file StatsCollector.cc.

References PLERROR.

Referenced by PLearn::VecStatsCollector::getStat().

real PLearn::StatsCollector::last_obs  )  const [inline]
 

Definition at line 143 of file StatsCollector.h.

References last_.

Referenced by print().

real PLearn::StatsCollector::max  )  const [inline]
 

Definition at line 136 of file StatsCollector.h.

References max_.

Referenced by print(), and PLearn::VMatrix::printFieldInfo().

real PLearn::StatsCollector::mean  )  const [inline]
 

Definition at line 137 of file StatsCollector.h.

References nnonmissing_, and sum().

Referenced by getAllValuesMapping(), print(), PLearn::VMatrix::printFieldInfo(), sharperatio(), and PLearn::ConditionalDensityNet::train().

real PLearn::StatsCollector::min  )  const [inline]
 

Definition at line 135 of file StatsCollector.h.

References min_.

Referenced by print(), and PLearn::VMatrix::printFieldInfo().

real PLearn::StatsCollector::n  )  const [inline]
 

number of samples seen with update (length of VMat for ex.)

Definition at line 129 of file StatsCollector.h.

References nmissing_, and nnonmissing_.

Referenced by print().

real PLearn::StatsCollector::nmissing  )  const [inline]
 

Definition at line 130 of file StatsCollector.h.

References nmissing_.

Referenced by print(), and PLearn::VMatrix::printFieldInfo().

real PLearn::StatsCollector::nnonmissing  )  const [inline]
 

Definition at line 131 of file StatsCollector.h.

References nnonmissing_.

Referenced by PLearn::VMatrix::printFieldInfo(), and stderror().

void PLearn::StatsCollector::oldread istream &  in  )  [virtual]
 

DEPRECATED For backward compatibility with old saved object.

Reimplemented from PLearn::Object.

Definition at line 522 of file StatsCollector.cc.

References counts, max_, maxnvalues, min_, PLearn::StatsCollectorCounts::n, PLearn::StatsCollectorCounts::nbelow, nmissing_, nnonmissing_, PLERROR, PLearn::read(), PLearn::readField(), PLearn::readFieldName(), PLearn::readFooter(), PLearn::readHeader(), PLearn::readNewline(), PLearn::StatsCollectorCounts::sum, sum_, PLearn::StatsCollectorCounts::sumsquare, and sumsquare_.

void PLearn::StatsCollector::oldwrite ostream &  out  )  const [virtual]
 

Definition at line 494 of file StatsCollector.cc.

References counts, max_, maxnvalues, min_, nmissing_, nnonmissing_, sum_, sumsquare_, PLearn::write(), PLearn::writeField(), PLearn::writeFieldName(), PLearn::writeFooter(), PLearn::writeHeader(), and PLearn::writeNewline().

PLearn::StatsCollector::PLEARN_DECLARE_OBJECT StatsCollector   ) 
 

void PLearn::StatsCollector::print ostream &  out  )  const [virtual]
 

Prints a human-readable, short (not necessarily complete) description of this object instance (default prints info()). This is what is called by operator<< on Object

Reimplemented from PLearn::Object.

Definition at line 471 of file StatsCollector.cc.

References counts, PLearn::endl(), first_obs(), last_obs(), max(), mean(), min(), n(), nmissing(), stddev(), and stderror().

real PLearn::StatsCollector::sharperatio  )  const [inline]
 

Definition at line 144 of file StatsCollector.h.

References mean(), and stddev().

void PLearn::StatsCollector::sortIds  ) 
 

fix 'id' attribute of all StatCollectorCounts so that increasing ids correspond to increasing real values NOT TESTED YET (Julien)

Definition at line 91 of file StatsCollector.cc.

References counts, PLearn::PairRealSCCType, and PLearn::sortIdComparator().

real PLearn::StatsCollector::stddev  )  const [inline]
 

Definition at line 140 of file StatsCollector.h.

References PLearn::sqrt(), and variance().

Referenced by print(), PLearn::VMatrix::printFieldInfo(), and sharperatio().

real PLearn::StatsCollector::stderror  )  const [inline]
 

Definition at line 141 of file StatsCollector.h.

References nnonmissing(), PLearn::sqrt(), and variance().

Referenced by print().

real PLearn::StatsCollector::sum  )  const [inline]
 

Definition at line 132 of file StatsCollector.h.

References first_, nnonmissing_, and sum_.

Referenced by mean(), PLearn::VMatrix::printFieldInfo(), and sumsquare().

real PLearn::StatsCollector::sumsquare  )  const [inline]
 

Definition at line 134 of file StatsCollector.h.

References first_, nnonmissing_, sum(), and sumsquare_.

void PLearn::StatsCollector::update real  val,
real  weight = 1.0
 

update statistics with next value val of sequence

Definition at line 164 of file StatsCollector.cc.

References counts, first_, PLearn::is_missing(), last_, max_, maxnvalues, min_, nmissing_, nnonmissing_, sum_, sumsquare_, and val.

Referenced by PLearn::RepeatSplitter::build_(), getAllValuesMapping(), PLearn::printDistanceStatistics(), and PLearn::ConditionalDensityNet::train().

real PLearn::StatsCollector::variance  )  const [inline]
 

Definition at line 139 of file StatsCollector.h.

References nnonmissing_, PLearn::square(), sum_, and sumsquare_.

Referenced by stddev(), and stderror().


Member Data Documentation

map<real,StatsCollectorCounts> PLearn::StatsCollector::counts
 

will contain up to maxnvalues values and associated Counts as well as a last element which maps FLT_MAX, so that we don't miss anything (empty if maxnvalues=0)

Definition at line 113 of file StatsCollector.h.

Referenced by build_(), cdf(), forget(), getAllValuesMapping(), getBinMapping(), getCounts(), oldread(), oldwrite(), print(), sortIds(), and update().

real PLearn::StatsCollector::first_
 

first encountered nonmissing observation

Definition at line 107 of file StatsCollector.h.

Referenced by first_obs(), forget(), sum(), sumsquare(), and update().

real PLearn::StatsCollector::last_
 

last encountered nonmissing observation

Definition at line 108 of file StatsCollector.h.

Referenced by forget(), last_obs(), and update().

real PLearn::StatsCollector::max_
 

the max

Definition at line 106 of file StatsCollector.h.

Referenced by cdf(), forget(), getBinMapping(), max(), oldread(), oldwrite(), and update().

int PLearn::StatsCollector::maxnvalues
 

maximum number of different values to keep track of in counts (if 0, we will only keep track of global statistics)

Definition at line 95 of file StatsCollector.h.

Referenced by build_(), getMaxNValues(), oldread(), oldwrite(), PLearn::ConditionalDensityNet::train(), and update().

real PLearn::StatsCollector::min_
 

the min

Definition at line 105 of file StatsCollector.h.

Referenced by cdf(), forget(), getBinMapping(), min(), oldread(), oldwrite(), and update().

double PLearn::StatsCollector::nmissing_
 

(weighted) number of missing values

Definition at line 100 of file StatsCollector.h.

Referenced by forget(), getAllValuesMapping(), getBinMapping(), n(), nmissing(), oldread(), oldwrite(), and update().

double PLearn::StatsCollector::nnonmissing_
 

(weighted) number of non missing value

Definition at line 101 of file StatsCollector.h.

Referenced by cdf(), forget(), getAllValuesMapping(), getBinMapping(), mean(), n(), nnonmissing(), oldread(), oldwrite(), sum(), sumsquare(), update(), and variance().

double PLearn::StatsCollector::sum_
 

sum of all (values-first_

Definition at line 102 of file StatsCollector.h.

Referenced by forget(), oldread(), oldwrite(), sum(), update(), and variance().

double PLearn::StatsCollector::sumsquare_
 

sum of square of all (values-first_)

Definition at line 103 of file StatsCollector.h.

Referenced by forget(), oldread(), oldwrite(), sumsquare(), update(), and variance().

double PLearn::StatsCollector::sumweights_
 

sum of the weights

Definition at line 104 of file StatsCollector.h.


The documentation for this class was generated from the following files:
Generated on Tue Aug 17 16:23:56 2004 for PLearn by doxygen 1.3.7