#include <StatsCollector.h>
Inheritance diagram for PLearn::StatsCollector:
Public Types | |
typedef Object | inherited |
typedef Object | inherited |
Public Member Functions | |
PLEARN_DECLARE_OBJECT (StatsCollector) | |
StatsCollector (int the_maxnvalues=0) | |
real | n () const |
number of samples seen with update (length of VMat for ex.) | |
real | nmissing () const |
real | nnonmissing () const |
real | sum () const |
real | sumsquare () const |
real | min () const |
real | max () const |
real | mean () const |
real | variance () const |
real | stddev () const |
real | stderror () const |
real | first_obs () const |
real | last_obs () const |
real | sharperatio () const |
real | getStat (const string &statname) const |
Returns the index in the vector returned by getAllStats of the stat with the given name. | |
virtual void | build () |
simply calls inherited::build() then build_() | |
void | forget () |
clears all statistics, allowing to restart collecting them | |
void | update (real val, real weight=1.0) |
update statistics with next value val of sequence | |
void | finalize () |
finishes whatever computation are needed after all updates have been made | |
map< real, StatsCollectorCounts > * | getCounts () |
int | getMaxNValues () |
Mat | cdf (bool normalized=true) const |
returns a Mat with x,y coordinates for plotting the cdf only if normalized will the cdf go to 1, otherwise it will go to nsamples | |
void | sortIds () |
fix 'id' attribute of all StatCollectorCounts so that increasing ids correspond to increasing real values NOT TESTED YET (Julien) | |
RealMapping | getBinMapping (double discrete_mincount, double continuous_mincount, real tolerance=.1, TVec< double > *fcount=0) const |
returns a mapping that maps values to a bin number (from 0 to mapping.length()-1) The mapping will leave missing values as MISSING_VALUE And values outside the [min, max] range will be mapped to -1 Tolerance is used to test wheter we join the two last bins or not. | |
RealMapping | getAllValuesMapping (TVec< double > *fcount=0) const |
RealMapping | getAllValuesMapping (TVec< bool > *to_be_included, TVec< double > *fcount=0, bool ignore_other=false, real tolerance=0) const |
Same as getAllValuesMapping, except we can specify a bool vector, that indicates whether the k-th range should be included or not. | |
virtual void | oldwrite (ostream &out) const |
virtual void | oldread (istream &in) |
DEPRECATED For backward compatibility with old saved object. | |
virtual void | print (ostream &out) const |
Public Attributes | |
int | maxnvalues |
maximum number of different values to keep track of in counts (if 0, we will only keep track of global statistics) | |
double | nmissing_ |
(weighted) number of missing values | |
double | nnonmissing_ |
(weighted) number of non missing value | |
double | sum_ |
sum of all (values-first_ | |
double | sumsquare_ |
sum of square of all (values-first_) | |
double | sumweights_ |
sum of the weights | |
real | min_ |
the min | |
real | max_ |
the max | |
real | first_ |
first encountered nonmissing observation | |
real | last_ |
last encountered nonmissing observation | |
map< real, StatsCollectorCounts > | counts |
will contain up to maxnvalues values and associated Counts as well as a last element which maps FLT_MAX, so that we don't miss anything (empty if maxnvalues=0) | |
Static Protected Member Functions | |
void | declareOptions (OptionList &ol) |
Declares this class' options. | |
Private Member Functions | |
void | build_ () |
This does the actual building. |
Definition at line 81 of file StatsCollector.h.
|
Reimplemented from PLearn::Object. Definition at line 89 of file StatsCollector.h. |
|
Reimplemented from PLearn::Object. Definition at line 84 of file StatsCollector.h. |
|
Definition at line 73 of file StatsCollector.cc. References build_(), and MISSING_VALUE. |
|
simply calls inherited::build() then build_()
Reimplemented from PLearn::Object. Definition at line 145 of file StatsCollector.cc. References build_(). Referenced by PLearn::ConditionalDensityNet::train(). |
|
This does the actual building.
Reimplemented from PLearn::Object. Definition at line 137 of file StatsCollector.cc. References counts, and maxnvalues. Referenced by build(), forget(), and StatsCollector(). |
|
returns a Mat with x,y coordinates for plotting the cdf only if normalized will the cdf go to 1, otherwise it will go to nsamples
Definition at line 440 of file StatsCollector.cc. References PLearn::TMat< T >::column(), counts, PLearn::Mat, max_, min_, nnonmissing_, and val. Referenced by PLearn::interactiveDisplayCDF(), and PLearn::ConditionalDensityNet::train(). |
|
Declares this class' options.
Reimplemented from PLearn::Object. Definition at line 103 of file StatsCollector.cc. References PLearn::declareOption(), and PLearn::OptionList. |
|
finishes whatever computation are needed after all updates have been made
Definition at line 167 of file StatsCollector.h. Referenced by PLearn::RepeatSplitter::build_(). |
|
Definition at line 142 of file StatsCollector.h. References first_. Referenced by print(). |
|
clears all statistics, allowing to restart collecting them
Definition at line 151 of file StatsCollector.cc. References build_(), counts, first_, last_, max_, min_, MISSING_VALUE, nmissing_, nnonmissing_, sum_, and sumsquare_. Referenced by PLearn::ConditionalDensityNet::train(). |
|
Same as getAllValuesMapping, except we can specify a bool vector, that indicates whether the k-th range should be included or not. The boolean 'ignore_other' indicates whether a value not appearing in the mapping should be mapped to itself (false), or to -1 (true). We can also give a 'tolerance': in this case, each mapping will be expanded by '-epsilon' below and '+epsilon' above, with epsilon = tolerance * mean(difference between two consecutive values). If two consecutive mappings have a non-empty intersection after the expansion, they will be merged. Definition at line 341 of file StatsCollector.cc. References PLearn::RealMapping::addMapping(), PLearn::TVec< T >::append(), count, counts, k, PLearn::RealMapping::keep_other_as_is, PLearn::mean(), mean(), nmissing_, nnonmissing_, PLearn::RealMapping::other_mapsto, PLERROR, PLWARNING, PLearn::TVec< T >::resize(), and update(). |
|
Definition at line 336 of file StatsCollector.cc. References getAllValuesMapping(). Referenced by getAllValuesMapping(). |
|
returns a mapping that maps values to a bin number (from 0 to mapping.length()-1) The mapping will leave missing values as MISSING_VALUE And values outside the [min, max] range will be mapped to -1 Tolerance is used to test wheter we join the two last bins or not. If last be is short of more then tolerance*100% of continuous_mincount elements, we join it with the previous bin. Definition at line 210 of file StatsCollector.cc. References PLearn::RealMapping::addMapping(), PLearn::TVec< T >::append(), PLearn::TVec< T >::back(), count, counts, PLearn::RealMapping::lastMapping(), max_, min_, nmissing_, nnonmissing_, PLERROR, PLWARNING, PLearn::TVec< T >::pop_back(), PLearn::RealMapping::removeMapping(), PLearn::TVec< T >::resize(), PLearn::RealMapping::setMappingForOther(), and PLearn::RealMapping::size(). |
|
Definition at line 169 of file StatsCollector.h. References counts. Referenced by PLearn::RepeatSplitter::build_(). |
|
Definition at line 170 of file StatsCollector.h. References maxnvalues. |
|
Returns the index in the vector returned by getAllStats of the stat with the given name. Currently available names are E (mean) V (variance) STDDEV MIN MAX STDERROR SHARPERATIO Will call PLERROR statname is invalid Definition at line 559 of file StatsCollector.cc. References PLERROR. Referenced by PLearn::VecStatsCollector::getStat(). |
|
Definition at line 143 of file StatsCollector.h. References last_. Referenced by print(). |
|
Definition at line 136 of file StatsCollector.h. References max_. Referenced by print(), and PLearn::VMatrix::printFieldInfo(). |
|
Definition at line 137 of file StatsCollector.h. References nnonmissing_, and sum(). Referenced by getAllValuesMapping(), print(), PLearn::VMatrix::printFieldInfo(), sharperatio(), and PLearn::ConditionalDensityNet::train(). |
|
Definition at line 135 of file StatsCollector.h. References min_. Referenced by print(), and PLearn::VMatrix::printFieldInfo(). |
|
number of samples seen with update (length of VMat for ex.)
Definition at line 129 of file StatsCollector.h. References nmissing_, and nnonmissing_. Referenced by print(). |
|
Definition at line 130 of file StatsCollector.h. References nmissing_. Referenced by print(), and PLearn::VMatrix::printFieldInfo(). |
|
Definition at line 131 of file StatsCollector.h. References nnonmissing_. Referenced by PLearn::VMatrix::printFieldInfo(), and stderror(). |
|
DEPRECATED For backward compatibility with old saved object.
Reimplemented from PLearn::Object. Definition at line 522 of file StatsCollector.cc. References counts, max_, maxnvalues, min_, PLearn::StatsCollectorCounts::n, PLearn::StatsCollectorCounts::nbelow, nmissing_, nnonmissing_, PLERROR, PLearn::read(), PLearn::readField(), PLearn::readFieldName(), PLearn::readFooter(), PLearn::readHeader(), PLearn::readNewline(), PLearn::StatsCollectorCounts::sum, sum_, PLearn::StatsCollectorCounts::sumsquare, and sumsquare_. |
|
Definition at line 494 of file StatsCollector.cc. References counts, max_, maxnvalues, min_, nmissing_, nnonmissing_, sum_, sumsquare_, PLearn::write(), PLearn::writeField(), PLearn::writeFieldName(), PLearn::writeFooter(), PLearn::writeHeader(), and PLearn::writeNewline(). |
|
|
|
Prints a human-readable, short (not necessarily complete) description of this object instance (default prints info()). This is what is called by operator<< on Object Reimplemented from PLearn::Object. Definition at line 471 of file StatsCollector.cc. References counts, PLearn::endl(), first_obs(), last_obs(), max(), mean(), min(), n(), nmissing(), stddev(), and stderror(). |
|
Definition at line 144 of file StatsCollector.h. |
|
fix 'id' attribute of all StatCollectorCounts so that increasing ids correspond to increasing real values NOT TESTED YET (Julien)
Definition at line 91 of file StatsCollector.cc. References counts, PLearn::PairRealSCCType, and PLearn::sortIdComparator(). |
|
Definition at line 140 of file StatsCollector.h. References PLearn::sqrt(), and variance(). Referenced by print(), PLearn::VMatrix::printFieldInfo(), and sharperatio(). |
|
Definition at line 141 of file StatsCollector.h. References nnonmissing(), PLearn::sqrt(), and variance(). Referenced by print(). |
|
Definition at line 132 of file StatsCollector.h. References first_, nnonmissing_, and sum_. Referenced by mean(), PLearn::VMatrix::printFieldInfo(), and sumsquare(). |
|
Definition at line 134 of file StatsCollector.h. References first_, nnonmissing_, sum(), and sumsquare_. |
|
update statistics with next value val of sequence
Definition at line 164 of file StatsCollector.cc. References counts, first_, PLearn::is_missing(), last_, max_, maxnvalues, min_, nmissing_, nnonmissing_, sum_, sumsquare_, and val. Referenced by PLearn::RepeatSplitter::build_(), getAllValuesMapping(), PLearn::printDistanceStatistics(), and PLearn::ConditionalDensityNet::train(). |
|
Definition at line 139 of file StatsCollector.h. References nnonmissing_, PLearn::square(), sum_, and sumsquare_. Referenced by stddev(), and stderror(). |
|
will contain up to maxnvalues values and associated Counts as well as a last element which maps FLT_MAX, so that we don't miss anything (empty if maxnvalues=0)
Definition at line 113 of file StatsCollector.h. Referenced by build_(), cdf(), forget(), getAllValuesMapping(), getBinMapping(), getCounts(), oldread(), oldwrite(), print(), sortIds(), and update(). |
|
first encountered nonmissing observation
Definition at line 107 of file StatsCollector.h. Referenced by first_obs(), forget(), sum(), sumsquare(), and update(). |
|
last encountered nonmissing observation
Definition at line 108 of file StatsCollector.h. Referenced by forget(), last_obs(), and update(). |
|
the max
Definition at line 106 of file StatsCollector.h. Referenced by cdf(), forget(), getBinMapping(), max(), oldread(), oldwrite(), and update(). |
|
maximum number of different values to keep track of in counts (if 0, we will only keep track of global statistics)
Definition at line 95 of file StatsCollector.h. Referenced by build_(), getMaxNValues(), oldread(), oldwrite(), PLearn::ConditionalDensityNet::train(), and update(). |
|
the min
Definition at line 105 of file StatsCollector.h. Referenced by cdf(), forget(), getBinMapping(), min(), oldread(), oldwrite(), and update(). |
|
(weighted) number of missing values
Definition at line 100 of file StatsCollector.h. Referenced by forget(), getAllValuesMapping(), getBinMapping(), n(), nmissing(), oldread(), oldwrite(), and update(). |
|
(weighted) number of non missing value
Definition at line 101 of file StatsCollector.h. Referenced by cdf(), forget(), getAllValuesMapping(), getBinMapping(), mean(), n(), nnonmissing(), oldread(), oldwrite(), sum(), sumsquare(), update(), and variance(). |
|
sum of all (values-first_
Definition at line 102 of file StatsCollector.h. Referenced by forget(), oldread(), oldwrite(), sum(), update(), and variance(). |
|
sum of square of all (values-first_)
Definition at line 103 of file StatsCollector.h. Referenced by forget(), oldread(), oldwrite(), sumsquare(), update(), and variance(). |
|
sum of the weights
Definition at line 104 of file StatsCollector.h. |