Main Page | Namespace List | Class Hierarchy | Alphabetical List | Class List | File List | Namespace Members | Class Members | File Members

PLearn::CompactVMatrix Class Reference

#include <CompactVMatrix.h>

Inheritance diagram for PLearn::CompactVMatrix:

Inheritance graph
[legend]
Collaboration diagram for PLearn::CompactVMatrix:

Collaboration graph
[legend]
List of all members.

Public Member Functions

int nbits ()
int nsymbols ()
int nfixedpoint ()
void setOneHotMode (bool on=true)
 CompactVMatrix ()
 default constructor (for automatic deserialization)

 CompactVMatrix (int the_length, int n_variables, int n_binary, int n_nonbinary_discrete, int n_fixed_point, TVec< int > &n_symbolvalues, Vec &fixed_point_min, Vec &fixed_point_max, bool one_hot_encoding=true)
 CompactVMatrix (VMat m, int keep_last_variables_last=1, bool onehot_encoding=true)
 CompactVMatrix (const string &filename, int nlast=1)
 construct from saved CompactVMatrix

 CompactVMatrix (CompactVMatrix *cvm, VMat m, bool rescale=false, bool check=true)
void append (CompactVMatrix *vm)
 append vm to this VMatrix (the rows of vm are concatenated to the current rows of this VMatrix)

void perturb (int i, Vec row, real noise_level, int n_last)
TVec< int > & permutation_vector ()
virtual real squareDifference (int i, int j)
 encoding (v is not one-hot, and the variables in v are in the "original" order return the square difference between row i and row j, excluding n_last columns

virtual real dotProduct (int i, int j) const
 return the dot product of row i with row j, excluding n_last columns

virtual real dot (int i1, int i2, int inputsize) const
virtual real dot (int i, const Vec &v) const
 returns the result of the dot product between row i and the given vec (only v.length() first elements of row i are considered).

virtual void encodeAndPutRow (int i, Vec v)
 (i.e. at position i in v we find variable variables_permutation[i] in getRow's result)

virtual void putRow (int i, Vec v)
 v is possibly one-hot-encoded (according to one_hot_encoding flag) and the variables are in the same order as for getRow.

virtual void putSubRow (int i, int j, Vec v)
virtual void save (const string &filename)
 calls write

 PLEARN_DECLARE_OBJECT (CompactVMatrix)
 reverse of write, can be used by calling load(string)

void makeDeepCopyFromShallowCopy (map< const void *, void * > &copies)
 Transforms a shallow copy into a deep copy.

virtual void build ()
 nothing to do...


Public Attributes

TVec< intn_symbol_values
 for each 1-byte symbol, the number of possible values

int n_last
 used by dotProduct and squareDifference to specify # of last columns to ignore


Protected Member Functions

virtual void getNewRow (int i, const Vec &v) const
 decoding (v may be one-hot depending on one_hot_encoding flag)


Static Protected Member Functions

void set_n_bits_in_byte ()

Protected Attributes

Storage< unsigned char > data
 Each row of the matrix holds in order: bits, 1-byte symbols, fixed point numbers.

int row_n_bytes
 # of bytes per row

int n_bits
 number of binary symbols per row

int n_symbols
 number of 1-byte symbols per row

int n_fixedpoint
 number of fixed point numbers per row

int n_variables
 = n_bits + n_symbols + n_fixedpoint

bool one_hot_encoding
 the 1-byte symbols are converted to one-hot encoding by get

Vec fixedpoint_min
Vec fixedpoint_max
 the ranges of each number for fixed point encoding

Vec delta
 (fixedpoint_max-fixedpoint_min)/2^16

TVec< intvariables_permutation
 this variable is used only when constructed from VMat

int normal_width
 the value of width_ when one_hot_encoding=true

int symbols_offset
 where in each row the symbols start

int fixedpoint_offset
 where in each row the fixed point numbers start

Vec row_norms
 to cache the norms of the rows for squareDifference method


Static Protected Attributes

unsigned char n_bits_in_byte [256]
 CompactVMatrix *.


Private Types

typedef RowBufferedVMatrix inherited

Detailed Description

Like MemoryVMatrix this class holds the data in memory, but it tries to hold it compactly by using single bits for binary variables, single bytes for discrete variables whose number of possible values is less than 256, and unsigned shorts for the others, using a fixed point representation.

Definition at line 63 of file CompactVMatrix.h.


Member Typedef Documentation

typedef RowBufferedVMatrix PLearn::CompactVMatrix::inherited [private]
 

Reimplemented from PLearn::RowBufferedVMatrix.

Definition at line 65 of file CompactVMatrix.h.

Referenced by CompactVMatrix().


Constructor & Destructor Documentation

PLearn::CompactVMatrix::CompactVMatrix  ) 
 

default constructor (for automatic deserialization)

Definition at line 79 of file CompactVMatrix.cc.

PLearn::CompactVMatrix::CompactVMatrix int  the_length,
int  n_variables,
int  n_binary,
int  n_nonbinary_discrete,
int  n_fixed_point,
TVec< int > &  n_symbolvalues,
Vec fixed_point_min,
Vec fixed_point_max,
bool  one_hot_encoding = true
 

Definition at line 85 of file CompactVMatrix.cc.

References data, delta, fixedpoint_max, fixedpoint_min, fixedpoint_offset, inherited, n_bits, n_symbol_values, n_symbols, n_variables, normal_width, one_hot_encoding, PLearn::Storage< unsigned char >::resize(), row_n_bytes, set_n_bits_in_byte(), setOneHotMode(), symbols_offset, variables_permutation, and PLearn::Vec.

PLearn::CompactVMatrix::CompactVMatrix VMat  m,
int  keep_last_variables_last = 1,
bool  onehot_encoding = true
 

Convert a VMat into a CompactVMatrix: this will use the stats computed in the fieldstats of the VMatrix (they will be computed if not already) to figure out which variables are binary, discrete (and how many symbols), and the ranges of numeric variables. THE VMAT DISCRETE VARIABLES MUST NOT BE ALREADY ONE-HOT ENCODED. The variables will be permuted according to the permutation vector which can be retrieved from the variables_permutation_vector() method. By default the last column of the VMat will stay last, thus being coded as fixedpoint (so the permutation information may not be necessary if the last column represents a target and all the previous ones some inputs. keep_last_variables_last is the number of "last columns" to keep in place.

Definition at line 111 of file CompactVMatrix.cc.

References PLearn::VMFieldStat::counts, data, delta, encodeAndPutRow(), PLearn::endl(), fixedpoint_max, fixedpoint_min, fixedpoint_offset, PLearn::isMapKeysAreInt(), PLearn::VMFieldStat::max(), PLearn::VMFieldStat::min(), n_bits, n_fixedpoint, n_symbol_values, n_symbols, n_variables, normal_width, one_hot_encoding, PLERROR, PLearn::TVec< VMFieldStat >::resize(), PLearn::TVec< VMField >::resize(), PLearn::Storage< unsigned char >::resize(), PLearn::TVec< int >::resize(), PLearn::TVec< T >::resize(), row_n_bytes, set_n_bits_in_byte(), setOneHotMode(), symbols_offset, variables_permutation, and PLearn::VMat::width().

PLearn::CompactVMatrix::CompactVMatrix const string filename,
int  nlast = 1
 

construct from saved CompactVMatrix

Definition at line 211 of file CompactVMatrix.cc.

References PLearn::load(), n_last, and set_n_bits_in_byte().

PLearn::CompactVMatrix::CompactVMatrix CompactVMatrix cvm,
VMat  m,
bool  rescale = false,
bool  check = true
 

Create a CompactVMatrix with the same structure as cvcm but containing the data in m. Both must obviously have the same width. If rescale is true, then the min/max values for fixed-point encoding are recomputed. If check==true than this is verified and an error message is thrown if the floating point data are not in the expected ranges (of cvm).

Definition at line 218 of file CompactVMatrix.cc.

References PLearn::TVec< T >::copy(), data, delta, fixedpoint_max, fixedpoint_min, fixedpoint_offset, n_bits, n_fixedpoint, n_last, n_symbol_values, n_symbols, n_variables, normal_width, one_hot_encoding, PLERROR, putRow(), PLearn::Storage< unsigned char >::resize(), row_n_bytes, setOneHotMode(), symbols_offset, variables_permutation, PLearn::VMat::width(), and PLearn::VMatrix::width().


Member Function Documentation

void PLearn::CompactVMatrix::append CompactVMatrix vm  ) 
 

append vm to this VMatrix (the rows of vm are concatenated to the current rows of this VMatrix)

Definition at line 734 of file CompactVMatrix.cc.

References PLearn::TVec< T >::copy(), PLearn::Storage< unsigned char >::data, data, delta, PLearn::endl(), fixedpoint_max, fixedpoint_min, fixedpoint_offset, PLearn::RowBufferedVMatrix::getRow(), PLearn::VMatrix::length(), n_bits, n_fixedpoint, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, putRow(), PLearn::Storage< unsigned char >::resize(), row_n_bytes, setOneHotMode(), PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, PLearn::VMatrix::width(), and PLearn::write().

virtual void PLearn::CompactVMatrix::build  )  [inline, virtual]
 

nothing to do...

Reimplemented from PLearn::VMatrix.

Definition at line 194 of file CompactVMatrix.h.

real PLearn::CompactVMatrix::dot int  i,
const Vec v
const [virtual]
 

returns the result of the dot product between row i and the given vec (only v.length() first elements of row i are considered).

Reimplemented from PLearn::RowBufferedVMatrix.

Definition at line 414 of file CompactVMatrix.cc.

References PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, PLearn::dot(), PLearn::dot_product(), fixedpoint_min, fixedpoint_offset, PLearn::TVec< T >::length(), n_bits, n_fixedpoint, n_last, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, row_n_bytes, SANITYCHECK_CompactVMatrix_PRECISION, PLearn::TVec< T >::subVec(), symbols_offset, PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, and PLearn::VMatrix::width().

real PLearn::CompactVMatrix::dot int  i1,
int  i2,
int  inputsize
const [virtual]
 

returns the dot product between row i1 and row i2 (considering only the inputsize first elements). The default version in VMatrix is somewhat inefficient, as it repeatedly calls get(i,j) The default version in RowBufferedVMatrix is a little better as it buffers the 2 Vecs between calls in case one of them is needed again. But the real strength of this method is for specialised and efficient versions in subbclasses. This method is typically used by SmartKernels so that they can compute kernel values between input samples efficiently.

Reimplemented from PLearn::RowBufferedVMatrix.

Definition at line 340 of file CompactVMatrix.cc.

References PLearn::Storage< unsigned char >::data, data, delta, PLearn::dot_product(), fixedpoint_min, fixedpoint_offset, k, n_bits, n_bits_in_byte, n_fixedpoint, n_last, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, row_n_bytes, symbols_offset, PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, and PLearn::VMatrix::width().

real PLearn::CompactVMatrix::dotProduct int  i,
int  j
const [virtual]
 

return the dot product of row i with row j, excluding n_last columns

Definition at line 483 of file CompactVMatrix.cc.

References PLearn::dot(), n_last, and PLearn::VMatrix::width().

Referenced by squareDifference().

void PLearn::CompactVMatrix::encodeAndPutRow int  i,
Vec  v
[virtual]
 

(i.e. at position i in v we find variable variables_permutation[i] in getRow's result)

Definition at line 497 of file CompactVMatrix.cc.

References PLearn::TVec< int >::data(), PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, fixedpoint_min, fixedpoint_offset, n_bits, n_fixedpoint, n_symbol_values, n_symbols, PLERROR, row_n_bytes, symbols_offset, val, and variables_permutation.

Referenced by CompactVMatrix().

void PLearn::CompactVMatrix::getNewRow int  i,
const Vec v
const [protected, virtual]
 

decoding (v may be one-hot depending on one_hot_encoding flag)

Implements PLearn::RowBufferedVMatrix.

Definition at line 288 of file CompactVMatrix.cc.

References PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, fixedpoint_min, fixedpoint_offset, PLearn::TVec< T >::length(), n_bits, n_fixedpoint, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, row_n_bytes, symbols_offset, PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, and PLearn::VMatrix::width().

void PLearn::CompactVMatrix::makeDeepCopyFromShallowCopy map< const void *, void * > &  copies  )  [virtual]
 

Transforms a shallow copy into a deep copy.

Reimplemented from PLearn::RowBufferedVMatrix.

Definition at line 834 of file CompactVMatrix.cc.

References data, PLearn::deepCopyField(), fixedpoint_max, fixedpoint_min, n_symbol_values, and variables_permutation.

int PLearn::CompactVMatrix::nbits  )  [inline]
 

Definition at line 80 of file CompactVMatrix.h.

References n_bits.

int PLearn::CompactVMatrix::nfixedpoint  )  [inline]
 

Definition at line 82 of file CompactVMatrix.h.

References n_fixedpoint.

int PLearn::CompactVMatrix::nsymbols  )  [inline]
 

Definition at line 81 of file CompactVMatrix.h.

References n_symbols.

TVec<int>& PLearn::CompactVMatrix::permutation_vector  )  [inline]
 

this vector is filled only when the CompactVMatrix was constructed from a VMat, and it provides the permutation of the original columns to order them into (bits, bytes, fixedpoint)

Definition at line 156 of file CompactVMatrix.h.

References variables_permutation.

void PLearn::CompactVMatrix::perturb int  i,
Vec  row,
real  noise_level,
int  n_last
 

create in the elements of row (except the n_last ones) a perturbed version of the i-th row of the database. This random perturbation is based on the unconditional statistics which should be present in the fieldstats; the noise level can be modulated with the noise_level argument (a value of 1 will perturb by as much as the noise seen in the unconditional statistics). Continuous variables are resampled around the current value with sigma = noise_leve * unconditional_sigma. Discrete variables are resampled with a distribution that is a mixture: (1-noise_level)*(probability mass on all current value)+noise_level*(unconditional distr)

Definition at line 606 of file CompactVMatrix.cc.

References PLearn::binomial_sample(), PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, fixedpoint_max, fixedpoint_min, fixedpoint_offset, PLearn::TVec< T >::length(), PLearn::multinomial_sample(), n_bits, n_fixedpoint, n_symbol_values, n_symbols, n_variables, PLearn::normal_sample(), one_hot_encoding, PLERROR, PLearn::VMFieldStat::prob(), PLearn::TVec< T >::resize(), row_n_bytes, PLearn::TVec< VMFieldStat >::size(), symbols_offset, PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, val, PLearn::var(), and PLearn::VMatrix::width().

PLearn::CompactVMatrix::PLEARN_DECLARE_OBJECT CompactVMatrix   ) 
 

reverse of write, can be used by calling load(string)

void PLearn::CompactVMatrix::putRow int  i,
Vec  v
[virtual]
 

v is possibly one-hot-encoded (according to one_hot_encoding flag) and the variables are in the same order as for getRow.

Reimplemented from PLearn::VMatrix.

Definition at line 531 of file CompactVMatrix.cc.

References putSubRow().

Referenced by append(), and CompactVMatrix().

void PLearn::CompactVMatrix::putSubRow int  i,
int  j,
Vec  v
[virtual]
 

It is suggested that this method be implemented in subclasses of writable matrices to speed up accesses (default version repeatedly calls put(i,j,value) which may have a significant overhead)

Reimplemented from PLearn::VMatrix.

Definition at line 536 of file CompactVMatrix.cc.

References PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, fixedpoint_min, fixedpoint_offset, k, n_bits, n_fixedpoint, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, row_n_bytes, symbols_offset, and val.

Referenced by putRow().

virtual void PLearn::CompactVMatrix::save const string filename  )  [inline, virtual]
 

calls write

Definition at line 185 of file CompactVMatrix.h.

void PLearn::CompactVMatrix::set_n_bits_in_byte  )  [static, protected]
 

Definition at line 59 of file CompactVMatrix.cc.

References n_bits_in_byte.

Referenced by CompactVMatrix().

void PLearn::CompactVMatrix::setOneHotMode bool  on = true  ) 
 

Definition at line 279 of file CompactVMatrix.cc.

References n_variables, normal_width, and one_hot_encoding.

Referenced by append(), and CompactVMatrix().

real PLearn::CompactVMatrix::squareDifference int  i,
int  j
[virtual]
 

encoding (v is not one-hot, and the variables in v are in the "original" order return the square difference between row i and row j, excluding n_last columns

Definition at line 486 of file CompactVMatrix.cc.

References dotProduct(), PLearn::TVec< T >::length(), row_norms, and PLearn::Vec.


Member Data Documentation

Storage<unsigned char> PLearn::CompactVMatrix::data [protected]
 

Each row of the matrix holds in order: bits, 1-byte symbols, fixed point numbers.

Definition at line 71 of file CompactVMatrix.h.

Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), makeDeepCopyFromShallowCopy(), perturb(), and putSubRow().

Vec PLearn::CompactVMatrix::delta [protected]
 

(fixedpoint_max-fixedpoint_min)/2^16

Definition at line 85 of file CompactVMatrix.h.

Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), perturb(), and putSubRow().

Vec PLearn::CompactVMatrix::fixedpoint_max [protected]
 

the ranges of each number for fixed point encoding

Definition at line 84 of file CompactVMatrix.h.

Referenced by append(), CompactVMatrix(), makeDeepCopyFromShallowCopy(), and perturb().

Vec PLearn::CompactVMatrix::fixedpoint_min [protected]
 

Definition at line 84 of file CompactVMatrix.h.

Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), makeDeepCopyFromShallowCopy(), perturb(), and putSubRow().

int PLearn::CompactVMatrix::fixedpoint_offset [protected]
 

where in each row the fixed point numbers start

Definition at line 199 of file CompactVMatrix.h.

Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), perturb(), and putSubRow().

int PLearn::CompactVMatrix::n_bits [protected]
 

number of binary symbols per row

Definition at line 73 of file CompactVMatrix.h.

Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), nbits(), perturb(), and putSubRow().

unsigned char PLearn::CompactVMatrix::n_bits_in_byte [static, protected]
 

CompactVMatrix *.

and provides the permutation of the original columns in order to order them into (bits, bytes, fixedpoint) variables_permutation[new_column]=old_column (not in one-hot code)

Definition at line 57 of file CompactVMatrix.cc.

Referenced by dot(), and set_n_bits_in_byte().

int PLearn::CompactVMatrix::n_fixedpoint [protected]
 

number of fixed point numbers per row

Definition at line 75 of file CompactVMatrix.h.

Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), nfixedpoint(), perturb(), and putSubRow().

int PLearn::CompactVMatrix::n_last
 

used by dotProduct and squareDifference to specify # of last columns to ignore

Definition at line 95 of file CompactVMatrix.h.

Referenced by CompactVMatrix(), dot(), and dotProduct().

TVec<int> PLearn::CompactVMatrix::n_symbol_values
 

for each 1-byte symbol, the number of possible values

Definition at line 79 of file CompactVMatrix.h.

Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), makeDeepCopyFromShallowCopy(), perturb(), and putSubRow().

int PLearn::CompactVMatrix::n_symbols [protected]
 

number of 1-byte symbols per row

Definition at line 74 of file CompactVMatrix.h.

Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), nsymbols(), perturb(), and putSubRow().

int PLearn::CompactVMatrix::n_variables [protected]
 

= n_bits + n_symbols + n_fixedpoint

Definition at line 76 of file CompactVMatrix.h.

Referenced by CompactVMatrix(), perturb(), and setOneHotMode().

int PLearn::CompactVMatrix::normal_width [protected]
 

the value of width_ when one_hot_encoding=true

Definition at line 97 of file CompactVMatrix.h.

Referenced by CompactVMatrix(), and setOneHotMode().

bool PLearn::CompactVMatrix::one_hot_encoding [protected]
 

the 1-byte symbols are converted to one-hot encoding by get

Definition at line 77 of file CompactVMatrix.h.

Referenced by append(), CompactVMatrix(), dot(), getNewRow(), perturb(), putSubRow(), and setOneHotMode().

int PLearn::CompactVMatrix::row_n_bytes [protected]
 

# of bytes per row

Definition at line 72 of file CompactVMatrix.h.

Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), perturb(), and putSubRow().

Vec PLearn::CompactVMatrix::row_norms [protected]
 

to cache the norms of the rows for squareDifference method

Definition at line 200 of file CompactVMatrix.h.

Referenced by squareDifference().

int PLearn::CompactVMatrix::symbols_offset [protected]
 

where in each row the symbols start

Definition at line 198 of file CompactVMatrix.h.

Referenced by CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), perturb(), and putSubRow().

TVec<int> PLearn::CompactVMatrix::variables_permutation [protected]
 

this variable is used only when constructed from VMat

Definition at line 86 of file CompactVMatrix.h.

Referenced by CompactVMatrix(), encodeAndPutRow(), makeDeepCopyFromShallowCopy(), and permutation_vector().


The documentation for this class was generated from the following files:
Generated on Tue Aug 17 16:26:04 2004 for PLearn by doxygen 1.3.7