#include <CompactVMatrix.h>
Inheritance diagram for PLearn::CompactVMatrix:
Public Member Functions | |
int | nbits () |
int | nsymbols () |
int | nfixedpoint () |
void | setOneHotMode (bool on=true) |
CompactVMatrix () | |
default constructor (for automatic deserialization) | |
CompactVMatrix (int the_length, int n_variables, int n_binary, int n_nonbinary_discrete, int n_fixed_point, TVec< int > &n_symbolvalues, Vec &fixed_point_min, Vec &fixed_point_max, bool one_hot_encoding=true) | |
CompactVMatrix (VMat m, int keep_last_variables_last=1, bool onehot_encoding=true) | |
CompactVMatrix (const string &filename, int nlast=1) | |
construct from saved CompactVMatrix | |
CompactVMatrix (CompactVMatrix *cvm, VMat m, bool rescale=false, bool check=true) | |
void | append (CompactVMatrix *vm) |
append vm to this VMatrix (the rows of vm are concatenated to the current rows of this VMatrix) | |
void | perturb (int i, Vec row, real noise_level, int n_last) |
TVec< int > & | permutation_vector () |
virtual real | squareDifference (int i, int j) |
encoding (v is not one-hot, and the variables in v are in the "original" order return the square difference between row i and row j, excluding n_last columns | |
virtual real | dotProduct (int i, int j) const |
return the dot product of row i with row j, excluding n_last columns | |
virtual real | dot (int i1, int i2, int inputsize) const |
virtual real | dot (int i, const Vec &v) const |
returns the result of the dot product between row i and the given vec (only v.length() first elements of row i are considered). | |
virtual void | encodeAndPutRow (int i, Vec v) |
(i.e. at position i in v we find variable variables_permutation[i] in getRow's result) | |
virtual void | putRow (int i, Vec v) |
v is possibly one-hot-encoded (according to one_hot_encoding flag) and the variables are in the same order as for getRow. | |
virtual void | putSubRow (int i, int j, Vec v) |
virtual void | save (const string &filename) |
calls write | |
PLEARN_DECLARE_OBJECT (CompactVMatrix) | |
reverse of write, can be used by calling load(string) | |
void | makeDeepCopyFromShallowCopy (map< const void *, void * > &copies) |
Transforms a shallow copy into a deep copy. | |
virtual void | build () |
nothing to do... | |
Public Attributes | |
TVec< int > | n_symbol_values |
for each 1-byte symbol, the number of possible values | |
int | n_last |
used by dotProduct and squareDifference to specify # of last columns to ignore | |
Protected Member Functions | |
virtual void | getNewRow (int i, const Vec &v) const |
decoding (v may be one-hot depending on one_hot_encoding flag) | |
Static Protected Member Functions | |
void | set_n_bits_in_byte () |
Protected Attributes | |
Storage< unsigned char > | data |
Each row of the matrix holds in order: bits, 1-byte symbols, fixed point numbers. | |
int | row_n_bytes |
# of bytes per row | |
int | n_bits |
number of binary symbols per row | |
int | n_symbols |
number of 1-byte symbols per row | |
int | n_fixedpoint |
number of fixed point numbers per row | |
int | n_variables |
= n_bits + n_symbols + n_fixedpoint | |
bool | one_hot_encoding |
the 1-byte symbols are converted to one-hot encoding by get | |
Vec | fixedpoint_min |
Vec | fixedpoint_max |
the ranges of each number for fixed point encoding | |
Vec | delta |
(fixedpoint_max-fixedpoint_min)/2^16 | |
TVec< int > | variables_permutation |
this variable is used only when constructed from VMat | |
int | normal_width |
the value of width_ when one_hot_encoding=true | |
int | symbols_offset |
where in each row the symbols start | |
int | fixedpoint_offset |
where in each row the fixed point numbers start | |
Vec | row_norms |
to cache the norms of the rows for squareDifference method | |
Static Protected Attributes | |
unsigned char | n_bits_in_byte [256] |
CompactVMatrix *. | |
Private Types | |
typedef RowBufferedVMatrix | inherited |
Definition at line 63 of file CompactVMatrix.h.
|
Reimplemented from PLearn::RowBufferedVMatrix. Definition at line 65 of file CompactVMatrix.h. Referenced by CompactVMatrix(). |
|
default constructor (for automatic deserialization)
Definition at line 79 of file CompactVMatrix.cc. |
|
Definition at line 85 of file CompactVMatrix.cc. References data, delta, fixedpoint_max, fixedpoint_min, fixedpoint_offset, inherited, n_bits, n_symbol_values, n_symbols, n_variables, normal_width, one_hot_encoding, PLearn::Storage< unsigned char >::resize(), row_n_bytes, set_n_bits_in_byte(), setOneHotMode(), symbols_offset, variables_permutation, and PLearn::Vec. |
|
Convert a VMat into a CompactVMatrix: this will use the stats computed in the fieldstats of the VMatrix (they will be computed if not already) to figure out which variables are binary, discrete (and how many symbols), and the ranges of numeric variables. THE VMAT DISCRETE VARIABLES MUST NOT BE ALREADY ONE-HOT ENCODED. The variables will be permuted according to the permutation vector which can be retrieved from the variables_permutation_vector() method. By default the last column of the VMat will stay last, thus being coded as fixedpoint (so the permutation information may not be necessary if the last column represents a target and all the previous ones some inputs. keep_last_variables_last is the number of "last columns" to keep in place. Definition at line 111 of file CompactVMatrix.cc. References PLearn::VMFieldStat::counts, data, delta, encodeAndPutRow(), PLearn::endl(), fixedpoint_max, fixedpoint_min, fixedpoint_offset, PLearn::isMapKeysAreInt(), PLearn::VMFieldStat::max(), PLearn::VMFieldStat::min(), n_bits, n_fixedpoint, n_symbol_values, n_symbols, n_variables, normal_width, one_hot_encoding, PLERROR, PLearn::TVec< VMFieldStat >::resize(), PLearn::TVec< VMField >::resize(), PLearn::Storage< unsigned char >::resize(), PLearn::TVec< int >::resize(), PLearn::TVec< T >::resize(), row_n_bytes, set_n_bits_in_byte(), setOneHotMode(), symbols_offset, variables_permutation, and PLearn::VMat::width(). |
|
construct from saved CompactVMatrix
Definition at line 211 of file CompactVMatrix.cc. References PLearn::load(), n_last, and set_n_bits_in_byte(). |
|
Create a CompactVMatrix with the same structure as cvcm but containing the data in m. Both must obviously have the same width. If rescale is true, then the min/max values for fixed-point encoding are recomputed. If check==true than this is verified and an error message is thrown if the floating point data are not in the expected ranges (of cvm). Definition at line 218 of file CompactVMatrix.cc. References PLearn::TVec< T >::copy(), data, delta, fixedpoint_max, fixedpoint_min, fixedpoint_offset, n_bits, n_fixedpoint, n_last, n_symbol_values, n_symbols, n_variables, normal_width, one_hot_encoding, PLERROR, putRow(), PLearn::Storage< unsigned char >::resize(), row_n_bytes, setOneHotMode(), symbols_offset, variables_permutation, PLearn::VMat::width(), and PLearn::VMatrix::width(). |
|
append vm to this VMatrix (the rows of vm are concatenated to the current rows of this VMatrix)
Definition at line 734 of file CompactVMatrix.cc. References PLearn::TVec< T >::copy(), PLearn::Storage< unsigned char >::data, data, delta, PLearn::endl(), fixedpoint_max, fixedpoint_min, fixedpoint_offset, PLearn::RowBufferedVMatrix::getRow(), PLearn::VMatrix::length(), n_bits, n_fixedpoint, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, putRow(), PLearn::Storage< unsigned char >::resize(), row_n_bytes, setOneHotMode(), PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, PLearn::VMatrix::width(), and PLearn::write(). |
|
nothing to do...
Reimplemented from PLearn::VMatrix. Definition at line 194 of file CompactVMatrix.h. |
|
returns the result of the dot product between row i and the given vec (only v.length() first elements of row i are considered).
Reimplemented from PLearn::RowBufferedVMatrix. Definition at line 414 of file CompactVMatrix.cc. References PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, PLearn::dot(), PLearn::dot_product(), fixedpoint_min, fixedpoint_offset, PLearn::TVec< T >::length(), n_bits, n_fixedpoint, n_last, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, row_n_bytes, SANITYCHECK_CompactVMatrix_PRECISION, PLearn::TVec< T >::subVec(), symbols_offset, PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, and PLearn::VMatrix::width(). |
|
returns the dot product between row i1 and row i2 (considering only the inputsize first elements). The default version in VMatrix is somewhat inefficient, as it repeatedly calls get(i,j) The default version in RowBufferedVMatrix is a little better as it buffers the 2 Vecs between calls in case one of them is needed again. But the real strength of this method is for specialised and efficient versions in subbclasses. This method is typically used by SmartKernels so that they can compute kernel values between input samples efficiently. Reimplemented from PLearn::RowBufferedVMatrix. Definition at line 340 of file CompactVMatrix.cc. References PLearn::Storage< unsigned char >::data, data, delta, PLearn::dot_product(), fixedpoint_min, fixedpoint_offset, k, n_bits, n_bits_in_byte, n_fixedpoint, n_last, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, row_n_bytes, symbols_offset, PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, and PLearn::VMatrix::width(). |
|
return the dot product of row i with row j, excluding n_last columns
Definition at line 483 of file CompactVMatrix.cc. References PLearn::dot(), n_last, and PLearn::VMatrix::width(). Referenced by squareDifference(). |
|
(i.e. at position i in v we find variable variables_permutation[i] in getRow's result)
Definition at line 497 of file CompactVMatrix.cc. References PLearn::TVec< int >::data(), PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, fixedpoint_min, fixedpoint_offset, n_bits, n_fixedpoint, n_symbol_values, n_symbols, PLERROR, row_n_bytes, symbols_offset, val, and variables_permutation. Referenced by CompactVMatrix(). |
|
decoding (v may be one-hot depending on one_hot_encoding flag)
Implements PLearn::RowBufferedVMatrix. Definition at line 288 of file CompactVMatrix.cc. References PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, fixedpoint_min, fixedpoint_offset, PLearn::TVec< T >::length(), n_bits, n_fixedpoint, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, row_n_bytes, symbols_offset, PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, and PLearn::VMatrix::width(). |
|
Transforms a shallow copy into a deep copy.
Reimplemented from PLearn::RowBufferedVMatrix. Definition at line 834 of file CompactVMatrix.cc. References data, PLearn::deepCopyField(), fixedpoint_max, fixedpoint_min, n_symbol_values, and variables_permutation. |
|
Definition at line 80 of file CompactVMatrix.h. References n_bits. |
|
Definition at line 82 of file CompactVMatrix.h. References n_fixedpoint. |
|
Definition at line 81 of file CompactVMatrix.h. References n_symbols. |
|
this vector is filled only when the CompactVMatrix was constructed from a VMat, and it provides the permutation of the original columns to order them into (bits, bytes, fixedpoint) Definition at line 156 of file CompactVMatrix.h. References variables_permutation. |
|
create in the elements of row (except the n_last ones) a perturbed version of the i-th row of the database. This random perturbation is based on the unconditional statistics which should be present in the fieldstats; the noise level can be modulated with the noise_level argument (a value of 1 will perturb by as much as the noise seen in the unconditional statistics). Continuous variables are resampled around the current value with sigma = noise_leve * unconditional_sigma. Discrete variables are resampled with a distribution that is a mixture: (1-noise_level)*(probability mass on all current value)+noise_level*(unconditional distr) Definition at line 606 of file CompactVMatrix.cc. References PLearn::binomial_sample(), PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, fixedpoint_max, fixedpoint_min, fixedpoint_offset, PLearn::TVec< T >::length(), PLearn::multinomial_sample(), n_bits, n_fixedpoint, n_symbol_values, n_symbols, n_variables, PLearn::normal_sample(), one_hot_encoding, PLERROR, PLearn::VMFieldStat::prob(), PLearn::TVec< T >::resize(), row_n_bytes, PLearn::TVec< VMFieldStat >::size(), symbols_offset, PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, val, PLearn::var(), and PLearn::VMatrix::width(). |
|
reverse of write, can be used by calling load(string)
|
|
v is possibly one-hot-encoded (according to one_hot_encoding flag) and the variables are in the same order as for getRow.
Reimplemented from PLearn::VMatrix. Definition at line 531 of file CompactVMatrix.cc. References putSubRow(). Referenced by append(), and CompactVMatrix(). |
|
It is suggested that this method be implemented in subclasses of writable matrices to speed up accesses (default version repeatedly calls put(i,j,value) which may have a significant overhead) Reimplemented from PLearn::VMatrix. Definition at line 536 of file CompactVMatrix.cc. References PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, fixedpoint_min, fixedpoint_offset, k, n_bits, n_fixedpoint, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, row_n_bytes, symbols_offset, and val. Referenced by putRow(). |
|
calls write
Definition at line 185 of file CompactVMatrix.h. |
|
Definition at line 59 of file CompactVMatrix.cc. References n_bits_in_byte. Referenced by CompactVMatrix(). |
|
Definition at line 279 of file CompactVMatrix.cc. References n_variables, normal_width, and one_hot_encoding. Referenced by append(), and CompactVMatrix(). |
|
encoding (v is not one-hot, and the variables in v are in the "original" order return the square difference between row i and row j, excluding n_last columns
Definition at line 486 of file CompactVMatrix.cc. References dotProduct(), PLearn::TVec< T >::length(), row_norms, and PLearn::Vec. |
|
Each row of the matrix holds in order: bits, 1-byte symbols, fixed point numbers.
Definition at line 71 of file CompactVMatrix.h. Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), makeDeepCopyFromShallowCopy(), perturb(), and putSubRow(). |
|
(fixedpoint_max-fixedpoint_min)/2^16
Definition at line 85 of file CompactVMatrix.h. Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), perturb(), and putSubRow(). |
|
the ranges of each number for fixed point encoding
Definition at line 84 of file CompactVMatrix.h. Referenced by append(), CompactVMatrix(), makeDeepCopyFromShallowCopy(), and perturb(). |
|
Definition at line 84 of file CompactVMatrix.h. Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), makeDeepCopyFromShallowCopy(), perturb(), and putSubRow(). |
|
where in each row the fixed point numbers start
Definition at line 199 of file CompactVMatrix.h. Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), perturb(), and putSubRow(). |
|
number of binary symbols per row
Definition at line 73 of file CompactVMatrix.h. Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), nbits(), perturb(), and putSubRow(). |
|
and provides the permutation of the original columns in order to order them into (bits, bytes, fixedpoint) variables_permutation[new_column]=old_column (not in one-hot code) Definition at line 57 of file CompactVMatrix.cc. Referenced by dot(), and set_n_bits_in_byte(). |
|
number of fixed point numbers per row
Definition at line 75 of file CompactVMatrix.h. Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), nfixedpoint(), perturb(), and putSubRow(). |
|
used by dotProduct and squareDifference to specify # of last columns to ignore
Definition at line 95 of file CompactVMatrix.h. Referenced by CompactVMatrix(), dot(), and dotProduct(). |
|
for each 1-byte symbol, the number of possible values
Definition at line 79 of file CompactVMatrix.h. Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), makeDeepCopyFromShallowCopy(), perturb(), and putSubRow(). |
|
number of 1-byte symbols per row
Definition at line 74 of file CompactVMatrix.h. Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), nsymbols(), perturb(), and putSubRow(). |
|
= n_bits + n_symbols + n_fixedpoint
Definition at line 76 of file CompactVMatrix.h. Referenced by CompactVMatrix(), perturb(), and setOneHotMode(). |
|
the value of width_ when one_hot_encoding=true
Definition at line 97 of file CompactVMatrix.h. Referenced by CompactVMatrix(), and setOneHotMode(). |
|
the 1-byte symbols are converted to one-hot encoding by get
Definition at line 77 of file CompactVMatrix.h. Referenced by append(), CompactVMatrix(), dot(), getNewRow(), perturb(), putSubRow(), and setOneHotMode(). |
|
# of bytes per row
Definition at line 72 of file CompactVMatrix.h. Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), perturb(), and putSubRow(). |
|
to cache the norms of the rows for squareDifference method
Definition at line 200 of file CompactVMatrix.h. Referenced by squareDifference(). |
|
where in each row the symbols start
Definition at line 198 of file CompactVMatrix.h. Referenced by CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), perturb(), and putSubRow(). |
|
this variable is used only when constructed from VMat
Definition at line 86 of file CompactVMatrix.h. Referenced by CompactVMatrix(), encodeAndPutRow(), makeDeepCopyFromShallowCopy(), and permutation_vector(). |