PLearn::CompactVMatrix Class Reference

#include <CompactVMatrix.h>

Inheritance diagram for PLearn::CompactVMatrix:

[legend]Collaboration diagram for PLearn::CompactVMatrix:

[legend]List of all members.


Public Member Functions
int	nbits ()
int	nsymbols ()
int	nfixedpoint ()
void	setOneHotMode (bool on=true)
	CompactVMatrix ()
	default constructor (for automatic deserialization)
	CompactVMatrix (int the_length, int n_variables, int n_binary, int n_nonbinary_discrete, int n_fixed_point, TVec< int > &n_symbolvalues, Vec &fixed_point_min, Vec &fixed_point_max, bool one_hot_encoding=true)
	CompactVMatrix (VMat m, int keep_last_variables_last=1, bool onehot_encoding=true)
	CompactVMatrix (const string &filename, int nlast=1)
	construct from saved CompactVMatrix
	CompactVMatrix (CompactVMatrix *cvm, VMat m, bool rescale=false, bool check=true)
void	append (CompactVMatrix *vm)
	append vm to this VMatrix (the rows of vm are concatenated to the current rows of this VMatrix)
void	perturb (int i, Vec row, real noise_level, int n_last)
TVec< int > &	permutation_vector ()
virtual real	squareDifference (int i, int j)
	encoding (v is not one-hot, and the variables in v are in the "original" order return the square difference between row i and row j, excluding n_last columns
virtual real	dotProduct (int i, int j) const
	return the dot product of row i with row j, excluding n_last columns
virtual real	dot (int i1, int i2, int inputsize) const
virtual real	dot (int i, const Vec &v) const
	returns the result of the dot product between row i and the given vec (only v.length() first elements of row i are considered).
virtual void	encodeAndPutRow (int i, Vec v)
	(i.e. at position i in v we find variable variables_permutation[i] in getRow's result)
virtual void	putRow (int i, Vec v)
	v is possibly one-hot-encoded (according to one_hot_encoding flag) and the variables are in the same order as for getRow.
virtual void	putSubRow (int i, int j, Vec v)
virtual void	save (const string &filename)
	calls write
	PLEARN_DECLARE_OBJECT (CompactVMatrix)
	reverse of write, can be used by calling load(string)
void	makeDeepCopyFromShallowCopy (map< const void , void > &copies)
	Transforms a shallow copy into a deep copy.
virtual void	build ()
	nothing to do...
Public Attributes
TVec< int >	n_symbol_values
	for each 1-byte symbol, the number of possible values
int	n_last
	used by dotProduct and squareDifference to specify # of last columns to ignore
Protected Member Functions
virtual void	getNewRow (int i, const Vec &v) const
	decoding (v may be one-hot depending on one_hot_encoding flag)
Static Protected Member Functions
void	set_n_bits_in_byte ()
Protected Attributes
Storage< unsigned char >	data
	Each row of the matrix holds in order: bits, 1-byte symbols, fixed point numbers.
int	row_n_bytes
	# of bytes per row
int	n_bits
	number of binary symbols per row
int	n_symbols
	number of 1-byte symbols per row
int	n_fixedpoint
	number of fixed point numbers per row
int	n_variables
	= n_bits + n_symbols + n_fixedpoint
bool	one_hot_encoding
	the 1-byte symbols are converted to one-hot encoding by get
Vec	fixedpoint_min
Vec	fixedpoint_max
	the ranges of each number for fixed point encoding
Vec	delta
	(fixedpoint_max-fixedpoint_min)/2^16
TVec< int >	variables_permutation
	this variable is used only when constructed from VMat
int	normal_width
	the value of width_ when one_hot_encoding=true
int	symbols_offset
	where in each row the symbols start
int	fixedpoint_offset
	where in each row the fixed point numbers start
Vec	row_norms
	to cache the norms of the rows for squareDifference method
Static Protected Attributes
unsigned char	n_bits_in_byte [256]
	CompactVMatrix *.
Private Types
typedef RowBufferedVMatrix	inherited

Detailed Description

Like MemoryVMatrix this class holds the data in memory, but it tries to hold it compactly by using single bits for binary variables, single bytes for discrete variables whose number of possible values is less than 256, and unsigned shorts for the others, using a fixed point representation.

Definition at line 63 of file CompactVMatrix.h.

Member Typedef Documentation

typedef RowBufferedVMatrix PLearn::CompactVMatrix::inherited [private]

Reimplemented from PLearn::RowBufferedVMatrix.
Definition at line 65 of file CompactVMatrix.h.
Referenced by CompactVMatrix().

Constructor & Destructor Documentation

PLearn::CompactVMatrix::CompactVMatrix ( )

default constructor (for automatic deserialization)

Definition at line 79 of file CompactVMatrix.cc.

PLearn::CompactVMatrix::CompactVMatrix ( int the_length,

int n_variables,

int n_binary,

int n_nonbinary_discrete,

int n_fixed_point,

TVec< int > & n_symbolvalues,

Vec & fixed_point_min,

Vec & fixed_point_max,

bool one_hot_encoding = true

)

Definition at line 85 of file CompactVMatrix.cc.
References data, delta, fixedpoint_max, fixedpoint_min, fixedpoint_offset, inherited, n_bits, n_symbol_values, n_symbols, n_variables, normal_width, one_hot_encoding, PLearn::Storage< unsigned char >::resize(), row_n_bytes, set_n_bits_in_byte(), setOneHotMode(), symbols_offset, variables_permutation, and PLearn::Vec.

PLearn::CompactVMatrix::CompactVMatrix ( VMat m,

int keep_last_variables_last = 1,

bool onehot_encoding = true

)

Convert a VMat into a CompactVMatrix: this will use the stats computed in the fieldstats of the VMatrix (they will be computed if not already) to figure out which variables are binary, discrete (and how many symbols), and the ranges of numeric variables. THE VMAT DISCRETE VARIABLES MUST NOT BE ALREADY ONE-HOT ENCODED. The variables will be permuted according to the permutation vector which can be retrieved from the variables_permutation_vector() method. By default the last column of the VMat will stay last, thus being coded as fixedpoint (so the permutation information may not be necessary if the last column represents a target and all the previous ones some inputs. keep_last_variables_last is the number of "last columns" to keep in place.
Definition at line 111 of file CompactVMatrix.cc.
References PLearn::VMFieldStat::counts, data, delta, encodeAndPutRow(), PLearn::endl(), fixedpoint_max, fixedpoint_min, fixedpoint_offset, PLearn::isMapKeysAreInt(), PLearn::VMFieldStat::max(), PLearn::VMFieldStat::min(), n_bits, n_fixedpoint, n_symbol_values, n_symbols, n_variables, normal_width, one_hot_encoding, PLERROR, PLearn::TVec< VMFieldStat >::resize(), PLearn::TVec< VMField >::resize(), PLearn::Storage< unsigned char >::resize(), PLearn::TVec< int >::resize(), PLearn::TVec< T >::resize(), row_n_bytes, set_n_bits_in_byte(), setOneHotMode(), symbols_offset, variables_permutation, and PLearn::VMat::width().

PLearn::CompactVMatrix::CompactVMatrix ( const string & filename,

int nlast = 1

)

construct from saved CompactVMatrix

Definition at line 211 of file CompactVMatrix.cc.
References PLearn::load(), n_last, and set_n_bits_in_byte().

PLearn::CompactVMatrix::CompactVMatrix ( CompactVMatrix * cvm,

VMat m,

bool rescale = false,

bool check = true

)

Create a CompactVMatrix with the same structure as cvcm but containing the data in m. Both must obviously have the same width. If rescale is true, then the min/max values for fixed-point encoding are recomputed. If check==true than this is verified and an error message is thrown if the floating point data are not in the expected ranges (of cvm).
Definition at line 218 of file CompactVMatrix.cc.
References PLearn::TVec< T >::copy(), data, delta, fixedpoint_max, fixedpoint_min, fixedpoint_offset, n_bits, n_fixedpoint, n_last, n_symbol_values, n_symbols, n_variables, normal_width, one_hot_encoding, PLERROR, putRow(), PLearn::Storage< unsigned char >::resize(), row_n_bytes, setOneHotMode(), symbols_offset, variables_permutation, PLearn::VMat::width(), and PLearn::VMatrix::width().

Member Function Documentation

void PLearn::CompactVMatrix::append ( CompactVMatrix * vm )

append vm to this VMatrix (the rows of vm are concatenated to the current rows of this VMatrix)

Definition at line 734 of file CompactVMatrix.cc.
References PLearn::TVec< T >::copy(), PLearn::Storage< unsigned char >::data, data, delta, PLearn::endl(), fixedpoint_max, fixedpoint_min, fixedpoint_offset, PLearn::RowBufferedVMatrix::getRow(), PLearn::VMatrix::length(), n_bits, n_fixedpoint, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, putRow(), PLearn::Storage< unsigned char >::resize(), row_n_bytes, setOneHotMode(), PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, PLearn::VMatrix::width(), and PLearn::write().

virtual void PLearn::CompactVMatrix::build ( ) [inline, virtual]

nothing to do...

Reimplemented from PLearn::VMatrix.
Definition at line 194 of file CompactVMatrix.h.

real PLearn::CompactVMatrix::dot ( int i,

const Vec & v

) const [virtual]

returns the result of the dot product between row i and the given vec (only v.length() first elements of row i are considered).

Reimplemented from PLearn::RowBufferedVMatrix.
Definition at line 414 of file CompactVMatrix.cc.
References PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, PLearn::dot(), PLearn::dot_product(), fixedpoint_min, fixedpoint_offset, PLearn::TVec< T >::length(), n_bits, n_fixedpoint, n_last, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, row_n_bytes, SANITYCHECK_CompactVMatrix_PRECISION, PLearn::TVec< T >::subVec(), symbols_offset, PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, and PLearn::VMatrix::width().

real PLearn::CompactVMatrix::dot ( int i1,

int i2,

int inputsize

) const [virtual]

returns the dot product between row i1 and row i2 (considering only the inputsize first elements). The default version in VMatrix is somewhat inefficient, as it repeatedly calls get(i,j) The default version in RowBufferedVMatrix is a little better as it buffers the 2 Vecs between calls in case one of them is needed again. But the real strength of this method is for specialised and efficient versions in subbclasses. This method is typically used by SmartKernels so that they can compute kernel values between input samples efficiently.
Reimplemented from PLearn::RowBufferedVMatrix.
Definition at line 340 of file CompactVMatrix.cc.
References PLearn::Storage< unsigned char >::data, data, delta, PLearn::dot_product(), fixedpoint_min, fixedpoint_offset, k, n_bits, n_bits_in_byte, n_fixedpoint, n_last, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, row_n_bytes, symbols_offset, PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, and PLearn::VMatrix::width().

real PLearn::CompactVMatrix::dotProduct ( int i,

int j

) const [virtual]

return the dot product of row i with row j, excluding n_last columns

Definition at line 483 of file CompactVMatrix.cc.
References PLearn::dot(), n_last, and PLearn::VMatrix::width().
Referenced by squareDifference().

void PLearn::CompactVMatrix::encodeAndPutRow ( int i,

Vec v

) [virtual]

(i.e. at position i in v we find variable variables_permutation[i] in getRow's result)

Definition at line 497 of file CompactVMatrix.cc.
References PLearn::TVec< int >::data(), PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, fixedpoint_min, fixedpoint_offset, n_bits, n_fixedpoint, n_symbol_values, n_symbols, PLERROR, row_n_bytes, symbols_offset, val, and variables_permutation.
Referenced by CompactVMatrix().

void PLearn::CompactVMatrix::getNewRow ( int i,

const Vec & v

) const [protected, virtual]

decoding (v may be one-hot depending on one_hot_encoding flag)

Implements PLearn::RowBufferedVMatrix.
Definition at line 288 of file CompactVMatrix.cc.
References PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, fixedpoint_min, fixedpoint_offset, PLearn::TVec< T >::length(), n_bits, n_fixedpoint, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, row_n_bytes, symbols_offset, PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, and PLearn::VMatrix::width().

void PLearn::CompactVMatrix::makeDeepCopyFromShallowCopy ( map< const void *, void * > & copies ) [virtual]

Transforms a shallow copy into a deep copy.

Reimplemented from PLearn::RowBufferedVMatrix.
Definition at line 834 of file CompactVMatrix.cc.
References data, PLearn::deepCopyField(), fixedpoint_max, fixedpoint_min, n_symbol_values, and variables_permutation.

int PLearn::CompactVMatrix::nbits ( ) [inline]

Definition at line 80 of file CompactVMatrix.h.
References n_bits.

int PLearn::CompactVMatrix::nfixedpoint ( ) [inline]

Definition at line 82 of file CompactVMatrix.h.
References n_fixedpoint.

int PLearn::CompactVMatrix::nsymbols ( ) [inline]

Definition at line 81 of file CompactVMatrix.h.
References n_symbols.

TVec<int>& PLearn::CompactVMatrix::permutation_vector ( ) [inline]

this vector is filled only when the CompactVMatrix was constructed from a VMat, and it provides the permutation of the original columns to order them into (bits, bytes, fixedpoint)
Definition at line 156 of file CompactVMatrix.h.
References variables_permutation.

void PLearn::CompactVMatrix::perturb ( int i,

Vec row,

real noise_level,

int n_last

)

create in the elements of row (except the n_last ones) a perturbed version of the i-th row of the database. This random perturbation is based on the unconditional statistics which should be present in the fieldstats; the noise level can be modulated with the noise_level argument (a value of 1 will perturb by as much as the noise seen in the unconditional statistics). Continuous variables are resampled around the current value with sigma = noise_leve * unconditional_sigma. Discrete variables are resampled with a distribution that is a mixture: (1-noise_level)*(probability mass on all current value)+noise_level*(unconditional distr)
Definition at line 606 of file CompactVMatrix.cc.
References PLearn::binomial_sample(), PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, fixedpoint_max, fixedpoint_min, fixedpoint_offset, PLearn::TVec< T >::length(), PLearn::multinomial_sample(), n_bits, n_fixedpoint, n_symbol_values, n_symbols, n_variables, PLearn::normal_sample(), one_hot_encoding, PLERROR, PLearn::VMFieldStat::prob(), PLearn::TVec< T >::resize(), row_n_bytes, PLearn::TVec< VMFieldStat >::size(), symbols_offset, PLearn::short_and_twobytes::twobytes, PLearn::short_and_twobytes::us, val, PLearn::var(), and PLearn::VMatrix::width().

PLearn::CompactVMatrix::PLEARN_DECLARE_OBJECT ( CompactVMatrix )

reverse of write, can be used by calling load(string)

void PLearn::CompactVMatrix::putRow ( int i,

Vec v

) [virtual]

v is possibly one-hot-encoded (according to one_hot_encoding flag) and the variables are in the same order as for getRow.

Reimplemented from PLearn::VMatrix.
Definition at line 531 of file CompactVMatrix.cc.
References putSubRow().
Referenced by append(), and CompactVMatrix().

void PLearn::CompactVMatrix::putSubRow ( int i,

int j,

Vec v

) [virtual]

It is suggested that this method be implemented in subclasses of writable matrices to speed up accesses (default version repeatedly calls put(i,j,value) which may have a significant overhead)
Reimplemented from PLearn::VMatrix.
Definition at line 536 of file CompactVMatrix.cc.
References PLearn::TVec< T >::data(), PLearn::Storage< unsigned char >::data, data, delta, fixedpoint_min, fixedpoint_offset, k, n_bits, n_fixedpoint, n_symbol_values, n_symbols, one_hot_encoding, PLERROR, row_n_bytes, symbols_offset, and val.
Referenced by putRow().

virtual void PLearn::CompactVMatrix::save ( const string & filename ) [inline, virtual]

calls write

Definition at line 185 of file CompactVMatrix.h.

void PLearn::CompactVMatrix::set_n_bits_in_byte ( ) [static, protected]

Definition at line 59 of file CompactVMatrix.cc.
References n_bits_in_byte.
Referenced by CompactVMatrix().

void PLearn::CompactVMatrix::setOneHotMode ( bool on = true )

Definition at line 279 of file CompactVMatrix.cc.
References n_variables, normal_width, and one_hot_encoding.
Referenced by append(), and CompactVMatrix().

real PLearn::CompactVMatrix::squareDifference ( int i,

int j

) [virtual]

encoding (v is not one-hot, and the variables in v are in the "original" order return the square difference between row i and row j, excluding n_last columns

Definition at line 486 of file CompactVMatrix.cc.
References dotProduct(), PLearn::TVec< T >::length(), row_norms, and PLearn::Vec.

Member Data Documentation

Storage<unsigned char> PLearn::CompactVMatrix::data [protected]

Each row of the matrix holds in order: bits, 1-byte symbols, fixed point numbers.

Definition at line 71 of file CompactVMatrix.h.
Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), makeDeepCopyFromShallowCopy(), perturb(), and putSubRow().

Vec PLearn::CompactVMatrix::delta [protected]

(fixedpoint_max-fixedpoint_min)/2^16

Definition at line 85 of file CompactVMatrix.h.
Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), perturb(), and putSubRow().

Vec PLearn::CompactVMatrix::fixedpoint_max [protected]

the ranges of each number for fixed point encoding

Definition at line 84 of file CompactVMatrix.h.
Referenced by append(), CompactVMatrix(), makeDeepCopyFromShallowCopy(), and perturb().

Vec PLearn::CompactVMatrix::fixedpoint_min [protected]

Definition at line 84 of file CompactVMatrix.h.
Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), makeDeepCopyFromShallowCopy(), perturb(), and putSubRow().

int PLearn::CompactVMatrix::fixedpoint_offset [protected]

where in each row the fixed point numbers start

Definition at line 199 of file CompactVMatrix.h.
Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), perturb(), and putSubRow().

int PLearn::CompactVMatrix::n_bits [protected]

number of binary symbols per row

Definition at line 73 of file CompactVMatrix.h.
Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), nbits(), perturb(), and putSubRow().

unsigned char PLearn::CompactVMatrix::n_bits_in_byte [static, protected]

CompactVMatrix *.
and provides the permutation of the original columns in order to order them into (bits, bytes, fixedpoint) variables_permutation[new_column]=old_column (not in one-hot code)
Definition at line 57 of file CompactVMatrix.cc.
Referenced by dot(), and set_n_bits_in_byte().

int PLearn::CompactVMatrix::n_fixedpoint [protected]

number of fixed point numbers per row

Definition at line 75 of file CompactVMatrix.h.
Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), nfixedpoint(), perturb(), and putSubRow().

int PLearn::CompactVMatrix::n_last

used by dotProduct and squareDifference to specify # of last columns to ignore

Definition at line 95 of file CompactVMatrix.h.
Referenced by CompactVMatrix(), dot(), and dotProduct().

TVec<int> PLearn::CompactVMatrix::n_symbol_values

for each 1-byte symbol, the number of possible values

Definition at line 79 of file CompactVMatrix.h.
Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), makeDeepCopyFromShallowCopy(), perturb(), and putSubRow().

int PLearn::CompactVMatrix::n_symbols [protected]

number of 1-byte symbols per row

Definition at line 74 of file CompactVMatrix.h.
Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), nsymbols(), perturb(), and putSubRow().

int PLearn::CompactVMatrix::n_variables [protected]

= n_bits + n_symbols + n_fixedpoint

Definition at line 76 of file CompactVMatrix.h.
Referenced by CompactVMatrix(), perturb(), and setOneHotMode().

int PLearn::CompactVMatrix::normal_width [protected]

the value of width_ when one_hot_encoding=true

Definition at line 97 of file CompactVMatrix.h.
Referenced by CompactVMatrix(), and setOneHotMode().

bool PLearn::CompactVMatrix::one_hot_encoding [protected]

the 1-byte symbols are converted to one-hot encoding by get

Definition at line 77 of file CompactVMatrix.h.
Referenced by append(), CompactVMatrix(), dot(), getNewRow(), perturb(), putSubRow(), and setOneHotMode().

int PLearn::CompactVMatrix::row_n_bytes [protected]

# of bytes per row

Definition at line 72 of file CompactVMatrix.h.
Referenced by append(), CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), perturb(), and putSubRow().

Vec PLearn::CompactVMatrix::row_norms [protected]

to cache the norms of the rows for squareDifference method

Definition at line 200 of file CompactVMatrix.h.
Referenced by squareDifference().

int PLearn::CompactVMatrix::symbols_offset [protected]

where in each row the symbols start

Definition at line 198 of file CompactVMatrix.h.
Referenced by CompactVMatrix(), dot(), encodeAndPutRow(), getNewRow(), perturb(), and putSubRow().

TVec<int> PLearn::CompactVMatrix::variables_permutation [protected]

this variable is used only when constructed from VMat

Definition at line 86 of file CompactVMatrix.h.
Referenced by CompactVMatrix(), encodeAndPutRow(), makeDeepCopyFromShallowCopy(), and permutation_vector().

The documentation for this class was generated from the following files:

Generated on Tue Aug 17 16:26:04 2004 for PLearn by

1.3.7

PLearn::CompactVMatrix Class Reference

Public Member Functions

Public Attributes

Protected Member Functions

Static Protected Member Functions

Protected Attributes

Static Protected Attributes

Private Types

Detailed Description

Member Typedef Documentation

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation