Main Page | Namespace List | Class Hierarchy | Alphabetical List | Class List | File List | Namespace Members | Class Members | File Members

PLearn::SimpleDB< KeyType, QueryResult > Class Template Reference

#include <SimpleDB.h>

Inheritance diagram for PLearn::SimpleDB< KeyType, QueryResult >:

Inheritance graph
[legend]
Collaboration diagram for PLearn::SimpleDB< KeyType, QueryResult >:

Collaboration graph
[legend]
List of all members.

Public Types

typedef unsigned long RowNumber
 A row number in the database.

typedef unsigned long Offset
 A physical offset_into the database.

typedef QueryResult QueryResult_t
 make it available

typedef SimpleDBIndexKey<
KeyType > 
IndexKey
 An index is simply a hash table from IndexKey to QueryResult.

typedef Hash< IndexKey, QueryResult > Index
typedef PP< IndexPIndex
typedef vector< const unsigned
char * > 
vuc
 Use linear search to find multiple "lookfor".

enum  { InvalidRow = ULONG_MAX }
enum  AccessType { readwrite = 0, readonly = 1 }
 Whether the user is granted read/write or read-only access. More...


Public Member Functions

 SimpleDB (string rootname, string path=".", AccessType=readwrite, bool verbose=true)
 --- Constructors, etc.

virtual ~SimpleDB ()
string getName () const
 --- Functions dealing with database name and location

string getPath () const
void setSchema (const Schema &s)
 --- Functions dealing with schema representation.

const SchemagetSchema () const
void saveSchema ()
void loadSchema ()
bool findColumn (string name, int &position, int &start, int &precision) const
int indexOfField (const string &fieldname) const
 returns the index of the given field inside the Schema, -1 if not found

Row getRow (RowNumber) const
 --- Functions dealing with simple database queries

RowgetInRow (RowNumber, Row &) const
RowNumber size () const
 1 beyond maximum row number

int length () const
int width () const
void addRow (const Row &)
 add at end of database

void setRow (const Row &, RowNumber)
 set a particular row

void truncateFromRow (RowNumber n)
 erase all rows from row n (included) until end

bool indexColumn (string columnName, string secondColumn=string(""))
 --- Functions dealing with indexing and more complex queries

void clearIndex (string columnName)
 Clear an index from memory to free up space.

QueryResult findEqual (const unsigned char *lookfor, string columnName, string secondColumn=string(""))
const QueryResult & findEqualIndexed (const unsigned char *lookfor, string columnName, string secondColumn=string(""))
 Always use the index to find "lookfor".

QueryResult findEqualLinear (const unsigned char *lookfor, string columnName, string secondColumn=string(""))
 Use linear search to find "lookfor".

QueryResult findEqualLinear (const vuc &lookfor, string columnName, string secondColumn=string(""))
double tableSizeMultiplier () const
 Access the table size multiplier (see description below).

void tableSizeMultiplier (double x)

Static Public Attributes

const Offset AbsoluteFileLimit = 512ul * 1024ul * 1024ul - 1
 Maximum size that a single database file can hold; this is currently set to 512 MB.

QueryResult EmptyResult
 A null query.


Private Member Functions

void computeSize ()
 computes the size_ field (called by constructor)

void memoryToDisk (Row &) const
 convert a row from machine-dependent endianness to disk-standard little-endian format.

void diskToMemory (Row &) const
 convert a row from disk-standard little-endian format to machine-dependent endianness

int seekToRow (RowNumber) const
int seekToEnd () const
 seek to end of database, return FD to the correct file

void openAllFiles () const
 open all existing segments in database

void closeAllFiles () const
 close all open file descriptors

int lastSegment () const
 return the index (NOT FD!) of the last open segment

string getSegmentPath (int i) const
 return full path of segment i (zero-based)

 SimpleDB (const SimpleDB &)
 For now, disable copy construction and assignment.

void operator= (const SimpleDB &)

Private Attributes

string name
 database base name

string path
 database root path

AccessType access_type
 readwrite or readonly

int access_mask
 Unix constants for access_type.

RowNumber size_
 cached number of rows

Schema schema
 database schema

int row_size
 length of a row in bytes

RowNumber max_records_file
 maximum number of full rows in a single file

vector< intallfd
 File descriptors of open segments; -1 for not open.

double table_size_multiplier
 -- Indexing-related

vector< PIndexindexes
 There's one possible index per column in the database.

bool verbose
 print debugging info to cerr


Detailed Description

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
class PLearn::SimpleDB< KeyType, QueryResult >

Simple Database

This class permits the representation of a simple quasi-flat-file database in an efficient binary format, and enables indexing string columns to obtain the rows that match a given string.

Definition at line 505 of file SimpleDB.h.


Member Typedef Documentation

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
typedef Hash<IndexKey,QueryResult> PLearn::SimpleDB< KeyType, QueryResult >::Index
 

Definition at line 539 of file SimpleDB.h.

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
typedef SimpleDBIndexKey<KeyType> PLearn::SimpleDB< KeyType, QueryResult >::IndexKey
 

An index is simply a hash table from IndexKey to QueryResult.

Definition at line 538 of file SimpleDB.h.

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
typedef unsigned long PLearn::SimpleDB< KeyType, QueryResult >::Offset
 

A physical offset_into the database.

Definition at line 521 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::seekToEnd(), and PLearn::SimpleDB< KeyType, QueryResult >::seekToRow().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
typedef PP<Index> PLearn::SimpleDB< KeyType, QueryResult >::PIndex
 

Definition at line 540 of file SimpleDB.h.

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
typedef QueryResult PLearn::SimpleDB< KeyType, QueryResult >::QueryResult_t
 

make it available

Definition at line 534 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::indexColumn().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
typedef unsigned long PLearn::SimpleDB< KeyType, QueryResult >::RowNumber
 

A row number in the database.

Rows are numbered starting with 0. InvalidRow is a constant denoting an invalid row number.

Definition at line 515 of file SimpleDB.h.

Referenced by PLearn::SDBWithStats::computeStats(), PLearn::SimpleDB< KeyType, QueryResult >::findEqualLinear(), PLearn::SimpleDB< KeyType, QueryResult >::indexColumn(), and PLearn::SimpleDB< KeyType, QueryResult >::setSchema().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
typedef vector<const unsigned char*> PLearn::SimpleDB< KeyType, QueryResult >::vuc
 

Use linear search to find multiple "lookfor".

Definition at line 639 of file SimpleDB.h.


Member Enumeration Documentation

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
anonymous enum
 

Enumeration values:
InvalidRow 

Definition at line 516 of file SimpleDB.h.

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
enum PLearn::SimpleDB::AccessType
 

Whether the user is granted read/write or read-only access.

Enumeration values:
readwrite 
readonly 

Definition at line 524 of file SimpleDB.h.


Constructor & Destructor Documentation

template<class KT, class QR>
PLearn::SimpleDB< KT, QR >::SimpleDB string  rootname,
string  path = ".",
AccessType  = readwrite,
bool  verbose = true
 

--- Constructors, etc.

Definition at line 1214 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::access_mask, PLearn::SimpleDB< KeyType, QueryResult >::access_type, PLearn::SimpleDB< KeyType, QueryResult >::computeSize(), PLearn::SimpleDB< KeyType, QueryResult >::loadSchema(), PLearn::SimpleDB< KeyType, QueryResult >::name, PLearn::SimpleDB< KeyType, QueryResult >::openAllFiles(), PLearn::SimpleDB< KeyType, QueryResult >::path, PLearn::SimpleDB< KeyType, QueryResult >::readonly, PLearn::SimpleDB< KeyType, QueryResult >::readwrite, and slash.

template<class KT, class QR>
PLearn::SimpleDB< KT, QR >::~SimpleDB  )  [virtual]
 

Upon destroying the database, save the current schema

Definition at line 1239 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::closeAllFiles(), and PLearn::SimpleDB< KeyType, QueryResult >::saveSchema().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
PLearn::SimpleDB< KeyType, QueryResult >::SimpleDB const SimpleDB< KeyType, QueryResult > &   )  [private]
 

For now, disable copy construction and assignment.


Member Function Documentation

template<class KT, class QR>
void PLearn::SimpleDB< KT, QR >::addRow const Row row  ) 
 

add at end of database

Handle writing error

< increment length of db

Preserve database integrity by truncating from the point where we should have started writing

Definition at line 1363 of file SimpleDB.h.

References PLERROR, PLWARNING, PLearn::Row::raw(), PLearn::SimpleDB< KeyType, QueryResult >::row_size, PLearn::Row::sanitize(), PLearn::SimpleDB< KeyType, QueryResult >::seekToEnd(), PLearn::Row::size(), and PLearn::SimpleDB< KeyType, QueryResult >::size_.

template<class KT, class QR>
void PLearn::SimpleDB< KT, QR >::clearIndex string  columnName  ) 
 

Clear an index from memory to free up space.

Definition at line 1772 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::indexes.

template<class KT, class QR>
void PLearn::SimpleDB< KT, QR >::closeAllFiles  )  const [private]
 

close all open file descriptors

Definition at line 1644 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::allfd.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::openAllFiles(), PLearn::SimpleDB< KeyType, QueryResult >::setSchema(), PLearn::SimpleDB< KeyType, QueryResult >::truncateFromRow(), and PLearn::SimpleDB< KeyType, QueryResult >::~SimpleDB().

template<class KT, class QR>
void PLearn::SimpleDB< KT, QR >::computeSize  )  [private]
 

computes the size_ field (called by constructor)

! cerr << "computing size" << endl; if(row_size<=0) //!< schema not yet set size_ = 0; else { int last = lastSegment(); int fd = seekToEnd(); assert (fd != -1 && last >= 0); off_t pos = lseek(fd, 0ul, SEEK_CUR); assert(row_size > 0 && pos row_size == 0); size_ = pos / row_size + last * max_records_file; }

Definition at line 1487 of file SimpleDB.h.

References PLearn::file_size(), PLearn::SimpleDB< KeyType, QueryResult >::getSegmentPath(), PLearn::SimpleDB< KeyType, QueryResult >::row_size, and PLearn::SimpleDB< KeyType, QueryResult >::size_.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::setSchema(), PLearn::SimpleDB< KeyType, QueryResult >::SimpleDB(), and PLearn::SimpleDB< KeyType, QueryResult >::truncateFromRow().

template<class KT, class QR>
void PLearn::SimpleDB< KT, QR >::diskToMemory Row  )  const [private]
 

convert a row from disk-standard little-endian format to machine-dependent endianness

Definition at line 1547 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::getInRow().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
bool PLearn::SimpleDB< KeyType, QueryResult >::findColumn string  name,
int position,
int start,
int precision
const [inline]
 

Find a column by name, return true if found, and return the position, total bytes before in the row, and number of bytes in the column

Definition at line 573 of file SimpleDB.h.

References PLearn::Schema::findColumn(), PLearn::SimpleDB< KeyType, QueryResult >::findColumn(), and PLearn::SimpleDB< KeyType, QueryResult >::schema.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::findColumn().

template<class KT, class QR>
QR PLearn::SimpleDB< KT, QR >::findEqual const unsigned char *  lookfor,
string  columnName,
string  secondColumn = string("")
 

Find all rows in the database having the sequence of bytes contained in lookfor in the column columnName. (The number of bytes to match is determined by the precision of the field type of the specified column). If the column has been indexed before, use the index; otherwise, use linear search through the database.

Definition at line 1780 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::EmptyResult, and PLearn::SimpleDB< KeyType, QueryResult >::indexes.

template<class KT, class QR>
const QR & PLearn::SimpleDB< KT, QR >::findEqualIndexed const unsigned char *  lookfor,
string  columnName,
string  secondColumn = string("")
 

Always use the index to find "lookfor".

Definition at line 1794 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::EmptyResult, PLearn::Hash_UNUSED_TAG, PLearn::Hash< KeyType, DataType >::hashAddress(), PLearn::SimpleDB< KeyType, QueryResult >::indexes, and PLERROR.

template<class KT, class QR>
QR PLearn::SimpleDB< KT, QR >::findEqualLinear const vuc lookfor,
string  column_name,
string  second_column = string("")
 

Make a vector of keys from all strings to look for

Indeed search linearly...

Look among all keys to lookfor

(the -1 below assumes that the first column type is a string;--- should enforce this in the future)

Definition at line 1833 of file SimpleDB.h.

References PLearn::SimpleDBIndexKey< KeyType >::begin(), std::copy(), PLearn::SimpleDB< KeyType, QueryResult >::EmptyResult, PLearn::endl(), PLearn::SimpleDB< KeyType, QueryResult >::getInRow(), PLearn::Row::raw(), PLearn::SimpleDB< KeyType, QueryResult >::RowNumber, PLearn::SimpleDB< KeyType, QueryResult >::schema, PLearn::SimpleDB< KeyType, QueryResult >::size(), and PLearn::SimpleDB< KeyType, QueryResult >::verbose.

template<class KT, class QR>
QR PLearn::SimpleDB< KT, QR >::findEqualLinear const unsigned char *  lookfor,
string  columnName,
string  secondColumn = string("")
 

Use linear search to find "lookfor".

Definition at line 1822 of file SimpleDB.h.

template<class KT, class QR>
Row & PLearn::SimpleDB< KT, QR >::getInRow RowNumber  ,
Row
const
 

Definition at line 1463 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::diskToMemory(), PLERROR, PLearn::Row::raw(), PLearn::SimpleDB< KeyType, QueryResult >::row_size, PLearn::SimpleDB< KeyType, QueryResult >::seekToRow(), PLearn::Row::size(), and PLearn::SimpleDB< KeyType, QueryResult >::size().

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::findEqualLinear(), PLearn::AutoSDBVMatrix::getNewRow(), PLearn::SimpleDB< KeyType, QueryResult >::getRow(), PLearn::SDBVMatrix::getRow(), PLearn::halfShuffleRows(), PLearn::SimpleDB< KeyType, QueryResult >::indexColumn(), and PLearn::randomShuffleRows().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
string PLearn::SimpleDB< KeyType, QueryResult >::getName  )  const [inline]
 

--- Functions dealing with database name and location

Definition at line 551 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::name.

Referenced by PLearn::SDBWithStats::hasStats(), PLearn::SDBWithStats::loadStats(), and PLearn::SDBWithStats::saveStats().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
string PLearn::SimpleDB< KeyType, QueryResult >::getPath  )  const [inline]
 

Definition at line 554 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::path.

Referenced by PLearn::SDBWithStats::hasStats(), PLearn::SDBWithStats::loadStats(), and PLearn::SDBWithStats::saveStats().

template<class KT, class QR>
Row PLearn::SimpleDB< KT, QR >::getRow RowNumber   )  const
 

--- Functions dealing with simple database queries

Definition at line 1479 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::getInRow(), and PLearn::SimpleDB< KeyType, QueryResult >::schema.

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
const Schema& PLearn::SimpleDB< KeyType, QueryResult >::getSchema  )  const [inline]
 

Definition at line 562 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::schema.

Referenced by PLearn::AutoSDBVMatrix::AutoSDBVMatrix(), PLearn::SDBWithStats::computeStats(), PLearn::SDBWithStats::fieldname(), PLearn::AutoSDBVMatrix::getMappings(), PLearn::halfShuffleRows(), PLearn::SDBWithStats::nfields(), PLearn::randomShuffleRows(), PLearn::SDBVMatrix::SDBVMatrix(), and PLearn::SDBWithStats::SDBWithStats().

template<class KT, class QR>
string PLearn::SimpleDB< KT, QR >::getSegmentPath int  i  )  const [private]
 

return full path of segment i (zero-based)

Definition at line 1663 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::name, PLearn::SimpleDB< KeyType, QueryResult >::path, and PLERROR.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::computeSize(), PLearn::SimpleDB< KeyType, QueryResult >::openAllFiles(), PLearn::SimpleDB< KeyType, QueryResult >::seekToEnd(), PLearn::SimpleDB< KeyType, QueryResult >::seekToRow(), and PLearn::SimpleDB< KeyType, QueryResult >::truncateFromRow().

template<class KT, class QR>
bool PLearn::SimpleDB< KT, QR >::indexColumn string  columnName,
string  secondColumn = string("")
 

--- Functions dealing with indexing and more complex queries

Index according to the field having a given name. Return TRUE if indexing has been successful. At the moment, indexes are strictly kept in memory; they are not saved to disk. Optionally, the contents of a second column can be concatenated for indexing purposes.

Add all records to the index. Create one if necessary. Make initial size a constant times the number of records in the DB, as a heuristic.

If current key already exists, just add current row number; otherwise create new query result (the -1 below assumes that the first column type is a string;--- should enforce this in the future).

Avoid creating a one-element QueryResult and then copy it over. We add an EMPTY key/value pair to the hash, table, then access the value object (through the normal code to add a new row into an existing key, then set the row i into it.

Definition at line 1681 of file SimpleDB.h.

References PLearn::Hash< KeyType, DataType >::add(), PLearn::SimpleDBIndexKey< KeyType >::begin(), std::copy(), PLearn::Hash< KeyType, DataType >::diagnostics(), PLearn::endl(), PLearn::Hash< KeyType, DataType >::flush(), PLearn::SimpleDB< KeyType, QueryResult >::getInRow(), PLearn::Hash_UNUSED_TAG, PLearn::Hash< KeyType, DataType >::hashAddress(), PLearn::SimpleDB< KeyType, QueryResult >::indexes, PLWARNING, PLearn::SimpleDB< KeyType, QueryResult >::QueryResult_t, PLearn::Row::raw(), PLearn::Hash< KeyType, DataType >::resize(), PLearn::SimpleDB< KeyType, QueryResult >::RowNumber, PLearn::SimpleDB< KeyType, QueryResult >::schema, PLearn::SimpleDB< KeyType, QueryResult >::size(), PLearn::SimpleDB< KeyType, QueryResult >::table_size_multiplier, and PLearn::SimpleDB< KeyType, QueryResult >::verbose.

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
int PLearn::SimpleDB< KeyType, QueryResult >::indexOfField const string fieldname  )  const [inline]
 

returns the index of the given field inside the Schema, -1 if not found

Definition at line 580 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::indexOfField(), and PLearn::SimpleDB< KeyType, QueryResult >::schema.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::indexOfField().

template<class KT, class QR>
int PLearn::SimpleDB< KT, QR >::lastSegment  )  const [inline, private]
 

return the index (NOT FD!) of the last open segment

Definition at line 1656 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::allfd.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::seekToEnd(), PLearn::SimpleDB< KeyType, QueryResult >::seekToRow(), and PLearn::SimpleDB< KeyType, QueryResult >::truncateFromRow().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
int PLearn::SimpleDB< KeyType, QueryResult >::length  )  const [inline]
 

Definition at line 589 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::size().

Referenced by PLearn::AutoSDBVMatrix::AutoSDBVMatrix(), PLearn::halfShuffleRows(), and PLearn::randomShuffleRows().

template<class KT, class QR>
void PLearn::SimpleDB< KT, QR >::loadSchema  ) 
 

Definition at line 1321 of file SimpleDB.h.

References PLearn::CharacterType, PLearn::DateType, PLearn::DoubleType, PLearn::endl(), PLearn::FloatType, PLearn::IntType, PLearn::lowerstring(), PLearn::SimpleDB< KeyType, QueryResult >::name, PLearn::SimpleDB< KeyType, QueryResult >::path, PLearn::SimpleDB< KeyType, QueryResult >::schema, PLearn::ShortType, PLearn::SignedCharType, and PLearn::StringType.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::SimpleDB().

template<class KT, class QR>
void PLearn::SimpleDB< KT, QR >::memoryToDisk Row  )  const [private]
 

convert a row from machine-dependent endianness to disk-standard little-endian format.

Definition at line 1521 of file SimpleDB.h.

References PLearn::RowIterator::asDate(), PLearn::RowIterator::asDouble(), PLearn::RowIterator::asFloat(), PLearn::RowIterator::asInt(), PLearn::RowIterator::asShort(), PLearn::Row::begin(), PLearn::Row::end(), PLearn::Row::iterator, PLearn::reverse_double(), PLearn::reverse_float(), PLearn::reverse_int(), PLearn::reverse_short(), and x.

template<class KT, class QR>
void PLearn::SimpleDB< KT, QR >::openAllFiles  )  const [private]
 

open all existing segments in database

< safeguard

Since we don't know in advance how many files are in the database, we start by opening segment zero (which always exists), and then successively try to open more segments

Definition at line 1614 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::access_mask, PLearn::SimpleDB< KeyType, QueryResult >::allfd, c_str(), PLearn::SimpleDB< KeyType, QueryResult >::closeAllFiles(), PLearn::SimpleDB< KeyType, QueryResult >::getSegmentPath(), open, and PLERROR.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::setSchema(), PLearn::SimpleDB< KeyType, QueryResult >::SimpleDB(), and PLearn::SimpleDB< KeyType, QueryResult >::truncateFromRow().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
void PLearn::SimpleDB< KeyType, QueryResult >::operator= const SimpleDB< KeyType, QueryResult > &   )  [private]
 

template<class KT, class QR>
void PLearn::SimpleDB< KT, QR >::saveSchema  ) 
 

Definition at line 1270 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::access_type, PLearn::CharacterType, PLearn::DateType, PLearn::DoubleType, PLearn::endl(), PLearn::FloatType, PLearn::IntType, PLearn::SimpleDB< KeyType, QueryResult >::name, PLearn::SimpleDB< KeyType, QueryResult >::path, PLERROR, PLearn::SimpleDB< KeyType, QueryResult >::readwrite, PLearn::SimpleDB< KeyType, QueryResult >::schema, PLearn::ShortType, PLearn::SignedCharType, PLearn::StringType, and PLearn::Unknown.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::~SimpleDB().

template<class KT, class QR>
int PLearn::SimpleDB< KT, QR >::seekToEnd  )  const [private]
 

seek to end of database, return FD to the correct file

cout << "seekToEnd in db " << getName() << endl;

There is a slight subtlety here: if the last segment is full, we have to seek to the beginning of the next segment

Definition at line 1589 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::allfd, c_str(), PLearn::SimpleDB< KeyType, QueryResult >::getSegmentPath(), PLearn::SimpleDB< KeyType, QueryResult >::lastSegment(), PLearn::SimpleDB< KeyType, QueryResult >::max_records_file, PLearn::SimpleDB< KeyType, QueryResult >::Offset, PLERROR, PLearn::SimpleDB< KeyType, QueryResult >::row_size, and PLearn::SimpleDB< KeyType, QueryResult >::seekToRow().

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::addRow().

template<class KT, class QR>
int PLearn::SimpleDB< KT, QR >::seekToRow RowNumber   )  const [private]
 

given a row number, seek to the beginning of the row of the correct underlying physical file (segment), and return a file descriptor to the file; open a new file if necessary

Definition at line 1556 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::access_mask, PLearn::SimpleDB< KeyType, QueryResult >::allfd, c_str(), PLearn::SimpleDB< KeyType, QueryResult >::getSegmentPath(), PLearn::SimpleDB< KeyType, QueryResult >::lastSegment(), PLearn::SimpleDB< KeyType, QueryResult >::max_records_file, PLearn::SimpleDB< KeyType, QueryResult >::Offset, open, PLERROR, and PLearn::SimpleDB< KeyType, QueryResult >::row_size.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::getInRow(), PLearn::SimpleDB< KeyType, QueryResult >::seekToEnd(), PLearn::SimpleDB< KeyType, QueryResult >::setRow(), and PLearn::SimpleDB< KeyType, QueryResult >::truncateFromRow().

template<class KT, class QR>
void PLearn::SimpleDB< KT, QR >::setRow const Row row,
RowNumber  n
 

set a particular row

Handle writing error

Definition at line 1398 of file SimpleDB.h.

References PLERROR, PLearn::Row::raw(), PLearn::SimpleDB< KeyType, QueryResult >::row_size, PLearn::Row::sanitize(), PLearn::SimpleDB< KeyType, QueryResult >::seekToRow(), PLearn::Row::size(), and PLearn::SimpleDB< KeyType, QueryResult >::size().

Referenced by PLearn::halfShuffleRows(), and PLearn::randomShuffleRows().

template<class KT, class QR>
void PLearn::SimpleDB< KT, QR >::setSchema const Schema s  ) 
 

--- Functions dealing with schema representation.

cout << "In setSchema for db " << getName() << " row_size=" << row_size << " max_records_file=" << max_records_file << endl; Reopen files with new schema in effect

Compute the maximum size of a single file, taking into account the row size.

Definition at line 1247 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::AbsoluteFileLimit, PLearn::SimpleDB< KeyType, QueryResult >::closeAllFiles(), PLearn::SimpleDB< KeyType, QueryResult >::computeSize(), PLearn::SimpleDB< KeyType, QueryResult >::indexes, PLearn::SimpleDB< KeyType, QueryResult >::max_records_file, PLearn::SimpleDB< KeyType, QueryResult >::openAllFiles(), PLearn::SimpleDB< KeyType, QueryResult >::row_size, PLearn::SimpleDB< KeyType, QueryResult >::RowNumber, PLearn::SimpleDB< KeyType, QueryResult >::schema, and PLearn::Row::size().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
RowNumber PLearn::SimpleDB< KeyType, QueryResult >::size  )  const [inline]
 

1 beyond maximum row number

Definition at line 587 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::size_.

Referenced by PLearn::SDBWithStats::computeStats(), PLearn::SimpleDB< KeyType, QueryResult >::findEqualLinear(), PLearn::SimpleDB< KeyType, QueryResult >::getInRow(), PLearn::SimpleDB< KeyType, QueryResult >::indexColumn(), PLearn::SimpleDB< KeyType, QueryResult >::length(), PLearn::SDBVMatrix::SDBVMatrix(), PLearn::SDBWithStats::SDBWithStats(), and PLearn::SimpleDB< KeyType, QueryResult >::setRow().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
void PLearn::SimpleDB< KeyType, QueryResult >::tableSizeMultiplier double  x  )  [inline]
 

Definition at line 648 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::table_size_multiplier, and x.

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
double PLearn::SimpleDB< KeyType, QueryResult >::tableSizeMultiplier  )  const [inline]
 

Access the table size multiplier (see description below).

Definition at line 645 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::table_size_multiplier.

template<class KT, class QR>
void PLearn::SimpleDB< KT, QR >::truncateFromRow RowNumber  n  ) 
 

erase all rows from row n (included) until end

We must perform the following: seek to the proper row, truncate the current file, unlink all remaining files until end, close everything, re-open everything.

Definition at line 1423 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::allfd, c_str(), PLearn::SimpleDB< KeyType, QueryResult >::closeAllFiles(), PLearn::SimpleDB< KeyType, QueryResult >::computeSize(), PLearn::find(), PLearn::SimpleDB< KeyType, QueryResult >::getSegmentPath(), PLearn::SimpleDB< KeyType, QueryResult >::lastSegment(), PLearn::SimpleDB< KeyType, QueryResult >::openAllFiles(), PLearn::SimpleDB< KeyType, QueryResult >::path, PLERROR, PLWARNING, PLearn::SimpleDB< KeyType, QueryResult >::seekToRow(), and PLearn::tostring().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
int PLearn::SimpleDB< KeyType, QueryResult >::width  )  const [inline]
 

Definition at line 592 of file SimpleDB.h.

References PLearn::SimpleDB< KeyType, QueryResult >::schema.

Referenced by PLearn::AutoSDBVMatrix::AutoSDBVMatrix(), PLearn::SDBWithStats::forgetStats(), PLearn::SDBWithStats::getStat(), and PLearn::AutoSDBVMatrix::nstrings().


Member Data Documentation

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
const Offset PLearn::SimpleDB< KeyType, QueryResult >::AbsoluteFileLimit = 512ul * 1024ul * 1024ul - 1 [static]
 

Maximum size that a single database file can hold; this is currently set to 512 MB.

Definition at line 531 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::setSchema().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
int PLearn::SimpleDB< KeyType, QueryResult >::access_mask [private]
 

Unix constants for access_type.

Definition at line 692 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::openAllFiles(), PLearn::SimpleDB< KeyType, QueryResult >::seekToRow(), and PLearn::SimpleDB< KeyType, QueryResult >::SimpleDB().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
AccessType PLearn::SimpleDB< KeyType, QueryResult >::access_type [private]
 

readwrite or readonly

Definition at line 691 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::saveSchema(), and PLearn::SimpleDB< KeyType, QueryResult >::SimpleDB().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
vector<int> PLearn::SimpleDB< KeyType, QueryResult >::allfd [mutable, private]
 

File descriptors of open segments; -1 for not open.

Definition at line 702 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::closeAllFiles(), PLearn::SimpleDB< KeyType, QueryResult >::lastSegment(), PLearn::SimpleDB< KeyType, QueryResult >::openAllFiles(), PLearn::SimpleDB< KeyType, QueryResult >::seekToEnd(), PLearn::SimpleDB< KeyType, QueryResult >::seekToRow(), and PLearn::SimpleDB< KeyType, QueryResult >::truncateFromRow().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
QR PLearn::SimpleDB< KT, QR >::EmptyResult [static]
 

A null query.

Definition at line 1207 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::findEqual(), PLearn::SimpleDB< KeyType, QueryResult >::findEqualIndexed(), and PLearn::SimpleDB< KeyType, QueryResult >::findEqualLinear().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
vector<PIndex> PLearn::SimpleDB< KeyType, QueryResult >::indexes [private]
 

There's one possible index per column in the database.

An index is made only when indexColumn is called for a particular column.

Definition at line 715 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::clearIndex(), PLearn::SimpleDB< KeyType, QueryResult >::findEqual(), PLearn::SimpleDB< KeyType, QueryResult >::findEqualIndexed(), PLearn::SimpleDB< KeyType, QueryResult >::indexColumn(), and PLearn::SimpleDB< KeyType, QueryResult >::setSchema().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
RowNumber PLearn::SimpleDB< KeyType, QueryResult >::max_records_file [private]
 

maximum number of full rows in a single file

Definition at line 699 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::seekToEnd(), PLearn::SimpleDB< KeyType, QueryResult >::seekToRow(), and PLearn::SimpleDB< KeyType, QueryResult >::setSchema().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
string PLearn::SimpleDB< KeyType, QueryResult >::name [private]
 

database base name

Definition at line 689 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::getName(), PLearn::SimpleDB< KeyType, QueryResult >::getSegmentPath(), PLearn::SimpleDB< KeyType, QueryResult >::loadSchema(), PLearn::SimpleDB< KeyType, QueryResult >::saveSchema(), and PLearn::SimpleDB< KeyType, QueryResult >::SimpleDB().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
string PLearn::SimpleDB< KeyType, QueryResult >::path [private]
 

database root path

Definition at line 690 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::getPath(), PLearn::SimpleDB< KeyType, QueryResult >::getSegmentPath(), PLearn::SimpleDB< KeyType, QueryResult >::loadSchema(), PLearn::SimpleDB< KeyType, QueryResult >::saveSchema(), PLearn::SimpleDB< KeyType, QueryResult >::SimpleDB(), and PLearn::SimpleDB< KeyType, QueryResult >::truncateFromRow().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
int PLearn::SimpleDB< KeyType, QueryResult >::row_size [private]
 

length of a row in bytes

Definition at line 698 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::addRow(), PLearn::SimpleDB< KeyType, QueryResult >::computeSize(), PLearn::SimpleDB< KeyType, QueryResult >::getInRow(), PLearn::SimpleDB< KeyType, QueryResult >::seekToEnd(), PLearn::SimpleDB< KeyType, QueryResult >::seekToRow(), PLearn::SimpleDB< KeyType, QueryResult >::setRow(), and PLearn::SimpleDB< KeyType, QueryResult >::setSchema().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
Schema PLearn::SimpleDB< KeyType, QueryResult >::schema [private]
 

database schema

Definition at line 697 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::findColumn(), PLearn::SimpleDB< KeyType, QueryResult >::findEqualLinear(), PLearn::SimpleDB< KeyType, QueryResult >::getRow(), PLearn::SimpleDB< KeyType, QueryResult >::getSchema(), PLearn::SimpleDB< KeyType, QueryResult >::indexColumn(), PLearn::SimpleDB< KeyType, QueryResult >::indexOfField(), PLearn::SimpleDB< KeyType, QueryResult >::loadSchema(), PLearn::SimpleDB< KeyType, QueryResult >::saveSchema(), PLearn::SimpleDB< KeyType, QueryResult >::setSchema(), and PLearn::SimpleDB< KeyType, QueryResult >::width().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
RowNumber PLearn::SimpleDB< KeyType, QueryResult >::size_ [private]
 

cached number of rows

Definition at line 693 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::addRow(), PLearn::SimpleDB< KeyType, QueryResult >::computeSize(), and PLearn::SimpleDB< KeyType, QueryResult >::size().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
double PLearn::SimpleDB< KeyType, QueryResult >::table_size_multiplier [private]
 

-- Indexing-related

The multiplier factor for converting the number of database entries into the size of the hash table for indexing. By default, 1.8, but should be considerably less if there are many repeated keys.

Definition at line 711 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::indexColumn(), and PLearn::SimpleDB< KeyType, QueryResult >::tableSizeMultiplier().

template<class KeyType = TinyVector<unsigned char, 8>, class QueryResult = TinyVector<unsigned int, 4>>
bool PLearn::SimpleDB< KeyType, QueryResult >::verbose [private]
 

print debugging info to cerr

Definition at line 718 of file SimpleDB.h.

Referenced by PLearn::SimpleDB< KeyType, QueryResult >::findEqualLinear(), and PLearn::SimpleDB< KeyType, QueryResult >::indexColumn().


The documentation for this class was generated from the following file:
Generated on Tue Aug 17 16:23:04 2004 for PLearn by doxygen 1.3.7