Main Page | Namespace List | Class Hierarchy | Alphabetical List | Class List | File List | Namespace Members | Class Members | File Members

PLearn::TextSenseSequenceVMatrix Class Reference

This class handles a sequence of words/sense tag/POS triplets to present it as target words and their context. More...

#include <TextSenseSequenceVMatrix.h>

Inheritance diagram for PLearn::TextSenseSequenceVMatrix:

Inheritance graph
[legend]
Collaboration diagram for PLearn::TextSenseSequenceVMatrix:

Collaboration graph
[legend]
List of all members.

Public Types

typedef RowBufferedVMatrix inherited

Public Member Functions

 TextSenseSequenceVMatrix ()
 Default constructor. After setting all options individually, build() should be called.

 TextSenseSequenceVMatrix (VMat that_dvm, int that_window_size, TVec< int > that_res_pos=TVec< int >(0), bool that_rand_syn=false, WordNetOntology *that_wno=NULL)
int getRestrictedRow (int i, Vec v) const
 This restricts the extraction of the context to the words that don't have their POS in res_pos and returns the position of the next non-overlapping context.

virtual void build ()
 Should call simply inherited::build(), then this class's build_().

virtual void makeDeepCopyFromShallowCopy (map< const void *, void * > &copies)
 Transforms a shallow copy into a deep copy.

void setOntology (WordNetOntology *that_wno)
 Sets the ontology.

void setWindowSize (int that_window_size)
 Sets the number of context words.

void setWordSequence (VMat that_dvm)
 Sets the VMatrix of word/sense_tag/POS sequence.

void setRandomGeneration (bool that_rand_syn)
 Sets the activation/desactivation of the random generation of contexts and target words.

void setRestrictedPOS (TVec< int > that_res_pos)
 Sets the vector of forbidden POS for the context words.

void setSentenceBoundary (int b)
 Sets the sentence boundary symbol.

void setUndefinedPOSId (int pos_id)
 Sets the undefined pos id.

 PLEARN_DECLARE_OBJECT (TextSenseSequenceVMatrix)
 Declares name and deepCopy methods.


Protected Member Functions

virtual void getNewRow (int i, const Vec &v) const
 This is the only method requiring implementation.


Static Protected Member Functions

void declareOptions (OptionList &ol)
 Declares this class' options.


Protected Attributes

VMat dvm
 The VMatrix containing the sequence of words or lemmas, with their POS and WordNet (optional) tags.

int window_size
 The number of context words.

bool is_supervised_data
 Indication that at less some of the words or lemmas are semantically disambiguated.

TVec< intres_pos
 The vector containing the forbidden POS of the words given in the context of a target word.

bool rand_syn
 Indication that examples can be randomly generated using random synonym replacements.

TVec< TVec< pair< int, real > > > word_given_sense_priors
 Probability of a word given it has some sense.

WordNetOntologywno
 Ontology of the sense tagging.

int my_current_row_index
 Index of the current row.

Vec my_current_row
 Elements of the current row.

bool keep_in_sentence
 Indication that the context must not spread over another sentence.

int sentence_boundary
 Sentence boundary symbol.

bool undefined_pos_set
 Indication that the undefined pos id is defined.

int undefined_pos
 Undefined pos id.


Private Member Functions

void build_ ()
 This does the actual building.

void permute (Vec v) const
 This permutes randomly the words (target and context) with one of their corresponding synonym.

void apply_boundary (const Vec &v) const
 This applies the sentence boundary.


Detailed Description

This class handles a sequence of words/sense tag/POS triplets to present it as target words and their context.

Definition at line 17 of file TextSenseSequenceVMatrix.h.


Member Typedef Documentation

typedef RowBufferedVMatrix PLearn::TextSenseSequenceVMatrix::inherited
 

Reimplemented from PLearn::RowBufferedVMatrix.

Definition at line 128 of file TextSenseSequenceVMatrix.h.

Referenced by TextSenseSequenceVMatrix().


Constructor & Destructor Documentation

PLearn::TextSenseSequenceVMatrix::TextSenseSequenceVMatrix  ) 
 

Default constructor. After setting all options individually, build() should be called.

Definition at line 9 of file TextSenseSequenceVMatrix.cc.

References inherited.

PLearn::TextSenseSequenceVMatrix::TextSenseSequenceVMatrix VMat  that_dvm,
int  that_window_size,
TVec< int that_res_pos = TVec<int>(0),
bool  that_rand_syn = false,
WordNetOntology that_wno = NULL
[inline]
 

Parameters:
that_dvm the sequence of words/lemmas
that_window_size the number of context words/lemmas
that_res_pos the forbidden POS for the context words
that_rand_syn indication that the user allow the random generation of contexts and target words using synonyms
that_wno the ontology used as a sense inventory

Definition at line 67 of file TextSenseSequenceVMatrix.h.

References build_(), dvm, is_supervised_data, keep_in_sentence, my_current_row, my_current_row_index, rand_syn, res_pos, undefined_pos_set, window_size, and wno.


Member Function Documentation

void PLearn::TextSenseSequenceVMatrix::apply_boundary const Vec v  )  const [private]
 

This applies the sentence boundary.

Definition at line 344 of file TextSenseSequenceVMatrix.cc.

References sentence_boundary, undefined_pos, undefined_pos_set, UNDEFINED_SS_ID, UNDEFINED_TYPE, and window_size.

Referenced by getNewRow(), and getRestrictedRow().

void PLearn::TextSenseSequenceVMatrix::build  )  [virtual]
 

Should call simply inherited::build(), then this class's build_().

This method should be callable again at later times, after modifying some option fields to change the "architecture" of the object.

Reimplemented from PLearn::VMatrix.

Definition at line 527 of file TextSenseSequenceVMatrix.cc.

References build_().

void PLearn::TextSenseSequenceVMatrix::build_  )  [private]
 

This does the actual building.

Reimplemented from PLearn::VMatrix.

Definition at line 454 of file TextSenseSequenceVMatrix.cc.

References PLearn::Set::begin(), dvm, PLearn::Set::end(), PLearn::TVec< TVec< pair< int, real > > >::first(), PLearn::WordNetOntology::getSenseSize(), PLearn::WordNetOntology::getWordsForSense(), PLearn::PP< VMatrix >::isNull(), PLearn::VMat::length(), PLERROR, PLWARNING, rand_syn, PLearn::TVec< TVec< pair< int, real > > >::resize(), PLearn::TVec< VMField >::resize(), PLearn::SetIterator, PLearn::Set::size(), PLearn::TVec< TVec< pair< int, real > > >::size(), PLearn::sum(), PLearn::VMat::width(), window_size, wno, and word_given_sense_priors.

Referenced by build(), and TextSenseSequenceVMatrix().

void PLearn::TextSenseSequenceVMatrix::declareOptions OptionList ol  )  [static, protected]
 

Declares this class' options.

Reimplemented from PLearn::VMatrix.

Definition at line 440 of file TextSenseSequenceVMatrix.cc.

References PLearn::declareOption(), and PLearn::OptionList.

void PLearn::TextSenseSequenceVMatrix::getNewRow int  i,
const Vec v
const [protected, virtual]
 

This is the only method requiring implementation.

Implements PLearn::RowBufferedVMatrix.

Definition at line 24 of file TextSenseSequenceVMatrix.cc.

References apply_boundary(), dvm, getRestrictedRow(), is_supervised_data, keep_in_sentence, PLearn::TVec< T >::length(), PLearn::VMat::length(), my_current_row, my_current_row_index, permute(), PLERROR, rand_syn, res_pos, PLearn::TVec< T >::size(), PLearn::TVec< int >::size(), SYNSETTAG_ID, undefined_pos, undefined_pos_set, UNDEFINED_SS_ID, UNDEFINED_TYPE, PLearn::Vec, PLearn::VMat::width(), and window_size.

int PLearn::TextSenseSequenceVMatrix::getRestrictedRow int  i,
Vec  v
const
 

This restricts the extraction of the context to the words that don't have their POS in res_pos and returns the position of the next non-overlapping context.

Definition at line 171 of file TextSenseSequenceVMatrix.cc.

References apply_boundary(), PLearn::TVec< int >::contains(), dvm, is_supervised_data, keep_in_sentence, PLearn::TVec< T >::length(), PLearn::VMat::length(), my_current_row, my_current_row_index, permute(), PLERROR, rand_syn, res_pos, PLearn::TVec< T >::size(), SYNSETTAG_ID, undefined_pos, undefined_pos_set, UNDEFINED_SS_ID, UNDEFINED_TYPE, PLearn::VMat::width(), and window_size.

Referenced by getNewRow().

void PLearn::TextSenseSequenceVMatrix::makeDeepCopyFromShallowCopy map< const void *, void * > &  copies  )  [virtual]
 

Transforms a shallow copy into a deep copy.

Reimplemented from PLearn::RowBufferedVMatrix.

Definition at line 533 of file TextSenseSequenceVMatrix.cc.

References PLearn::deepCopyField(), dvm, and res_pos.

void PLearn::TextSenseSequenceVMatrix::permute Vec  v  )  const [private]
 

This permutes randomly the words (target and context) with one of their corresponding synonym.

Definition at line 377 of file TextSenseSequenceVMatrix.cc.

References ADJ_TYPE, ADV_TYPE, PLearn::WordNetOntology::getSensesForWord(), PLearn::WordNetOntology::getWord(), PLearn::WordNetOntology::getWordId(), k, NOUN_TYPE, PLearn::TVec< T >::size(), PLearn::TVec< TVec< pair< int, real > > >::size(), PLearn::stemWord(), PLearn::sum(), PLearn::WordNetOntology::temp_word_to_adj_senses, PLearn::WordNetOntology::temp_word_to_adv_senses, PLearn::WordNetOntology::temp_word_to_noun_senses, PLearn::WordNetOntology::temp_word_to_verb_senses, UNDEFINED_TYPE, PLearn::uniform_sample(), VERB_TYPE, window_size, wno, and word_given_sense_priors.

Referenced by getNewRow(), and getRestrictedRow().

PLearn::TextSenseSequenceVMatrix::PLEARN_DECLARE_OBJECT TextSenseSequenceVMatrix   ) 
 

Declares name and deepCopy methods.

void PLearn::TextSenseSequenceVMatrix::setOntology WordNetOntology that_wno  )  [inline]
 

Sets the ontology.

Definition at line 108 of file TextSenseSequenceVMatrix.h.

References setOntology(), and wno.

Referenced by setOntology().

void PLearn::TextSenseSequenceVMatrix::setRandomGeneration bool  that_rand_syn  )  [inline]
 

Sets the activation/desactivation of the random generation of contexts and target words.

Definition at line 117 of file TextSenseSequenceVMatrix.h.

References rand_syn, and setRandomGeneration().

Referenced by setRandomGeneration().

void PLearn::TextSenseSequenceVMatrix::setRestrictedPOS TVec< int that_res_pos  )  [inline]
 

Sets the vector of forbidden POS for the context words.

Definition at line 120 of file TextSenseSequenceVMatrix.h.

References res_pos, and setRestrictedPOS().

Referenced by setRestrictedPOS().

void PLearn::TextSenseSequenceVMatrix::setSentenceBoundary int  b  )  [inline]
 

Sets the sentence boundary symbol.

Definition at line 123 of file TextSenseSequenceVMatrix.h.

References keep_in_sentence, sentence_boundary, and setSentenceBoundary().

Referenced by setSentenceBoundary().

void PLearn::TextSenseSequenceVMatrix::setUndefinedPOSId int  pos_id  )  [inline]
 

Sets the undefined pos id.

Definition at line 126 of file TextSenseSequenceVMatrix.h.

References setUndefinedPOSId(), undefined_pos, and undefined_pos_set.

Referenced by setUndefinedPOSId().

void PLearn::TextSenseSequenceVMatrix::setWindowSize int  that_window_size  )  [inline]
 

Sets the number of context words.

Definition at line 111 of file TextSenseSequenceVMatrix.h.

References setWindowSize(), and window_size.

Referenced by setWindowSize().

void PLearn::TextSenseSequenceVMatrix::setWordSequence VMat  that_dvm  )  [inline]
 

Sets the VMatrix of word/sense_tag/POS sequence.

Definition at line 114 of file TextSenseSequenceVMatrix.h.

References dvm, is_supervised_data, and setWordSequence().

Referenced by setWordSequence().


Member Data Documentation

VMat PLearn::TextSenseSequenceVMatrix::dvm [protected]
 

The VMatrix containing the sequence of words or lemmas, with their POS and WordNet (optional) tags.

Definition at line 25 of file TextSenseSequenceVMatrix.h.

Referenced by build_(), getNewRow(), getRestrictedRow(), makeDeepCopyFromShallowCopy(), setWordSequence(), and TextSenseSequenceVMatrix().

bool PLearn::TextSenseSequenceVMatrix::is_supervised_data [protected]
 

Indication that at less some of the words or lemmas are semantically disambiguated.

Definition at line 29 of file TextSenseSequenceVMatrix.h.

Referenced by getNewRow(), getRestrictedRow(), setWordSequence(), and TextSenseSequenceVMatrix().

bool PLearn::TextSenseSequenceVMatrix::keep_in_sentence [protected]
 

Indication that the context must not spread over another sentence.

Definition at line 43 of file TextSenseSequenceVMatrix.h.

Referenced by getNewRow(), getRestrictedRow(), setSentenceBoundary(), and TextSenseSequenceVMatrix().

Vec PLearn::TextSenseSequenceVMatrix::my_current_row [mutable, protected]
 

Elements of the current row.

Definition at line 41 of file TextSenseSequenceVMatrix.h.

Referenced by getNewRow(), getRestrictedRow(), and TextSenseSequenceVMatrix().

int PLearn::TextSenseSequenceVMatrix::my_current_row_index [mutable, protected]
 

Index of the current row.

Definition at line 39 of file TextSenseSequenceVMatrix.h.

Referenced by getNewRow(), getRestrictedRow(), and TextSenseSequenceVMatrix().

bool PLearn::TextSenseSequenceVMatrix::rand_syn [protected]
 

Indication that examples can be randomly generated using random synonym replacements.

Definition at line 33 of file TextSenseSequenceVMatrix.h.

Referenced by build_(), getNewRow(), getRestrictedRow(), setRandomGeneration(), and TextSenseSequenceVMatrix().

TVec<int> PLearn::TextSenseSequenceVMatrix::res_pos [protected]
 

The vector containing the forbidden POS of the words given in the context of a target word.

Definition at line 31 of file TextSenseSequenceVMatrix.h.

Referenced by getNewRow(), getRestrictedRow(), makeDeepCopyFromShallowCopy(), setRestrictedPOS(), and TextSenseSequenceVMatrix().

int PLearn::TextSenseSequenceVMatrix::sentence_boundary [protected]
 

Sentence boundary symbol.

Definition at line 45 of file TextSenseSequenceVMatrix.h.

Referenced by apply_boundary(), and setSentenceBoundary().

int PLearn::TextSenseSequenceVMatrix::undefined_pos [protected]
 

Undefined pos id.

Definition at line 49 of file TextSenseSequenceVMatrix.h.

Referenced by apply_boundary(), getNewRow(), getRestrictedRow(), and setUndefinedPOSId().

bool PLearn::TextSenseSequenceVMatrix::undefined_pos_set [protected]
 

Indication that the undefined pos id is defined.

Definition at line 47 of file TextSenseSequenceVMatrix.h.

Referenced by apply_boundary(), getNewRow(), getRestrictedRow(), setUndefinedPOSId(), and TextSenseSequenceVMatrix().

int PLearn::TextSenseSequenceVMatrix::window_size [protected]
 

The number of context words.

Definition at line 27 of file TextSenseSequenceVMatrix.h.

Referenced by apply_boundary(), build_(), getNewRow(), getRestrictedRow(), permute(), setWindowSize(), and TextSenseSequenceVMatrix().

WordNetOntology* PLearn::TextSenseSequenceVMatrix::wno [protected]
 

Ontology of the sense tagging.

Definition at line 37 of file TextSenseSequenceVMatrix.h.

Referenced by build_(), permute(), setOntology(), and TextSenseSequenceVMatrix().

TVec<TVec<pair<int, real> > > PLearn::TextSenseSequenceVMatrix::word_given_sense_priors [protected]
 

Probability of a word given it has some sense.

Definition at line 35 of file TextSenseSequenceVMatrix.h.

Referenced by build_(), and permute().


The documentation for this class was generated from the following files:
Generated on Tue Aug 17 16:27:38 2004 for PLearn by doxygen 1.3.7