#include <TextSenseSequenceVMatrix.h>
Inheritance diagram for PLearn::TextSenseSequenceVMatrix:
Public Types | |
typedef RowBufferedVMatrix | inherited |
Public Member Functions | |
TextSenseSequenceVMatrix () | |
Default constructor. After setting all options individually, build() should be called. | |
TextSenseSequenceVMatrix (VMat that_dvm, int that_window_size, TVec< int > that_res_pos=TVec< int >(0), bool that_rand_syn=false, WordNetOntology *that_wno=NULL) | |
int | getRestrictedRow (int i, Vec v) const |
This restricts the extraction of the context to the words that don't have their POS in res_pos and returns the position of the next non-overlapping context. | |
virtual void | build () |
Should call simply inherited::build(), then this class's build_(). | |
virtual void | makeDeepCopyFromShallowCopy (map< const void *, void * > &copies) |
Transforms a shallow copy into a deep copy. | |
void | setOntology (WordNetOntology *that_wno) |
Sets the ontology. | |
void | setWindowSize (int that_window_size) |
Sets the number of context words. | |
void | setWordSequence (VMat that_dvm) |
Sets the VMatrix of word/sense_tag/POS sequence. | |
void | setRandomGeneration (bool that_rand_syn) |
Sets the activation/desactivation of the random generation of contexts and target words. | |
void | setRestrictedPOS (TVec< int > that_res_pos) |
Sets the vector of forbidden POS for the context words. | |
void | setSentenceBoundary (int b) |
Sets the sentence boundary symbol. | |
void | setUndefinedPOSId (int pos_id) |
Sets the undefined pos id. | |
PLEARN_DECLARE_OBJECT (TextSenseSequenceVMatrix) | |
Declares name and deepCopy methods. | |
Protected Member Functions | |
virtual void | getNewRow (int i, const Vec &v) const |
This is the only method requiring implementation. | |
Static Protected Member Functions | |
void | declareOptions (OptionList &ol) |
Declares this class' options. | |
Protected Attributes | |
VMat | dvm |
The VMatrix containing the sequence of words or lemmas, with their POS and WordNet (optional) tags. | |
int | window_size |
The number of context words. | |
bool | is_supervised_data |
Indication that at less some of the words or lemmas are semantically disambiguated. | |
TVec< int > | res_pos |
The vector containing the forbidden POS of the words given in the context of a target word. | |
bool | rand_syn |
Indication that examples can be randomly generated using random synonym replacements. | |
TVec< TVec< pair< int, real > > > | word_given_sense_priors |
Probability of a word given it has some sense. | |
WordNetOntology * | wno |
Ontology of the sense tagging. | |
int | my_current_row_index |
Index of the current row. | |
Vec | my_current_row |
Elements of the current row. | |
bool | keep_in_sentence |
Indication that the context must not spread over another sentence. | |
int | sentence_boundary |
Sentence boundary symbol. | |
bool | undefined_pos_set |
Indication that the undefined pos id is defined. | |
int | undefined_pos |
Undefined pos id. | |
Private Member Functions | |
void | build_ () |
This does the actual building. | |
void | permute (Vec v) const |
This permutes randomly the words (target and context) with one of their corresponding synonym. | |
void | apply_boundary (const Vec &v) const |
This applies the sentence boundary. |
Definition at line 17 of file TextSenseSequenceVMatrix.h.
|
Reimplemented from PLearn::RowBufferedVMatrix. Definition at line 128 of file TextSenseSequenceVMatrix.h. Referenced by TextSenseSequenceVMatrix(). |
|
Default constructor. After setting all options individually, build() should be called.
Definition at line 9 of file TextSenseSequenceVMatrix.cc. References inherited. |
|
Definition at line 67 of file TextSenseSequenceVMatrix.h. References build_(), dvm, is_supervised_data, keep_in_sentence, my_current_row, my_current_row_index, rand_syn, res_pos, undefined_pos_set, window_size, and wno. |
|
This applies the sentence boundary.
Definition at line 344 of file TextSenseSequenceVMatrix.cc. References sentence_boundary, undefined_pos, undefined_pos_set, UNDEFINED_SS_ID, UNDEFINED_TYPE, and window_size. Referenced by getNewRow(), and getRestrictedRow(). |
|
Should call simply inherited::build(), then this class's build_(). This method should be callable again at later times, after modifying some option fields to change the "architecture" of the object. Reimplemented from PLearn::VMatrix. Definition at line 527 of file TextSenseSequenceVMatrix.cc. References build_(). |
|
This does the actual building.
Reimplemented from PLearn::VMatrix. Definition at line 454 of file TextSenseSequenceVMatrix.cc. References PLearn::Set::begin(), dvm, PLearn::Set::end(), PLearn::TVec< TVec< pair< int, real > > >::first(), PLearn::WordNetOntology::getSenseSize(), PLearn::WordNetOntology::getWordsForSense(), PLearn::PP< VMatrix >::isNull(), PLearn::VMat::length(), PLERROR, PLWARNING, rand_syn, PLearn::TVec< TVec< pair< int, real > > >::resize(), PLearn::TVec< VMField >::resize(), PLearn::SetIterator, PLearn::Set::size(), PLearn::TVec< TVec< pair< int, real > > >::size(), PLearn::sum(), PLearn::VMat::width(), window_size, wno, and word_given_sense_priors. Referenced by build(), and TextSenseSequenceVMatrix(). |
|
Declares this class' options.
Reimplemented from PLearn::VMatrix. Definition at line 440 of file TextSenseSequenceVMatrix.cc. References PLearn::declareOption(), and PLearn::OptionList. |
|
This is the only method requiring implementation.
Implements PLearn::RowBufferedVMatrix. Definition at line 24 of file TextSenseSequenceVMatrix.cc. References apply_boundary(), dvm, getRestrictedRow(), is_supervised_data, keep_in_sentence, PLearn::TVec< T >::length(), PLearn::VMat::length(), my_current_row, my_current_row_index, permute(), PLERROR, rand_syn, res_pos, PLearn::TVec< T >::size(), PLearn::TVec< int >::size(), SYNSETTAG_ID, undefined_pos, undefined_pos_set, UNDEFINED_SS_ID, UNDEFINED_TYPE, PLearn::Vec, PLearn::VMat::width(), and window_size. |
|
This restricts the extraction of the context to the words that don't have their POS in res_pos and returns the position of the next non-overlapping context.
Definition at line 171 of file TextSenseSequenceVMatrix.cc. References apply_boundary(), PLearn::TVec< int >::contains(), dvm, is_supervised_data, keep_in_sentence, PLearn::TVec< T >::length(), PLearn::VMat::length(), my_current_row, my_current_row_index, permute(), PLERROR, rand_syn, res_pos, PLearn::TVec< T >::size(), SYNSETTAG_ID, undefined_pos, undefined_pos_set, UNDEFINED_SS_ID, UNDEFINED_TYPE, PLearn::VMat::width(), and window_size. Referenced by getNewRow(). |
|
Transforms a shallow copy into a deep copy.
Reimplemented from PLearn::RowBufferedVMatrix. Definition at line 533 of file TextSenseSequenceVMatrix.cc. References PLearn::deepCopyField(), dvm, and res_pos. |
|
This permutes randomly the words (target and context) with one of their corresponding synonym.
Definition at line 377 of file TextSenseSequenceVMatrix.cc. References ADJ_TYPE, ADV_TYPE, PLearn::WordNetOntology::getSensesForWord(), PLearn::WordNetOntology::getWord(), PLearn::WordNetOntology::getWordId(), k, NOUN_TYPE, PLearn::TVec< T >::size(), PLearn::TVec< TVec< pair< int, real > > >::size(), PLearn::stemWord(), PLearn::sum(), PLearn::WordNetOntology::temp_word_to_adj_senses, PLearn::WordNetOntology::temp_word_to_adv_senses, PLearn::WordNetOntology::temp_word_to_noun_senses, PLearn::WordNetOntology::temp_word_to_verb_senses, UNDEFINED_TYPE, PLearn::uniform_sample(), VERB_TYPE, window_size, wno, and word_given_sense_priors. Referenced by getNewRow(), and getRestrictedRow(). |
|
Declares name and deepCopy methods.
|
|
Sets the ontology.
Definition at line 108 of file TextSenseSequenceVMatrix.h. References setOntology(), and wno. Referenced by setOntology(). |
|
Sets the activation/desactivation of the random generation of contexts and target words.
Definition at line 117 of file TextSenseSequenceVMatrix.h. References rand_syn, and setRandomGeneration(). Referenced by setRandomGeneration(). |
|
Sets the vector of forbidden POS for the context words.
Definition at line 120 of file TextSenseSequenceVMatrix.h. References res_pos, and setRestrictedPOS(). Referenced by setRestrictedPOS(). |
|
Sets the sentence boundary symbol.
Definition at line 123 of file TextSenseSequenceVMatrix.h. References keep_in_sentence, sentence_boundary, and setSentenceBoundary(). Referenced by setSentenceBoundary(). |
|
Sets the undefined pos id.
Definition at line 126 of file TextSenseSequenceVMatrix.h. References setUndefinedPOSId(), undefined_pos, and undefined_pos_set. Referenced by setUndefinedPOSId(). |
|
Sets the number of context words.
Definition at line 111 of file TextSenseSequenceVMatrix.h. References setWindowSize(), and window_size. Referenced by setWindowSize(). |
|
Sets the VMatrix of word/sense_tag/POS sequence.
Definition at line 114 of file TextSenseSequenceVMatrix.h. References dvm, is_supervised_data, and setWordSequence(). Referenced by setWordSequence(). |
|
The VMatrix containing the sequence of words or lemmas, with their POS and WordNet (optional) tags.
Definition at line 25 of file TextSenseSequenceVMatrix.h. Referenced by build_(), getNewRow(), getRestrictedRow(), makeDeepCopyFromShallowCopy(), setWordSequence(), and TextSenseSequenceVMatrix(). |
|
Indication that at less some of the words or lemmas are semantically disambiguated.
Definition at line 29 of file TextSenseSequenceVMatrix.h. Referenced by getNewRow(), getRestrictedRow(), setWordSequence(), and TextSenseSequenceVMatrix(). |
|
Indication that the context must not spread over another sentence.
Definition at line 43 of file TextSenseSequenceVMatrix.h. Referenced by getNewRow(), getRestrictedRow(), setSentenceBoundary(), and TextSenseSequenceVMatrix(). |
|
Elements of the current row.
Definition at line 41 of file TextSenseSequenceVMatrix.h. Referenced by getNewRow(), getRestrictedRow(), and TextSenseSequenceVMatrix(). |
|
Index of the current row.
Definition at line 39 of file TextSenseSequenceVMatrix.h. Referenced by getNewRow(), getRestrictedRow(), and TextSenseSequenceVMatrix(). |
|
Indication that examples can be randomly generated using random synonym replacements.
Definition at line 33 of file TextSenseSequenceVMatrix.h. Referenced by build_(), getNewRow(), getRestrictedRow(), setRandomGeneration(), and TextSenseSequenceVMatrix(). |
|
The vector containing the forbidden POS of the words given in the context of a target word.
Definition at line 31 of file TextSenseSequenceVMatrix.h. Referenced by getNewRow(), getRestrictedRow(), makeDeepCopyFromShallowCopy(), setRestrictedPOS(), and TextSenseSequenceVMatrix(). |
|
Sentence boundary symbol.
Definition at line 45 of file TextSenseSequenceVMatrix.h. Referenced by apply_boundary(), and setSentenceBoundary(). |
|
Undefined pos id.
Definition at line 49 of file TextSenseSequenceVMatrix.h. Referenced by apply_boundary(), getNewRow(), getRestrictedRow(), and setUndefinedPOSId(). |
|
Indication that the undefined pos id is defined.
Definition at line 47 of file TextSenseSequenceVMatrix.h. Referenced by apply_boundary(), getNewRow(), getRestrictedRow(), setUndefinedPOSId(), and TextSenseSequenceVMatrix(). |
|
The number of context words.
Definition at line 27 of file TextSenseSequenceVMatrix.h. Referenced by apply_boundary(), build_(), getNewRow(), getRestrictedRow(), permute(), setWindowSize(), and TextSenseSequenceVMatrix(). |
|
Ontology of the sense tagging.
Definition at line 37 of file TextSenseSequenceVMatrix.h. Referenced by build_(), permute(), setOntology(), and TextSenseSequenceVMatrix(). |
|
Probability of a word given it has some sense.
Definition at line 35 of file TextSenseSequenceVMatrix.h. |