harvesttext.algorithms package¶
Submodules¶
harvesttext.algorithms.entity_discoverer module¶
-
class
harvesttext.algorithms.entity_discoverer.
NERPEntityDiscover
(sent_words, type_entity_dict, entity_count, pop_words_cnt, word2id, id2word, min_count=5, pinyin_tolerance=0, pinyin_adjlist=None, **kwargs)[source]¶ Bases:
object
-
class
harvesttext.algorithms.entity_discoverer.
NFLEntityDiscoverer
(sent_words, type_entity_dict, entity_count, pop_words_cnt, word2id, id2word, min_count=5, pinyin_tolerance=0, pinyin_adjlist=None, emb_dim=50, ft_iters=20, use_subword=True, threshold=0.98, min_n=1, max_n=4, **kwargs)[source]¶ Bases:
harvesttext.algorithms.entity_discoverer.NERPEntityDiscover
harvesttext.algorithms.keyword module¶
harvesttext.algorithms.match_patterns module¶
harvesttext.algorithms.sent_dict module¶
harvesttext.algorithms.texttile module¶
harvesttext.algorithms.utils module¶
harvesttext.algorithms.word_discoverer module¶
-
class
harvesttext.algorithms.word_discoverer.
WordDiscoverer
(doc, max_word_len=5, min_freq=5e-05, min_entropy=2.0, min_aggregation=50, ent_threshold='both', mem_saving=False)[source]¶ Bases:
object
-
genWords
(doc)[source]¶ Generate all candidate words with their frequency/entropy/aggregation informations @param doc the document used for words generation
-
-
class
harvesttext.algorithms.word_discoverer.
WordInfo
(text)[source]¶ Bases:
object
Store information of each word, including its freqency, left neighbors and right neighbors
-
compute
(length)[source]¶ Compute frequency and entropy of this word @param length length of the document for training to get words
-
-
harvesttext.algorithms.word_discoverer.
genSubparts
(string)[source]¶ Partition a string into all possible two parts, e.g. given “abcd”, generate [(“a”, “bcd”), (“ab”, “cd”), (“abc”, “d”)] For string of length 1, return empty list