Code
From NLTK
NLTK includes the following software modules (~100k lines of Python code):
- Corpus readers
- interfaces to many Corpora
- Tokenizers
- whitespace, newline, blankline, word, wordpunct, treebank, sexpr, regexp, Punkt sentence segmenter
- Stemmers
- Porter, Lancaster, regexp
- Taggers
- regexp, n-gram, backoff, Brill, HMM
- Parsers
- recursive descent, shift-reduce, chunk, chart, feature-based, probabilistic, ...
- Semantic interpretation
- untyped lambda calculus, first-order models, parser interface
- Wordnet
- wordnet interface, lexical relations, similarity
- Classifiers
- decision tree, maximum entropy, naive Bayes, Weka interface, megam
- Clusterers
- expectation maximization, agglomerative, k-means
- Evaluation
- accuracy, precision, recall, windowdiff
- Estimation
- uniform, maximum likelihood, Lidstone, Laplace, expected likelihood, heldout, cross-validation, Good-Turing, Witten-Bell
- Miscellaneous
- feature detection, unification, chatbots, many utilities
- NLTK-Contrib (less mature)
- categorial grammar (Lambek, CCG), dependency parser, finite-state automata, glue semantics, hole semantics, hadoop (MapReduce), kimmo, readability, textual entailment, timex, TnT, wordnet browser
Browse the source code: http://nltk.org/nltk/
Browse the subversion repository: http://nltk.svn.sourceforge.net/viewvc/nltk/trunk/nltk/nltk/



