
ACL Anthology
CiteSeer
Google Scholar
Linguistic Terms
Stanford Encyclopedia of Philosophy

Html Cleaners
Emsa
HTML Tag Remover – Very easy to use tag
cleaner program that can be run from a GUI or command line.
Activation code: 1760559.
Tokenizers
Boost Tokenizer Package – Part of the
Boost C++ Library. It contains functions that aid in breaking up
strings.
Part of Speech
Taggers
TreeTagger – “The
TreeTagger is a tool for annotating text with part-of-speech and
lemma information. It was developed by Helmut
Schmid in the TC project at the Institute for Computational
Linguistics of the University of Stuttgart.” It can also be used for
noun, verb, adverb, adjective and prepositional phrase chunking. Linux
or Win32 binaries are available. Usable through
command line.
Stanford Tagger – A Log-Linear Part of Speech Tagger developed and
maintained by Stanford. Usable through the command line and requires
Java to run.
Parsers
Minipar – An efficient 300 words/sec
English parser.
Stanford Parser – A statistical parser developed and maintained by
Stanford. Uses Java and runs through command line.
Other
Tools
WordNet – English Lexicon developed and
maintained by Princeton. Contains the meanings and
relations of most nouns, verbs, adverbs and adjectives.
Smart
Stop Words – A list of words often discarded for efficiency in
search engines.
Virtual Box – Open source program for creating virtual machines of
operating systems.
Ubuntu – Free Linux based operating
system.
Kevin’s Word List – Various word lists and links to other
collections of word lists
GATE
Link Grammar Parser
Freeling Language Tools
BioNLP resources
NLP resources

Cyc
Openmind
FrameNet
KnowItAll

Stanford NLP Group
John
Hopkins Language Lab
Computational Linguistics at Ohio State
|