Linguistics 439/539
Statistical Natural Language Processing
Handouts:
Syllabus
Survey
Assignment #1
Assignment #2
N-gram approximation handout
Revised Assignment #3
HMM N-gram handout
Assignment #4
Assignment #5
Forward probabilities for a PCFG
Sample from Welsh-English corpus
Assignment #6
Sample alignment for Welsh-English corpus
Final assignment
Matlab and text files:
Simple PCFG weights
Efficiency demo
Austen text
Russian text
The+N sequences in Brown
Two-dice simulation
Binomial distribution
Entropy of a die with up to 8 sides
Entropy of an unfair die with 4 sides
Distribution of word lengths in Brown
Mutual information
Zipfian distribution in Brown
Special log2 function
N-gram approximation
Laplace and Lidstone demo
Unseen unigram demo
Comparing Laplace and Lidstone for Austen
Comparing MLE and Lidstone for Austen
Second Austen text
Comparing smoothing techniques for one sentence in second Austen text
Deleted estimation demo
One answer to #3 on HW#2
Translation of Good-Turing
Demo of regression for Good-Turing
Applying Good-Turing to Austen
Multiple tags in Brown
Dumb tagger for Brown
One answer to #2 on HW#3
Forward probabilities
HMM visualization
(Kludgy, platform-dependent, and
Graphviz
must be installed.
One answer to #1 on HW#4
One (partial) answer to #3 on HW#4
One answer to #2 on HW#5
Quick and dirty xml parser for Welsh National Assembly data
Code for setting different colors
K-means demo
EM demo
One answer to #1 on HW#6
Vector space demo
Poisson distributions
Poisson demo
two-Poisson distribution demo
k-mix demo
least-squares demo
Binary independence model demo
Links:
Corpora and readings
(for enrolled students only)
Textbook website
Matlab site license page at UA
(free for UA folks)
Matlab website
Octave
(open-source free alternative to Matlab)
Graphviz
(Graph visualization software, for HMMs)
Various Matlab files from 478/578
(Spring 2013)
Various Matlab files from 408/508
(Fall 2013)
Mike Hammond