Linguistics 408/508 |
Fall 2003 |
Hammond |
Handout 12
Overview
- sample texts linked on course homepage
- link to the site-licensed version of Textpad now
on the course homepage
- A perl version of the unix utility grep.
- Questions from last time
- Practice with regular expressions
Regular expression overview
- Concatenation, e.g.
/abc/ = "abc"
,
"xabc"
, "abcx"
,
etc.
- Union ("or"), e.g.
/a[bc]d/
or
/a(b|c)d/ = "abd"
or
"acd"
, etc.
- Kleene star, e.g.
/ab*c/ = "ac"
,
"abc"
, "abbc"
,
"abbbc"
, etc.
- Beginning of string, e.g.
/^ab/ =
"ab"
, "abc"
,
"abdfgsk"
, etc.
- End of string, e.g.
/ab$/ = "ab"
,
"cab"
, "dfgskab"
,
etc.
- any character, e.g.
/.../ =
any
string of at least three characters or
more.
- Special characters, e.g.
/\t/ =
tab,
etc.
- etc.
Assignment #3
- Give three ways of matching all vowels except
i
:
-
/[aeou]/
-
/[^ibcdfghjklmnpqrstvwxyz]/
-
!~
/[ibcdfghjklmnpqrstvwxyz]/
- What does this abbreviate:
/\\\\a/
?: "\\a"
- Given a file composed of a single column of words,
give a regular expression that will find all
two-syllable words:
/^ [^aeiou]* [aeiou]
[^aeiou]* ([aiou] [^aeiou]* | e
[^aeiou]+) e? $ /x
.
- Given the same type of file, give a regular
expression that will match any word that does
not contain two identical letters in a
row:
!~ /(.)\1/
.
- What does this abbreviate:
/ \$ $
/x
?: a single dollar-sign at the
end of a string
- Write a regular expression to match any word
containing an even number of vowels:
/^
[^aeiou]* ([aeiou] [^aeiou]* [aeiou]
[^aeiou]*)+ $/x
.
More practice with regular expressions