(Optional:
There is also something called "transliteration" which replaces
single characters not strings.
$string_variable =~ tr/character_sequence/character_sequence/
Most of the regular expression
special characters are not valid for transliteration but "-" can
be used as in tr/a-z// which would delete all letters.)
Examples:
1) s/[Ll][Oo][Nn][Dd][Oo][Nn]/London/g
replaces LOndoN or loNDON etc by London.
This is equivalent to s/london/London/gi.
2) s/Alice/Mary/
replaces every occurrence of Alice by Mary.
Use the following script for todays exercises:
#!/usr/local/bin/perl
# searching the file content line by line:
#
# Regular expressions
#
# reading a file:
open(ALICE, "alice.txt");
@lines = <ALICE> ;
close(ALICE);
foreach $line (@lines){
$line =~s/T/t/g;
print $line;
} # end of foreach
Exercises
Using the alice.txt file replace
1a) all upper case A by lower case a.
1b) the word "Alice" by "ALICE".
1c) Delete all words with more than 3 characters.
1d) Print two blank space characters after the "." at the end of a sentence.
(Optional: Don't do this if the "." is the last character in a line.)
1e) Replace single quotes (' or `) by double quotes.
Exercise
2) Write a replace statement that deletes all HTML markup from a file. You need non-greedy multipliers because otherwise the text between tags may be deleted in a line that contains several tags.
Examples
/(t.*e)/;
print "$1";
prints strings that start with "t" and end with "e".
s/(t.*e)/:$1:/g;
places a ":" in front and behind each string t...e.
/(...)\1/
matches a three character string that is repeated.
s/^(.)(.*)(.)$/$3$2$1/
switches the first and last character of a line.
Exercises
3a) Try the examples from above.
3b) Insert a newline character after each punctuation mark (.,!).
If you chomp each line (or the array of lines)
before inserting the newline characters
you can print each sentence in one line.
3c) Print double characters within parenthesis "()".
For example, replace "arrived" by "a(rr)ived".
Exercises
4) Read the alice.txt file into an array. Chomp it. Using "join" concatenate it into one string. Then split it into words (or sentences) and print it one word (sentence) per line.
5) Write a script that takes an HTML source file as input and prints it so that a newline follows only "closing tags", i.e. tags that are of the form </...>.
6) Optional: Parsing web pages:
If a CGI script downloads a page from the web, it will retrieve the
HTML source code. Look at several html pages on the web.
Think about the following questions: How would you
extract information from them? How would you store that information in
an array? How would a script search for words
(or regular expressions) in web pages? Besides the fact that you don't
know yet how a CGI script can download pages from the web, have you
learned enough so far that you could write such a script?