#!/usr/bin/env python
import re
date = raw_input("Please enter a date in the format mm.dd.yy ")
keyword = re.compile(r"(\d\d?)\.(\d\d?)\.(\d\d)")
result = keyword.search (date)
if result:
print "Month:", result.group(1)
print "Day:", result.group(2)
print "Year:", result.group(3)
2.2 Optional: Print all lines in the alice.txt file so that the first and the last character in each line are switched.
Optional material/exercise:
2.3 Parenthesis can also be used to match repeated substrings within one regular expression. In this case, the groups are denoted by \1, \2, \3. For example, r"(.)\1" matches any character that occurs twice. Note that this is different from r"..", which means any two (possibly different) characters. Exercise: Print all lines in the alice.txt file that contain two double characters.
#!/usr/bin/env python
import re
# open a file
file = open("alice.txt","r")
text = file.readlines()
file.close()
# compiling the regular expression:
keyword = re.compile(r"t")
# searching the file content line by line:
for line in text:
print keyword.sub ("T",line),
3.5 Modify your program from exercise 1.1, so that it deletes all HTML markup.
#!/usr/bin/env python
import re
# open a file
file = open("alice.txt","r")
text = file.readlines()
file.close()
# join all of the lines together using " " as glue
bigstring = " ".join(text)
# delete newline characters and white space from the end of each line
keyword = re.compile(r"\s*\n\s*")
bigstring = keyword.sub (" ",bigstring)
# split bigstring where "." or "," occurs
keyword = re.compile(r"[\.,]\s*")
text = keyword.split (bigstring)
for line in text:
print line
4.2 Write a script that takes an HTML source file as input and prints it so that a newline follows only "closing tags", i.e. tags that are of the form </...>.
4.3 Optional: Parsing web pages:
If a CGI script downloads a page from the web, it will retrieve the
HTML source code. Look at several html pages on the web.
Think about the following questions: How would you
extract information from them? How would you store that information in
an array? How would a script search for words
(or regular expressions) in web pages? Besides the fact that you don't
know yet how a CGI script can download pages from the web, have you
learned enough so far that you could write such a script?