I have posted homework 6 here.
It is due this coming Sunday at 11:59 PM.
Let's look at the answers to Quiz 4
The Midterm exam for this course will be held on Tuesday, October 22nd.
The exam will be given in this room.
It will consist of questions like those on the quizzes along with questions asking you to write short segments of Python code.
60% of the points on this exam will consist of questions from the Graded Quizzes.
There will be 15 of these questions worth 4 points each.
The other 40% of points will come from four questions that ask you to write a short segment of code.
Each of the code questions is worth 10 points each.
To study for the code questions you should know
A good way to study for the code questions is to review the Class Exercises and homework solutions.
The last class before the exam, Thursday, October 17th, will be a review session.
You will only be responsible for the material in the Class Notes for that class on the exam.
You will find the Midterm review Class Notes here.
If for some reason you cannot take the exam on the date mentioned above you must contact me to make alternate arrangements.
The Midterm is given on paper.
I scan each exam paper and upload the scans to Gradescope.
I score the exam on Gradescope.
You will get an email from Gradescope with your score when I am done.
The Midterm is a closed book exam.
You are not allowed to use any resource, other than what is in your head, while taking the exam.
Cheating on the exam will result in a score of 0 and will be reported to the Administration.
Remember your Oath of Honesty.
To prevent cheating, certain rules will be enforced during the exam.
Are there any questions before I begin?
import sys print("sys.argv:", sys.argv)
$ ./argv.py foo bar bletch
sys.argv: ['./argv.py', 'foo', 'bar', 'bletch']
import sys for index in range(len(sys.argv)): print(sys.argv[index])
$ ./args.py arg1 arg2 arg3
0: ./args.py 1: arg1 2: arg2 3: arg3
import os import sys dir = sys.argv[1] file_count = 0 for entry in os.listdir(dir): if os.path.isfile(entry): file_count += 1 print("There are ", file_count, "files in", dir)
$ ./count_files_1.py .
There are 3 files in .
$ ./count_files_1.py
Traceback (most recent call last):
File "./count_files_1.py", line 7, in <module>
dir = sys.argv[1]
IndexError: list index out of range
import os
import os
import sys
if len(sys.argv) < 2:
print("Usage:", sys.argv[0], "DIRECTORY")
sys.exit()
dir = sys.argv[1]
file_count = 0
for entry in os.listdir(dir):
if os.path.isfile(entry):
file_count += 1
print("There are ", file_count, "files in", dir)
$ ./count_files_2.py Usage: ./count_files_2.py DIRECTORY
$ example_code_it117/os_sys_example_code/count_files_2.py
Usage: example_code_it117/os_sys_example_code/count_files_2.py DIRECTORY
basename
command
$ basename /courses/it117/f21/ghoffman ghoffman
if len(sys.argv) < 2:
print("Usage:", os.path.basename(sys.argv[0]), "DIRECTORY")
sys.exit()
$ example_code_it117/os_sys_example_code/count_files_3.py
Usage: count_files_3.py DIRECTORY
dos2unix
on hw6.py grep
in Unixgrep
with two arguments
search
function in the re
module can do
something similar
>>> match = re.search("man", "A man, a plan, a canal. Panama") >>> print(match) >_sre.SRE_Match object; span=(2, 5), match='man'>
def regex_test(regular_expression, line): pattern_object = re.compile(regular_expression ) match_object = pattern_object.search(line) if match_object : print("Regular expression:", regular_expression) print("Matches:", line) else: print("Regular expression:", regular_expression) print("Does NOT match", line)
>>> regex_test("man", "A man, a plan, a canal, Panama") Regular expression: man Matches: A man, a plan, a canal, Panama >>> regex_test("xxx", "A man, a plan, a canal, Panama") Regular expression: xxx Does NOT match A man, a plan, a canal, Panama
. ^ $ * + ? { } [ ] \ | ( )
>>> regex_test("th.n", "And then I went home") Regular expression: th.n Matches: And then I went home >>> regex_test("th.n", "I am better than you") Regular expression: th.n Matches: I am better than you >>> regex_test("th.n", "I wish I were thiner") Regular expression: th.n Matches: I wish I were thiner
>>> regex_test("t..n", "And then I went home") Regular expression: t..n Matches: And then I went home >>> regex_test("t..n", "Is there a taint of scandal?") Regular expression: t..n Matches: Is there a taint of scandal?
regex_test("t*n", "1234 tttttn abcd")
Regular expression: t*n
Matches: 1234 ttttt abcd
regex_test("t*n", "1234 n abcd")
Regular expression: t*n
Matches: 1234 n abcd
>>> regex_test("t.*n", "abcd tan efg") Regular expression: t.*n Matches: abcd tan efg >>> regex_test("t.*n", "xx the zzn") Regular expression: t.*n Matches: xx the zzn
>>> regex_test("ab+c", "xxx abccccc yyy") Regular expression: ab+c Matches: xxx abccccc yyy >>> regex_test("ab+c", "xxx abbbbbccccc zzz") Regular expression: ab+c Matches: xxx abbbbbccccc zzz
>>> regex_test("ab+c", "xxx accccc zzz") Regular expression: ab+c Does NOT match xxx accccc zzz
>>> regex_test("ab*c", "xxx accccc zzz")
Regular expression: ab*c
Matches: xxx accccc zzz
>>> regex_test("ab?c", "qqq abc jjj") Regular expression: ab?c Matches: qqq abc jjj >>> regex_test("ab?c", "123 ac 456") Regular expression: ab?c Matches: 123 ac 456 >>> regex_test("ab?c", "786 abbc vvv") Regular expression: ab?c Does NOT match 786 abbc vvv
>>> regex_test("a\+b", "345 a+bcde")
Regular expression: a\+b
Matches: 345 a+bcde
>>> regex_test("a+b", "906 a+bcde") Regular expression: a+b Does NOT match 906 a+bcde
>>> regex_test( "a\+\+\+b", "567 a+++bcde")
Regular expression: a\+\+\+b
Matches: 567 a+++bcde
>>> regex_test("\d", "1234")
Regular expression: \d
Matches: 1234
>>> regex_test("\d*a", "1234abc")
Regular expression: \d*a
Matches: 1234abc
>>> regex_test("\D", "1a234")
Regular expression: \D
Matches: 1a234
>>> regex_test("\w","---a------------") Regular expression: \w Matches: ---a------------ >>> regex_test("\w+","---1234abc------") Regular expression: \w+ Matches: ---1234abc------
>>> regex_test("\W+","###" )
Regular expression: \W+
Matches: ###
>>> regex_test("a\sb", "----a b----")
Regular expression: a\sb
Matches: ----a b----
>>> regex_test("\S+", "abcd")
Regular expression: \S+
Matches: abcd
205.236.184.72
>>> regex_test("\d+", "205.236.184.72") Regular expression: \d+ Matches: 205.236.184.72
>>> regex_test("\d+\.", "205.236.184.72") Regular expression: \d+\. Matches: 205.236.184.72
>>> regex_test("\d+\.\d+\.", "205.236.184.72") Regular expression: \d+\.\d+\. Matches: 205.236.184.72
>>> regex_test("\d+\.\d+\.\d+\.\d+", "205.236.184.72") Regular expression: \d+\.\d+\.\d+\.\d+ Matches: 205.236.184.72
205.236.184.72 - - [09/Mar/2014:00:03:21 +0000] "GET /wzbc-2014-03-05-14-00.mp3 HTTP/1.1" 200 56810323
205.236.184.101 - - [09/Mar/2014:00:03:21 +0000] "GET /wzbc-2014-03-05-14-00.mp3 HTTP/1.1" 200 56810323
\d+\.\d+\.\d+\.\d+.*GET
(\d+\.\d+\.\d+\.\d+).*GET
>>> pattern_object = re.compile("(\d+\.\d+\.\d+\.\d+).*GET")
>>> match_object = pattern_object.search('205.236.184.101 - - [09/Mar/2014:00:03:21 +0000] "GET /wzbc-2014-03-05-14-00.mp3 HTTP/1.1" 200 56810323')
>>> match_object <_sre.SRE_Match object; span=(0, 53), match="205.236.184.101 - - [09/Mar/2014:00:03:21 +0000]
match_object.group(1) '205.236.184.101'
199.21.99.114 - - [23/Mar/2014:00:05:14 +0000] "GET /wzbc-2014-03-20-00-00.m3u HTTP/1.1" 200 102 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 108.121.132.248 - - [23/Mar/2014:00:07:46 +0000] "GET / HTTP/1.1" 200 76437 "http://www.bc.edu/bc_org/svp/st_org/wzbc/" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)" 108.121.132.248 - - [23/Mar/2014:00:07:47 +0000] "GET /favicon.ico HTTP/1.1" 200 1150 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)" 151.203.239.216 - - [23/Mar/2014:00:03:03 +0000] "GET /wzbc-2014-03-16-13-00.mp3 HTTP/1.1" 206 20035340 "-" "NSPlayer/12.00.7601.17514 WMFSDK/12.00.7601.17514" 82.193.99.33 - - [23/Mar/2014:00:07:49 +0000] "GET / HTTP/1.1" 200 76437 "http://carsdined.org" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; XMPP Tiscali Communicator v.10.0.2; .NET CLR 2.0.50727)" 82.193.99.33 - - [23/Mar/2014:00:07:50 +0000] "GET / HTTP/1.1" 200 76437 "http://carsdined.org" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; XMPP Tiscali Communicator v.10.0.2; .NET CLR 2.0.50727)" 82.193.99.33 - - [23/Mar/2014:00:07:51 +0000] "GET / HTTP/1.1" 200 76437 "http://carsdined.org" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; XMPP Tiscali Communicator v.10.0.2; .NET CLR 2.0.50727)" 108.121.132.248 - - [23/Mar/2014:00:08:07 +0000] "GET /wzbc-2014-03-22-11-00.m3u HTTP/1.1" 200 102 "http://zbconline.com/" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)" 183.60.213.106 - - [23/Mar/2014:00:11:35 +0000] "GET /wzbc-2014-02-15-19-00.m3u HTTP/1.1" 404 301 "-" "Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)" 199.21.99.114 - - [23/Mar/2014:00:12:55 +0000] "GET /wzbc-2014-03-06-22-00.m3u HTTP/1.1" 404 305 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
>>> file = open("apache_log.txt", "r")
>>> pattern_object = re.compile("(\d+\.\d+\.\d+\.\d+).*GET")
>>> for line in file: ... match_object = pattern_object.search(line) ... if match_object: ... ip_address = match_object.group(1) ... print(ip_address) ... 199.21.99.114 108.121.132.248 108.121.132.248 151.203.239.216 82.193.99.33 82.193.99.33 82.193.99.33 108.121.132.248 183.60.213.106 199.21.99.114
#! /usr/bin/python3 # tests regular expressions and returns # the first group if it can import re import os import sys def regex_match_with_group(regular_expression, line): pattern_object = re.compile(regular_expression) match_object = pattern_object.search(line) if match_object : try : return_string = match_object.group(1) print("regular expression:", regular_expression) print("matches:", line) print("returns:", return_string) except : print("Match found but no substring returned") else: print("No match") if len(sys.argv) < 3 : print("Usage: ", os.path.basename(sys.argv[0]), " REGULAR_EXPRESSION STRING_TO_MATCH") sys.exit() regex = sys.argv[1] line = sys.argv[2] regex_match_with_group(regex, line)
try/except
statement?
$ ./regex_test_with_group.py "(\d\d\d)" 123456789 regular expression: (\d\d\d) matches: 123456789 returns: 123
$ ./regex_test_with_group.py "(\d{5})" 123456789 regular expression: (\d{5}) matches: 123456789 returns: 12345
$ ./regex_test_with_group.py "(\w{6})" abcdefghijk regular expression: (\w{6}) matches: abcdefghijk returns: abcdef
$ ./regex_test_with_group.py "\d+(\d\s{4}\w{2})" "12345 abcdefghijk" regular expression: \d+(\d\s{4}\w{2}) matches: 12345 abcdefghijk returns: 5 ab
$ ./regex_test_with_group.py "(b{3})" "---bbbbbbbbb---" regular expression: (b{3}) matches: ---bbbbbbbbb--- returns: bbb
$ ./regex_test_with_group.py "(\d{2,5})" "---12---------" regular expression: (\d{2,5}) matches: ---12--------- returns: 12 $ ./regex_test_with_group.py "(\d{2,5})" "---12345---------" regular expression: (\d{2,5}) matches: ---12345--------- returns: 12345
$ ./regex_test_with_group.py "(\d{2,})" "---12---------" regular expression: (\d{2,}) matches: ---12--------- returns: 12 $ ./regex_test_with_group.py "(\d{2,})" "---123456---------" regular expression: (\d{2,}) matches: ---123456--------- returns: 123456
$ ./regex_test_with_group.py "(\d{,3})" "---123456---------" regular expression: (\d{,3}) matches: ---123456--------- returns:123 $ ./regex_test_with_group.py "(\d{,3})" "---12---------" regular expression: (\d{,3}) matches: ---123456--------- returns:12
$ ./regex_test_with_group.py "([abc])" bdewrosdf regular expression: ([abc]) matches: bdewrosdf returns: b
$ ./regex_test_with_group.py "([abc]+)" bcaewrosdf regular expression: ([abc]+) matches: bcaewrosdf returns: bca
$ ./regex_test_with_group.py "([abc]{2})" bcaewrosdf regular expression: ([abc]{2}) matches: bcaewrosdf returns: bc
[abcdefghijklmnopqrstuvwxyz]
$ ./regex_test_with_group.py "([a-d]+)" ---------bacdnmonpn-------- regular expression: ([a-d]+) matches: ---------bacdnmonpn-------- returns: bacd
$ ./regex_test_with_group.py "([a-dm-p]+)" ---------bacdnmonpn-------- regular expression: ([a-dm-p]+) matches: ---------bacdnmonpn-------- returns: bacdnmonpn
$ ./regex_test_with_group.py "\W*([e-a]+)" ---------bacdnmonpn--------
Traceback (most recent call last):
File "./regex_test_with_group.py", line 17, in <module>
pattern_object = re.compile( regex )
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/re.py", line 224, in compile
return _compile(pattern, flags)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/re.py", line 293, in _compile
p = sre_compile.compile(pattern, flags)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/sre_compile.py", line 536, in compile
p = sre_parse.parse(p, flags)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/sre_parse.py", line 829, in parse
p = _parse_sub(source, pattern, 0)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/sre_parse.py", line 437, in _parse_sub
itemsappend(_parse(source, state))
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/sre_parse.py", line 778, in _parse
p = _parse_sub(source, state)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/sre_parse.py", line 437, in _parse_sub
itemsappend(_parse(source, state))
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/sre_parse.py", line 575, in _parse
raise source.error(msg, len(this) + 1 + len(that))
sre_constants.error: bad character range e-a at position 5
>>> pattern_object = re.compile("^(\d)") >>> match_object = pattern_object.search("123456789") >>> if match_object : ... print match_object.group(1) ... 1
>>> pattern_object = re.compile("(\d)$") >>> match_object = pattern_object.search("123456789") >>> if match_object : ... print match_object.group(1) ... 9
$ ./regex_test_with_group.py "([5-9]+)" 987654321 regular expression: ([5-9]+) matches: 987654321 returns: 98765
$ ./regex_test_with_group.py "([^5-9]+)" asdfasd123456789 regular expression: ([^5-9]+) matches: asdfasd123456789 returns: asdfasd1234
$ ./regex_test_with_group.py "(Red|Blue)" "Red Sox" regular expression: (Red|Blue) matches: Red Sox returns: Red $ ./regex_test_with_group.py "(Sox|Ducks)" "Red Sox" regular expression: (Sox|Ducks) matches: Red Sox returns: Sox
<td>Class 4</td> <td>February 6th</td>
$ ./regex_test_with_group.py "<td>(.*)</td>" "<td>Class 4</td> <td>February 6th</td>" regular expression: <td>(.*)</td> matches: <td>Class 4</td> <td>February 6th</td> returns: Class 4</td> <td>February 6th
$ ./regex_test_with_group.py "<td>(.*?)</td>" "<td>Class 4</td> <td>February 6th</td>" regular expression: <td>(.*?)</td> matches: <td>Class 4</td> <td>February 6th</td> returns: Class 4
search
method on this object ...match()
search()
fullmatch()
match()
checks for a match only at the beginning of the stringsearch()
checks for a match anywhere in the stringfullmatch()
checks for entire string to be a matchsearch()
is the most forgivingmatch()
is fussierfullmatch()
is the most stringent