IT 117: Intermediate Scripting
Class 11
Tips and Examples
Review
New Material
Graded Quiz
You can connect to Gradescope to take weekly graded quiz
today during the last 15 minutes of the class.
Once you start the quiz you have 15 minutes to finish it.
You can only take this quiz today.
There is not makeup for the weekly quiz because Gradescope does not permit it.
Solution to Homework 4
I have posted a solution to homework 4
here.
Let's take a look.
Homework 6
I have posted homework 6
here.
It is due this coming Sunday at 11:59 PM.
Midterm
The Midterm exam for this course will be held on Tuesday, October 25th.
The exam will be given in this room.
It will consist of questions like those on the quizzes along with questions
asking you to write short segments of Python code.
60% of the points on this exam will consist of questions from the Ungraded Class
Quizzes.
The other 40% will come from four questions that ask you to write a short segment
of code.
The last class before the exam, Thursday, October 20th, will be a review session.
You will only be responsible for the material in the Class Notes for that class
on the exam.
You will find the Midterm review Class Notes
here.
The Midterm is given on paper.
I scan each exam paper and upload the scans to Gradescope.
I score the exam on Gradescope.
You will get an email from Gradescope with your score when I am done.
The Midterm is a closed book exam.
You are not allowed to any resource other than what is in your head while taking the exam.
Cheating on the exam will result in a score of 0 and will be reported to the Administration.
Remember your Oath of Honesty.
To prevent cheating, certain rules
will be enforced during the exam.
Tips and Examples
Making Script Executable
- All scripts submitted for this course must be
executable
- Or you will lose points
- You make a file executable by doing two things
- The hashbang line must be the very first line of the script
- The first two characters of this first line must be #!
- This must be followed by the absolute address of the Python 3 interpreter
- For this course the hashbang line must be
#! /usr/bin/python3
- You can make a file executable by running the following Unix
command on it
chmod 755 FILENAME
- FILENAME is a placeholder
- You replace this with the name of the file you are making
executable
- If I wanted to make the file hw2.py
executable, I would would write the following on the Unix
command line
chmod 755 hw2.py
- You can also change the permissions in FileZilla
- Connect to pe15 using FileZilla
- Go to the homework directory for the assignment
- Right-click on the homework script
- Drag down to "Permissions" in the menu that appears
- Enter 755
Getting Your First IT Job
- Students sometimes come to me asking for advice on how to get an IT
job
- I have not had a job in industry for many years
- So I usually refer them to
Career Services
- They provide a web site called
Handshake
where companies can find interns
- In IT, as in many fields, your first job is the stepping stone
to your career in the field
- But it can be hard to find an internship or or a job
when you have no IT experience
- If this is a situation you face, there are things
you can do
- If you have a job, check with your current employer
- They must have computers somewhere and maybe you can
help keeping them running
- If they have an IT Department ask if there is something you
can do for them in your free time
- You can do something similar with local organization
like a church, temple, mosque or youth group
- They probably use a computer for some of the work they do
or perhaps they need a web page
- Volunteer to do some IT work for them in return for a letter
talking about the work you did
- You can cite this work in your resume
- Another place to find volunteer opportunities is
Volunteer Match
- They have virtual opportunities that in many different areas
- Or perhaps you can find some open source project that needs
help
- The Free Software Foundations
is based in Boston and often needs volunteers
- The economy goes through cycles and sometimes jobs are hard
to find
- Just get in the habit constantly looking for opportunities
- But above all don't give up
Review
Working with the Operating System
- Certain operations can only be performed by the operating system
- For example
- Creating files
- Renaming files
- Deleting files
- Creating directories
- All of the things you can do at the command line
- Can be done within Python
- The Python interpreter can ask the operating system to perform these task for
you
The os Module
- When you need the operating system to do something
- Use Python's os module
- Of course you must import it first
>>> import os
- Whenever you need to do something with a file, other then reading or writing
- You need the os module
os.getcwd()
os.listdir(path)
os.chdir(path)
os.rename(old_name, new_name)
- You can change the name of a file with os.rename
>>> os.chdir("/home/ghoffmn/tmp")
>>> os.listdir(".")
['test.txt', 'dir1']
>>> os.rename("test.txt", "file.txt")
>>> os.listdir(".")
['dir1', 'file.txt']
- os.rename() also works on directories
>>> os.rename("dir1", "test_dir")
>>> os.listdir(".")
['test_dir', 'file.txt']
os.remove(path)
- To delete a file use os.remove()
>>> os.remove("file.txt")
>>> os.listdir(".")
['test_dir']
- os.remove() does not work on directories
>>> os.remove("test_dir")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 21] Is a directory: 'test_dir'
os.rmdir(path )
os.mkdir(path )
Running Unix Commands within Python
os.environ
The os.path Module
- The os.path module contain some functions
which operate on pathnames
- It is part of the os module and does not
have to be imported
- If you have already imported os
os.path.isfile(path) and os.path.isdir(path)
os.path.basename(path)
The sys Module
- Python scripts run inside two environments
- The operating system
- The Python interpreter
- The sys module contains variables and functions
that interact with the Python interpreter
- You must import the sys module before you can use it
>>> import sys
Getting Values from the Command Line
- A script can get the values it needs from the command line
- The sys module contains the variable
argv
- sys.argv is a list variable
that contains all the command line arguments
- As well as the pathname used to run the program
$ cat argv.py
#! /usr/bin/python3
import sys
print('sys.argv:', sys.argv)
$ argv.py foo bar blecth
sys.argv: ['./argv.py', 'foo', 'bar', 'blecth']
Leaving a Running Script
- You can leave a script before the end of the code
by using the sys.exit() function
- Why would you want to do this?
- There are many reasons
- The most common is when you encounter an error that prevents the script from
proceeding
Usage Messages
- If a script does not get the arguments it needs it should print a
usage message
- A usage message tells the user they have not given the command line arguments
needed
- In this class, usages messages must have the form
Usage: SCRIPT_NAME ARGUMENT_1 ARGUMENT_2 ...
- For example, let's say the script list_dir.py
needs the name of a directory from the command line
- If it does not get it it should print a usage message that indicates the
argument it needs
$ ./list_dir.py
Usage: list_dir.py DIRECTORY_NAME
- The message is printed by the following code fragment
if len(sys.argv) < 2:
print("Usage:", os.path.basename(sys.argv[0]), "DIRECTORY_NAME")
sys.exit()
- Let's examine this code
- The first line checks the number of tokens on the commands line
- You might have thought that the value should be 1, not 2
- But sys.argv is a list containing all command line
strings including the pathname of the script
- So the pathname used to run the script can be obtained from the expression
sys.argv[0]
- The second line prints the message
- It uses the os.path module function
basename()
to strip away everything except the name of the script
- If I had not done this the usage message would read
$ ./list_dir.py
Usage: ./list_dir.py DIRECTORY_NAME
- The third line ends the running of the script
New Material
Regular Expressions
Working with Regular Expressions
- Regular expression are a language used to specify patterns
- You could use regular expressions to find the Sox games in the range of dates
I mentioned above
- They are very powerful
- But they are difficult to learn
- They work in a way that takes some time to get used to
- Some of the characters used in regular expression
look like Unix meta-characters
- But they are work very differently from Unix
- It is easy to get frustrated when first using regular expressions
- And to give up on them as not worth the effort
- But they are well worth the time needed to understand them
- The trick is to start slowly
- And build up experience as you go along
What You Need to Remember
- In the sections below I will discuss regular expressions
- I will also show you some Python code that uses regular expressions
- You do NOT have to remember the Python
- I will not ask you for them on a test or quiz
- The Python code for regular expressions can be Googled when you need it
- But for this course you will have to learn how to create
patterns in the regular expression language
The Characters in Regular Expressions
- A
regular expression
is a string of characters forming a pattern
- This pattern is compared against a string
- If some characters in the string fit the pattern, we have a match
- A regular expression is a string composed of
- Ordinary characters
- Meta-characters
- Character classes
Ordinary Characters in Regular Expressions
- Ordinary characters are characters which are not meta-characters
- An ordinary character will match itself
- So the regular expression "a" will match a string like "abc"
- And "bcd" will match "abcde"
- And so on
- When I say "a" matches "abc" I mean that "a" matches one of the characters in the string
- Regular expressions are case sensitive
- Upper case characters only match upper case characters
- And the same for lower case
- Digits are ordinary characters
- So the regular expression "5" matches the string "256"
Using Regular Expressions to Find a Match
- There is more than one way to use regular expressions
- But the simplest way is to use them to find a line
- That matches a pattern written as a regular expression
- When used this way they are like the Unix command
grep
- You run
grep
with two arguments
- A string you are trying to match
- A list of files to look for a matches
- So if I have a file with scores from Red Sox games
2011-07-02 Red Sox @ Astros Win 7-5
2011-07-03 Red Sox @ Astros Win 2-1
2011-07-04 Red Sox vs Blue Jays Loss 7-9
2011-07-05 Red Sox vs Blue Jays Win 3-2
...
- The following
grep
command will find all games the Sox won
$ grep Win red_sox.txt
2011-07-02 Red Sox @ Astros Win 7-5
2011-07-03 Red Sox @ Astros Win 2-1
2011-07-05 Red Sox vs Blue Jays Win 3-2
...
- We can do the same thing with regular expressions
- To use regular expressions you must import the re
module
>>> import re
- This module contains functions and classes that work with regular expressions
- You can find a match in Python using the re module's
search function
- This search takes two arguments
- A regular expression
- A line you are trying to match
- Normally, you would use search inside a
for
loop
- Looping the through the lines in a file
- But to show you how Python works with regular expressions, I will do
something different here
- I will use search with a
string literal
- So you can see how search works
- If search finds a match it returns a match object
- search will return a match object if
the regular expression finds matching characters
- Anywhere in the line
- It will return
None
if it cannot find a match
- None
is like zero for objects
- We use search in an assignment statement like this
>>> match = re.search("man", "A man, a plan, a canal. Panama")
- Here the regular expression is "man"
- And the line is "A man, a plan, a canal. Panama"
- match is an object variable
- If search finds a match in the lines
match will hold a pointer to the match object
- If it does not find a match, the value of match
will be None
- In this case a match was found
>>> print(match)
<_sre.SRE_Match object; span=(2, 5), match="man">
- We can use match in an
if
statement
- Because Python thinks anything that points to an object is
True
>>> if match:
... print("Found match")
... else:
... print("No match found")
...
Found match
- Here is an example of search not finding
a match
>>> match = re.search("xxxxxxxx", "A man, a plan, a canal. Panama")
>>> print(match)
None
>>> if match:
... print("Found match")
... else:
... print("No match found")
...
No match found
Pattern Objects
- search is a function in the
re module
- If it finds a match it creates a match object and returns a pointer
to it
- This match object is also contained in the re
module
- But in order to search for the match search
creates another object defined inside re
- A pattern object
- Whenever I use regular expression in Python, I do not use
the search function
- Instead I create a pattern object from a regular expression
- To do this, I use the compile
function
- Also contained in the re module
- I use it in an assignment statement like this
>>> pattern = re.compile("man")
- I can then use the search
method on the pattern object to find a match
>>> match = pattern.search("A man, a plan, a canal. Panama")
>>> if match:
... print("Found match")
... else:
... print("No match found")
...
Found match
- You won't need to remember this for quizzes or exams
- I am showing you this so you can understand what I am doing
in what follows
A Test Function for Regular Expressions
- To experiment with regular expressions we need a test function
- This function will take have two parameters
- A regular expression string
- A line to be matched
- The pattern string will be turned into a pattern object
- Here is the code
def regex_test(regular_expression, line):
pattern_object = re.compile(regular_expression )
match_object = pattern_object.search(line)
if match_object :
print("Regular expression:", regular_expression)
print("Matches:", line)
else:
print("Regular expression:", regular_expression)
print("Does NOT match", line)
- Here it is in operation
>>> regex_test("man", "A man, a plan, a canal, Panama")
Regular expression: man
Matches: A man, a plan, a canal, Panama
>>> regex_test("xxx", "A man, a plan, a canal, Panama")
Regular expression: xxx
Does NOT match A man, a plan, a canal, Panama
- . matches one of any single character
- Except
newline
- It works the same way as the ? meta-character
on the Unix command line
- Here is an example
>>> regex_test("th.n", "And then I went home")
Regular expression: th.n
Matches: And then I went home
>>> regex_test("th.n", "I am better than you")
Regular expression: th.n
Matches: I am better than you
>>> regex_test("th.n", "I wish I were thiner")
Regular expression: th.n
Matches: I wish I were thiner
- . only matches a single character
- So you must use one . for every character
you are trying to match
>>> regex_test("t..n", "And then I went home")
Regular expression: t..n
Matches: And then I went home
>>> regex_test("t..n", "Is there a taint of scandal?")
Regular expression: t..n
Matches: Is there a taint of scandal?
- * matches zero or more occurrences
of the previous character
- * in regular expressions is similar to the same
character on the Unix command line
- But there is an important difference
- * in Unix matches 0 or more occurrences of
any character
- So the * in regular expressions is
more selective
- Than the * in Unix
- This makes it more powerful
- It will match multiple instances of the character that comes before it
regex_test("t*n", "1234 tttttn abcd")
Regular expression: t*n
Matches: 1234 tttttn abcd
- But it will also match no instances of the character that comes before it
regex_test(("t*n", "1234 n abcd")
Regular expression: t*n
Matches: 1234 n abcd
- Notice there is no "t" in the line
- But it still matches
- You can get the same effect as * in Unix
- But you must use .* in regular expressions to do this
>>> regex_test("t.*n", "abcd tan efg")
Regular expression: t.*n
Matches: abcd tan efg
>>> regex_test("t.*n", "xx the zzn")
Regular expression: t.*n
Matches: xx the zzn
>>> regex_test("t.*n", "123 train 456")
Regular expression: t.*n
Matches: 123 train 456
>>> regex_test("t.*n", "---think---")
Regular expression: t.*n
Matches: ---think---
- So * means one thing in Unix
- And another thing in regular expressions
- This is one of the reasons it takes time to get used to regular expressions
- The + meta-character is like
*
- Because it is used to indicate repetition of the previous character
- * means zero or more occurrences
- But + means one or more occurrences
>>> regex_test("ab+c", "xxx abccccc yyy")
Regular expression: ab+c
Matches: xxx abccccc yyy
>>> regex_test("ab+c", "xxx abbbbbccccc zzz")
Regular expression: ab+c
Matches: xxx abbbbbccccc zzz
- It will not match no occurrences of the character it follows
>>> regex_test("ab+c", "xxx accccc zzz")
Regular expression: ab+c
Does NOT match xxx accccc zzz
- Unlike the * meta-character
>>> regex_test("ab*c", "xxx accccc zzz")
Regular expression: ab*c
Matches: xxx accccc zzz
- ? is also a repetition meta-character
- It means zero or one occurrences of the previous character
- In other words, it means the previous character is optional
>>> regex_test("ab?c", "qqq abc jjj")
Regular expression: ab?c
Matches: qqq abc jjj
>>> regex_test("ab?c", "123 ac 456")
Regular expression: ab?c
Matches: 123 ac 456
>>> regex_test("ab?c", "786 abbc vvv")
Regular expression: ab?c
Does NOT match 786 abbc vvv
- The backslash, \ , is a meta-character
- It turns off the special meaning of the character that immediately
follows it
- It performs the same function as the backslash on the Unix command line
- To search for a meta-character, put a \ in front of
it
>>> regex_test("a\+b", "345 a+bcde")
Regular expression: a\+b
Matches: 345 a+bcde
- If you don't turn off the meta-character you won't get a match
>>> regex_test("a+b", "906 a+bcde")
Regular expression: a+b
Does NOT match 906 a+bcde
- To match more than one meta-character
- Put \ in front of each
>>> regex_test( "a\+\+\+b", "567 a+++bcde")
Regular expression: a\+\+\+b
Matches: 567 a+++bcde
- The \ is also used in character classes
Character Classes
- Character classes match a single occurence of a set of characters
- A character class is represented by a \ in front of a single letter
- A lower case character character matches a single character in the set
- An upper case matches a single character not in the set
\d and \D Character Classes
- \d matches a single digit
>>> regex_test("\d", "1234")
Regular expression: \d
Matches: 1234
- \d can be used with a repetition meta-character
- to match many occurrences of a digit
>>> regex_test("\d*a", "1234abc")
Regular expression: \d*a
Matches: 1234abc
- \D matches any single character that is not a digit
>>> regex_test("\D", "1a234")
Regular expression: \D
Matches: 1a234
The \w and \W Character Classes
- \w matches any single alphanumeric character and the
underscore, _
- The alphanumeric characters are the letters and the digits
>>> regex_test("\w","---a------------")
Regular expression: \w
Matches: ---a------------
>>> regex_test("\w+","---1234abc------")
Regular expression: \w+
Matches: ---1234abc------
- \W matches any single character that is not a letter
- Or an underscore, _ or a digit
>>> regex_test("\W+","###" )
Regular expression: \W+
Matches: ###
The \s and \S Character Classes
- \s matches any
whitespace character
>>> regex_test("a\sb", "----a b----")
Regular expression: a\sb
Matches: ----a b----
- \S matches any character that is not whitespace
>>> regex_test("\S+", "abcd")
Regular expression: \S+
Matches: abcd
Attendance
Graded Quiz
Class Exercise
Class Quiz