IT 117: Intermediate Scripting

IT 117: Intermediate Scripting
Class 14 - Midterm Review

Review

Dictionaries
Dictionary Literals
Getting Values from a Dictionary
Changing a Dictionary Value
Looping Through a Dictionary
When To Use a Dictionary
Adding Elements to a Dictionary
Lists versus Dictionaries
The in And not in Operators
Deleting Elements from A Dictionary
Getting the Number of Elements in a Dictionary
Sets in Mathematics
Set Membership
Subsets and Supersets
Union of Sets
Intersection of Sets
Difference between Sets
Sets in Python
Creating a Set in Python
Set Literals
The Empty Set
Adding Elements to a Set
Removing Elements from a Set
The Size of a Set
When Are Sets Equal?
for Loops with Sets
Testing for Set Membership
Union of Sets in Python
Intersection of Sets in Python
Difference between Sets in Python
Subsets and Supersets in Python
min And max with Sets
The os Module
os.getcwd()
os.listdir(path)
os.chdir(path)
Running Unix Commands within Python
os.environ
The os.path Module
os.path.isfile(path) and os.path.isdir(path)
os.path.basename(path)
The sys Module
Getting Values from the Command Line
Leaving a Running Script
Usage Messages
The Characters in Regular Expressions
Ordinary Characters in Regular Expressions
Meta-characters in Regular Expressions
The . Meta-character
The * Meta-character
The + Meta-character
The ? Meta-character
The \ Meta-character
Character Classes
\d and \D Character Classes
The \w and \W Character Classes
The \s and \S Character Classes
Getting Strings from a Match
The ( ) Meta-characters
Repetition in Regular Expressions
Specifying a Range of Repeating Characters
Creating Custom Character Classes
Ranges of Characters in a Character Class

Homework 7

I have posted homework 7 here.

It is NOT due this coming Sunday.

Instead it is due Sunday, October 30th.

This is to give you time to study for the midterm.

And to give me time to score it.

Midterm

The Midterm exam for this course will be held on Tuesday, October 25th.

The exam will be given in this room.

It will consist of questions like those on the quizzes along with questions asking you to write short segments of Python code.

60% of the points on this exam will consist of questions from the Ungraded Class Quizzes.

The other 40% will come from four questions that ask you to write a short segment of code.

Today's class will be a review session.

You will only be responsible for the material in the Class Notes for today's class on the exam.

The Midterm is given on paper.

I scan each exam paper and upload the scans to Gradescope.

I score the exam on Gradescope.

You will get an email from Gradescope with your score when I am done.

The Midterm is a closed book exam.

You are not allowed to any resource other than what is in your head while taking the exam.

Cheating on the exam will result in a score of 0 and will be reported to the Administration.

Remember your Oath of Honesty.

To prevent cheating, certain rules will be enforced during the exam.

Quiz 5

Let's look at the answers to Quiz 5

No Class Exercise or Class Quiz

Today is a review session.

There will no Class Exercise or Class Quiz today.

Review

Dictionaries

Dictionaries are like lists where each element is a pair of values
The first of the two parts of a dictionary entry is a is the key
And the second is a value associated with that key
You use the key to get the value
So an entry in a Python dictionary is a key-value pair
You cannot have a dictionary entry that is a key with no value
Or a value with no key

Dictionary Literals

A literal is a value written out inside the code
Dictionary literals contain a list of entries separated by commas
The entries consist of a key-value pair
And they are enclosed in curly braces, {}

The key and value are separated by a colon, :

>>> digit_names = {1 : "one", 2 : "two", 3 : "three"} 
>>> digit_names
{1: 'one', 2: 'two', 3: 'three'}

Getting Values from a Dictionary

To get the value of associated with a key you use the [ ] operator
You place this operator after the variable pointing to the dictionary
And you place the key inside the [ ]

So if I have the following dictionary

>>> students = {"023413" : "Alan Smith", "01234" : "John Doe"}

I can get the name of the student with ID 01234 like this
```
>>> students["01234"]
'John Doe'
```

Changing a Dictionary Value

The way you change a value in a dictionary is similar to what you do in a list
You use an assignment statement
The left hand side is the dictionary variable with a key inside [ ]

The new value appears on the right hand side of the assignment statement

>>> students
{'01234': 'John Doe', '023413': 'Alan Smith'
>>> students["023413"] = "Al Smith"
>>> students
{'01234': 'John Doe', '023413': 'Al Smith'}

Looping Through a Dictionary

You can loop through a dictionary using a for loop
The loop variable gets the values of each key
As it loops through the code

You can use the key to get the value

>>> scores = {"amy" : 100, "bill": 95, "dave" : 60, "sally" : 95}
>>> for name in scores:
...     print(name, scores[name])
... 
dave 60
amy 100
sally 95
bill 95

You can print the keys in sorted order using the sorted function

If you run sorted on a dictionary it will return a list of sorted keys

>>> for name in sorted(scores):
...     print(name, scores[name])
... 
amy 100
bill 95
dave 60
sally 95

When To Use a Dictionary

Lists and tuples are used when the elements are not particularly unique
In a list of quiz scores there is nothing special about an individual score
But every student is different
Each student has a unique identity
Dictionaries are used to store data about unique things

Adding Elements to a Dictionary

We define an empty dictionary like this
```
email_addresses = {}
```
Dictionaries have no methods to add entries to an existing dictionary
When you use the [ ] to assign a value to a new key

You have created a new dictionary value

>>> email_addresses
{}
>>> email_addresses["joe"] = "joe@gmail.com"
>>> email_addresses
{'joe': 'joe@gmail.com'}

The entry will only be added
If the key is not already used in the dictionary
If the key is already in the dictionary

It will change the value associated with that key

>>> email_addresses
{'joe': 'joe@gmail.com'}
>>> email_addresses["joe"] = "bigmanjoe@hotmail.com"
>>> email_addresses
{'joe': 'bigmanjoe@hotmail.com'}

Lists versus Dictionaries

Both lists and dictionaries are objects that hold multiple values
In a list you access a value by its index
In a dictionary you access a value by its key
You can think of a list as a collection of values
While a dictionary is a collection of variables
A variable is a place in memory with a name
And a value

The `in` And `not in` Operators

The in and not in operators work with dictionaries
The same way they work in sequences

It tells whether the dictionary contains a key

>>> digit_names = {"one":1, "two":2, "three":3, "four":4, "five":5 }
>>> "one" in digit_names
True
>>> 1 in digit_names
False

The not in operator returns True if the first operand is not a key in the dictionary
```
>>> "one" not  in digit_names
False
>>> 1 not in digit_names
True
```

Deleting Elements from A Dictionary

To delete an entry in a dictionary you use a delete statement
The general format of such a statement is
```
del DICTIONARY_NAME[KEY] 
```
Where DICTIONARY_NAME is a variable that points to a dictionary object

And KEY is the key of the entry to be deleted

>>> words_integers
{'three': 3, 'five': 5, 'two': 2, 'one': 1, 'four': 4}
>>> del words_integers["five"]
>>> words_integers
{"three": 3, "two": 2, "one": 1, "four": 4}

If you use a key that does not exist

You will get a KeyError exception is raised

del words_integers["six"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'six'

Getting the Number of Elements in a Dictionary

The len function returns the number of entries in a dictionary

>>> email_addresses
{'Chris': 'chrisk@yahoo.com', 'Alan': 'alanh@gmail.com'}
>>> len(email_addresses)
2
>>> words_integers
{'five': 5, 'one': 1, 'six': 6, 'two': 2, 'four': 4, 'three': 3}
>>> len(words_integers)
6

Sets in Mathematics

A set is an unordered collection of distinct objects
You can put anything into a set
But you can't add a value that is already there
If you try, the set will not change
The set with nothing in it is called the empty set

Set Membership

If the value x is in set A we say that x is a member of A

Subsets and Supersets

If you have two sets, A and B
And all the values in A are also in B
Then A is a subset of B
Another way of describing this situation is that B is a superset of A
The situation is shown in the following diagram

Union of Sets

Again start with two sets A and B
The set of all the elements A and all the elements of B is called the union of A and B
In the diagram below the union of A and B is shown in red

Intersection of Sets

The set of elements which are member of A
And also members of B is the
Is the intersection of A and B
In the diagram below, the intersection of A and B is shown in red

Difference between Sets

The set of all elements of A not in B is the difference between A and B
In the diagram below, the difference between A and B is shown in red

Sets in Python

A set in Python is an object that holds an unordered collection of unique items
The items inside a set can by of any data type
As long as the data type is immutable

Creating a Set in Python

You create a set in Python using the built-in set function
set takes a single argument
That argument must be iterable
A Python object is iterable if you can use it in a for loop

Here is an example

>>> num_list = [1,2,3]
>>> num_set  = set(num_list)
>>> num_set
{1, 2, 3}

Set Literals

A list literal is has values, separated by commas, inside square brackets
```
>>> list_1 = [1, 2, 3, 4, 5]
>>> type(list_1)
<class 'list'>
```

A set literal uses curly braces

>>> nonsense = {"foo", "bar", "bletch"}
>>> type(nonsense)
<class 'set'>

The Empty Set

We can use empty square brackets to create an empty list
```
>>> empty = []
>>> type(empty)
<class 'list'>
```
But we cannot use empty curly braces to create a empty set

The empty curly braces are an empty dictionary

>>> empty = {}
>>> type(empty)
<class 'dict'>

When the creators of Python got to set literals
They ran out of symbols to enclose the elements
So they had to reuse { }
So how do you create an empty set?
You run set with no arguments
```
>>> set_1 = set()
>>> set_1
set()
```
You cannot create an empty set like this
```
>>> set_2 = ()
```
This creates not a set but a tuple
```
>>> set_2 = ()
```
That is why an empty set
```
empty_set = set()
```
looks like this
```
>>> empty_set
set()
```

Adding Elements to a Set

Sets are mutable objects so they can be changed at any time
There are two set methods that can be used to add elements to a set
The add method adds a single element to a set
So if we start with an empty set
```
>>> s1 = set()
>>> s1
set()
```
We can use add to add individual elements
```
>>> s1.add(1)
>>> s1
{1}
```
If you add an element that is already in the set

Nothing will change

>>> s1.add(1)
>>> s1
{(3, 3, 3), 1, 'two'}

But it won't raise an exception

Removing Elements from a Set

To remove an element from a set use one of two methods
- discard()
- remove()
Both methods take a single argument

The value that is to be removed

>>> numb_set
{1, 2, 3, 4, 5}
>>> numb_set.discard(2)
>>> numb_set
{1, 3, 4, 5}
>>> numb_set.remove(4)
>>> numb_set
{1, 3, 5}

The only difference is what happens when you remove a value that is not in the set

discard will say nothing

>>> numb_set.discard(2)
>>> numb_set
{1, 3, 5}

But remove will raise an exception

>>> numb_set.remove(4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 4

The Size of a Set

The len function gives the size of a set

>>> set_1 = {1, 2, 3}
>>> len(set_1)
3
>>> set_2 = {3, 2, 1}
>>> len(set_2)
3
>>> set_3 = {"one", "two", "three", "four"}
>>> len(set_3)
4

When Are Sets Equal?

If two sets have the same elements

They are equal

>>> set_1 = {1, 2, 3}
>>> set_2 = {3, 2, 1}
>>> set_1 == set_2
True

`for` Loops with Sets

Sets are iterable
This means that they can be used in a for loop

The general format of a for loop looks like this

for LOOP_VARIABLE in ITERABLE_OBJECT:
    STATEMENT
    ...

If you use a set in a for loop you will get each element in the set

>>> names = {"amy", "bill", "dave", "sally"}
>>> for name in names:
...     print(name)
... 
1
2
3
4
5

To print the elements in sorted order use the sorted function

>>> >>> for name in names:
...     print(name)
... 
dave
amy
sally
bill

Testing for Set Membership

The in operator tells you if a value is contained in a set

>>> set_1
{1, 2, 3, 4, 5}
>>> 7 in set_1
False
>>> 8 in set_1
False
>>> 3 in set_1
True

The not in operator tells you if an element is not in a set
```
>>> 8 not in set_1
True
>>> 3 not in set_1
False
```

Union of Sets in Python

We can form the union of two sets in Python by using the union method

>>> A = {1, 4, 8, 12}
>>> B = {1, 2, 6, 8}
>>>  A.union(B)
{1, 2, 4, 6, 8, 12}

The union operation is symmetrical
This means that
```
A.union(B)
```
is the same as
```
B.union(A)
```

Intersection of Sets in Python

Set objects in Python have an intersection method

>>> A
{8, 1, 12, 4}
>>> B
{8, 1, 2, 6}
>>> A.intersection(B)
{8, 1}

Intersection is also symmetrical so
```
A.intersection(B)
```
is the same as
```
B.intersection(A)
```

Difference between Sets in Python

In Python, we can use the set difference method

>>> A
{8, 1, 12, 4}
>>> B
{8, 1, 2, 6}
A.difference(B)
{12, 4}

Set difference is not a symmetric operation

Subsets and Supersets in Python

We can tell if one set is a subset of another using the issubset method

If we have two sets

>>> A = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
>>> B = {1, 3, 5, 7, 9}

We can ask if one set is the subset of another like this
```
>>> A.issubset(B)
False
>>> B.issubset(A)
True
```
We can ask if one set is a superset of another using the issuperset method
```
>>> A.issuperset(B)
True
>>> B.issuperset(A)
False
```

`min` And `max` with Sets

To find the set element with the maximum value you can use the max built-in function
```
>>> B = {1, 3, 5, 7, 9}
>>> max(B)
9
```
To find the set element with the minimum value use the min function
```
>>> min(B)
1
```

The os Module

The os module allows you do thing only the operating system can do

os.getcwd()

os.getcwd() returns a string with the pathname of your current directory
```
>>> os.getcwd()
'/home/ghoffmn'
```
The name means get current working drectory

os.listdir(path)

os.listdir(path) returns a list of the contents of a directory

>>> course_dir = os.listdir("/courses/it117/s14/ghoffmn")
>>> for entry in course_dir :
...     print(entry)
... 
GROUP
MAIL
cmanuel1
jpinto
fortinsy
ebeazer
...

os.listdir does not return the special entries . and ..
The list is not in any particular order
But you can use the built-in function sorted() to change that

os.chdir(path)

To move to another directory you need to use os.chdir(path)

>>> os.chdir("/home/ghoffmn/assignments_submitted")
>>> os.getcwd()
'/home/ghoffmn/assignments_submitted'

Running Unix Commands within Python

You can run a Unix command from within Python using os.system()
The argument to os.system() is a string which contains a Unix command
os.system() runs the command

And returns the exit status

>>> result = os.system("touch foo.txt")
>>> result
0
>>> os.listdir(".")
['dir1', 'foo.txt']

When the exit status is 0 the the command ran without error
If the number is greater than 0 the command did not work
os.system() will also work with Windows

os.environ

One of the ways we customize a Unix environment is with shell variables
os.environ is not a function
It is a module variable which points to a dictionary
The keys in the dictionary are names of variables
The values in the dictionary are the values of the variables
To get the value of a shell variable use the [ ] operator

With the name of the variable inside the square brackets

>>> os.environ["HOME"]
'/home/ghoffmn'
>>> os.environ["SHELL"]
'/bin/bash'
>>> os.environ["PATH"]
'/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games'

The os.path Module

The os.path module contains functions which operate on pathnames
os.path is part of the os module
If you import os you will also import os.path

os.path.isfile(path) and os.path.isdir(path)

os.path.isfile() and os.path.isdir() are boolean functions

os.path.isfile() returns True if its argument is a file

>>> os.path.isfile("foo.txt")
True
>>> os.path.isfile("dir1")
False

os.path.isdir() returns True if its argument is a directory

>>> os.path.isdir("dir1")
True
>>> os.path.isdir("foo.txt")
    False

os.path.basename(path)

os.path.basename() returns the filename part of a pathname

>>> os.getcwd()
'/home/ghoffmn/tmp'
>>> os.path.basename(os.getcwd())
'tmp'

It should be used in usage messages

The sys Module

Python scripts run inside two environments
- The operating system
- The Python interpreter
The sys module contains variables and functions
That let you interact with the Python interpreter
You must import the sys module before you can use it
```
>>> import sys
```

Getting Values from the Command Line

A script can get the values it needs from the command line
The sys module contains the variable argv
sys.argv is a list variable that contains all the command line arguments
The first command line argument is the pathname used to run the script

Leaving a Running Script

sys.exit() stops a running script
You can use it to stop a script before it gets to the end of the code
Why would you want to do this?
There are many reasons
The most common is when your script encounters an error
Like not getting the command line arguments it needs

The Characters in Regular Expressions

A regular expression is a string of characters forming a pattern
This pattern is compared against a string
Looking for a match
If parts of the string that match the regular expression pattern
We have a match
A regular expression is a string composed of
- Ordinary characters
- Meta-characters
- Character classes

Ordinary Characters in Regular Expressions

Ordinary characters are characters which are not meta-characters
An ordinary character will match itself
So the regular expression "cat" will match the string "cat"
Regular expressions are case sensitive
Upper case characters only match upper case characters
And the same for lower case
Digits are ordinary characters
So the regular expression "5" matches the string "5"

Meta-characters in Regular Expressions

Meta-characters have special meaning in a regular expression

The meta-characters are

    .  ^  $  *  +  ?  {  }  [  ]  \  |  (  )

Every character that is not a meta-character is an ordinary character

The . Meta-character

. matches one of any single character except the newline

The * Meta-character

* is used to repeat the previous character in a regular expression
* matches zero or more occurrences character ahead of it
* in regular expressions is similar to the * in Unix
But there is an important difference
* in Unix matches 0 or more occurrences of any character
* in regular expressions matches 0 or more occurrences
Of the character that comes before it
To get the same effect as *in Unix you must use .*
Note that I said zero or more occurrences

The + Meta-character

The + meta-character is like *
It is used to indicate repetition of the previous character
But * means zero or more occurrences
+ means one or more occurrences

The ? Meta-character

? is also a repetition meta-character
It means zero or one occurrences of the previous character
In other words, it means the previous character is optional

The \ Meta-character

The backslash, \ , is a meta-character
It turns off the special meaning of the character that immediately follows it
It performs the same function as the backslash on the Bash command line
If you wanted to search for a meta-character
You would have to put \ in front of it
The \ is also used in character classes

Character Classes

Character classes match a single occurence of a set of characters
A character class is represented by a \ in front of a single letter
When the letter is lower case the character class matches one occurence
Of any character in the set
When the letter is upper case the character class matches one occurence
Of any character not in the lowercase class of the same letter

\d and \D Character Classes

\d matches a single digit
\d can be used with a repetition meta-character
To match many occurrences of a digit
\D matches any single character that is not a digit

The \w and \W Character Classes

\w matches any single alphanumeric character
And the underscore, _
The alphanumeric characters are the letters and the digits
\W matches any single character that is not a letter
Or an underscore, _
Or a digit

The \s and \S Character Classes

\s matches any whitespace character
\S matches any character that is not whitespace

Getting Strings from a Match

Regular expressions can be used to get parts of the matching string
To extract part of a string from a match we need two things
- The ( ) meta-character
- The group method of a match object

The ( ) Meta-characters

We use ( ) to extract part of a matched string
You place these around the part of the regular expression you want to extract
We need to do two things
- Write a regular expression that matches only the line we want
- Put ( ) meta-characters around the part we want to extract
Let's say that the the following regular expression matches this line
```
\d+\.\d+\.\d+\.\d+.*GET
```
To extract the IP address part we would write
```
(\d+\.\d+\.\d+\.\d+).*GET
```
We can use the group method to get the IP address
But we have to specify which group we are talking about
So we have to give group a number
We only have one set of parentheses
So we use 1 as the argument to group
```
ip_address = match_object.group(1)
```

Repetition in Regular Expressions

If we want to match a certain number of digits
We can use many instances of \d
Like this
```
\d\d\d
```
But there is another way
We can follow \d with curly braces, { }
And put the number of repetitions we want inside the braces
Like this
```
\d{3}
```
This works with all character classes
And all ordinary characters

Specifying a Range of Repeating Characters

We can also use curly braces to specify a range of repetitions
When we do this, the curly braces contain two integers
Separated by a comma
The first integer is the minimum number of repetitions
And the second is the maximum
If we wanted to match either 1, 2, or 3 digits we would write
```
\d{1,3}
```

Creating Custom Character Classes

Character classes are sets of characters
Python provides 6 predefined character classes
- \d matches any digit
- \D matches any character not a digit
- \w matches any alphanumeric character and _
- \W matches any character not an alphanumeric or _
- \s matches any whitespace character
- \S matches any character not a whitespace
But Python lets you define your own character classes
We do this using the [ ] meta-characters
The characters you place inside the square brackets
Are the characters in the character class
If we wanted match a least one occurence of even digits we would write
```
[02468]+
```

Ranges of Characters in a Character Class

What if you wanted to match all lowercase letters?
You can't use \w
It contains too many other characters
I can create a custom character class by listing all the lowercase characters
```
[abcdefghijklmnopqrstuvwxyz]
```
But there is a better way
I can use a range of characters inside the [ ]
I do this by placing a - between two characters
```
[a-z]
```
The characters must be in the correct order

Review

Homework 7

Midterm

Quiz 5

No Class Exercise or Class Quiz

Review

Dictionaries

Dictionary Literals

Getting Values from a Dictionary

Changing a Dictionary Value

Looping Through a Dictionary

When To Use a Dictionary

Adding Elements to a Dictionary

Lists versus Dictionaries

The in And not in Operators

Deleting Elements from A Dictionary

Getting the Number of Elements in a Dictionary

Sets in Mathematics

Set Membership

Subsets and Supersets

Union of Sets

Intersection of Sets

Difference between Sets

Sets in Python

Creating a Set in Python

Set Literals

The Empty Set

Adding Elements to a Set

Removing Elements from a Set

The Size of a Set

When Are Sets Equal?

for Loops with Sets

Testing for Set Membership

Union of Sets in Python

Intersection of Sets in Python

Difference between Sets in Python

Subsets and Supersets in Python

min And max with Sets

The os Module

os.getcwd()

os.listdir(path)

os.chdir(path)

Running Unix Commands within Python

os.environ

The os.path Module

os.path.isfile(path) and os.path.isdir(path)

os.path.basename(path)

The sys Module

Getting Values from the Command Line

Leaving a Running Script

The Characters in Regular Expressions

Ordinary Characters in Regular Expressions

Meta-characters in Regular Expressions

The . Meta-character

The * Meta-character

The + Meta-character

The ? Meta-character

The \ Meta-character

Character Classes

\d and \D Character Classes

The \w and \W Character Classes

The \s and \S Character Classes

Getting Strings from a Match

The ( ) Meta-characters

Repetition in Regular Expressions

Specifying a Range of Repeating Characters

Creating Custom Character Classes

Ranges of Characters in a Character Class

Attendance

The `in` And `not in` Operators

`for` Loops with Sets

`min` And `max` with Sets