IT 116: Introduction to Scripting
Class 17
Tips and Examples
Review
New Material
Microphone
Graded Quiz
You can connect to Gradescope to take weekly graded quiz
today during the last 15 minutes of the class.
Once you start the quiz you have 15 minutes to finish it.
You can only take this quiz today.
There is not makeup for the weekly quiz because Gradescope does not permit it.
Readings
If you have the textbook you should read Chapter 6,
Files and Exceptions, from
Starting Out with Python.
Solution to Homework 6
I have posted a solution to homework 6
here.
Let's take a look.
Homework 8
I have posted homework 8
here.
It is due this Sunday at 11:59 PM.
Midterm Scores
If you believe I have scored one of your Midterm answers incorrectly you
can request a rescore for that question on Gradescope.
If your score on the Final is significantly better than your score on the Midterm,
I will replace your Midterm grade with that of the Final when calculating your
grade for the course.
If you want to discuss how you can improve your grade in this course,
come see me during Office Hours.
You do not have to make an appointment to see me during Office Hours,
but you may have to wait your turn if I am helping another student.
I want students to pass this course, and will do what I reasonably can to
make that happen.
Your Current Standing in This Course
If you want to know your grade as of this moment, send me an email.
In performing this calculation, I will assume that your score on the final
exam will be the same as that of you Midterm score.
I can do this only for your grade as of the Midterm.
I can't do this for your grade later in the semester.
Final Exam
The final for this course will take place in this room
on Tuesday, December 17th, from 3:00 PM - 6:00 PM.
You will also find this information at the top of the class web page.
Questions
Are there any questions before I begin?
Tips and Examples
Never Assume Your Code Works
- In the American justice system a defendant is presumed innocent ...
- until he or she is proven guilty
- In other words you are innocent until you have been shown to be guilty
- With code the opposite is true
- You should always assume your code is not working ...
- until you can test it to prove otherwise
- This means your work is not done once you have written the code
- It is only finished when you can prove it works
Review
Functions Returning Boolean Values
Storing Functions in Modules
- A
module
is a file containing definitions of Python functions ...
- and assignment statements for variables
- To use these functions in a script, you import the module
- The filename of the module must
- End in .py
- The module name cannot be a keyword
Attendance
New Material
Files
- Programs need data to do their work
- We can use the
input
function to get data from the user
- But manual data entry is slow and often inaccurate
- Some tasks involve processing a large amount of data
- When a large company does its payroll it needs data on every employee ...
- which may involve thousands of records
- When processing large amount of data we need to use
files
- A file is simply a linear arrangement of data ...
- on some long term storage device
- That storage device might be a
- Hard disk
- Flash drive
- CD ROM
- SSD card
Types of Files
- All files consist of binary numbers
- But those numbers can be interpreted in one of two ways
- Consider the following text file
$ cat fruit.txt
grapes
pears
oranges
cranberries
apples
melons
blueberries
- The Unix
od
(octal dump) command shows how
this file is stored on disk
$ od -b fruit.txt
0000000 147 162 141 160 145 163 012 160 145 141 162 163 012 157 162 141
0000020 156 147 145 163 012 143 162 141 156 142 145 162 162 151 145 163
0000040 012 141 160 160 154 145 163 012 155 145 154 157 156 163 012 142
0000060 154 165 145 142 145 162 162 151 145 163 012
0000073
- The first column is the offset
- It is how far the data on the right is from the start of the file
- Every column after the first shows the
octal
value of a
byte
in the file
- Each byte of the file represents a character in Unicode
- To see the characters, I can run
od
with the -c
option
od -c fruit.txt
0000000 g r a p e s \n p e a r s \n o r a
0000020 n g e s \n c r a n b e r r i e s
0000040 \n a p p l e s \n m e l o n s \n b
0000060 l u e b e r r i e s \n
0000073
- A binary file, such as a JPEG image also consist of numbers
- Consider the JPEG file that holds the following image
- When we look at this with
od
we get
$ od -b square.jpg
0000000 377 330 377 340 000 020 112 106 111 106 000 001 001 000 000 110
0000020 000 110 000 000 377 341 000 200 105 170 151 146 000 000 115 115
0000040 000 052 000 000 000 010 000 004 001 032 000 005 000 000 000 001
...
- Trying to interpret it as text gives us garbage
$ od -c square.jpg
0000000 377 330 377 340 \0 020 J F I F \0 001 001 \0 \0 H
0000020 \0 H \0 \0 377 341 \0 200 E x i f \0 \0 M M
0000040 \0 * \0 \0 \0 \b \0 004 001 032 \0 005 \0 \0 \0 001
...
- Both files contain binary numbers
- But the numbers in the text file represent Unicode characters
- They make sense when we view them as text
- But the numbers in the JPEG file make no sense when viewed as text
- To use a file in Python we have to specify if it is text or binary
- Why?
- Because text files have a simple structure
- Each text file is a collection of lines
- A line is a series of characters followed by the newline character,
\n
- Binary files do not have this structure
- We will only be using text files in this course
Identifying a File
- You need two pieces of information to identify a file
- Operating systems have different rules about the characters in a filename
- Linux and Unix are
case sensitive
- This means that memo.txt and
MEMO.txt are different file names
- Windows is not case sensitive
- You cannot have two files with the same name ...
- inside this same directory
- The location of a file is given by a
path
- A path is a list of directories
- The path tells what directory holds the file
- It is a list of directories from some starting point ...
- to the directory that holds the file
- A
pathname
is a path ...
- followed by the file name
- A pathname gives the name and location of a file
- So it uniquely specifies each file
Storing Data on a Hard Disk
- Every type of storage device has its own way of storing data
- Hard disks consists of rapidly spinning disks
- The disks contain information in the form of magnetic fields
- The disk is made up of material that can be magnetized
- This material consists of millions of small regions
- In each region the magnetic field points in some direction
- Either up or down
- Each of these regions represents a binary digit called a
bit
- One direction means 0 and the other means 1
- The surface of the disk is broken up into rings called tracks
- And each track is broken up into sectors
- Most files consist of many sectors
- The sectors do not have to be one right after the other
- The sectors are usually scattered all over the disk
Files and the Kernel
- A file can be scattered over different sectors of the disk
- So how does the operating system know where to get the different parts?
- It doesn't have to
- That's the job of the
device driver
- The device driver is software that allows a device to talk to the computer
- Device drivers must be provided by the manufacturer
- If you install a new printer on your machine ...
- you must select the right device driver to talk to it
- At the core of every operating system is a program called the
kernel
- The kernel handles all interactions with hardware
- When a program needs a file it must ask the kernel ...
- to set up a connection to the file
- The kernel does not have to worry about where the file is stored on the disk
- That's the job of the device driver
- This division of labor makes everything work better
- It also means that you can work with files on many different devices
Working with a File in Python
- All operations inside a computer take place in RAM
- RAM is working memory
- A file lives on a some external storage device ...
- like a hard disk or thumb drive
- But any interactions with the file takes place in RAM
- RAM is fast and expensive
- But external storage is slower and cheap
- RAM space is limited ...
- and files can be very large
- So you usually only have a small part of a file in RAM ...
- at any one time
- The kernel sets aside a section of RAM to work with the file
- This section holds the part of the file that has been read into RAM ...
- and various other pieces of bookkeeping information
- Like the location of the last line read from disk
Objects
- Sometimes you need several pieces of information about one thing
- If we were writing software for a car dealer we would need to know
- Manufacturer
- Model
- Year
- Color
- You would need many variables and functions to work with this data
- These variables and functions might be scattered over different parts of the
code
- This could become unwieldy ...
- and make the code hard to understand and maintain
- Objects
were created to deal with this problem
- An object is region of RAM ...
- that contains all the data about some thing
- It also has all the functions that deal with this data
- The data is contained in variables inside the object
- The functions that work on this data are called
methods
- Objects have no name
- They only have a location in RAM
- We create an object by using a special function
- This function returns a pointer to the object in RAM
- We store this value of this pointer in a variable ...
- and we use this variable to work with the object
Accessing the Data in Files
- There are two ways that you can get the data in a file
- In sequential access the interpreter starts at the beginning of the file ...
- and travels through the file executing each statement ...
- until it gets to the end
- Sequential access is how a tape recorder works
- In direct access you go directly to the part of the file that holds the data
- This is like getting the number of a book from the library card index ...
- and going directly to the shelf that holds the book
- We will only be using sequential access in this class
Opening a File
- To work with a file we have to create a
file object
- We do this using the built-in function
open
- The
open
function does two things
- It creates a file object
- It returns a reference to the object
- The reference is a memory location that is stored in a variable
- We use the variable to work with the object
- The picture in memory looks like this
- Here is the format for using the
open
function
FILE_VARIABLE = open(FILENAME, MODE)
- FILE_VARIABLE is the name of the variable that
holds the reference
- FILENAME is a string giving the name and location
of the file
- MODE is a letter specifying what we want to do
with the file
- The three most commonly used
access modes
are
Mode |
Description |
"r" |
Open a file for reading |
"w" |
Open a file for writing |
"a" |
Open a file for writing to the end of the file |
- The a in the last mode stands for "append"
- Opening an existing file in this mode does not destroy its content
- Instead, new lines will be added to the bottom of the file
- If a file is open for reading it cannot be changed
- If you try to open a file for reading ...
- and the file does not exist ...
- you will get an error
- If you try to open a file for writing or appending ...
- that does not exists ...
- the kernel will create the file for you
- If you open a file for writing that already exists ...
- its contents will be replaced by whatever you write to it
- The mode argument to the
open
command must be given as a
string
- To open the file memo.txt for reading,
I would write
file = open("memo.txt", "r")
- Not
file = open("memo.txt", r)
Specifying the Location of a File
- A file is identified by two pieces of information
- If the file is in your current directory things are simple
- All you need is the name of the file
- If I open a file for writing with the following statement
test_file = open("test.txt", "w")
- the Python interpreter will ask the kernel ...
- to look for the file test.txt
in my current directory
- To open the file test.txt
in my /home/ghoffman directory on
pe15
I would write
test_file = open("/home/ghoffman/tmp/test.txt", "w")
- To open the file in C:\Users\ghoffman of a Windows
machine I would write
test_file = open(r"C:\Users\ghoffman\test.txt", "w")
- Note the r before the pathname
- The r stands for raw
- The problem is that Windows uses the backslash character,
\, ...
- to mark the end if a directory name
- Python uses this character in an
escape sequence
- The r before the pathname string ...
- tells Python to ignore the special meaning of \
- We will only use relative pathnames in this class
Writing Data to a File
- Objects contain values ...
- and the functions that work with those values
- The functions inside objects are called methods
- To call a method you must use dot notation
- Dot notation for methods has the following format
OBJECT_VARIABLE.METHOD_NAME
- To write data to a file we must first open it for writing
teams_file = open("teams.txt", "w")
- This will give you a file object pointed to by the variable
teams_file
- Now we can use the write method of this file object ...
- to send some text to the file
$ cat write_al_east.py
# writes the names of the teams in the American League East
teams_file = open("teams.txt", "w")
teams_file.write("Boston Red Sox\n")
teams_file.write("Baltimore Orioles\n")
teams_file.write("Toronto Blue Jays\n")
teams_file.write("Tampa Bay Rays\n")
teams_file.write("New York Yankees\n")
$ python3 write_al_east.py
$ cat teams.txt
Boston Red Sox
Baltimore Orioles
Toronto Blue Jays
Tampa Bay Rays
New York Yankees
- Notice that I had to add the newline character
\n at the end of every line
- Otherwise all the text would be on a single line
Reading Data From a File
Ethics and Technology
The Spiderman Principle
Class Exercise
Class Quiz