IT 116: Introduction to Scripting
Class 17
Today's Topics
Tips and Examples
Review
New Material
Homework 8
I have posted homework 8 here.
It is due the Sunday at 11:59 PM.
Tips and Examples
Review
Functions Returning Boolean Values
Storing Functions in Modules
- A modules
is a file of Python functions and variables
- To use these functions in a script, you import the module
- The filename of the module must
- End in .py
- The module name cannot be a keyword
New Material
Files
- Programs need to be able to get data to do their work
- We can use the
input
function to get data from a user
- But manual data entry is slow and often inaccurate
- What if we needed a large amount of data to perform some task
- When a large company does its payroll it needs data on every employee
- When processing large amount of data we need to use
files
- A file is simply a linear arrangement of data on some long term storage device
- That storage medium might be a
- Hard disk
- Flash drive
- CD ROM
- SSD card
Types of Files
- All files consist of binary numbers
- But those numbers can be interpreted in two ways
- Consider the following text file
$ cat fruit.txt
grapes
pears
oranges
cranberries
apples
melons
blueberries
- The Unix
od
(octal dump) command shows how
this file is stored on disk
$ od -b fruit.txt
0000000 147 162 141 160 145 163 012 160 145 141 162 163 012 157 162 141
0000020 156 147 145 163 012 143 162 141 156 142 145 162 162 151 145 163
0000040 012 141 160 160 154 145 163 012 155 145 154 157 156 163 012 142
0000060 154 165 145 142 145 162 162 151 145 163 012
0000073
- The first column is the offset
- It is how far the data on the right is from the start of the file
- Every column after the first shows the
octal value of a
byte in the file
- Each byte of the file represents a character in Unicode
- To see the characters, I can run
od
with the -c
option
od -c fruit.txt
0000000 g r a p e s \n p e a r s \n o r a
0000020 n g e s \n c r a n b e r r i e s
0000040 \n a p p l e s \n m e l o n s \n b
0000060 l u e b e r r i e s \n
0000073
- A binary file, such as a JPEG image also consist of numbers
- Consider the JPEG file that holds the following image
- When we look at this with
od
we get
$ od -b square.jpg
0000000 377 330 377 340 000 020 112 106 111 106 000 001 001 000 000 110
0000020 000 110 000 000 377 341 000 200 105 170 151 146 000 000 115 115
0000040 000 052 000 000 000 010 000 004 001 032 000 005 000 000 000 001
...
- Trying to interpret it as text gives us garbage
$ od -c square.jpg | head
0000000 377 330 377 340 \0 020 J F I F \0 001 001 \0 \0 H
0000020 \0 H \0 \0 377 341 \0 200 E x i f \0 \0 M M
0000040 \0 * \0 \0 \0 \b \0 004 001 032 \0 005 \0 \0 \0 001
...
- Both files contain binary numbers
- But the numbers in the text file represent Unicode characters
- They make sense when we use them as text
- The JPEG numbers make no sense if we try to use them as text
- To use a file in Python we have to specify if it is text or not
- Why?
- Because text files have a simple structure
- Each file is a collection of lines
- A line is a series of characters followed by the newline character \n
- We will only be using text files in this course
Identifying a File
- We need two pieces of information to identify a file
- Operating system has different rules about the characters in a filename
- Linux and Unix are
case sensitive
- This means that memo.txt and MEMO.txt
are different file names
- Windows is not case sensitive
- You cannot have two files with the same name inside any one directory
- The location of a file is given by a
path
- A path is a list of directories
- The path tells what directory holds the file
Storing Data on a Hard Disk
- Every type of storage device has it's own way of storing data
- Hard disks consists of rapidly spinning disks
- The disks contain information in the form of magnetic fields
- The disk is made up of material that can be magnetized
- This material consists of billions of small regions
- In each regions the magnetic field points in some direction
- Each of these fields represents a binary digit called a
bit
- The direction can be either up or down
- One direction means 0 and the other means 1
- The surface of the disk is broken up into rings called tracks
- And each track is broken up into sectors
- Most files consist of many sectors
- And the sectors do not have to be one right after the other
- The sectors of a file can be scattered all over the hard disk
Files and the Kernel
- A file can be scattered over different sectors of the disk
- So how does the operating system know where to get the different parts?
- It doesn't have to
- That's the job of the
device driver
- The device driver is software that allows the device to talk to the computer
- Device drivers must be provided by the manufacturer
- If you install a new printer on your machine
you the right device driver to talk to it
- At the core of every operating system is a program called the
kernel
- The kernel handles all interactions with hardware
- When a program needs a file
it must ask the kernel to get a connection to the file
- The kernel in turn gets that data from the device driver
- The kernel does not have to worry about where the file is stored on the disk
- That's the job of the device driver
- Give the kernel the name and location of a file
- And it will find the device driver to get it
- This division of labor makes everything work better
- It means that you can work with files on many different devices
Working with a File in Python
- All operations inside a computer take place in RAM
- RAM is working memory
- A file lives on a some storage device
- But any interactions with it take place in RAM
- RAM is fast and expensive
- But external storage is slower and cheap
- RAM is limited and files can be very large
- So you usually only have a small part of a file in RAM at any one time
- But Python needs to know how to get the rest of the file when needed
- The information needed to do this is kept in a file object
- The object contains all the bookkeeping data that is needed to work with the file
Objects
- Sometimes you need several pieces of information about one thing
- If we were writing software for a car dealer we would need to know
- Manufacturer
- Model
- Year
- Color
- We might need to write several functions to work with this data
- These variables and functions might be scattered in different parts of the code
- This could become unwieldy and make the code hard to maintain
- Objects
were created to deal with this situation
- An object is region of RAM that contains all the data about some thing
- It also has all the functions that deal with this data
- The data is contained in variables inside the object
- The functions that work on this data are called
methods
- Objects have no name
- They only have a location in RAM
- We create an object by using a special function
- This function returns a pointer to the object RAM
- We store this value of this pointer in a variable
- And we use this variable to work with the object
Accessing the Data in Files
- There are two ways that the data of a file can be accessed
- In sequential access the script starts at the beginning of the file
- And travels through the file until it gets the data it needs
- Sequential access is how a tape recorder works
- In direct access you go directly to the part of the file that holds the data you want
- This is like getting the number of a book from the library card index
- And going directly to the shelf that holds the book
- We will only be using sequential access in this class
Opening a File
- To work with a file we have create a
file object
- We do this using the built-in function
open
- The
open
function does two things
- It creates a file object
- It returns a reference to the object
- The reference is a memory location that is stored in a variable
- We use the variable to work with the object
- Here is the format for using the
open
function
FILE_VARIABLE = open(FILENAME, MODE)
- FILE_VARIABLE is the name of the variable that holds the reference
- FILENAME is a string giving the name and location of the file
- MODE is a letter specifying what we want to do with the file
- The three most commonly used
access modes
are
Mode |
Description |
"r" |
Open a file for reading |
"w" |
Open a file for writing |
"a" |
Open a file for writing to the end of the file |
- If a file is open for reading it cannot be changed
- If a file is opened for writing it will be created if it does not exist
- If the file already exists, it will be replaced with new content
- The a in the last mode stands for "append"
- Opening an existing file in this mode does not destroy it's content
- Instead, new lines will be added to the bottom of the file
- The mode argument to the
open
command must be given as a string
- To open the file memo.txt for reading
- I would write
file = open("memo.txt", "r")
- not
file = open("memo.txt", r)
Specifying the Location of a File
- A file is identified by two pieces of information
- If the file is in your current directory then things are simple
- All you need is the name of the file
- If I open a file for writing with the following statement
test_file = open("test.txt", "w")
- the Python interpreter will look for the file test.txt
- in the directory where the script is located
- But what if the file is in a different directory?
- Then you must specify both the filename and the location
- This is done by using a
pathname
- The pathname consists of two parts
- The path to the file
- The filename
- A path is simply a list of directories starting a some particular directory
- And ending with the directory where the file lives
- In Linux and Unix each directory in a path is separated from the next
by a /
/home/ghoffman/it116
- In Windows you have to use a backslash, \, instead
c:\Users\ghoffman
- There are two types of path
- The difference between the two types is the directory where they start
- A relative path
starts with your current directory
- In this case that would be the directory that holds the script
- We will only be using relative pathnames in this course
- An absolute path
starts with the root directory
- Root is a special directory at the top of the filesystem
- On Unix systems the root directory is written /
- The situation on Windows is different
- Disc drives on Windows are identified by a capital letter
- Each drive has its own root directory
- The top directory on the C drive on a Windows machine is C:\
- Notice that Unix uses a slash, /
- While Windows uses a backslash, \
- Consider a file test.txt were in my home directory on Unix
- It's absolute pathname would be /home/ghoffman/test.txt
- Consider another file in ghoffman under
Users directory on the C drive of a Windows machine
- It's absolute pathname would be C:\Users\test.txt
- The fact Windows and Unix use different characters to separate directory names creates problems
- To open the file test.txt
in my /home/ghoffman directory on users3
I would write
test_file = open("/home/ghoffman/tmp/test.txt", "w")
- To open the file in C:\Users\ghoffman on a Windows machine
I would write
test_file = open(r"C:\Users\ghoffman\\test.txt", "w")
- Note the r before the pathname
- The r stands for raw
- The problem is that Windows uses the backslash character, \
- Python uses this character in an
escape character
- The r before the pathname string tells Python to ignore
the special meaning of \
- Once again, we will only use relative pathnames in this class
Writing Data to a File
- Objects contain values and the functions that work with those values
- The functions inside objects are called methods
- To call a method you must use dot notation
- Dot notation for methods has the following format
OBJECT_VARIABLE.METHOD_NAME
- To write data to a file we must first open it for writing
teams_file = open("teams.txt", "w")
- This will give you a file object referenced by teams_file
- Now we can use the write method of this file open to send some text to the file
$ cat write_al_east.py
# writes the names of the teams in the American League East
teams_file = open("teams.txt", "w")
teams_file.write("Boston Red Sox\n")
teams_file.write("Baltimore Orioles\n")
teams_file.write("Toronto Blue Jays\n")
teams_file.write("Tampa Bay Rays\n")
teams_file.write("New York Yankees\n")
$ python3 write_al_east.py
$ cat teams.txt
Boston Red Sox
Baltimore Orioles
Toronto Blue Jays
Tampa Bay Rays
New York Yankees
- Notice that I had to put the newline escape character \n
at the end of every line I write
- Otherwise all the text would be on a single line
Reading Data From a File
Attendance
Class Quiz