IT 117: Intermediate Scripting
Homework 4
Due
Sunday, February 23rd at 11:59 PM
What You Need to Do
- Create the script hw4.py
- Make sure it obeys the rules in
Rules for Homework Scripts
- Make sure the script has a hashbang line and is executable
- Move it to an an hw4
directory on pe15.cs.umb.edu
Setup On Your Machine
- Open a text editor.
I would suggest the text editor built into the program IDLE
.
- Save the file as h4.py
- Copy the file gettysburg.txt from
/home/ghoffman/course_files/it117_files.
Use FileZilla to do this.
Specification
- This script creates a word frequncy dictionary for a
text file
- It reads in a text file
- Then creates a dictionary of word frequencies
- In this dictionary the keys will be words found in the file
- The values would be the number of times the word appeared
- The the script prints a sorted list of all words in the
file and how many times they appeared
- The script contains three functions
- open_file_read
- word_frequencies_create
- word_frequency_print
Functions
open_file_read
word_frequencies_create
- Header
def word_frequencies_create(file):
- This function should read in a file using its
parameter, file, a file object
- It sould return a word frequency dictionary
- A word frequency dictionary is one where
the keys are words
- And the values are the number of times the word appears
in the file
- This function must ignore case
- So "We" and "we" should count as the same word
- Here is the algorithm
create an empty dictionary
for each line in the file
make the line lowercase
turn the line into a list
for each word in the list
if the word is not in the dictionary
create an entry in the dictionary with a value of 1
else
add 1 to the value of the word entry in the dictionary
return the dictionary
word_frequency_print
Test Code
Output
Suggestions
- Write this script in stages
- Test your script at each step
- Print the steps below
- And check them off as you finish each one
-
Create a file with the hashbang line.
Copy and paste the headers for each of the three functions
into the scrpt.
The code block for each of these functions should be the single
Python statement pass
.
Pass does nothing but it will keep the code from giving syntax errors.
Run the code.
You should see nothing.
If you have errors, fix them.
-
Remove the
pass
statement in
open_file_read.
Create a try
/except
statement.
In the try
block, add a line that opens a file for
reading using the filename parameter.
Add a statement that returns a pointer to this object.
In the except
block, write code to print an error
message if the file cannot be opened.
Paste the test code in the bottom of the file.
Comment out the last two lines of the test code by typing
# at the beginning of each line.
Run the code.
You should see
Cannot open xxxxxxx
If you see something else, fix it.
-
Remove the
pass
statement in
word_frequencies_create.
Create an empty dictionary called word_frequencies.
Create a for
loop that prints each line in the file.
Run the code and fix any errors.
-
You need to convert all words into lowercase.
To do this insert an assignment statement before the print
statement that will turn all the uppercase characters in the line to lowercase
using the lower string method.
Print the line.
Run the code and fix any errors.
-
Remove the print statement.
Use the split string method
to create a list of all the words in the file and assign this list
to the variable words.
Print this list.
Run the code and fix any errors.
-
Remove the previous print statement.
Add a new for
loop inside the old for
loop
using word as the loop variable.
Inside the loop print each word.
Run the code and fix any errors.
-
Remove the previous print statement.
Write an if
statement that checks whether the word
is NOT already in the dictionary
word_frequencies.
If the word is not already in the dictionary, create an entry
in the dictionary with the word as the key and a value of 1.
Add a line to print the dictionary.
This line must be outside both for
loops
but still inside the body of the function.
Make sure you get the indentation right.
Run the code.
You should see many words, but all the values should be 1.
Fix any errors you find.
-
Add an
else
clause to the if
statement
inside the second for
loop.
In the code block inside this else
clause
write an assignment statement that sets the variable
count to the value associated with
word inside
word_frequencies.
Now set the value associated with word
to count + 1.
Run the code.
You should see words with many different values.
Fix any errors you find.
-
Remove the print statement from the end of the function.
Replace it with a statement that returns the
word_frequencies dictionary.
Remove the pass
statement from
word_frequency_print code block.
Replace it with a statement that prints the
parameter frequencies.
Run the code.
The output should be similar to what you saw at the last step.
Fix any errors you find.
-
Replace the print statement with a
for
loop that
prints the word and word count value for each
word in the dictionary.
Run the code and fix any errors.
-
Change the
for
loop so it prints the words in alphabetical order.
Output
When you run the program the output should look something like this
Cannot open file xxxxxxx
a 7
above 1
add 1
advanced 1
ago 1
all 1
altogether 1
and 6
any 1
...
war 2
we 10
what 2
whether 1
which 2
who 3
will 1
work 1
world 1
years 1
Copy the file to Unix
- Open FileZilla and connect to
pe15.cs.umb.edu
You will have to connect using your Unix username and password.
- Go to your it117 directory
- Go to your hw directory
- Create a directory for this exercise
Right-click in the whitespace and create the hw4
directory
- Copy the script to the to
hw4
Click and drag the file from the bottom left panel to the bottom right panel
- Make the script executable
Right-click on the file and select "Permissions" from the menu
Enter 755 in the box provided
Testing on Your Machine
Copy the Script to Unix
- Open FileZilla and connect to
pe15.cs.umb.edu
- Go to your it117 directory
- Go to your hw directory
- Right-click in the whitespace inside the
hw directory
- Enter hw4 in the dialog box
- Click and drag your script from the bottom left panel
to the bottom right panel
- Right-click on the file and select "Permissions" from
the menu
- Enter 755 in the box provided
- This will make the script executable
- Click and drag gettysburg.txt
from the bottom left panel to the bottom right panel
Testing the script on Unix (Optional)
Testing the Script on Unix (Optional)
Copyright © 2022 Glenn Hoffman. All rights reserved. May not be reproduced without permission.