See
Project 4 on pg. 821 for the basic ideas of this project. We will find the
longest anagrams in the words.txt provided in the Chapter 13 files on the
author’s website.
1.
First
we need to compute the canonical form of each word, so compose a static method
public static String canonicalForm(String
word)
so that canonicalForm(“computer”) returns “cemoprtu”,
and canonicalForm(“poor”)
returns “oopr”.
For this first step, put this method in a little program CanonWord.java that inputs one word and outputs its canonical
form:
java CanonWord
Enter
word: computer
cemoprtu
To
implement canonicalForm, program a loop that unloads the
individual characters in the argued string and puts them in an array of char or
ArrayList<Character>. Thus canonicalForm(“poor”)
gets an array of 4 chars ‘p’, ‘o’, ‘o’, ‘r’ or an ArrayList<Character>
with the corresponding Characters. Then sort that array or ArrayList<Character>.
Finally, use the sorted characters to build a new String by a loop of s = s +
c, where s is a String, and c is a char or Character.
The
program CanonWord needs a main method as well as the canonicalForm method.
All it does is use a Scanner to get the word from the user and call canonicalWord to convert it, and print out the resulting
canonical form.
2.
Now write a program FindAnagrams that (in its main method) reads words.txt,
and for each word there, adds it into an ArrayList<String> words and its canonical form into an ArrayList<String> codes. It obtains the canonical form
by calling the same canonicalForm method you
developed for CanonWord. Sort the codes ArrayList. Then hunt
for duplicates in the sorted ArrayList codes. Any
duplicates there will indicate anagrams in the original words. Keep track of
the longest duplicate code you have found so far as you scan codes. Finally
using this longest code, scan the words list for words that have this canonical
form and output them. Don’t worry about
ties at the longest length, just report on first anagram group with the longest
length.
Test
case:
words.txt
codes sorted codes
rat art abt
hears aehrs abtt
share aehrs aehrs
tar art aehrs
bat abt aehrs
batt
abtt
art
shear aehrs art
The
triplet of codes “aehrs” means that the corresponding
words (hears, share, and shear) are anagrams.
Similarly the duplicate of codes “art” indicates that rat and tar are
anagrams. We are interested in the
longest anagram, so the program should output as follows:
java FindAnagrams
hears
share
shear
We
see there are two anagram groups here, one with code aehrs
and the other with code art, so the longest is aehrs,
corresponding to words hears, share, and shear.
For
an example of comparing words next to each other in a sorted list, see line 34
of Vocabulary1.java, pg. 664.
Note
that we are assuming that all the words in words.txt are in the same case,
uppercase or lowercase. The textbook’s words.txt
is all in lowercase.
3.
Write a narrative that describes how you
went about designing and implementing these modifications, how you tested your
program. Did you use the test case during development or did you use the final
words.txt from the start, or what? Did you find it helpful that the canonicalForm computation was separately developed? Report
on what you found to be the longest anagrams in the author’s words.txt from the
Chapter 13 area. Put your discussion in a text file called memo.txt.
Delivery: in your
it115/p4 directory:
CanonWord.java
(with main and canonicalWord methods)
FindAnagrams.java
(with main and same the canonicalWord
method as in CanonWord)
memo.txt