CSIT115 Class 23

Final Exam: Thurs, May 23, 6:30-9:30, W-1-048

We’ll try to start at 6:15

Last time: equals method for our own objects, so ArrayList’s indexOf works right when its elements are of our object type.

For example, for ArrayList<Point>, it’s good for Point.java to have an equals method like the one shown on pg. 577, and in the “fancy” Point.java in the chapter 8 sources online. In that case we can use indexOf, contains, and lastIndexOf and they will work properly.

Note that the “Basic ArrayList methods” listed on pg. 634 never use equals for the elements, so these will work even if equals is not implemented correctly. It’s only the more advanced ones, listed on pg. 636, that use equals.

For example, list.contains(“hello”), where list is an ArrayList<String>, calls element.equals(“hello”) for each element String in list, and returns true if equals returns true for any of the elements.

To prove this really affects what happens in a program, consider this test program: PointMain1.java

// A program that deals with 2D points.

// Fifth version, to accompany Point class with toString method.

import java.util.*;

public class PointMain1 {

public static void main(String[] args) {

// create two Point objects

Point p1 = new Point(7, 2);

Point p2 = new Point(4, 3);

// print the points

System.out.println("p1 is " + p1);

System.out.println("p2 is " + p2);

// Put them in an ArrayList: ß added to old PointMain.java

ArrayList<Point> pts = new ArrayList<Point>();

pts.add(p1);

pts.add(p2);

// print them in ArrayList:

System.out.println(pts);

// Find distance from origin of first pt in pts:

System.out.println("dist = " + pts.get(0).distanceFromOrigin());

// Find (4,3):

System.out.println("(4,3) is at index " + pts.indexOf(new Point(4,3)));

System.out.println("(4,3) is contained in pts? " + pts.contains(new Point(4,3)));

System.out.println("(4,3) equals p2? " + p2.equals(new Point(4,3)));

}

If you run this program in the same directory as Point.java with equals implemented properly, we get output:

$ java PointMain1

p1 is (7, 2)

p2 is (4, 3)

[(7, 2), (4, 3)]

dist = 7.280109889280518

(4,3) is at index 1 ßexpected answer that (4,3) is at index 1

(4,3) is contained in pts? true ß contains found (4,3) in the list

(4,3) equals p2? true ß this shows how the match was determined

However if you run this same program in the same directory as Point.java that has no equals method, we get different output:

$ java PointMain1

p1 is (7, 2)

p2 is (4, 3)

[(7, 2), (4, 3)]

dist = 7.280109889280518

(4,3) is at index -1 ß -1 result means indexOf could not find (4,3) in the list

(4,3) is contained in pts? false ßcontains can’t find (4,3) either

(4,3) equals p2? false ß the reason indexOf and contains fail to find it: equals returns false

So you see that equals doesn’t do its job of checking that the x’s match and the y’s match unless we code that in the equals method.

Class Exercise on ArrayList<ITSystem> for p3 warmup--see linked pages.

Using sorts to speed up heavy-duty jobs.

We previously tackled the following problem (class 19):

An IT example.

Suppose we have assembled two files of usernames as follows:

Users_weak_pws.dat: usernames of users with weak passwords, obtained by trying all “easy” passwords to login, a standard test. We are given this file, the result of such testing.

Users_priv.dat: usernames of users with system privileges, obtained from system information.

We want to make a list of privileged users with weak passwords, these being the most dangerous to the health of the system. In other words, we want a list of users who show up on both given lists.

How can we do this?

We can use Linux commands “sort” and “join”, as follows:

sort users_weak_pws.dat > users_weak_sorted.dat

sort users_priv.dat > users_priv_sorted.dat

join users_weak_sorted.dat users_priv_sorted.dat

Now of course this is not part of this course, but eventually you will learn such tricks in Linux Systems Admin courses. More to the point, we can do this using Java, ArrayLists, and contains, as follows:

Read users_weak_pws.dat into ArrayList<String> weakOnes, following the code we used for words.

Read users_priv.dat into ArrayList<String> privOnes, following the code we used for words.

Here is our previous strategy (class 19):

For a username found in privOnes, see if it occurs also in weakOnes by weakOnes.contains(username), and if so print it out.

Do this for each username in privOnes.

This is an effective algorithm, but it is not fast. Each contains must look at up to n elements, if n is the size of weakOnes list, and this is done for each of m names in the privOnes list, so a total of n*m steps. If n = m = 1000, then that’s a million steps.

If we sort the lists first, then we can more efficiently match up the ones that occur in both lists. The elementary steps involve comparing two Strings and seeing if they are equal, before in String order, or after in String order.

Example: one-character usernames, for simplicity: left-hand list A .. H is weakOnes, A..M is privOnes.

A A

B C

F D

G F

H I

Here are the String compares needed:

1. A—A match, output, step to next in both lists

2. B—C no match, step past smaller one only, B to F in weakOnes

3. F—C no match, step C to D in privOnes

4. F—D no match, step D to F in privOnes

5. F—F, match, output, step to next in both lists

6. G—I no match, step G to H in weakOnes

7. H—I no match, step H to end-of-list of weakOnes, done

This locates and outputs the two users with weak passwords and privileges, namely A and F. (This is what the Linux command “join” can do, working on two sorted files.)

The corresponding code is just a recasting of getOverlap, pg. 668. Here weakOnes can be list1 and privOnes list2 or vice versa. Or you can recast the above example to comparing two vocabularies to find their overlap.

It’s much faster than the contains-based algorithm for large lists. For 1000 on both lists, it takes several thousand steps, compared to about a million for the previous algorithm. Of course even a million steps will execute quickly. You really have to have say a million entries, and see then that the first algorithm would have a trillion steps vs. the second several million.