Handed out: 11/5/07
Note. Read this entire assignment before beginning to work on it.
We studied the edit distance problem in class. I will repeat the problem definition and the basic facts that we derived.
The problem |
---|
A basic edit operation on a string is either to insert a character anywhere, to delete any character, or to replace one character by another. You are given two strings x and y. You want to know the smallest number of basic edit operations that can be used to change x into y. Define that to be distance(x, y). For example, distance("cat", "cute") = 2, since you can change "cat" into "cute" by performing two basic edit operations. (Replace the a by u, then insert e at the end.) |
Prefixes of strings |
---|
Suppose that x and y are two strings. Define x(i) to be the length i prefix of x, and y(j) to be the length j prefix of y. For example, if x is "kangaroo", then x(5) is "kanga", x(3) is "kan" and x(0) = "" (an empty string). |
Facts |
---|
Suppose that the characters in strings x and y are numbered starting at 1. So the first character of x is x1, the second character is x2, etc. We derived the following facts in class.
|
A straightforward algorithm is to use the above facts in a recursive function definition. Here is a definition of the distance function written in Cinnameg. The idea is that dist(x, y, i, j) produces distance(x(i), y(j)).
Define case dist(x,y,i,j) = j when i == 0 = i when j == 0 = dist(x, y, i-1, j-1) when x#i == y#j = 1 + min[dist(x, y, i-1, j-1), dist(x, y, i-1, j), dist(x, y, i, j-1)] %Define Define distance(x,y) = dist(x, y, length(x), length(y)).
Algorithm 1 has the problem that it recomputes the same thing many times. That makes it very slow. Recomputation can be avoided by writing things down, and using the results that you wrote down earlier rather than recomputing them. You can create a two-dimensional array (a grid) d where di, j will be set to hold distance(x(i), y(j)). If you store values in the d array in the right order, (such as by rows, from row 0 to row n, left-to-right within a row) you will find that the values you need have always been computed earlier, and you just get them out of the array. Just
Define a class, DistanceSolver, that has two private variables (x and y) of type String. Create a constructor that takes two strings as parameters, and installs them as the values of x and y. The idea is that an object of class DistanceSolver knows how to compute the distance between strings x and y.
Note. Class DistanceSolver will not be public. Just start it with
class DistanceSolver
Implement Algorithm 1 in Java, as a method in class DistanceSolver. Call your method distance1. Notice that it will not take any parameters, since it will get strings x and y from the variables. You will need a helper method that takes two integer parameters i and j. Just translate the Cinnameg program to Java, but omit parameters x and y, since they are obtained from the variables in the class. Make the helper method private.
You can use method Math.min(a,b) to get the smaller of a and b. For example,
int r = Math.min(4,2);makes r = 2.
Create another (public) class that holds the main method, and make that program test your distance1 method. For example, a start is as follows.
public static void main(String[] args) { DistanceSolver solver = new DistanceSolver("cat", "cute"); System.out.println("The distance between \"cat\" and \"cute\" is " + solver.distance1()); }(Writing \" in a string constant causes the quote character to be put in the string.) Try at least two examples. Test this. Do not proceed until you believe that distance1 is working correctly.
Implement Algorithm 2 in Java, as a method called distance2 in class DistanceSolver. Do not remove Algorithm 1. Leave your distance1 function in your program. Just add this new one.
Change your main method to call distance2. Test it. Do not proceed until it works.
Comment out your main method that performs tests. Do not remove it. (Just put // in front of each line to make it a comment.)
Write a new main method that gets two strings
x and
y from the user using
You can use Java method Integer.parseInt(s) to convert from a string to an integer. For example,
int r = Integer.parseInt("35");makes r = 35. Alternatively, just read the method as a string and compare strings. Important. Do not test whether two strings are equal using operator ==. Expression u.equals(v) is true if strings u and v are equal. Test your program before continuing.
Try your algorithms on some long strings. Your algorithms will take the most time when the two strings do not contain any characters in common. For example, compute the edit distance between abcdefg and hijklmn. Next try abcdefghijklm and nopqrstuvwxyz. Which algorithm appears to be more efficient? Is the difference significant? What do you think will happen if you use two strings of length 30?
Submit a printed version of the program. On the printed version, write (by hand) about how long algorithm 1 and algorithm 2 took to find the distance between abcdefghijklm and nopqrstuvwxyz. (Just time it by hand. If the time is too fast to notice, say "too short to tell".
A string has type String. Characters are numbered from 0 in Java. You can get the i-th character of a string x (numbering from 0) using Java expression x.charAt(i). So Cinnameg expression x#i is equivalent to Java expression x.charAt(i-1) (since Cinnameg numbers from 1).
You can get the length of string x using x.length( ). For example, "abc".length( ) = 3.
For algorithm 2, you need to use a two-dimensional array. A two dimensional array of integers has type int[][]. So
int[][] d;creates a variable d that can refer to a two-dimensional array of integers. But it does not actually create the array. To build an actual array, and make d refer to it you can write
d = new int[m][n];where m is the desired number of rows and n is the desired number of columns. For example,
d = new int[2][3];creates an array that looks as follows, where Each slot can hold one integer.
To use the value at row i and column j in array d, write d[i][j]. But remember that numbering is from 0, so the first row of the array is row 0, and the first column is column 0.
A two-dimensional array is really just an array of arrays. It is an array of its rows, and each row is an array of columns. If d is a two-dimensional array, then d.length is the number of rows, and d[0].length is the number of columns in row 0. Typically, all of the rows have the same number of columns, but it is possible to create a two-dimensional array where different rows have different sizes. (Can you see how to do it?)
The Java library has been built up over time, and has different sections that were contributed by different groups, or at different times. One part is the Swing library, which helps you do graphics, and is popular and easy to use. To use Swing, you should put
import javax.swing.*;in your program. Java statement
String xstr = JOptionPane.showInputDialog("What is the first string?");causes a message box to pop up that contains the text "What is the first string?", and also has a box where the user can type a string. It will set xstr to the string that the user types.
You can also just show a message without asking for information. If msg is a string that you want to show, then statement
JOptionPane.showMessageDialog(null, msg);will pop up a box showing message msg, with an ok button to allow the user to say when he or she is done reading the message.
Important. If you use graphics, then you need to shut down the graphics support when the program is done. At the end of main, add statement
System.exit(0);to shut everything down. If this statement is performed anywhere in a program, the entire program is stopped. You cannot do anything after this statement.