Exercise 2

Objective

To build a Token recognizer for Java programs.

Overview

In Exercise 1 you developed an application that displayed the lexemes in a Java program using Java.io.StreamTokenizer to separate the lexemes. Unfortunately, there is no good way for StreamTokenizer to handle multiple-character operators. So I am providing you with another class, JavaLexemes, which you are to use to implement a new class that parses Java compilation units into Tokens. (JavaLexemes is described below.)

You are to write two public classes. The class that contains main() will be called TestTokenizer. It will use the second class you are to write, called JavaTokenizer, to print the sequence of tokens that make up a Java program.

JavaLexemes

Class JavaLexemes has one constructor, which takes a BufferedReader as an argument. Once the constructor has been called, the method nextLexeme() is used to extract lexemes from the input stream. The return value of nextLexeme() is an int, which will have one of the following values, which are the names of public constants in class JavaLexemes:

Value	Meaning
LT_LEXEME	The public String, `theLexeme` contains the next lexeme in the input stream.
LT_EOF	The input stream has been completely processed.
LT_EOL	The end of an input line has been reached.
LT_SLASHSLASH	A comment starting with // was found. The comment itself is in `theLexeme`.
LT_SLASHASTER	A comment starting with /* was found. The comment itself (including any embedded newline characters) is in `theLexeme`.
LT_SLASHDOC	A comment starting with /** was found. The comment itself (including any embedded newline characters) is in `theLexeme`.
LT_LITERALSTR	The public String, `theLexeme` contains a literal String value.
LT_LITERALCHR	The public String, `theLexeme` contains a literal character value.
LT_UNTERMCOMM	A comment was not terminated properly.
LT_UNTERMSTR	A String literal was not terminated properly.
LT_UNTERMCHR	A character literal was not terminated properly.
LT_INVALIDCHR	The file contains a character that is not part of the Java language outside of a String literal or comment.
LT_ERROR	There was an internal error in nextLexeme(). (Indicates a bug in my code!)
LT_NOTHING	Defined, but not used.

Class JavaLexemes also provides a public int named lineNum that tells which tells the current line number in the source file.

You can download a zip file containing JavaLexemes.class and a few classes that it uses internally by clicking on the link below:

[ JavaLexemes.zip ]

You can also download the source code for the Java program that I used during the development of JavaLexemes from this link:

[ TestJavaLexemes.java ]

The first thing you should do is to make sure you can run the code I am supplying. Set up a project directory, extract the class files from JavaLexemes.zip into it, and copy TestJavaLexemes.java into it. Compile TestJavaLexemes.java, and make sure the program works. If you give the program the names of .java files on the command line, it will print each lexeme in each source file, with line numbers. It can tell whether a lexeme is a /* comment, a /** comment, a // comment, a string literal, a character literal, or a "generic" lexeme and prints an inidcator of the type of lexeme just before each one ( </*>, </**>, <//>, <">, <'>, or <lex> ).

You don't actually have to extract the class files from the zip file. Instead, you could give this command:
      C:\> set CLASSPATH=.;JavaLexemes.zip
  
After this, java and javac will be able to find class files that are located in either the current directory ( . ) or inside JavaLexemes.zip.

Once you have verified that you can use my class, JavaLexemes successfully, you can start working on the code for this exercise. To get you started, I am supplying three source files:

[ TestTokenizer.java ]
This is a "quick and dirty" program that creates a JavaTokenizer, and prints each string returned by JavaTokenizer.next().
[ JavaTokenizer.java ]
This is the outline for the class that converts lexemes into tokens.
[ JavaToken.java ]
This is the class that defines the data type for the value returned by JavaTokenizer.next().

If you compile and run the code I am supplying, it should print all the lexemes in the source file you name on the command line. Your job is to get it to print each token in the source file, including the type of token and its value (if it has one.)

Be sure to document your code in proper javadoc style. I've started to do this for you, but have not done a complete job. You can see the documentation for JavaLexemes, JavaTokenizer, and JavaToken by clicking on [ this link ].

Extra Credit

I am offering extra credit for work that you do to implement the equivalent of the JavaLexemes class that I supplied to you. Your code must be carefully tested and documented, but it does not have to be complete to receive partial credit. Two people may work together for this project, but no more than that, and the extra credit will be split between the two people submitting the code. Submit your code by emailing your JavaLexemes.java file, inside a zip file, to me no later than midnight, May 2 (our first class day after Spring Break). Tell me how far you got in your email message so I'll know what to look for (and what not to try!). A maximum of 10 points of extra credit are available for this work. If two people work together, you can tell me how to split the points if they are not to be split equally between you.