HomeSpecificationImplementationPublicationsContact
 

Downloads

All relevant files have been archived into MaSTerClass.zip. The links to some of the files given below are provided for illustration and do not need to be downloaded separately.

Folder Description Note
buildFiles Contains the build files that are necessary to run MaSTerClass through Ant (which is distributed and installed with Java WSDP). Make sure that you change the classpath information within these files so as to match the locations of the jwsdp, jaxp and xhive folders on the computer used to compile and run MaSTerClass.
docs Contains automatically generated API documentation together with the ReadMe file with the instructions how to set up MaSTerClass. ReadMe.txt
input Contains three folders, which are used to specify and store the input data depending on the mode: the referent, testing and validation folders are used for the normal, validation and testing modes respectively. The mode is specified by the value of the MODE variable (see the MaSTerClass.java file). All input sentences are enumerated and stored within the sentences folder. The enumeration starts from 0. Each sentence is stored in a separate xml file and named according to the following regular expression: (R|S|T)<number>.xml. These sentences are structured according to the retrieved.xsd schema.

The total number of input sentences is stored in the Total.txt file.

The number of the sentence from which the processing should start is stored in the Start.txt file. This number is usually set to zero to indicate that all sentences should be processed.

Individual sentences can be deprecated (i.e. exempt from the processing) by storing their numbers (one number per line) in the Deprecated.txt file.

Finally, the terms to be classified in each input sentence are specified by their positions within the sentence, i.e. by the number of the chunk (the first one being zero) which corresponds to the term in question. These positions are stored in the Positions.txt file (one position per line, whose number matches that of the corresponding sentence).

The MaSTerClass distribution comes with its own testing and validation sets, which can be used or replaced as needed. In any case, the given sets can be used to illustrate how the input should be specified. The specification of input sentences should be simplified in the future versions of the system. Moreover, once the GUI is implemented, these details will be hidden from the user altogether.
logFiles The processing results are logged inside this folder. The output file: run.log

The following information is logged:
  • input sentence
  • term to be classified and its relative position
  • key processing stages of the retrieval process, e.g. retrieving term (verb) classes and their members
  • retrieval summary: total number of retrieved sentences and retrieval time
In addition, for each sufficiently similar sentence the following information is displayed:
  • retrieved sentence
  • similarity value
If the term to be classified in the input sentence is matched to a classified term in the retrieved sentence, then the following information is provided:
  • an optimal edit script (possibly more than one)
  • the corresponding alignment of the two sentences
  • the matched classified term and its classes
After attempting to align all sufficiently similar sentences, the classification summary is given:
  • voting results, i.e. the number of votes each class received
  • suggested classes (based on the voting results)
  • actual classes are also given if in the validation or testing mode
  • total sentence processing time
rdb Databases containing the ontologies. A relational database (RDB) is used as part of the case-base to store the general knowledge about the domain. The structure of the RDB is described in Chapter 6. We used a portion of the UMLS ontology to store some general knowledge in the biomedical domain. Two Access databases are supplied with the MaSTerClass distribution:
  • ontology.mdb, which should be used in the normal mode
  • evaluationOntology.mdb, which should be used in the validation and testing modes (in conjunction with the given validation and testing sets)
These RDBs can be replaced with your own information. Other ontologies may be used as long as you re-format them so as to fit the given structure (see Chapter 6).
schema XML schemas describing the structure of the corpus, retrieved sentences and the classification results. XML Schema has been used to specify these structures in the following schemas: corpus.xsd, retrieved.xsd and results.xsd. The last schema is used internally by the system. However, the other two schemas are used to annotate the corpus (which is a part of the case-base) and input sentences in XML. In order to use the MaSTerClass system you need to POS-tag and chunk your corpus and annotate the results using the given schemas. The explanation of specific tags is given in Chapter 6.
tmpFiles Temporary files used during the processing. Nothing to worry about here.
tools Auxiliary Java classes used for IO operations. Nothing to worry about here.
xmlresults Java classes automatically generated from results.xsd by Java WSDP. Nothing to worry about here.
xmlretrieved Java classes automatically generated from retrieved.xsd by Java WSDP. Nothing to worry about here.
File Description Note
MaSTerClass.java The main Java class. Compile and run this program in order to use the MaSTerClass system.
comp.bat The compilation script. Windows users can use this script to compile the program. First, make sure that the classpath information is specified correctly in buildFiles\Cbuild.xml as described above. To compile the program from the command line go to the MaSTerClass directory, type comp and press ENTER. Check logFiles\comp.log to see if the compilation succeded.
run.bat The run script. Windows users can use this script to run the program. First, make sure that the classpath information is specified correctly in buildFiles\Rbuild.xml as described above. To compile the program from the command line go to the MaSTerClass directory, type run and press ENTER. Check logFiles\run.log to see the results.
COPYRIGHT.TXT The copyright notice. COPYRIGHT.TXT