How?
The MaSTerClass system uses the methodology of case-based reasoning (CBR), which is based on remembering specific experiences that may be useful for the problem (case) being solved. It may be viewed as a multi-stage cycle involving the four "re-" (Aamodt, 1995):
- retrieve the most similar case,
- reuse the case to solve the new problem,
- revise the suggested solution, and
- retain the useful information obtained during problem solving.
Therefore, new problems are solved by adapting solutions that provided satisfactory results for similar problems, thus avoiding the need for an explicit model of the problem domain (Watson and Marir, 1994). Instead, only features relevant in the context of the current problem need to be identified. Therefore, CBR makes use of specific (as opposed to generalised) knowledge in both problem solving and learning (Kolodner, 1993). More information on CBR is provided in Chapter 3 and Chapter 4.
Each case in the MaSTerClass system consists of a term occurring in a specific context (description of the problem) and one or more classes that apply to that term occurrence (solution) - see Chapter 6 for more information. The solution part is missing for an unclassified term. The context of an unclassified term is compared to other contexts in order to find the sufficiently similar ones and reuse the classification information supplied with the corresponding classified terms.
The system operates in four stages (see the figure below):
- retrieval (see Chapter 7),
- similarity assessment (see Chapter 7),
- matching (see Chapter 8) and
- voting (see Chapter 8).
First, potentially similar cases are retrieved through rough semantic matching, which uses terminological information and is ontology driven. After the retrieval phase, in which the potentially similar cases are retrieved, the new case is compared to each retrieved case by the SOLD (syntactic, ontology-driven, lexical distance) measure, which makes use of linguistic and domain-specific knowledge. The SOLD measure is based on the concept of the general edit distance measure. The most similar cases are selected for further processing.
In the matching phase, the new case is aligned against other similar cases in order to match the unclassified term in the new case to a classified term in a similar case. The selected cases for which this is not possible are discarded. In the remaining cases the unclassified term is linked to a classified term occurring in a similar context. These cases are used collectively to propose the class(es) for the unclassified term through a voting procedure. Each matching case contributes to the final classification results by delegating the votes to the classes attached to the matched classified term.
A run-through example is given in Chapter 9. A review of the MaSTerClass system is available in OUP Bioinformatics.

