HomeSpecificationImplementationPublicationsContact
 

Introduction

In the literature dealing with the terminology issues, a term is intuitively defined as a phrase (typically nominal) that

  • frequently occurs in texts restricted to a specific domain, and
  • has a special meaning in the given domain.

More precisely, terms are linguistic representations of domain-specific concepts. Terms carry "heavier" information load compared to other words and phrases used in a sublanguage, and as such they can be used to:

  • provide support for natural language understanding,
  • correctly index domain-specific documents,
  • identify text phrases to be used for automatic summarisation of domain-specific documents,
  • efficiently skim through documents obtained through information retrieval,
  • identify slot fillers for the information extraction tasks, etc.

It is, thus, essential to build and maintain terminologies as repositories of various information on terms in order to enhance the performance of many natural language applications.

All terms belonging to a specific domain collectively form its terminology. It is often organised into a classification hierarchy. The core of such a hierarchy is based on the general-specific relation, while other relations are used to complete the representation of a specific domain. Concepts are natively assorted into the groups, either classes (where all concepts share a common description) or clusters (as groups of highly correlated concepts), and well-founded terminologies need to reflect this property consistently through terms and their relations. Moreover, terminologies should be extensible so that new terms, representing newly discovered or identified concepts, could be efficiently incorporated into the existing structures by associating them with other terms. These associations should at least include the links between the correlated terms, thus forming the clusters of semantically related terms, and the generalisation of terms sharing the same set of features into appropriate classes. Given a corpus of relevant textual documents, the techniques for automatic term recognition, clustering and classification, may help to automate the process of creating and maintaining a specific terminology.

More information on this subject can be found in Chapter 1 and Chapter 2.