This document describes requirements for the Extensible MultiModal Annotation language (EMMA) specification under development in the W3C Multimodal Interaction Activity. EMMA is intended as a data format for the interface between input processors and interaction management systems. It will define the means for recognizers to annotate application specific data with information such as confidence scores, time stamps, input mode (e.g. key strokes, speech or pen), alternative recognition hypotheses, and partial recognition results, etc. EMMA is a target data format for the semantic interpretation specification being developed in the Voice Browser Activity, and which describes annotations to speech grammars for extracting application specific data as a result of speech recognition. EMMA supercedes earlier work on the natural language semantics markup language in the Voice Browser Activity.
Status of this Document
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.
W3C's Multimodal Interaction Activity is developing specifications for extending the Web to support multiple modes of interaction. This document provides the basis for guiding and evaluating subsequent work on a specification for a data format (EMMA) that acts as an exchange mechanism between input processors and interaction management components in a multimodal application. These components are introduced in the W3C Multimodal Interaction Framework.
This document is a NOTE made available by the W3C for archival purposes, and is not expected to undergo frequent changes. Publication of this Note by W3C indicates no endorsement by W3C or the W3C Team, or any W3C Members. A list of current W3C technical reports and publications, including Recommendations, Working Drafts, and Notes can be found at http://www.w3.org/TR/.
This document has been produced as part of the W3C Multimodal Interaction Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Multimodal Interaction Working Group (W3C Members only). This is a Royalty Free Working Group, as described in W3C's Current Patent Practice NOTE. Working Group participants are required to provide patent disclosures.
Please send comments about this document to the public mailing list: [email protected] (public archives). To subscribe, send an email to <[email protected]> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe).
Table of Contents
- Introduction
- 1. Scope of EMMA
- 2. Data model requirements
- 3. Annotation requirements
- 4. Integration with other work
Introduction
Extensible MultiModal Annotation language (EMMA) is the markup language used to represent human input to a multimodal application. As such, it may be seen in terms of the W3C Multimodal Interaction Framework as the exchange mechanism between user input devices and the interaction management capabilities of an application.
General Principles
An EMMA document can be considered to hold three types of data:
- instance data
The slots and values corresponding to input information which is meaningful to the consumer of an EMMA document. Instances are application-specific and built by input processors at runtime. Given that utterances may be ambiguous with respect to input values, an EMMA document may hold more than one instance. - data model
The constraints on structure and content of an instance. The data model is typically pre-established by an application, and may be implicit, that is, unspecified. - metadata
Annotations associated with the data contained in the instance. Annotation values are added by input processors at runtime.
Given the assumptions above about the nature of data represented in an EMMA document, the following general principles apply to the design of EMMA:
- The main prescriptive content of the EMMA specification will consist of metadata: EMMA will provide a means to express the metadata annotations which require standardization. (Notice, however, that such annotations may express the relationship among all the types of data within an EMMA document.)
- The instance and its data model is assumed to be specified in XML, but EMMA will remain agnostic to the XML format used to express these. (The instance XML is assumed to be sufficiently structured to enable the association of annotative data.)
The following sections apply these principles in terms of the scope of EMMA, the requirements on the contents and syntax of data model and annotations, and EMMA integration with other work.
- EMMA must be able to represent the following kinds of input:
- 1.1 input in any human language
- 1.2 input from the modalities and devices specified in the next section
- input reflecting the results of the following processes:
- 1.3 token interpretation from signal (e.g. speech+SRGS)
- 1.4 semantic interpretation from token/signal (e.g. text+NL parsing/speech+SRGS+SI)
- input gained in any of the following ways:
- 1.5 single modality input
- 1.6 sequential modality input, that is: single-modality inputs presented in sequence
- 1.7 simultaneous modality input (as defined in the main MMI requirements doc).
- 1.8 composite modality input (as defined in the main MMI requirements doc).
- EMMA must be able to represent input from the following modalities, devices and architectures:
- human language input modalities
- 1.9 text
- 1.10 speech
- 1.11 handwriting
- 1.12 other modalities identified by the MMI Requirements document as required
- 1.13 combinations of the above modalities
- devices
- 1.14 telephones (i.e. no device processing, proxy agent)
- 1.15 thin clients (i.e. limited device processing)
- 1.16 rich clients (i.e. powerful device processing)
- 1.17 everything in this range
- known and foreseeable network configurations
- 1.18 architectures
- 1.19 protocols
- 1.20 extensibility to further devices and modalities
- Representation of output and other uses
EMMA is considered primarily as a representation of user input, and it is in this context that the rest of this document defines the requirements on EMMA. Given that the focus of EMMA is on meta information, sufficient need is not seen at this stage to define standard annotations for system output nor for general message content between system components. However, the following requirement is included to ensure that EMMA may still be used in these cases where necessary.
- 1.21 The following uses of EMMA must not be precluded:
- a representation from which system output markup may be generated;
- a language for general purpose communication among system components.
- Ease of use and portability
- 1.22 EMMA content must be accessible via standard means (e.g. XPath).
- 1.23 Queries on EMMA content must be easy to author.
- 1.24 The EMMA specification must enable portability of EMMA documents across applications.
- Data model content
The following requirements apply to the use of data models in EMMA documents
- 2.1 use of a data model and constraints must be possible, for the purposes of validation and interoperability
- 2.2 use of a data model will not be required
- in other words, it must be possible to rely on an implicit data model.
- 2.3 it must be possible in a single EMMA document to associate different data models with different instances
It is assumed that the combination and decomposition of data models will be supported by data model description formats (e.g. XML Schema), and that the comparison of data models is enabled by standard XML comparison mechanisms (e.g. use of XSLT, XPath). Therefore this functionality is not considered a requirement on EMMA data modelling.
- Data model description formats
The following requirements apply to the description format of data models used in EMMA documents
- 2.4 existing standard formats must be able to be used, for example:
- arbitrary XML
- XML Schema
- XForms
- 2.5 no single description format is required
The use of a data model in EMMA is for the purpose of validating an EMMA instance against the constraints of a data model. Since Web applications today use different formats to specify data models, e.g. XML Schema, XForms, Relax-NG, etc., the principle that EMMA does not require a single format enables EMMA to be used in a variety of application contexts. The concern that this may lead to problems of interoperability has been discussed, and will be reviewed during production of the specification. - 2.6 data model declarations must be able to be specified inline or referenced
- Annotation content
EMMA must enable the specification of the following features. For each annotation feature, "local" annotation is assumed: that is, that the association of the annotation may be at any level within the instance structure, and not only at the highest level.
- General meta data
- 3.1 lack of input
- 3.2 uninterpretable input
- 3.3 identification of input source
- 3.4 time stamps
- 3.5 relative positioning of input events
(NB: This requirement is covered explicitly by time stamps, but reflects use of EMMA in environments in which times tamping may not be possible.) - 3.6 temporal grouping of input events
- 3.7 human language of input
- 3.8 identification of input modality
- Annotational structure
- 3.9 association to corresponding instance element annotated
- 3.10 reference to data model definition
- 3.11 composite multimodal input: representation of input from multiple modalities.
- Recognition (signal --> tokens processing)
- 3.12 reference to signal
- 3.13 reference to processing used (e.g. SRGS grammar)
- 3.14 tokens of utterance
- 3.15 ambiguity
This enables a tree-based representation of local ambiguity. That is, alternatives are expressible for given nodes in the structure. - 3.16 confidence scores of recognition
- Interpretation (tokens --> semantic processing)
- 3.17 tokens of utterance
- 3.18 reference to processing used (e.g. SRGS)
- 3.19 ambiguity
- 3.20 confidence scores of interpretation
- Recognition and Interpretation (signal --> semantic processing)
- 3.21 union of Recognition/Interpretation features, (e.g. SRGS + SI)
- Modality-dependent annotations
- 3.22 EMMA must be extensible to annotations which are specific to particular modalities, e.g. those of:
4.1 Where such alignment is appropriate, EMMA must enable the use and integration of widely adopted standard specifications and features. The following activities are considered most relevant in this respect:
- W3C activities
- MMI activities
- MMI general requirements
- Events subgroup requirements
- Integration subgroup requirements
- Ink subgroup requirements
- Voice Browser activities
- SRGS: EMMA must enable results from speech using SRGS
- SI: EMMA must enable results from speech using SRGS with SI output
- Other W3C activities
- Relevant XML-related activities
- RDF working group
- Other organizations and standards