The Wayback Machine - https://web.archive.org/web/20100413131035/http://mpeg.chiariglione.org:80/faq/mp4-sys/sys-faq-sys4gen.htm

MPEG-4 Systems General Issues

1. What is "MPEG-4 Systems"?
2. Why was MPEG-4 Systems developed?
3. Is MPEG-4 Systems finalized ?
4. What functionality does MPEG-4 Systems provide?
5. I am familiar with other MPEG Systems specifications. What is different in MPEG-4 Systems?
6. Is MPEG-4 addressing broadcast, network based, or file based playback?

1. What is "MPEG-4 Systems"?

MPEG-4 Systems is the first part of the MPEG-4 specification. MPEG-4’s official designation is ISO/IEC 14496, and consequently MPEG-4 Systems is officially referred to as ISO/IEC 14496-1. After a balloting process by the member countries of ISO, MPEG-4 Systems attained the status of International Standard which means that no further changes will be made to it. Copies of the specification can be obtained (for a fee) through national standardization organizations affiliated with ISO (e.g., for the US that would be ANSI, the American National Standards Institute).

2. Why was MPEG-4 Systems developed?

The MPEG-4 standard addresses the coded representation of both natural and synthetic (computer generated) audio and visual objects. An audiovisual object is a representation of a natural or synthetic entity that has an audio and/or visual manifestation. Audio-visual objects may be associated with elementary streams that carry time dependent data related to the objects. MPEG-4 Systems was developed to provide the necessary facilities for specifying how audiovisual objects can be composed together in an MPEG-4 terminal to form complete scenes, how a user can interact with the content, as well as how the streams should be managed for transmission or storage.

3. Is MPEG-4 Systems finalized ?

Yes and no. MPEG-4 introduced the notion of versions, another name for Amendment, the official ISO denomination for extensions of standards. New versions add features into the MPEG-4 arsenal. New versions do not obsolete previous Versions, they just add different additional functionality. MPEG-4 Version 1 was finalized in October 1998. A second version has been finalized in December 1999. Other versions are currently under development.

4. What functionality does MPEG-4 Systems provide?

Much of the functionality that MPEG-4 provides comes from the Systems part. As Systems takes care of (among other issues) streams management and scene description, it acts as a ‘wrapper’ to the source coding technology. Systems supports the following basic functions described below:

  • Version 1:
    • A stream management framework encompassing a terminal model for time and buffer management, a coded representation of metadata for the description, identification and logical dependencies of the elementary streams (Object descriptors and other Descriptors), a coded representation of descriptive audio-visual content information (object content information – OCI), a coded representation of synchronization information (sync layer – SL), a multiplexed representation of individual elementary streams in a single stream (FlexMux);
    • a coded representation of interactive audio-visual scene description information (Binary Format for Scenes – BIFS) for presentation purpose. BIFS includes spatio-temporal positioning of audio-visual objects and description of their associated behavior. Individual objects may be natural or synthetic, audio or visual, 2D or 3D, and are coded using tools defined in the Visual and Audio parts of the specification (Parts 2 and 3).
    • an interface to intellectual property management and protection (IPMP) systems.
  • Version 2 :
    • a presentation engine for programmatic control of the player (MPEG-J). MPEG-J defines the format and the delivery of downloadable Java byte code as well as its execution lifecycle and behavior through standardized APIs.
    • A file format (MP4) to contain the media information of an MPEG-4 presentation in a flexible, extensible way to facilitate interchange, management, editing, streaming and presentation of the media.

5. I am familiar with other MPEG Systems specifications. What is different in MPEG-4 Systems?

MPEG-2 Systems specifies a transport layer facility with the MPEG-2 Transport Stream and Program Stream constructs. The MPEG-1 Systems design is essentially identical to the MPEG-2 Program Stream structure. MPEG-4 addresses the coded representation of audio-visual objects, both natural and synthetic. The MPEG-4 Systems layer addresses how these objects are composed together to form a scene (composition information or scene description), as well as how a user may interact with such objects. This scene description feature is the main innovation in terms of Systems expertise. In addition, the object based architecture of MPEG-4 necessitates changes in the way audio-visual information is managed. Therefore, and more in the tradition of the MPEG-1/2 Systems expertise, MPEG-4 Systems provides a flexible architecture for the delivery of the MPEG-4 data. This architecture consists in the Elementary Stream Management framework (ESM) as well as an abstraction of the specific delivery mechanisms referred to as DMIF (Delivery Multimedia Integration Framework).

6. Is MPEG-4 addressing broadcast, network based, or file based playback?

MPEG-4 Systems, and the entire MPEG-4 specification, was developed to accommodate broadcast, interactive network as well as mass storage based playback scenarios. This is achieved by :

  • Abstracting the specifics of the various delivery mechanisms from the representation of the interactive audio-visual information by providing a consistent architecture (the Elementary Stream Management framework) and set of interfaces (DMIF) for accessing the multimedia content. It is therefore possible to use MPEG-4 content in broadcast mode (ex: transmitted along traditionnal MPEG-2 content), in interactive network mode (ex: throught IP/UDP/RTP transport) as well as locally (ex: using a hard disk, CD-ROM or a DVD).
  • Developing multimedia content information representation that allows the creation of rich audiovisual experiences in the various delivery mechanism. The overall walkthrough for consuming such content is always the same : the sender, either distant or local transmits streams containing compressed audio-visual objects and the associated scene description. At the receiver, these streams are demultiplexed when needed and the resulting objects are decompressed, composed according to the scene description, and presented to the end user;
  • The end user may be allowed to interact with the presentation. MPEG-4 provides a broad spectrum of interaction mechanisms exploiting the underlying capabilities of the transport mechanism. Interaction information can be processed locally when contained inside the BIFS scene logic. A rich experience can be already obtained without any upstream information. Interaction information can also be transmitted to the sender when the delivery mechanism allows the set up of upstream information.