Berkeley Multimedia Research Center
Published: April 1995
Berkeley, CA
USA
http://www.bmrc.berkeley.edu
Video Compression for Desktop Applications
Sections
1.1 Introduction
1.2 Current State of the Art
1.2.1 MPEG on Every Desktop
1.2.2 Motion JPEG for Editing
1.2.3 H.261 for Video Conferencing
1.2.4 What's the User to Do
1.3 Research Problems
1.3.1 Multiple Format Stored Representations
1.3.2 Perceptual Coding
1.3.3 Multiple CPU/Chip Implementations
1.3.4 Continuous Media Infrastructure
1.4 Wireless Audio/Video Compression
1.5 Conclusions
1.6 References
1 Video Compression for Desktop Applications
Lawrence A. Rowe
University of California, Berkeley, CA, USA
This paper discusses the current state of compression for digital
video on the desktop. Today there are many choices for video compression
that yield different performance in terms of compression factor, quality,
bitrate, and cost. Users want a single low cost solution which, unfortunately,
today is non-existent. Consequently, users will have to develop applications
in an environment with multiple representations for digital video unless
PC's can be assigned to dedicated applications. Alternatively, programmable
compression/decompression boards can be used to solve the problem. Eventually,
special-purpose hardware solutions will be replaced by general-purpose
software running on desktop parallel processors which will be implemented
by multiple CPU's per chip.
1.1 Introduction
1.2 Current State of the Art
Back to Top
This paper presents my opinion of the current state of the
art for compression for desktop digital video applications. Put simply,
there are too many compression algorithms and standards and too few low-cost
boards that implement the major standards.
1.2 Current State of the Art
1.1 Introduction
1.3 Research Problems
Back to Top
There are numerous video compression algorithms including:
Apple's Roadpizza, Supermac's CINEPACK, Fractals, H.261, Intel's INDEO,
motion JPEG (MJPEG), MPEG-1, MPEG-2, Sun's CELLB, and Wavelets. Users are
confused by all these choices. They want to know which technology to use
so they can make intelligent investment decisions.
Unfortunately, the current situation is not very good because there
is no single technology that can be used for all applications. For example,
Apple's Roadpizza and Supermac's CINEPACK are designed for playback applications
with software-only decoding, H.261 is designed for video teleconferencing,
MPEG-1 is designed for low bitrate (e.g., 1.5Mbs) audio and video playback
applications, and MPEG-2 is designed for high bitrate, high quality playback
applications with full-sized images (e.g., CCIR 601 with studio quality
at 4-10 Mbits/sec).
Users want one solution, but one solution does not exist. In the next
couple of years, I see the following trends.
1.2.1 MPEG on Every Desktop
1.2 Current State of the Art
Low cost MPEG-1 decoder chips will be on every desktop. Add-in
boards cost around $350 today, and the next generation multimedia PC will
have audio and video decoder chips on the motherboard. Manufacturers of
video games and CD-ROM titles will use MPEG-1 video to add excitement to
their products.
MPEG hardware for workstations will be less readily available and more
costly because these manufacturers can provide creditable software-only
decoders for MPEG. Early experiments on software-only MPEG decoding showed
that small-sized images (e.g., QCIF which is 160x120) can be decoded in
real-time and medium-sized images (e.g., CIF which is 320x240) can be decoded
in near real-time (16 fps compared to 24 fps) on RISC processors [Rowe93].
Subsequent work by DEC showed that tuning the decoder to a specific processor
can achieve real-time decoding of CIF images [Ho94].
Recently, HP released a software-only MPEG audio and video decoder for
their HP Snake processors that runs in real-time on CIF images [Lee94].
The HP software uses special-purpose instructions added to the architecture
that speedup Huffman decoding and 8-bit arithmetic operations (using saturation
arithmetic). And, they use hardware to convert YCRCB to RGB and dither
to an 8-bit color map. Color space conversion was done in software in the
other cases which can be as much as 30% of the computation. Nevertheless,
the HP software is impressive.
These experiments illustrate that software-only decoders will eventually
replace all hardware decoders. I believe that it will be at least 4-6 years
before hardware decoders for MPEG-1 are out-dated. By that time, hardware
decoders for MPEG-2 which supports higher quality video and audio at higher
bitrates will be widely available. Some users will upgrade to higher quality
rather than continue with low quality at no cost. A general-purpose processor
capable of MPEG-2 decoding on full-sized images (e.g., 640x480 or 768x576)
will require multiple processors.
The biggest problem with MPEG is the cost of encoders. High quality,
real-time encoders cost between $50K and $500K. Almost all high end encoders
use parallel processors, either general-purpose supercomputers (e.g., IBM)
or custom-designed video processors (e.g., CCube). Lower quality real-time
encoders for PC platforms that use fewer processors cost around $20K (e.g.,
FutureTel, Optibase, Optivision, etc.). While the cost of these low end
systems will decline over the next couple of years, they will still be
too expensive for most users.
1.2.2 Motion JPEG for Editing
1.2 Current State of the Art
Non-linear video editors are typically used in broadcast TV,
commercial post production, and high-end corporate media departments. Low
bitrate MPEG-1 quality is unacceptable to these customers, and it is difficult
to edit video sequences that use inter-frame compression. Consequently,
non-linear editors (e.g., AVID, Matrox, FAST, etc.) will continue to use
motion JPEG with low compression factors (e.g., 6:1 to 10:1).
Motion JPEG compression has also been used in some desktop video conferencing
applications (e.g., Insoft) because affordable workstation boards that
support real-time encoding and decoding have been available. Typical boards
cost $4K to $10K. Motion JPEG boards are now being sold for PC's that cost
$1K to $4K.
1.2.3 H.261 for Video Conferencing
1.2 Current State of the Art
Video conferencing has been an active research and product
area for many years. Although most commercial room-sized conferencing systems
use proprietary standards, they are now adopting the H.261 ITU standard
for video conferencing*. Moreover, most desktop video conferencing systems
are using H.261 (e.g., AT&T, Compression Labs, Intel, PictureTel, etc.).
Most of these systems use ISDN lines, although a few are starting to support
packet-switched networks. And, several research laboratories are developing
software that uses H.261 boards on PC's and workstations.
1.2.4 What's the User to Do
1.2 Current State of the Art
What is the user to do who wants to provide ubiquitous digital
video, that is, video in all applications including email, documents, conferencing,
hypermedia courseware, and databases? Users have two choices:
Select one compression standard and try to acquire applications that
will use it.
Acknowledge that you need support for multiple compression standards.
My opinion is that users will have to make the second choice
which means either a programmable compression/decompression board or multiple
compression boards. Programmable boards exist, but they are not widely
available, and they are expensive. In addition, vendors do not yet provide
microcode for the variety of compression standards needed, but I believe
that eventually the software will be readily available and relatively inexpensive.
The question is will the software be available for programmable boards
before parallel processors for desktops are available that can run general-purpose
software.
*Actually, H.261 is just the video standard. A video conferencing system
must also support the appropriate audio standards (e.g., G.72x) and system-level
standards.
In the meantime, users must develop applications that are open so that
new compression technology can be introduced and so that real-time conversion
is supported. For example, Quicktime from Apple and Video-for-Windows from
Microsoft are the dominant storage systems for PC video. Both systems support
multiple compression standards.
Better support is needed in applications to convert between different
representations because most applications are closed. For example, a desktop
video conferencing system should allow video transmitted in H.261 format
to be converted to an MPEG stream so that PC users can view remote presentations.
1.3 Research Problems
1.2 Current State of the Art
1.4 Wireless Audio/Video Compression
Back to Top
This section discusses some possible research problems. Some
researchers argue we need improved compression technology such as wavelet-based
algorithms. Except in the case of wireless communication discussed below,
I disagree. I believe that research should be directed to improving the
existing technologies and developing improved implementations, systems
infrastructure, and applications. Unless a new technology can provide significantly
better performance (i.e., at least 2:1 improvement in space) than the current
JPEG, MPEG, and H.261 standards, users will be better served by improving
the existing techniques and applications.
Some proposed compression standards provide other services such as multiresolution
sequences (i.e., different applications can request different sized images
at different bitrates from the same compressed representation) and variable
quality (i.e., different quality at different bitrates). While these features
are reasonable to request, I do not believe you need a completely different
compression technology to support them. The MPEG-2 standard has provisions,
albeit somewhat controversial, for image size, quality (S/N ratio), and
frame rate scalability. I believe it makes more sense to develop the technology
supporting these standards than it does to propose a completely different
technology unless you get the compression improvement mentioned above.
1.3.1 Multiple Format Stored Representations
1.3 Research Problems
Suppose you wanted to develop a video server for a heterogenous
computing environment that included desktop computers with different decompression
capabilities (e.g., motion JPEG, H.261, and MPEG-1). The problem is what
representation do you store. You could store one of these representations
and then provide a real-time transcoder somewhere on the network that will
convert between the different representations. Another alternative is to
store a representation that makes it easy to generate any of these sequences.
For example, there are differences in the block and macroblock structure
of these streams, but it should be possible to devise a stored representation
that can easily generate any of the representations. Here are a couple
ideas:
Store several motion vectors for a macroblock. For example, MPEG vectors
can be arbitrary far away from the origin of the source block, they can
be on half-pixel boundaries, and, in the case of B frames that can be forward,
backward, or an average of a forward and backward block. H.261 motion vectors
can only be +/- 15 pixels, they cannot be on half-pixel boundaries, and
they can only be backward blocks. So, the idea is to store two motion vectors
for blocks whose MPEG vector is not valid for H.261 and select the appropriate
one when constructing the stream to be transmitted.
Store the huffman encoded representations of frames and create the rest
of the stream syntax on the fly. For example, an H.261 stream can skip
up to 2 frames between every frame displayed and although there is a requirement
to refresh every block within some number of frames, there is no requirement
to include the equivalent of a complete frame (i.e., an MPEG I-frame).
The H.261 stream could be easily generated from an appropriate MPEG-like
frame structure similar to the one suggested above.
Provide support for scalable H.261 and motion JPEG using the MPEG scalable
representations.
A shrewd data structure and efficient algorithm implementation
(e.g., possibly using frequency domain operations [Smith94])
should produce a more flexible system.
1.3.2 Perceptual Coding
1.3 Research Problems
Much work remains to be done understanding the human visual
system and developing models that can be used to implement better coders.
Surprisingly, perceptual coding of audio is ahead of perceptual coding
of video [Jayant93]. Today, most researchers are working
on best possible coding with infinite time to encode. The target bitrates
are typically 1.2 Mbs for CD-ROM and 2, 3 or 6 Mbs for video-on-demand.
There are many other points in the design space. For example, suppose you
wanted to encode CIF images on a typical PC and you were willing to produce
a statistical guarantee on bitrate. The idea is to relax the bitrate requirement
because real-time transport protocols are being designed to provide statistical
guarantees, so why should the coder work hard to satisfy a strict bitrate
bound when it may mean a significantly poorer picture. The coding strategy
for this implementation will be very different than the strategy used in
current coders. This idea is only of several ways to change the basic model.
1.3.3 Multiple CPU/Chip Implementations
1.3 Research Problems
Future desktop computer architectures will use microprocessors
that support multiple CPU's per chip. For example, a RISC processor requires
1M to 3M transistors. Chip technology will soon be able to put 100M transistors
on a chip. So the question is how to use the transistors? One design will
put many different processor architectures on a chip so that a system can
run different software. Another design will put many copies of the same
processor on the chip.
An interesting research problem is to understand the effect of different
architectures on compression and decompression. One possibility, which
is probably already being done in industrial research labs, is to look
at high performance parallel decoders for HDTV images (e.g., 1920x1080)
using general purpose processors.
1.3.4 Continuous Media Infrastructure
1.3 Research Problems
There is currently no portable toolkit for developing distributed
continuous media applications (i.e., digital audio and video) such as desktop
conferencing systems, distance learning systems, distributed video playback
systems. Many excellent research systems have been developed, but they
are typically not distributed, and they support few hardware platforms
and audio/video boards [Anderson91, Gibbs91, Hamakawa92,
Koegel93, Rossum93, Steinmetz91, Trehan93, Hewlett-Packard93]. There
are several standards groups and large companies trying to establish common
architectures and protocols for developing distributed applications, but
these efforts have yet to succeed. The consequence is that anyone who wants
to develop an application faces the problem of developing the infrastructure.
Our research group has developed such an infrastructure, called the
Berkeley Continuous Media Toolkit, that supports motion JPEG and
MPEG video boards, several audio standards, and runs on a variety of platforms.
It is based on the Tcl scripting language, the Tk interface toolkit, and
the Tcl-DP package for distributed client/server computing. We have developed
a network playback system [Rowe92] and desktop video
conferencing system using the toolkit [Chaffee94].
You might wonder how a research project at a university can compete
with large companies. The answer is we cannot. However, by distributing
our source code and working with other researchers we can build a common
infrastructure. This approach has worked for CAD tools, Tcl/Tk, and the
INGRES relational DBMS to name three examples from Berkeley.
However, we still need the equivalent of the PBMPLUS library for manipulating
digital video data. The idea is to develop tools and libraries so that
different researchers can experiment with components of the infrastructure
and with applications built using it.
1.4 Wireless Audio/Video Compression
1.3 Research Problems
1.5 Conclusions
Back to Top
Wireless computing links are very different than conventional
communication links. First, bandwidth is limited (e.g., approximately 2
Mbs aggregate bandwidth in a cell). And, communication errors are inversely
proportional to the power used on the portable device. Power is the scarce
resource so algorithms and implementations that perform adequately with
less power are better. Some researchers argue that portable devices should
have limited computational power to reduce power requirements which means
that audio and video compression must be very simple [Broderson93].
Compression algorithms that work well in this environment are an interesting
challenge. Some people are looking at pyramid and subband coding using
vector quantization. Vector quantization is simple to decode and pyramid
and subband coding can be used to partition the stream into high priority
data that will be sent with more power to reduce errors and low priority
data that will be sent with less power.
Needless to say, this architecture will create many problems if the
rest of the digital video infrastructure is dominated by the block transform
coding standards as I believe it will be.
1.5 Conclusions
1.4 Wireless Audio/Video Compression
1.6 References
Back to Top
Compression researchers have developed numerous technologies
that have been used to develop a series of compression standards that will
dominate desktop digital video. Today, and for at least the next 5-10 years,
application developers and users face a difficult choice of which hardware
and software to use. Eventually, desktop parallel processors will allow
many different compression algorithms, implemented in general-purpose software,
to be used.
Many research problems remain but my opinion is that effort should be
directed to improving existing implementations, software systems infrastructure,
and applications.
1.6 References
1.5 Conclusions
Back to Top
[Anderson91] D.P. Anderson and P. Chan, "Toolkit Support
for Multiuser Audio/Video Applications," Proc. 2nd Int'l. Workshop on
Network and Operating System Support for Digital Audio and Video, Heidelberg,
Germany, November 1991.
[Broderson94] R. Broderson, "The Infopad Project's Home Page,"
World-Wide Web Page, http://infopad.eecs.berkeley.edu/.
[Chaffee94] G. Chaffee, personal communication, May 1994.
[Gibbs91] S. Gibbs, et.al., "A Programming Environment for Multimedia
Applications," Proc. 2nd Int'l. Workshop on Network and Operating System
Support for Digital Audio and Video, Heidelberg, Germany, November
1991.
[Hamakawa92] R. Hamakawa, et.al., "Audio and Video Extensions
to Graphical User Interface Toolkits," Proc. 3rd Int'l. Workshop on
Network and Operating System Support for Digital Audio and Video, San
Diego, CA, November 1992.
[Hewlett-Packard93] Hewlett-Packard, IBM, and Sunsoft, "Multimedia
Systems Services (Version 1.0)," response to Multimedia System Services
Request for Technology, Interactive Multimedia Association, 1993.
[Ho94] S. Ho, personal communication, February 1994.
[Jayant93] N. Jayant, J. Johnston, and R. Safranek, "Signal
Compression Based on Models of Human Perception," Proc. of the IEEE,
Vol 81, No. 10, October 1993, p1385-1422.
[Koegel93] J.F. Koegel, et.al., "HyOctane: A HyTime Engine for
an MMIS," Proc. ACM Multimedia 93, Anaheim, CA, August 1993.
[Lee94] R. Lee, personal communication, May 1994.
[Rossum93] G. van Rossum, et.al., "CMIFed: A Presentation Environment
for Portable Hypermedia Documents," Proc. ACM Multimedia 93, Anaheim,
CA, August 1993.
[Rowe92] L.A. Rowe and B.C. Smith, "A Continuous Media Player"
Proc. 3rd Int'l. Workshop on Network and Operating System Support for
Digital Audio and Video, San Diego, CA, November 1992.
[Rowe93] L.A. Rowe, K. Patel and B.C. Smith, "Performance of
a Software MPEG Video Decoder," Proc. ACM Multimedia 93, Anaheim,
CA, August 1993.
[Smith94] B.C. Smith, "Fast Software Processing of Motion JPEG
Video," to appear ACM Multimedia 94, October 1994.
[Steinmetz91] R. Steinmetz and J.C. Fritzsche, "Abstractions
for Continuous-Media Programming," Proc. 2nd Int'l. Workshop on Network
and Operating System Support for Digital Audio and Video, Heidelberg,
Germany, November 1991.
[Trehan93] R. Trehan, et.al., "Toolkit for Shared Hypermedia
on a Distributed Object Oriented Architecture," Proc. ACM Multimedia
93, Anaheim, CA, August 1993.