Another reason to throw the
TV in the trash! (Hug your
At MIT, they can put
words in our mouths
do we really need another reason?
http://www.boston.com/dailyglobe2/135/metro/At_MIT_they_can_put_words_in_our_mouths+.shtmlCAMBRIDGE - Scientists at the Massachusetts
Institute of Technology have created the first realistic videos of people saying
things they never said - a scientific leap that raises unsettling questions
about falsifying the moving image.
In one demonstration, the researchers
taped a woman speaking into a camera, and then reprocessed the footage into a
new video that showed her speaking entirely new sentences, and even mouthing
words to a song in Japanese, a language she does not speak. The results were
enough to fool viewers consistently, the researchers report.
technique's inventors say it could be used in video games and movie special
effects, perhaps reanimating Marilyn Monroe or other dead film stars with new
lines. It could also improve dubbed movies, a lucrative global
But scientists warn the technology will also provide a powerful
new tool for fraud and propaganda - and will eventually cast doubt on everything
from video surveillance to presidential addresses.
''This is really
groundbreaking work,'' said Demetri Terzopoulos, a leading specialist in facial
animation who is a professor of computer science and mathematics at New York
University. But ''we are on a collision course with ethics. If you can make
people say things they didn't say, then potentially all hell breaks
The researchers have already begun testing the technology on
video of Ted Koppel, anchor of ABC's ''Nightline,'' with the aim of dubbing a
show in Spanish, according to Tony F. Ezzat, the graduate student who heads the
MIT team. Yet as this and similar technology makes its way out of academic
laboratories, even the scientists involved see ways it could be misused: to
discredit political dissidents on television, to embarrass people with
fabricated video posted on the Web, or to illegally use trusted figures to
''There is a certain point at which you raise the level
of distrust to where it is hard to communicate through the medium,'' said
Kathleen Hall Jamieson, dean of the Annenberg School for Communication at the
University of Pennsylvania. ''There are people who still believe the moon
landing was staged.''
Currently, the MIT method is limited: It works only
on video of a person facing a camera and not moving much, like a newscaster. The
technique only generates new video, not new audio.
But it should not be
difficult to extend the discovery to work on a moving head at any angle,
according to Tomaso Poggio, a neuroscientist at the McGovern Institute for Brain
Research, who is on the MIT team and runs the lab where the work is being done.
And while state-of-the-art audio simulations are not as convincing as the MIT
software, that barrier is likely to fall soon, researchers say.
only a matter of time before somebody can get enough good video of your face to
have it do what they like,'' said Matthew Brand, a research scientist at MERL, a
Cambridge-based laboratory for Mitsubishi Electric.
For years, animators
have used computer technology to put words in people's mouths, as they do with
the talking baby in CBS's ''Baby Bob'' - creating effects believable enough for
entertainment, but still noticeably computer-generated. The MIT technology is
the first that is ''video-realistic,'' the researchers say, meaning volunteers
in a laboratory test could not distinguish between real and synthesized clips.
And while current computer-animation techniques require an artist to smooth out
trouble spots by hand, the MIT method is almost entirely
Previous work has focused on creating a virtual model of a
person's mouth, then using a computer to render digital images of it as it
moves. But the new software relies on an ingenious application of artificial
intelligence to teach a machine what a person looks like when
Starting with between two and four minutes of video - the
minimum needed for the effect to work - the computer captures images which
represent the full range of motion of the mouth and surrounding areas, Ezzat
The computer is able to express any face as a combination of these
faces (46 in one example), the same way that any color can be represented by a
combination of red, green, and blue. The computer then goes through the video,
learning how a person expresses every sound, and how it moves from one to the
Given a new sound, the computer can then generate an accurate
picture of the mouth area and virtually superimpose it on the person's face,
according to a paper describing the work. The researchers are scheduled to
present the paper in July at Siggraph, the world's top computer graphics
The effect is significantly more convincing than a previous
effort, called Video Rewrite, which recorded a huge number of small snippets of
video and then recombined them. Still, the new method only seems lifelike for a
sentence or two at a time, because over longer stretches, the speaker seems to
MIT's Ezzat said that he would like to develop a more
complex model that would teach the computer to simulate basic emotions.
specialist can still detect the video forgeries, but as the technology improves,
scientists predict that video authentication will become a growing field - in
the courts and elsewhere - just like the authentication of photographs. As
video, too, becomes malleable, a society increasingly reliant on live satellite
feeds and fiber optics will have to find even more direct ways to
''We will probably have to revert to a method common in the
Middle Ages, which is eyewitness testimony,'' said the University of
Pennsylvania's Jamieson. ''And there is probably something healthy in
[Sample video clips available on web