|
Newsroom of the Future - Tech
Feature
"It's me," says John to the intercom outside his glass-encased office.
"Good morning, John," answers a friendly female voice. The voice is
emotionless, accent-less, accommodating. She sounds like a
secretary in tailored suit. John’s door slides open.
He steps into the soundproof room. The door slides shut. He throws his
coat over his wide armed desk chair and puts his coffee beside a
flat-screen computer monitor. "You have three new voice mail messages
and two saved messages," says the voice. "Would you like me to play
them for you?"
"Not right now, thank you," John says. "Get me my e-mail."
"Sure thing. Just a minute."
John sits in front of his monitor and sips his coffee. He looks out his
17th story window to the New City stretching out below him. He can
faintly hear the hum of the newsroom server pulsing below his feet.
"You have eight new e-mail messages." His account flickers up on the
monitor.
"Scroll down, please." Old messages disappear into the top of the
screen. "Stop. Read the message from Jay Thompson."
The computer reads the message in the same friendly tone - ever
patient, ever helpful.
"Reply,” says John. “Dear Mr. Thompson. Thank you for your message. I’m
looking forward to meeting with you for an interview this afternoon..."
John's words float through the space of his office and appear on the
monitor. He finishes composing his message and commands, "Read it back
to me, please."
The computer reads the message back, word for word.
"Thank you. Send it. And please call me a taxi."
"No problem." The message flicks out into cyberspace. A faint dial tone
and a series of beeps and clicks as the computer accesses the phone
database and requests a taxi from another computer halfway across the
New City. Fifteen minutes later, the computer interrupts an Internet
research session.
"I'm sorry to interrupt, John, but your taxi has arrived."
"Thank you. Take a break."
"Sure. Coffee sounds good right now. Say ‘come back’ when you need me."
John smirks at the occasional humor of his assistant as he puts on his
coat and steps out the door that has slid open silently. On his way
down the newsroom corridor, he passes other journalists settling in for
the morning - their lips moving silently behind the soundproof glass,
mouthing commands to their computers who respond obediently - thousands
of microcomputers controlled by a central server with access to every
resource a journalist could want - phone books, maps, government
directories, dictionaries, the AP Stylebook, the Internet - all
controlled by the human voice.
This is the newsroom of the future. And it's not as far off as some may
think. In fact, the two types of technology needed to run this newsroom
already exist – speech recognition and voice interface systems – and
it’s only a matter of time before the two come together.
Gordon Moore, Intel co-founder and the creator of Moore’s Law – which
has defined the exponential growth of computerized technology since the
60’s – says “really good” speech recognition will be the next most
important change in the computer industry. “By ‘really good,’” Moore
tells Business 2.0, “I mean enabling the computer to differentiate
among ‘to,’ ‘two,’ and ‘too.’ It has to understand in context. I think
once the computer gets to that point, it essentially understands
language and you can have a conversation with it. To me, that could
have a dramatic impact on the way people use computers.”
* * *
* *
Scholle Sawyer McFarland, managing editor of San Francisco-based
Macworld magazine, began using speech recognition software four years
ago when a repetitive strain injury left her unable to use her arms and
hands. She was 26 years old. "I thought I was going to be fired," she
said. "I didn't know if I'd ever be able to work again. I couldn't
touch a computer. Journalism is all about being able to work hard
whenever you're needed. And if you can't do it, somebody else can."
At the time, she was working as a writer for the monthly computer
magazine. As one of only two feature writers for the feature-heavy
publication, she worked under extreme pressure – 14-hours a day, six
days a week. Her body couldn't handle it. The tendons in her shoulders
wore down until they could no longer hold her shoulders in their
sockets. "It felt like someone had pulled my arm off," she said. "I
couldn't even really sit up straight for very long. They called me the
brain in the jar."
After five weeks of physical therapy and anti-inflammatory drugs,
Sawyer McFarland returned to work. But she could never go back to the
work style she had before her injury. She was lucky. Macworld readily
adapted her responsibilities to fit her physical limitations. They
completely revamped her workstation with a futuristic-looking ergonomic
set up including a $400 keyboard and a $300 chair. Macworld even bought
her a windows-compatible PC so she could use the latest speech
recognition software that had not yet been adapted for Macs. The
company spent about $5,000 to get Sawyer McFarland working again. Not
including the cost of lost workdays.
"The more and more efficient and the faster and faster the technology
has gotten, the worse and worse it's gotten for people," says Sawyer
McFarland. There was a point in her career when she would e-mail a
coworker sitting not three feet away. "It was like we'd forgotten that
we can get up off our butts and just walk over and talk to someone."
But the technology that has caused all these problems is now being
adapted to work for those who were harmed by it. And for those who use
a keyboard daily, especially journalists working under tight deadlines,
this technology will have a great impact.
The speech recognition technology that got Sawyer McFarland working
again is becoming more and more common, and surprisingly cheap. Dragon
Naturally Speaking, manufactured by Lernout & Hauspie, is the most
recognizable of these programs and costs about $60 for basic dictation
software. A more advanced version that allows oral editing and
mode-less speech – where the user can switch back and forth between
dictation and commands like “open Netscape” or “print document,” costs
$150-200. But the software doesn’t work well without a high speed
computer with a lot of memory, and learning to use the software can be
time-consuming and frustrating.
* * *
* *
John’s taxi drops him at Thompson’s office ten minutes early. In
preparation for their interview, John puts in a call to his newsroom
server. “Go to remote transcription mode,” he commands.
“OK, John. What’s the slug?”
“Election.”
“Thank you. Just a moment.”
John heads to the reception area and looks at his watch. One hour to
deadline.
“OK, John. Remote transcription mode is ready.”
John speaks some preliminary thoughts into his phone: the date, time,
some basic information about Thompson, a description of the office.
A few minutes later, Thompson sticks his head into the waiting area and
motions to John. A quick introduction and handshake. “Do you mind if I
record this conversation?” asks John.
“Go ahead.”
John places his cell phone face up on the desk between them. “Ready to
go,” he says. “Just speak normally.”
By the end of the interview, everything that has been said – John’s
notes, his questions, Thompson’s answers, every cough, every um, every
sound – will be recorded and sent digitally through the cell phone to
the newsroom server halfway across the city. When he returns to the
office, a complete transcription of the interview will be waiting for
him on his computer. It will be filed under slug election, along with
the server’s ever-growing collection of phone calls, interviews, and
business meetings – all recorded and transcribed remotely, all
accessible by voice.
* * *
* *
Susan Fulton, an administrative editor at the New York Times, has been
using speech recognition software since her repetitive strain injury in
1990. “I can run rings around most people using my computer,” she says,
“using a combination of keyboard and mouse and speech recognition.”
While there are currently only 10 people in her New York office using
speech recognition, she expects that number to grow. “If you have one
method that’s working, i.e. typing, and you’re accustomed to doing it,
there’s absolutely no reason to change to another method unless you
know that you’re going to get something out of it,” says Fulton.
“But if the programs become easier to use… then they may eventually
find that it is actually faster than typing.”
Learning to use speech recognition software is currently the biggest
hurdle to overcome. The training period in which the computer gets to
know your voice and your way of speaking only takes five to ten
minutes. But getting comfortable with the software, adding vocabulary,
and learning how to write and edit aloud can take months. For a
deadline reporter, says Fulton, this is not an option. But if you can
find the time to master the techniques of the software, speech
recognition can be easier and more efficient than typing. It can even
improve your writing.
“The keyboard with the computer is a little bit of a crutch,” says
Fulton. “In daily journalism, a lot of times people don’t think before
they put something down on paper. They’re using the paper as part of
their thought process instead of letting that go forth in their head.”
She says speech recognition can train you to think before you write – a
lost art in journalism. Essentially, she says, this could eliminate the
need for a keyboard and mouse. “Writing isn’t with your fingers. It’s
really the brain that’s doing the writing.”
Speech recognition has changed Fulton’s writing style. She’s more
concise. She writes with simple phrases and shorter sentences “like you
were going to explain something to your mother,” she says. “People used
to be able to call their notes back into their office in publishable
form. We’re out of practice. We should be able to do this again.”
* * *
* *
The other half of the newsroom of the future is voice interface – talk
to the computer and the computer talks back. Currently, these systems
are mostly used to bring the internet and all its information to the
telephone. Companies like BeVocal and Tellme provide 800 numbers that
anyone can call to get driving directions, stock quotes, weather and
traffic updates, restaurant, movie and music reviews. Catch up on soap
operas, get the horoscope, check the lottery, or play computerized
blackjack while sitting in traffic. (Incidentally, the dealer
sounds like Sean Connery.) There are also computerized operators
that will direct your business calls, and automated secretaries that
will handle your e-mail, faxes, address book, and calendar.
Bill Byrne has studied linguistics for 12 years and is now working to
give computers the power of speech. Working with the speech and
language team of General Magic, a Sunnyvale-based company specializing
in voice interface systems, Byrne creates computerized personalities
that are then sold to businesses. Like a logo or brand, companies take
on a voice and a personality to match. “Imagine you’re a company that
sells stock information and you want to voice enable all that stuff. If
you have one voice, that voice will now be attributed to that brand.
The voice in effect is going to say, ‘I am Charles Schwab.’”
Hundreds of businesses are setting up automated operators that will
provide company information: airlines, banks, investments firms,
auction sites, even department stores. So far, General Magic has
created seven different personalities: there’s the down-to-business
money manager – popular with stock and investment firms; the surfer
dude who does music reviews, and the dominant friendly female – similar
to the voice in the newsroom of the future.
“She’s our most popular voice,” says Byrne. “She is pretty efficient,
but also very friendly, very polite, handles things nicely. Very
competent. She’s not straight to the point, very terse, and she’s not
flowery and dumb either. People love her. She does what you need her to
do and she’s also very nice.”
Clifford Nass is a communications professor at Stanford University and
has been studying how people interact with technology for 10 years. He
emphasizes the importance of a computerized voice matching the
personality of the task it is assigned. “Different contexts require
different voices,” says Nass. “Just as we expect librarians to sound
differently than carnival barkers, and we expect announcers to have a
different voice than a doctor or medical site. We expect the same thing
of technology. Different car companies have to select different voices
to match their brand. A BMW should sound different from a Rolls Royce
should sound different from a Chevy.”
Publishers may soon have many more choices to make. In addition to
editorial content and the look of the publication, they may have to
decide what the publication sounds and acts like. Is the Wall
Street Journal more friendly than the New York Times? Does the San
Francisco Bay Guardian have a sense of humor or is it angry and
combative? Is the Houston Chronicle a man or a woman? Does he or she
have a Texas drawl? Both inside the newsroom and in the public,
newspapers and magazines will have a whole new way of defining
themselves.
“What it comes down to are the fundamental differences between people
And what it is about a voice that carries that,” says Nass. “Things
like speed of speaking, pitch range, how high and low you get in a
single conversation, how aroused or calm you seem, how deep. All these
things, to our brain, suggest information about how I should expect
this voice to behave, and in turn, how I will behave towards it.”
* * *
* *
In the cab ride back to the office, 20 minutes before his deadline,
John puts in another call to the newsroom.
“It’s me,” he says.
“Hello John. What can I do for you?”
“Ready to file. Slug – Election.” He begins to dictate his copy over
the phone from the back seat.
“What’s the word count?” he asks the server.
“Word count is 752 words.”
“Read it back to me, please.”
When the computer is done reading back the copy, John makes a few quick
edits and says, “OK. Send it to the editor’s desk.”
“Sure thing. Anything else you need?”
“No. Thank you. See you tomorrow.”
“Bye.”
John powers off his cell phone and sits back in the cab. Forty miles
away in the newsroom, his daily story makes its way through the server
to the editor’s office at the end of the hall. The editor is
pacing the room, talking into the air to her husband, when she’s
interrupted by the server.
“Sorry to bother you, but slug – election just came in. Would you like
to look at it?”
“Yes, thank you. I’ll be right there.”
She says goodbye to her husband and sits down in front of her monitor.
“OK, computer. Ready to edit election.” The story flickers up on the
monitor. She leans back and begins to speak her edits to the computer.
The changes appear, one by one, up on the screen until she’s ready to
send the story to press. She stretches her arms and clasps her hands
behind her head. Another page one story.
|