2009年11月23日 星期一

n.Fluent (IBM)

Link by Link

A Translator Tool With a Human Touch


Published: November 22, 2009

HOW hard can it be, as the joke goes, to speak Chinese? (Six-year-olds do it all the time.)

Skip to next paragraph
Ari Fishkind for I.B.M.

David Lubensky, left, and Salim Roukos of I.B.M. are using many humans, namely the company's 400,000 workers, to improve digital translation.

Yes, it turns out that learning languages is one of those skills that humans, even relatively young ones, master seemingly magically. It is all enough to make a mainframe computer jealous.

At I.B.M., a team of nearly 100, including mathematicians and software developers, is working on a project to create an automatic translation tool, so-called machine translation, that has the speed and accuracy to be used in instant-messaging between speakers of two different languages.

The project, called n.Fluent, is intended to teach the computer terminology that is specific to I.B.M.’s businesses, and, more significantly, allow the computer to learn what it has been doing wrong. To that end, the company is extracting and organizing contributions from I.B.M.’s 400,000-member work force spread across more than 170 countries, adding a human touch to the project.

Over a two-week period last month, the company issued a “worldwide translation challenge” to its employees, using a points-based system to award the biggest contributors prizes that were converted to charitable donations. About 6,000 I.B.M. employees made improvements in 11 languages to more than two million words of text translated by n.Fluent.

So, when a machine translation from French produces, “MTTP is the time of 30 minutes and it is steadily declining since January 2006,” a human correction comes up with this improved English version: “The MTTP delay is 30 minutes and it has been steadily declining since January 2006.”

“From this parallel data, we update the models,” said Salim Roukos, an I.B.M. researcher in language-related technology at its T.J. Watson Laboratory in Yorktown Heights, N.Y., home of the n.Fluent project. “You want to learn the idiomatic expressions — when you say someone has kicked the bucket, you don’t want that translated word for word.”

So far, n.Fluent is used only by I.B.M. employees, but the intention is to create a product that can be sold to other businesses.

Efforts like this at I.B.M., as well as social networking tools behind the company’s firewalls, amount to a new twist on “crowdsourcing,” the term I.B.M. officials use to describe them. In addition to the n.Fluent project, I.B.M. has its own companywide version of Wikipedia (Bluepedia), with contributions from 1,300 employees.

Perhaps the most innovative social networking experiment at I.B.M., according to Irene Greif of the I.B.M. Center for Social Software in Cambridge, Mass., is Dogear, a tool similar to Delicious that allows employees to share links and tagging on the Internet as well as on the I.B.M.-only intranet. The project itself was a bit of an experiment, and I.B.M. developers tweaked further, she said.

This led to Dogear, a system of tags and descriptions contributed by 10 percent of users. It has become more popular than I.B.M.’s own internal search engine.

“A small crowd, a self-selected crowd can often be useful,” Ms. Greif said.

This highlights the differences between what is occurring at I.B.M. and other large companies and what traditionally constitutes crowdsourcing.

I.B.M. employees are not just any “crowd”; they have expertise and a loyalty to their employer that any old posse wrangled up on the Internet may not. In fact, crowdsourcing may be the wrong way of thinking of such internal corporate projects. Employee-sourcing?

Maybe that catch-all term “collaboration” is the best way to think of what social networking technology can bring to the workplace.

After all, collaboration is an old goal for employees and employers.

In the case of the n.Fluent project, programmers are not trying to have a computer master the “rules” of a language, but rather are looking for statistical patterns between two sets of translated texts and among the words themselves. For example, Mr. Roukos said, the text of a Canadian parliamentary debate in French and English can help programmers to “build statistical models based on the parallel corpus.”

It is language’s fluidity and unpredictability that thus far make translation resist simple computer-based solutions. Which means that for the foreseeable future, translation experts will also need to become experts on collaboration.

“One of the reasons we’ve got senior-level executives behind this is that it is kind of a Harvard Business School case study of how the crowds inside the company help you develop a better product,” said David Lubensky, another researcher on the n.Fluent project. “We should be able to replicate this over various domains.”

For example, initially, all rewards to contributors were in the form of donations to one of seven worldwide charities. Over time, the team heard that some contributors “would personally want some trinket,” he said. And now small gifts are awarded as well.

Something any 6-year-old could have told you.

沒有留言: