2013年5月1日 星期三

Google Deep Learning/ IBM 2012科技大預測


10 Breakthrough Technologies 2013

Deep Learning

With massive amounts of computational power, machines can now recognize objects and translate speech in real time. Artificial intelligence is finally getting smart.

It quickly became obvious that such an effort would require nothing less than Google-scale data and computing power. “I could try to give you some access to it,” Page told Kurzweil. “But it’s going to be very difficult to do that for an independent company.” So Page suggested that Kurzweil, who had never held a job anywhere but his own companies, join Google instead. It didn’t take Kurzweil long to make up his mind: in January he started working for Google as a director of engineering. “This is the culmination of literally 50 years of my focus on artificial intelligence,” he says.
Kurzweil was attracted not just by Google’s computing resources but also by the startling progress the company has made in a branch of AI called deep learning. Deep-learning software attempts to mimic the activity in layers of neurons in the neocortex, the wrinkly 80 percent of the brain where thinking occurs. The software learns, in a very real sense, to recognize patterns in digital representations of sounds, images, and other data.
The basic idea—that software can simulate the neocortex’s large array of neurons in an artificial “neural network”—is decades old, and it has led to as many disappointments as breakthroughs. But because of improvements in mathematical formulas and increasingly powerful computers, computer scientists can now model many more layers of virtual neurons than ever before.
With this greater depth, they are producing remarkable advances in speech and image recognition. Last June, a Google deep-learning system that had been shown 10 million images from YouTube videos proved almost twice as good as any previous image recognition effort at identifying objects such as cats. Google also used the technology to cut the error rate on speech recognition in its latest Android mobile software. In October, Microsoft chief research officer Rick Rashid wowed attendees at a lecture in China with a demonstration of speech software that transcribed his spoken words into English text with an error rate of 7 percent, translated them into Chinese-language text, and then simulated his own voice uttering them in Mandarin. That same month, a team of three graduate students and two professors won a contest held by Merck to identify molecules that could lead to new drugs. The group used deep learning to zero in on the molecules most likely to bind to their targets.
Google in particular has become a magnet for deep learning and related AI talent. In March the company bought a startup cofounded by Geoffrey Hinton, a University of Toronto computer science professor who was part of the team that won the Merck contest. Hinton, who will split his time between the university and Google, says he plans to “take ideas out of this field and apply them to real problems” such as image recognition, search, and natural-language understanding, he says.
All this has normally cautious AI researchers hopeful that intelligent machines may finally escape the pages of science fiction. Indeed, machine intelligence is starting to transform everything from communications and computing to medicine, manufacturing, and transportation. The possibilities are apparent in IBM’s Jeopardy!-winning Watson computer, which uses some deep-learning techniques and is now being trained to help doctors make better decisions. Microsoft has deployed deep learning in its Windows Phone and Bing voice search.
Extending deep learning into applications beyond speech and image recognition will require more conceptual and software breakthroughs, not to mention many more advances in processing power. And we probably won’t see machines we all agree can think for themselves for years, perhaps decades—if ever. But for now, says Peter Lee, head of Microsoft Research USA, “deep learning has reignited some of the grand challenges in artificial intelligence.”
Building a Brain
There have been many competing approaches to those challenges. One has been to feed computers with information and rules about the world, which required programmers to laboriously write software that is familiar with the attributes of, say, an edge or a sound. That took lots of time and still left the systems unable to deal with ambiguous data; they were limited to narrow, controlled applications such as phone menu systems that ask you to make queries by saying specific words.
Neural networks, developed in the 1950s not long after the dawn of AI research, looked promising because they attempted to simulate the way the brain worked, though in greatly simplified form. A program maps out a set of virtual neurons and then assigns random numerical values, or “weights,” to connections between them. These weights determine how each simulated neuron responds—with a mathematical output between 0 and 1—to a digitized feature such as an edge or a shade of blue in an image, or a particular energy level at one frequency in a phoneme, the individual unit of sound in spoken syllables.
Some of today’s artificial neural networks can train themselves to recognize complex patterns.
Programmers would train a neural network to detect an object or phoneme by blitzing the network with digitized versions of images containing those objects or sound waves containing those phonemes. If the network didn’t accurately recognize a particular pattern, an algorithm would adjust the weights. The eventual goal of this training was to get the network to consistently recognize the patterns in speech or sets of images that we humans know as, say, the phoneme “d” or the image of a dog. This is much the same way a child learns what a dog is by noticing the details of head shape, behavior, and the like in furry, barking animals that other people call dogs.
But early neural networks could simulate only a very limited number of neurons at once, so they could not recognize patterns of great complexity. They languished through the 1970s.

In the mid-1980s, Hinton and others helped spark a revival of interest in neural networks with so-called “deep” models that made better use of many layers of software neurons. But the technique still required heavy human involvement: programmers had to label data before feeding it to the network. And complex speech or image recognition required more computer power than was then available.
Finally, however, in the last decade ­Hinton and other researchers made some fundamental conceptual breakthroughs. In 2006, Hinton developed a more efficient way to teach individual layers of neurons. The first layer learns primitive features, like an edge in an image or the tiniest unit of speech sound. It does this by finding combinations of digitized pixels or sound waves that occur more often than they should by chance. Once that layer accurately recognizes those features, they’re fed to the next layer, which trains itself to recognize more complex features, like a corner or a combination of speech sounds. The process is repeated in successive layers until the system can reliably recognize phonemes or objects.
Like cats. Last June, Google demonstrated one of the largest neural networks yet, with more than a billion connections. A team led by Stanford computer science professor Andrew Ng and Google Fellow Jeff Dean showed the system images from 10 million randomly selected YouTube videos. One simulated neuron in the software model fixated on images of cats. Others focused on human faces, yellow flowers, and other objects. And thanks to the power of deep learning, the system identified these discrete objects even though no humans had ever defined or labeled them.
What stunned some AI experts, though, was the magnitude of improvement in image recognition. The system correctly categorized objects and themes in the ­YouTube images 16 percent of the time. That might not sound impressive, but it was 70 percent better than previous methods. And, Dean notes, there were 22,000 categories to choose from; correctly slotting objects into some of them required, for example, distinguishing between two similar varieties of skate fish. That would have been challenging even for most humans. When the system was asked to sort the images into 1,000 more general categories, the accuracy rate jumped above 50 percent.
Big Data
Training the many layers of virtual neurons in the experiment took 16,000 computer processors—the kind of computing infrastructure that Google has developed for its search engine and other services. At least 80 percent of the recent advances in AI can be attributed to the availability of more computer power, reckons Dileep George, cofounder of the machine-learning startup Vicarious.
There’s more to it than the sheer size of Google’s data centers, though. Deep learning has also benefited from the company’s method of splitting computing tasks among many machines so they can be done much more quickly. That’s a technology Dean helped develop earlier in his 14-year career at Google. It vastly speeds up the training of deep-learning neural networks as well, enabling Google to run larger networks and feed a lot more data to them.
Already, deep learning has improved voice search on smartphones. Until last year, Google’s Android software used a method that misunderstood many words. But in preparation for a new release of Android last July, Dean and his team helped replace part of the speech system with one based on deep learning. Because the multiple layers of neurons allow for more precise training on the many variants of a sound, the system can recognize scraps of sound more reliably, especially in noisy environments such as subway platforms. Since it’s likelier to understand what was actually uttered, the result it returns is likelier to be accurate as well. Almost overnight, the number of errors fell by up to 25 percent—results so good that many reviewers now deem Android’s voice search smarter than Apple’s more famous Siri voice assistant.
For all the advances, not everyone thinks deep learning can move artificial intelligence toward something rivaling human intelligence. Some critics say deep learning and AI in general ignore too much of the brain’s biology in favor of brute-force computing.
One such critic is Jeff Hawkins, founder of Palm Computing, whose latest venture, Numenta, is developing a machine-learning system that is biologically inspired but does not use deep learning. Numenta’s system can help predict energy consumption patterns and the likelihood that a machine such as a windmill is about to fail. Hawkins, author of On Intelligence, a 2004 book on how the brain works and how it might provide a guide to building intelligent machines, says deep learning fails to account for the concept of time. Brains process streams of sensory data, he says, and human learning depends on our ability to recall sequences of patterns: when you watch a video of a cat doing something funny, it’s the motion that matters, not a series of still images like those Google used in its experiment. “Google’s attitude is: lots of data makes up for everything,” Hawkins says.
But if it doesn’t make up for everything, the computing resources a company like Google throws at these problems can’t be dismissed. They’re crucial, say deep-learning advocates, because the brain itself is still so much more complex than any of today’s neural networks. “You need lots of computational resources to make the ideas work at all,” says Hinton.
What’s Next
Although Google is less than forthcoming about future applications, the prospects are intriguing. Clearly, better image search would help YouTube, for instance. And Dean says deep-learning models can use phoneme data from English to more quickly train systems to recognize the spoken sounds in other languages. It’s also likely that more sophisticated image recognition could make Google’s self-driving cars much better. Then there’s search and the ads that underwrite it. Both could see vast improvements from any technology that’s better and faster at recognizing what people are really looking for—maybe even before they realize it.
Sergey Brin has said he wants to build a benign version of HAL in 2001: A Space Odyssey.
This is what intrigues Kurzweil, 65, who has long had a vision of intelligent machines. In high school, he wrote software that enabled a computer to create original music in various classical styles, which he demonstrated in a 1965 appearance on the TV show I’ve Got a Secret. Since then, his inventions have included several firsts—a print-to-speech reading machine, software that could scan and digitize printed text in any font, music synthesizers that could re-create the sound of orchestral instruments, and a speech recognition system with a large vocabulary.
Today, he envisions a “cybernetic friend” that listens in on your phone conversations, reads your e-mail, and tracks your every move—if you let it, of course—so it can tell you things you want to know even before you ask. This isn’t his immediate goal at Google, but it matches that of Google cofounder Sergey Brin, who said in the company’s early days that he wanted to build the equivalent of the sentient computer HAL in 2001: A Space Odyssey—except one that wouldn’t kill people.
For now, Kurzweil aims to help computers understand and even speak in natural language. “My mandate is to give computers enough understanding of natural language to do useful things—do a better job of search, do a better job of answering questions,” he says. Essentially, he hopes to create a more flexible version of IBM’s Watson, which he admires for its ability to understand Jeopardy! queries as quirky as “a long, tiresome speech delivered by a frothy pie topping.” (Watson’s correct answer: “What is a meringue harangue?”)
Kurzweil isn’t focused solely on deep learning, though he says his approach to speech recognition is based on similar theories about how the brain works. He wants to model the actual meaning of words, phrases, and sentences, including ambiguities that usually trip up computers. “I have an idea in mind of a graphical way to represent the semantic meaning of language,” he says.
That in turn will require a more comprehensive way to graph the syntax of sentences. Google is already using this kind of analysis to improve grammar in translations. Natural-language understanding will also require computers to grasp what we humans think of as common-sense meaning. For that, Kurzweil will tap into the Knowledge Graph, Google’s catalogue of some 700 million topics, locations, people, and more, plus billions of relationships among them. It was introduced last year as a way to provide searchers with answers to their queries, not just links.
Finally, Kurzweil plans to apply deep-learning algorithms to help computers deal with the “soft boundaries and ambiguities in language.” If all that sounds daunting, it is. “Natural-language understanding is not a goal that is finished at some point, any more than search,” he says. “That’s not a project I think I’ll ever finish.”
Though Kurzweil’s vision is still years from reality, deep learning is likely to spur other applications beyond speech and image recognition in the nearer term. For one, there’s drug discovery. The surprise victory by Hinton’s group in the Merck contest clearly showed the utility of deep learning in a field where few had expected it to make an impact.
That’s not all. Microsoft’s Peter Lee says there’s promising early research on potential uses of deep learning in machine vision—technologies that use imaging for applications such as industrial inspection and robot guidance. He also envisions personal sensors that deep neural networks could use to predict medical problems. And sensors throughout a city might feed deep-learning systems that could, for instance, predict where traffic jams might occur.
In a field that attempts something as profound as modeling the human brain, it’s inevitable that one technique won’t solve all the challenges. But for now, this one is leading the way in artificial intelligence. “Deep learning,” says Dean, “is a really powerful metaphor for learning about the world.”

考古
2012科技大預測 密碼將成歷史?

‧美國之音中文網 2012/01/14
【美國之音中文網/記者埃爾馬斯利/華盛頓電】

日新月異的科技不斷縮小科幻小説和科學事實間的差距,還給我們的生活帶來根本上的改變。根據IBM研究人員的預測,5年內我們就不再需要密碼,也不再會被垃圾電郵打攪,我們還能用我們的心操控許多機器。

微觀個人發電 煩人密碼消失

美國科技公司IBM要進入第兩百個年頭了,它日前公佈了第6次的「未來5年5大科技預測」。這個名單是IBM預測我們將在未來5年看見的5大科技創新。其中一項將使我們能産生小量能源供給家戶電力。

IBM創新部門的副主管梅爾森説:「你可以作微觀的發電。例如,第三世界國家的某個人有支手機或智慧型手機,但卻沒有電力供應,這是很常見的事。在他的鞋裏頭裝上靠走路發電的裝置,他的手機隨時能充電,使這個人實際上跟全世界聯結了起來。」

另一項科技創新將讓那些難記的密碼消失。不久,我們將能用一種被稱為生物辨識的科技進入電郵信箱或銀行帳號。

梅爾森説:「想像一下東西記得你。你走到一個提款機,機器看了你一眼説,嘿我知道你是誰。」

垃圾電郵不復見 心靈能力控制機器

在5年內,我們將不會再被垃圾電郵淹沒。梅爾森説,一個新的電子裝置將在我們看到垃圾電郵前就把它刪了。

梅爾森説:「在你刪信的時候,你不用讀信,你看它一眼然後刪掉。如此過一陣子,這個裝置就把你的習慣學起來,就象你的助理一樣幫你刪掉不想看的信。」

IBM還預測,我們將能用心靈能力控制許多電子裝置。梅爾森説:「不必實際做或説來指揮一個系統做事、光用想就讓事情精確地發生,這其實是很簡單的能力。這種深刻的能力使例如癱瘓或四肢麻痹患者能運用腦波讓事情發生,獨立打理他們的生活。」

消除數字鴻溝

梅爾森説,IBM名單上第5大科技創新是消除所謂的「數字鴻溝」

梅爾森説:「想想看今天的數字鴻溝。世界被分為擁有和缺少數字産品的人,能與不能跟世界聯結上的人。我們預期5年內世界超過80%人口都擁用手機和智慧型手機。到時,想像一下例如可以隨時隨地與隨便一種語言的人講話,這種實時翻譯的能力不就是《星艦迷航》的宇宙翻譯者的概念!」

IBM過去5年的預測結果好壞參半。一些預測到現在還沒成真。在2006年,IBM研究人員預測,現在會有一個3D因特網。在2009年,他們預測城市建築會象有機體一樣感應與回應。3年後,這個未來果真到來了。紐約一個藝術博物館,感應器會偵測溫度、濕度和光線的細微變化,然後自動調節建築的環境以維護藝術品的保存。

IBM的梅爾森説,「未來5年5大科技預測」名單的重要之處在於,它鼓勵那些努力將我們的創新想像變成真實的研究人員。

【2012/01/13 美國之音中文網】

沒有留言: