2013年6月16日 星期日

How the U.S. Uses Technology To Mine More Data More Quickly

請參考:  Phone Metadata

 

How the U.S. Uses Technology To Mine More Data More Quickly

美國如何利用科技掃蕩全球數據?

WASHINGTON — When American analysts hunting terrorists sought new ways to comb through the troves of phone records, e-mails and other data piling up as digital communications exploded over the past decade, they turned to Silicon Valley computer experts who had developed complex equations to thwart Russian mobsters intent on credit card fraud.
華盛頓——隨着數字通訊量在過去10年中爆炸式地增長,追蹤恐怖分子的美國分析師為了尋求處理大量電話記錄、電子郵件和其他數據的新方法,求助於硅谷的計算機專家,這些專家曾建立了複雜的方程,阻止了俄羅斯黑社會進行信用卡欺詐的企圖。

The partnership between the intelligence community and Palantir Technologies, a Palo Alto, Calif., company founded by a group of inventors from PayPal, is just one of many that the National Security Agency and other agencies have forged as they have rushed to unlock the secrets of “Big Data.”
為此,國家安全局(National Security Agency,簡稱NSA)和其他情報界部門與技術公司建立了合作關係,加利福尼亞州帕洛阿爾托的帕蘭提爾技術公司(Palantir Technologies)就是其中的一個,帕蘭提爾由一群來自貝寶(PayPal)的投資者創建,他們正忙於解開「大數據」(Big Data)的秘密。

Today, a revolution in software 
technology that allows for the highly automated and instantaneous analysis of enormous volumes of digital information has transformed the N.S.A., turning it into the virtual landlord of the digital assets of Americans and foreigners alike. The new technology has, for the first time, given America’s spies the ability to track the activities and movements of people almost anywhere in the world without actually watching them or listening to their conversations.
如今,軟件技術的革命使對規模巨大的數字信息進行自動及瞬時的分析變成可能,也給NSA帶來變化,使其成為美國人乃至外國人數字資產的事實上的擁有者。這些新技術有史以來第一次使美國間諜能跟蹤世界上幾乎任何地方的人的活動和往來,而無需實際監視他們或監聽他們的對話。

New disclosures that the N.S.A. has secretly acquired the phone records of millions of Americans and access to e-mails, videos and other data of foreigners from nine United States Internet companies have provided a rare glimpse into the growing reach of the nation’s largest spy agency.
新的爆料揭示,NSA一直在秘密地獲取數百萬美國人的電話記錄,還從9家美國互聯網公司得到外國人的電子郵件、視頻和其他數據。爆料為了解美國這家最大的情報機構不斷增長的能力提供了罕見的一瞥。

With little public debate, the N.S.A. has been undergoing rapid expansion in order to exploit the mountains of new data being created each day. The government has poured billions of dollars into the agency over the last decade, building a one-million-square-foot fortress in the mountains of Utah, apparently to store huge volumes of personal data indefinitely. It created intercept stations across the country, according to former industry and intelligence officials, and helped build one of the world’s fastest computers to crack the codes that protect information.
為了能利用每天產生的堆積如山的新數據, 在幾乎沒有公眾討論的情況下,NSA在迅速擴張。在過去10年里,政府給該機構投入了數十億美元的資金,在猶他州的山區修建了一座面積達100萬平方英尺 (約合10萬平方米)的城堡,顯然是用來無限期地存儲巨量的個人數據。前業界和情報官員稱,NSA在美國各地建立了竊聽站,並幫助建造了一台世界上最快的 計算機,以破解保護信息的編碼。

While once the flow of data across the Internet appeared too overwhelming for N.S.A. to keep up with, the recent revelations suggest that the agency’s capabilities are now far greater than most outsiders believed. “Five years ago, I would have said they don’t have the capability to monitor a significant amount of Internet traffic,” said Herbert S. Lin, an expert in computer science and telecommunications at the National Research Council. Now, he said, it appears “that they are getting close to that goal.”
有人曾一度認為互聯網流通的數據量太大,以至於超出NSA的分析能力,然而,最近的爆料顯示,該機構的能力遠比大多數局外人所認為的要強。美國全國研究委員會(National Research Council)的計算機科學和通訊專家赫伯特·S·林(Herbert S. Lin)說,「5年前,我會說他們沒有能力監視互聯網流量中的大部分。」現在他說,看來「他們已經接近這個目標。」

On Saturday, it became clear how close: Another N.S.A. document, again cited by The Guardian, showed a “global heat map” that appeared to represent how much data the N.S.A. sweeps up around the world. It showed that in March 2013 there were 97 billion pieces of data collected from networks worldwide; about 14 percent of it was in Iran, much was from Pakistan and about 3 percent came from inside the United States, though some of that might have been foreign data traffic routed through American-based servers.
上周六,人們得以看到他們到底有多接近。 一份由《衛報》(The Guardian)援引的NSA文件,展示了一張「全球熱度圖」,該圖看似表示了NSA從世界各地汲取的數據量。比如它顯示,2013年3月總共從全世界 互聯網上收集到970億條數據;其中14%來自伊朗,有許多來自巴基斯坦,另有約3%來自美國本土,不過其中一些可能是流經美國服務器的外國數據量。

A Shift in Focus
重點的轉移
The agency’s ability to efficiently mine metadata, data about who is calling or e-mailing, has made wiretapping and eavesdropping on communications far less vital, according to data experts.
數據專家稱,該機構有效地挖掘元數據的能力,已經使竊聽和偷聽通訊內容的重要性大大降低。元數據指的是關於誰在打電話或發郵件的信息。

“American laws and American policy view the content of communications as the most private and the most valuable, but that is backwards today,” said Marc Rotenberg, the executive director of the Electronic Privacy Information Center, a Washington group. “The information associated with communications today is often more significant than the communications itself, and the people who do the data mining know that.”
「美國法律和美國政策把通訊內容視為最為私密且最有價值的,但這在今天已經過時了,」總部在華盛頓的電子隱私信息中心(Electronic Privacy Information Center)的負責人馬克·盧騰伯格(Marc Rotenberg)說。「如今,與通訊關聯的信息遠比通訊內容本身更重要,那些從事數據挖掘的人都清楚這一點。」

United States laws restrict wiretapping and eavesdropping on the actual content of the communications of American citizens but offer very little protection to the digital data thrown off by the telephone when a call is made. And they offer virtually no protection to other forms of non-telephone-related data like credit card transactions.
美國法律限制竊聽和偷聽美國公民通訊的實際內容,但對於打電話這個行為所產生的數據只有很少的保護。而且,對於其他與電話無關的數據,例如用信用卡付帳,幾乎沒有法律保護。

When separate streams of data are integrated into large databases — matching, for example, time and location data from cellphones with credit card purchases or E-ZPass use — intelligence analysts are given a mosaic of a person’s life that would never be available from simply listening to their conversations. Just four data points about the location and time of a mobile phone call, a study published in Nature found, make it possible to identify the caller 95 percent of the time.
當不同的數據流被整合到大型數據庫中後, 例如把使用手機的時間和地點與信用卡購物、或使用E-ZPass電子收費系統的數據相匹配,情報分析師能獲得一個人生活的不同側面,而在過去,僅靠偷聽他 們的談話絕對無法得到這麼多。《自然》雜誌(Nature)上發表的一項研究顯示,有關打一次流動電話的地點和時間的僅僅四條數據,足以在95%的情況下 確定打電話者的身份。

“We can find all sorts of correlations and patterns,” said one government computer scientist who spoke on condition of anonymity because he was not authorized to comment publicly. “There have been tremendous advances.”
「我們能發現各種各樣的關聯和模式,這方面的技術已有重大的進展。」一名為政府工作的計算機科學家說,由於沒有獲准作公開評論,這位人士要求匿名。

Secret Programs
保密項目
When President George W. Bush secretly began the N.S.A.’s warrantless wiretapping program in October 2001, to listen in on the international telephone calls and e-mails of American citizens without court approval, the program was accompanied by large-scale data mining operations.
2001年10月,喬治·W·布殊(George W. Bush)總統秘密啟動了國家安全局的無授權監聽項目,在無需獲得法院授權的情況下監聽美國公民的國際電話和電子郵件,與這個項目同時進行的是大規模的數據挖掘活動。

Those secret programs prompted a showdown in March 2004 between Bush White House officials and a group of top Justice Department and F.B.I. officials in the hospital room of John Ashcroft, then the attorney general. Justice Department lawyers who were willing to go along with warrantless wiretapping argued that the data mining raised greater constitutional concerns.
這些保密項目導致了2004年3月在時任 司法部長約翰·阿什克羅夫特(John Ashcroft)病房裡的一次較量,較量的一方是布殊的白宮官員,另一方是司法部(Justice Department)和聯邦調查局的幾個官員。願意讓無授權監聽進行下去的司法部律師辯稱,是數據挖掘引發了更多與憲法有涉的擔憂。

The confrontation in Mr. Ashcroft’s hospital room took place just one month after a Harvard undergraduate, Mark Zuckerberg, created Facebook; Twitter would not be founded for two more years. Apple’s iPhone and iPad did not yet exist.
阿什克羅夫特病房裡的對峙發生前的一個月,哈佛大學的本科生馬克·扎克伯格(Mark Zuckerberg)創立了Facebook;Twitter的創立是在兩年多以後。蘋果的iPhone和iPad當時還不存在。

“More and more services like Google and Facebook have become huge central repositories for information,” observed Dan Auerbach, a technology analyst with the Electronic Frontier Foundation. “That’s created a pile of data that is an incredibly attractive target for law enforcement and intelligence agencies.”
電子前沿基金會(Electronic Frontier Foundation)技術分析師丹·奧爾巴赫(Dan Auerbach)觀察說,「越來越多的像谷歌(Google)和Facebook這樣的服務,正在變成信息的大型中央貯藏庫。它們所貯藏的大批數據,對執法機構和情報機構來說,是具有極大吸引力的目標。」

The spy agencies have long been among the most demanding customers for advanced computing and data-mining software — and even more so in recent years, according to industry analysts. “They tell you that somewhere there is an American who is going to be blown up,” said a former technology executive, and “the only thing that stands between that and him living is you.”
據行業分析人士稱,長期以來,情報機構一直是對高級計算和數據挖掘軟件需求最強烈的客戶之一,近年來尤其是這樣。一名曾擔任技術高管的人士說,「他們對你說,某地有一個美國人將要被炸死,唯一一個能阻止爆炸發生、讓他活下去的人就是你。」

Because of smartphones, tablets, social media sites, e-mail and other forms of digital communications, the world creates 2.5 quintillion bytes of new data daily, according to I.B.M.
根據IBM公司的估計,由於智能手機、平板電腦、社交媒體網站、電子郵件和其他形式的數字通訊的廣泛使用,全球每天產生250億億位元組的新數據。

The company estimates that 90 percent of the data that now exists in the world has been created in just the last two years. From now until 2020, the digital universe is expected to double every two years, according to a study by the International Data Corporation.
據IBM估計,在全球現存數據中,有90%是過去兩年中產生的。根據國際數據公司(International Data Corporation)的一項研究,預計從現在起到2020年,數字世界的規模將每兩年翻一番。

Accompanying that explosive growth has been rapid progress in the ability to sift through the information.
和爆炸性的數據增長相伴隨的,是分析這些數據能力的快速進步。

I.B.M.’s Watson, the supercomputing technology that defeated human Jeopardy! champions in 2011, is a prime example of the power of data-intensive artificial intelligence.
IBM的「沃森」(Watson)是一個最好的例子,它展示了數據密集型的人工智能的強大力量。「沃森」是一台超級計算機,在2011年擊敗了《危險邊緣》(Jeopardy!,美國的智力競猜節目——譯註)的人類冠軍。

Watson-style computing, analysts said, is precisely the technology that could instantly sift through the mass of Internet communications data, see patterns of suspicious online behavior and thus narrow the hunt for terrorists.
分析人士說,「沃森」式的計算,正是處理海量數據所需的技術,它能即時分析互聯網通訊數據,發現可疑的在線行為模式,因此能縮小恐怖分子的搜索範圍。

Both the N.S.A. and the Central Intelligence Agency have been testing Watson in the last two years, said a consultant who has advised the government and asked not to be identified because he was not authorized to speak.
一名曾給政府提供諮詢的顧問說,過去兩年里,NSA和中央情報局(Central Intelligence Agency)都在測試用「沃森」,因為他未獲准公開談論此事,所以要求匿名。

Trilaterization
三邊測量法
Industry experts say that intelligence and law enforcement agencies also use a new technology, known as trilaterization, that allows tracking of an individual’s location, moment to moment. The data, obtained from cellphone towers, can track the altitude of a person, down to the specific floor in a building. There is even software that exploits the cellphone data seeking to predict a person’s most likely route. “It is extreme Big Brother,” said Alex Fielding, an expert in networking and data centers.
業內專家說,情報部門和執法機構還在使用 一種被稱為三邊測量法(trilaterization)的新技術,它能夠從一個時刻到下一個時刻地跟蹤人的位置。從手機蜂窩塔得到的數據能夠跟蹤一個人 所在的海拔高度,精度足以確定該人在某棟建築的某一層。甚至還有軟件能夠通過分析手機數據,尋求預測一個人最可能採取的路線。網絡和數據中心專家亞歷克 斯·菲爾丁(Alex Fielding)說,「這是極端的『老大哥』(英國反烏托邦小說《1984》中監控全體國民的獨裁者——譯註)」。

Nothing revealed in recent days suggests that N.S.A. eavesdroppers have violated the law by targeting ordinary Americans. On Friday, President Obama defended the agency’s collection of phone records and other metadata, saying it did not involve listening to conversations or reading the content of e-mails.
最近披露的消息沒有證據表明,NSA的竊聽者對普通美國人的竊聽違反了法律。上周五,奧巴馬總統為國家安全局收集電話記錄和其他元數據的做法辯護,稱其不涉及偷聽談話內容、或閱讀電子郵件內容。

But privacy advocates say that a national debate must take place to come up with new rules to limit the intelligence community’s access to the new mountains of data.
然而,隱私權倡導者說,必須進行全國討論,以制定新的法規,來限制情報界對大量新數據的獲取。

Mr. Rotenberg, referring to the constitutional limits on search and seizure, said, “It is a bit of a fantasy to think that the government can seize so much information without implicating the Fourth Amendment interests of American citizens.”
盧騰伯格說,「那種認為政府能繳獲如此大量的信息,卻不會影響美國公民享有的憲法第四修正案(Fourth Amendment)權利的想法,有點是幻想。」他指的是美國憲法所規定的有關免於不合理的搜查與扣押的權利。

David E. Sanger和Scott Shane從華盛頓、 Steve Lohr和James Glanz從紐約、Quentin Hardy從加利福尼亞州伯克利對本文報道有貢獻。

翻譯:林蒙克、張薇

沒有留言:

張貼留言