2017年7月21日 星期五

"科學"地研究【紅樓夢】前80回與後40回的"作者歸屬"

【紅樓夢】前80回與後40回的"作者歸屬"之問題,"科學" (統計/機率學、語言學、植物方面 (參見我發表過台北植物園的研究結論之照片)......) 上通常將全書分3組 (各組40回),來研究。
現在比較時髦的方法如下一北京清華大學的倆作者所用的:(結論是全80回與後40回之間有"差異"),
Abstract:
Dream of the Red Chamber is regarded as one of the four great classical novels in Chinese literature and the dispute over the authorship of the last 40 chapters has existed for years. In this paper the novel is divided into three parts: the first 40 chapters the middle 40 chapters and the last 40 chapters. They are analyzed separately based on language models and text classification. To be specific in this research on the one hand from the aspect of linguistic features we analyze N-gram models calculate Jaccard Index acquire the differences of words and describe the collocations with the grammatical information. On the other hand from the aspect of machine learning the method of Random Forest is used to extract characteristics in experiments for automatic classification and the result shows that most of the first 40 chapters and the middle 40 chapters are not identified as the group of the last 40 chapters. With all these experiments above finally we are able to reach the conclusion that differences between the last 40 chapters and the first 80 chapters do exist.
Published in: Cloud Computing and Intelligent Systems (CCIS), 2012 IEEE 2nd International Conference on
Dream of the Red Chamber is regarded as one of the four great classical novels in Chinese literature and the dispute over the authorship of the last 40 cha
IEEEXPLORE.IEEE.ORG


N-gram請參考Wikipedia 英文版

The letter was featured in the movie Saving Private Ryan, and George W. Bush read it at a ceremony on the tenth anniversary of September 11.

N-gram tracing was used to reveal the author.
ATLASOBSCURA.COM

沒有留言: