Atjaunināt sīkdatņu piekrišanu

Literary Detective Work on the Computer [Hardback]

(University of Wolverhampton)
  • Formāts: Hardback, 283 pages, height x width: 245x164 mm, weight: 680 g
  • Sērija : Natural Language Processing 12
  • Izdošanas datums: 08-May-2014
  • Izdevniecība: John Benjamins Publishing Co
  • ISBN-10: 9027249997
  • ISBN-13: 9789027249999
Citas grāmatas par šo tēmu:
  • Hardback
  • Cena: 118,34 €*
  • * ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
  • Grāmatu piegādes laiks ir 3-4 nedēļas, ja grāmata ir uz vietas izdevniecības noliktavā. Ja izdevējam nepieciešams publicēt jaunu tirāžu, grāmatas piegāde var aizkavēties.
  • Daudzums:
  • Ielikt grozā
  • Piegādes laiks - 4-6 nedēļas
  • Pievienot vēlmju sarakstam
  • Formāts: Hardback, 283 pages, height x width: 245x164 mm, weight: 680 g
  • Sērija : Natural Language Processing 12
  • Izdošanas datums: 08-May-2014
  • Izdevniecība: John Benjamins Publishing Co
  • ISBN-10: 9027249997
  • ISBN-13: 9789027249999
Citas grāmatas par šo tēmu:
Computational linguistics can be used to uncover mysteries in text which are not always obvious to visual inspection. For example, the computer analysis of writing style can show who might be the true author of a text in cases of disputed authorship or suspected plagiarism. The theoretical background to authorship attribution is presented in a step by step manner, and comprehensive reviews of the field are given in two specialist areas, the writings of William Shakespeare and his contemporaries, and the various writing styles seen in religious texts. The final chapter looks at the progress computers have made in the decipherment of lost languages. This book is written for students and researchers of general linguistics, computational and corpus linguistics, and computer forensics. It will inspire future researchers to study these topics for themselves, and gives sufficient details of the methods and resources to get them started.
Preface ix
Chapter 1 Author identification 1(58)
1 Introduction
1(4)
2 Feature selection
5(6)
2.1 Evaluation of feature sets for authorship attribution
8(3)
3 Inter-textual distances
11(19)
3.1 Manhattan distance and Euclidean distance
12(2)
3.2 Labbe and Labbe's measure
14(1)
3.3 Chi-squared distance
15(1)
3.4 The cosine similarity measure
16(2)
3.5 Kullback-Leibler Divergence (KLD)
18(1)
3.6 Burrows' Delta
18(5)
3.7 Evaluation of feature-based measures for inter-textual distance
23(3)
3.8 Inter-textual distance by semantic similarity
26(2)
3.9 Stemmatology as a measure of inter-textual distance
28(2)
4 Clustering techniques
30(17)
4.1 Introduction to factor analysis
31(4)
4.2 Matrix algebra
35(3)
4.3 Use of matrix algebra for PCA
38(6)
4.4 PCA case studies
44(1)
4.5 Correspondence analysis
45(2)
5 Comparisons of classifiers
47(3)
6 Other tasks related to authorship
50(8)
6.1 $tylochronometry
50(3)
6.2 Affect dictionaries and psychological profiling
53(5)
6.3 Evaluation of author profiling
58(1)
7 Conclusion
58(1)
Chapter 2 Plagiarism and spam filtering 59(40)
1 Introduction
59(3)
2 Plagiarism detection software
62(24)
2.1 Collusion and plagiarism, external and intrinsic
63(1)
2.2 Preprocessing of corpora and feature extraction
63(1)
2.3 Sequence comparison and exact match
64(1)
2.4 Source-suspicious document similarity measures
65(1)
2.5 Fingerprinting
66(1)
2.6 Language models
67(1)
2.7 Natural language processing
68(2)
2.8 Intrinsic plagiarism detection
70(3)
2.9 Plagiarism of program code
73(1)
2.10 Distance between translated and original text
74(2)
2.11 Direction of plagiarism
76(2)
2.12 The search engine-based approach used at PAN-13
78(3)
2.13 Case study 1: Hidden influences from printed sources in the Gaelic tales of Duncan and Neil MacDonald
81(2)
2.14 Case study 2: General George Pickett and related writings
83(1)
2.15 Evaluation methods
84(1)
2.16 Conclusion
85(1)
3 Spam filters
86(12)
3.1 Content-based techniques
87(1)
3.2 Building a labeled corpus for training
87(1)
3.3 Exact matching techniques
88(1)
3.4 Rule-based methods
89(1)
3.5 Machine learning
90(2)
3.6 Unsupervised machine learning approaches
92(1)
3.7 Other spam-filtering problems
93(1)
3.8 Evaluation of spam filters
94(1)
3.9 Non-linguistic techniques
94(3)
3.10 Conclusion
97(1)
4 Recommendations for further reading
98(1)
Chapter 3 Computer studies of Shakespearean authorship 99(50)
1 Introduction
99(2)
2 Shakespeare, Wilkins and "Pericles"
101(7)
2.1 Correspondence analysis for "Pericles" and related texts
105(3)
3 Shakespeare, Fletcher and "The Two Noble Kinsmen"
108(2)
4 "King John"
110(1)
5 "The Raigne of King Edward III"
111(7)
5.1 Neural networks in stylometry
111(2)
5.2 Cusum charts in stylometry
113(3)
5.3 Burrows' Zeta and Iota
116(2)
6 Hand D in "Sir Thomas More"
118(14)
6.1 Elliott, Valenza and the Earl of Oxford
118(3)
6.2 Elliott and Valenza: Hand D
121(1)
6.3 Bayesian approach to questions of Shakespearian authorship
122(5)
6.4 Bayesian analysis of Shakespeare's second person pronouns
127(4)
6.5 Vocabulary differences, LDA and the authorship of Hand D 13o
6.6 Hand D: Conclusions
131(1)
7 The three parts of "Henry VI"
132(1)
8 "Timon of Athens"
132(1)
9 "The Puritan" and "A Yorkshire Tragedy"
133(1)
10 "Arden of Faversham"
134(2)
11 Estimation of the extent of Shakespeare's vocabulary and the authorship of the "Taylor" poem
136(5)
12 The chronology of Shakespeare
141(6)
13 Conclusion
147(2)
Chapter 4 Stylometric analysis of religious texts 149(58)
1 Introduction
149(41)
1.1 Overview of the New Testament by correspondence analysis
151(2)
1.2 Q
153(16)
1.3 Luke and Acts
169(2)
1.4 Recent approaches to New Testament stylometry
171(4)
1.5 The Pauline Epistles
175(13)
1.6 Hebrews
188(1)
1.7 The Signs Gospel
188(2)
2 Stylometric analysis of the Book of Mormon
190(8)
3 Stylometric studies of the Qu'ran
198(8)
4 Condupion
206(1)
Chapter 5 Computers and decipherment 207(52)
1 Introduction
207(17)
1.1 Differences between cryptography and decipherment
208(1)
1.2 Cryptological techniques for automatic language recognition
209(3)
1.3 Dictionary approaches to language recognition
212(1)
1.4 Sinkov's test
212(1)
1.5 Index of coincidence
213(1)
1.6 The log-likelihood ratio
214(1)
1.7 The chi-squared test statistic
215(1)
1.8 Entropy of language
215(3)
1.9 Zipf's Law and Heaps' Law coefficients
218(1)
1.10 Modal token length
219(1)
1.11 Autocorrelation analysis
220(1)
1.12 Vowel identification
221(3)
2 Rongorongo
224(19)
2.1 History of Rongorongo
224(2)
2.2 Characteristics of Rongorongo
226(1)
2.3 Obstacles to decipherment
227(1)
2.4 Encoding of Rongorongo symbols
227(1)
2.5 The "Mamari" lunar calendar
228(1)
2.6 Basic statistics of the Rongorongo corpus
228(1)
2.7 Alignment of the Rongorongo corpus
229(2)
2.8 A concordance for Rongorongo
231(2)
2.9 Collocations and collostructions
233(1)
2.10 Classification by genre
234(3)
2.11 Vocabulary richness
237(4)
2.12 Podzniakov's approach to matching frequency curves
241(2)
3 The Indus Valley texts
243(9)
3.1 Why decipherment of the Indus texts is difficult
243(1)
3.2 Are the Indus texts writing?
244(4)
3.3 Other evidence for the Indus Script being writing
248(1)
3.4 Determining the order of the Markov model
248(1)
3.5 Missing symbols
249(1)
3.6 Text segmentation and the log-likelihood measure
249(2)
3.7 Network analysis of the Indus Signs
251(1)
4 Linear A
252(3)
5 The Phaistos disk
255(1)
6 Iron Age Pictish symbols
256(1)
7 Mayan glyphs
256(1)
8 Conclusion
257(2)
References 259(22)
Index 281