Atjaunināt sīkdatņu piekrišanu

Analyzing Textual Information: From Words to Meanings through Numbers [Mīkstie vāki]

  • Formāts: Paperback / softback, 192 pages, height x width: 215x139 mm, weight: 240 g
  • Sērija : Quantitative Applications in the Social Sciences
  • Izdošanas datums: 13-Jul-2021
  • Izdevniecība: SAGE Publications Inc
  • ISBN-10: 1544390009
  • ISBN-13: 9781544390000
Citas grāmatas par šo tēmu:
  • Mīkstie vāki
  • Cena: 50,80 €
  • Grāmatu piegādes laiks ir 3-4 nedēļas, ja grāmata ir uz vietas izdevniecības noliktavā. Ja izdevējam nepieciešams publicēt jaunu tirāžu, grāmatas piegāde var aizkavēties.
  • Daudzums:
  • Ielikt grozā
  • Piegādes laiks - 4-6 nedēļas
  • Pievienot vēlmju sarakstam
  • Formāts: Paperback / softback, 192 pages, height x width: 215x139 mm, weight: 240 g
  • Sērija : Quantitative Applications in the Social Sciences
  • Izdošanas datums: 13-Jul-2021
  • Izdevniecība: SAGE Publications Inc
  • ISBN-10: 1544390009
  • ISBN-13: 9781544390000
Citas grāmatas par šo tēmu:
"Researchers in the social sciences and beyond are dealing more and more with massive quantities of text data requiring analysis, from historical letters to the constant stream of content in social media. Traditional texts on statistical analysis have focused on numbers, but this book will provide a practical introduction to the quantitative analysis of textual data. Using up-to-date R methods, this book will take readers through the text analysis process, from text mining and pre-processing the text to final analysis. It includes two major case studies using historical and more contemporary text data to demonstrate the practical applications of these methods. Currently, there is no introductory how-to book on textual data analysis with R that is up-to-date and applicable across the social sciences. Code and a variety of additional resources are available on an accompanying website for the book"--

Researchers in the social sciences and beyond are dealing more and more with massive quantities of text data requiring analysis, from historical letters to the constant stream of content in social media. Traditional texts on statistical analysis have focused on numbers, but this book will provide a practical introduction to the quantitative analysis of textual data. Using up-to-date R methods, this book will take readers through the text analysis process, from text mining and pre-processing the text to final analysis. It includes two major case studies using historical and more contemporary text data to demonstrate the practical applications of these methods. Currently, there is no introductory how-to book on textual data analysis with R that is up-to-date and applicable across the social sciences. Code and a variety of additional resources are available on an accompanying website for the book.

Recenzijas

The authors balance sophisticated analysis in R with the fundamentals of text mining so that all readers can understand and apply to their own analysis of text data. -- Matthew Eshbaugh-Soha If you have a little experience with R, Ledolter and Vandervelde have created an accessible book for learning to analyze text. They provide a scaffolded experience with concrete examples and access to the text and code. They also provide technical information for those interested in a deeper dive of the material. Readers will feel comfortable analyzing their own text as they use the provided material and progress through the book. I will be adding this book to my applied practicum course. -- James B. Schreiber

Series Editor's Introduction xiii
Preface xv
Acknowledgments xvii
About the Authors xix
Chapter 1 Introduction
1(25)
1.1 Text Data
1(5)
1.1.1 Introducing the Definitions
2(2)
1.1.2 Types of Text Data
4(1)
1.1.3 File Formats to Save and Store Text Information
5(1)
1.2 The Two Applications Considered in This Book
6(1)
1.3 Introductory Example and Its Analysis Using the R Statistical Software
7(15)
1.4 The Introductory Example Revisited, Illustrating Concordance and Collocation Using Alternative Software
22(2)
1.5 Concluding Remarks
24(1)
1.6 References
25(1)
Chapter 2 A Description of the Studied Text Corpora and A Discussion of Our Modeling Strategy
26(10)
2.1 Introduction to the Corpora: Selecting the Texts
26(1)
2.2 Debates of the 39th U.S. Congress, as recorded in the Congressional Globe
27(2)
2.3 The Territorial Papers of the United States
29(3)
2.4 Analyzing Text Data: Bottom-Up or Top-Down Analysis
32(2)
2.5 References
34(1)
Appendix to
Chapter 2: the Complete Congressional Record
35(1)
Chapter 3 Preparing Text for Analysis: Text Cleaning and Formatting
36(13)
3.1 Text Cleaning
36(7)
3.1.1 Compacting Multiple Word Sets Into a Single Word
42(1)
3.2 Text Formatting
43(3)
3.2.1 Formatting by Marking Versus Formatting by Deleting
44(1)
3.2.2 Formatting Beyond Metavariables: Telling the Computer What Sections to Skip When Running the Analysis
44(2)
3.3 Concluding Remarks
46(2)
3.4 References
48(1)
Chapter 4 Word Distributions: Document-Term Matrices of Word Frequencies and the "Bag of Words" Representation
49(13)
4.1 Document-Term Matrices of Frequencies
49(4)
4.1.1 Creating the Document-Term Matrix in R
51(1)
4.1.2 Dropping Sparse Words That Do Not Occur in Many Documents
52(1)
4.2 Displaying Word Frequencies
53(3)
4.3 Co-Occurrence of Terms in the Same Document
56(3)
4.4 The Zipf Law: An Interesting Fact About the Distribution of Word Frequencies
59(2)
4.5 References
61(1)
Chapter 5 Metavariables and Text Analysis Stratified on Metavariables
62(22)
5.1 The Significance of Stratification and the Importance of Metavariables
62(1)
5.2 Analysis of the Territorial Papers
63(9)
5.2.1 Territorial Papers: Visualization of the Metavariables
64(5)
5.2.2 Territorial Papers: Stratified Text Analysis
69(3)
5.3 Analysis of Speeches From the 39th Congress
72(11)
5.3.1 Speeches From the 39th Congress: Visualization of the Metavariables
73(4)
5.3.2 Speeches From the 39th Congress: Stratified Text Analysis
77(6)
5.4 References'
83(1)
Chapter 6 Sentiment Analysis
84(13)
6.1 Lexicons of Sentiment-Charged Words
84(4)
6.1.1 Attaching Sentiment to a Document
85(2)
6.1.2 Sentiment Analysis for the Corpus and Its Documents
87(1)
6.1.3 Importance of Sentiment Analysis
88(1)
6.2 Applying Sentiment Analysis to the Letters of the Territorial Papers
88(3)
6.3 Using Other Sentiment Dictionaries and the R Software tidytextfor Sentiment Analysis
91(3)
6.4 Concluding Remarks: An Alternative Approach for Sentiment Analysis
94(1)
6.5 References
95(2)
Chapter 7 Clustering of Documents
97(13)
7.1 Clustering Documents
97(1)
7.2 Measures for the Closeness and the Distance of Documents
98(3)
7.3 Methods for Clustering Documents
101(5)
7.5.7 Hierarchical Agglomerative Clustering and Dendrograms
101(2)
7.3.2 k-Means Clustering
103(2)
7.3.3 Additional Remarks
105(1)
7.4 Illustrating Clustering Methods on a Simulated Example
106(3)
7.5 References
109(1)
Chapter 8 Classification of Documents
110(11)
8.1 Introduction
110(1)
8.2 Classification Procedures
111(5)
8.2.1 The k-Nearest Neighbor Algorithm
111(2)
8.2.2 Naive Bayesian Analysis
113(2)
8.2.3 Fisher Linear Discriminant Method and Linear Scoring (SVM) Methods
115(1)
8.2.4 Evaluating Classification Rules on Hold-Out Samples
116(1)
8.3 Two Examples Using the Congressional Speech Database
116(3)
8.4 Concluding Remarks on Authorship Attribution: Commenting on the Field of Stylometry
119(1)
8.5 References
120(1)
Chapter 9 Modeling Text Data: Topic Models
121(21)
9.1 Topic Models
121(9)
9.1.1 Some More Technical Details and a Brief Primer on Dirichlet Distributions
126(2)
9.1.2 Model Extensions and Useful Software, With a Tip of the Hat to Their Developers
128(1)
9.1.3 Further Comments
129(1)
9.2 Fitting Topic Models to the Two Corpora Studied in This Book
130(10)
9.2.1 Topic Models for the Corpus of the Territorial Papers
130(4)
9.2.2 Topic Models for the Corpus of Speeches From the 39th U.S. Congress
134(6)
9.3 References
140(2)
Chapter 10 n-Crams and Other Ways of Analyzing Adjacent Words
142(9)
10.1 Analysis of Bigrams
142(1)
10.2 Text Windows to Measure Word Associations Within a Neighborhood of Words and a Discussion of the R Package text2vec
143(3)
10.3 Illustrating the Use of n-Grams: Speeches of the 39th Congress
146(5)
Chapter 11 Concluding Remarks
151(4)
Appendix: Listing of Website Resources 155(6)
Index 161
JOHANNES LEDOLTER has professorships in both the Business School, where he is Robert Thomas Holmes Professor of Business Analytics, and in the Department of Statistics and Actuarial Science at the University of Iowa. He is a Fellow of the American Statistical Association and the American Society for Quality, and Elected Member of the International Statistical Institute. He is the author of several books, including Statistical Methods for Forecasting, Introduction to Regression Modeling, Testing 1-2-3: Experimental Design with Applications in Marketing and Service Operations, and Data Mining and Business Analytics with R. He was Professor of Statistics at the Vienna University of Economics and Business from 1997 to 2015, and held visiting professorships at Princeton, Yale, Stanford and the University of Chicago. Since 2011, he has been Associate Investigator at the Center for Prevention and Treatment of Vision Loss at the Iowa City VA Health Care System, which studies optic nerve and retinal disorders in relation to traumatic brain injury. Professor Ledolter enjoys working on multi-disciplinary projects that involve both numeric and text information.

LEA VANDERVELDE is Josephine Witte Professor of Law at the University of Iowa.  She is an award-winning author in the fields of law and legal history. She is the author of several casebooks, dozens of articles in the nations leading law journals, and two historical works, Mrs. Dred Scott and Redemption Songs: Suing for Freedom before Dred Scott.   She has been the Guggenheim Fellow for Constitutional Studies and the May Brodbeck Humanities Fellow, and has held visiting professorships at Yale, the University of Pennsylvania, and the American Bar Foundation.  She is director of the RAOS project, Reconstruction Amendment Optical Scanning, and principle investigator of the Law of the Frontier project at Stanfords CESTA.  She had given professional lectures all over the world.