Klientu atbalsts: 27018494

Grāmatu iegāde | Jauns profils | Ienākt

E-grāmata: From Social Science to Data Science: Key Data Collection and Analysis Skills in Python

4.50/5 (2 ratings by Goodreads)

Bernie Hogan

Formāts: 400 pages
Izdošanas datums: 23-Nov-2022
Izdevniecība: Sage Publications Ltd
Valoda: eng
ISBN-13: 9781529736281

Citas grāmatas par šo tēmu:

Research methods: general

Formāts - PDF+DRM
Cena: 46,38 €*
* ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
Ielikt grozā
Pievienot vēlmju sarakstam
Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.

Formāts: 400 pages
Izdošanas datums: 23-Nov-2022
Izdevniecība: Sage Publications Ltd
Valoda: eng
ISBN-13: 9781529736281

Citas grāmatas par šo tēmu:

Research methods: general

DRM restrictions

Kopēšana (kopēt/ievietot):

nav atļauts
Drukāšana:

nav atļauts
Lietošana:

Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

Nepieciešamā programmatūra
Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

Built around the entire research process with a main focus on ethics, this book equips you with scaling up your skills to successfully conduct a computation social science research project with Python.

From Social Science to Data Science is a fundamental guide to scaling up and advancing your programming skills in Python. From beginning to end, this book will enable you to understand merging, accessing, cleaning and interpreting data whilst gaining a deeper understanding of computational techniques and seeing the bigger picture.

With key features such as tables, figures, step-by-step instruction and explanations giving a wider context, Hogan presents a clear and concise analysis of key data collection and skills in Python.

Recenzijas

Excellent. The students will love I think. It reminds me a bit of a Andy Fields SPSS/R books, which the students have also loved in the past too. This one has that flavour but also pushes the analytics into the contemporary era with Python. I expect it will be a real success. -- Emma Uprichard

List of figures and tables

xiii

Discover this textbook's online resources!

About the author

xvii

Acknowledgements

xix

Prologue

xxiii

0.1 Scaling up: Thinking about programming in the social sciences

xxiii

0.2 Who is this book for?

xxv

0.3 Why Python (and not R, Stata, Java, C, etc.)?

xxvi

0.3.1 How much Python should I already know?

xxvii

0.4 What version of Python?

xxviii

0.4.1 Part I. Thinking programmatically

xxix

0.4.2 Part II. Accessing and converting data

xxix

0.4.3 Part III. Interpreting data: Expectations versus observations

xxx

0.4.4 Part IV. Social data science in practice: Four approaches

xxxi

0.5 What about statistics?

xxxi

0.6 Writing and coding considerations

xxxii

0.6.1 My final tip before we go

xxxiii

PART I Thinking Programmatically

(2)

1 Introduction: Thinking of life at scale

(1)

1.1 From social science to what?

(1)

1.2 (PO)DIKW: A potential theoretical framework for data science

(4)

1.2.1 What is data?

(3)

1.2.2 From data to wisdom

(1)

1.3 Beyond the interface

(2)

1.4 Fixed, variable, and marginal costs: Why not to build a barn

(3)

1.4.1 From economics to data science

(2)

1.4.2 The challenges of maximising fixed costs

(1)

1.5 Code should be FREE

(4)

1.5.1 Functioning code

(1)

1.5.2 Robust code

(1)

1.5.3 Elegant code

(1)

1.5.4 Efficient code

(1)

1.6 Pseudocode (and pseudo-pseudocode)

(1)

1.6.1 Attempt
1. Pseudocode as written word

(1)

1.6.2 Attempt
2. Pseudocode as mathematical formula

(1)

1.6.3 Attempt
3. Pseudocode as written code

(1)

1.6.4 Attempt
4. Slightly more formal pseudocode (in a Python style)

(1)

1.7 Summary

(1)

1.8 Further reading

(1)

1.9 Extensions and reflections

(2)

2 The Series: Taming the distribution

(24)

2.1 Introducing the Series: Python's way to store a distribution

(14)

2.1.1 Working from index

(2)

2.1.2 Working from values (and masking)

(2)

2.1.3 Working from distributions

(2)

2.1.4 Adding data to a Series

(3)

2.1.5 Deleting data from a Series

(1)

2.1.6 Working with missing data in a Series

(1)

2.1.7 Getting unique values in a Series

(1)

2.2 Changing a Series

(7)

2.2.1 Changing the order of items in the Series

(1)

2.2.2 Changing the type of the Series

(2)

2.2.3 Changing Series values I: Arithmetic operators

(1)

2.2.4 Changing Series values II: Recoding values using map

(1)

2.2.5 Changing Series values III: Denning your own mapping

(2)

2.3 Summary

(1)

2.4 Extensions and reflections

(2)

3 The DataFrame: Python's tabular format

(28)

3.1 From the Series to the DataFrame

(2)

3.2 A DataFrame with multiple columns

(3)

3.2.1 From a list of lists

(1)

3.2.2 From a dictionary

(2)

3.3 Getting data from a DataFrame: Querying, masking, and slicing

(4)

3.3.1 Getting data about the DataFrame itself

(1)

3.3.2 Returning a single row or column

(1)

3.3.3 Returning multiple columns

(1)

3.3.4 Returning a single element

(1)

3.3.5 Returning a slice of data

(1)

3.4 Changing data at different scales

(10)

3.4.1 Adding data to an existing DataFrame

(3)

3.4.2 Adding one DataFrame to another

(1)

3.4.3 Changing a column or the entire DataFrame: apply, map, and applymap

(4)

3.4.4 Deep versus shallow copies

(2)

3.5 Advanced topics: numpy and numpy arrays

(5)

3.5.1 Reshaping in numpy

(2)

3.5.2 Linear algebra and numpy

(1)

3.6 Summary

(1)

3.7 Further reading

(1)

3.8 Extensions and reflections

(2)

PART II Accessing and Converting Data

(94)

4 File types: Getting data in

(26)

4.1 Importing data to a DataFrame

(2)

4.1.1 A important note on file organisation

(1)

4.1.2 Example data

(1)

4.2 Rectangular data: CSV

(3)

4.2.1 Using the csv library

(2)

4.2.2 Using the pandas CSV reader: read csv {)

(1)

4.3 Rectangular rich data: Excel

(3)

4.4 Nested data: JSON

(5)

4.4.1 Loading JSON

(4)

4.5 Nested markup languages: HTML and XML

(9)

4.5.1 HTML: Hypertext Markup Language

(1)

4.5.2 Wikipedia as a data source

(1)

4.5.3 Wikipedia as HTML

(1)

4.5.4 Using Beautiful Soup (bs4) for markup data

(1)

4.5.5 Data scepticism

(2)

4.5.6 XML

(4)

4.6 Serialisation

100

(1)

4.6.1 Long-term storage: Pickles and feather

101

(1)

4.7 Summary

101

(1)

4.8 Extensions and reflections

102

(1)

5 Merging and grouping data

103

(28)

5.1 Combining data across tables

104

(1)

5.2 A review of adding data to a DataFrame using concat

104

(6)

5.2.1 Adding rows

104

(3)

5.2.2 Adding columns

107

(2)

5.2.3 Multi-level indexed data

109

(1)

5.2.4 Transposing a DataFrame

110

(1)

5.3 The `key' to merging

110

(4)

5.3.1 One-to-many versus one-to-one relationships

111

(3)

5.4 Understanding joins

114

(4)

5.4.1 A join as a kind of set logic

114

(2)

5.4.2 Inner join

116

(1)

5.4.3 Outer join

116

(1)

5.4.4 Left join

117

(1)

5.4.5 Right join

117

(1)

5.5 Grouping and aggregating data

118

(4)

5.5.1 Mean centring

120

(2)

5.6 Long versus wide data

122

(1)

5.6.1 Advanced reshaping

123

(1)

5.7 Using SQL databases

123

(5)

5.7.1 SQL basics

124

(2)

5.7.2 Using SQL for aggregation and filtering

126

(2)

5.8 Summary

128

(1)

5.9 Further reading

129

(1)

5.10 Extensions and reflections

129

(2)

6 Accessing data on the World Wide Web using code

131

(18)

6.1 Accessing data I: Remote access of webpages

132

(6)

6.1.1 What is a URL?

133

(2)

6.1.2 URL parsing

135

(1)

6.1.3 What is a web request?

136

(2)

6.2 An example web collection task using paging

138

(5)

6.3 Other web-related issues to consider

143

(1)

6.3.1 When to use your own versus someone else's program

143

(1)

6.3.2 Are there ways to simulate a browser?

143

(1)

6.4 Ethical issues to consider

143

(3)

6.4.1 What is public data and how public?

143

(1)

6.4.2 Considering data minimisation as a basic ethical principle

144

(2)

6.5 Summary

146

(1)

6.6 Further reading in ethics of data access and privacy

146

(1)

6.7 Extensions and reflections

147

(2)

7 Accessing APIs, including Twitter and Reddit

149

(20)

7.1 Accessing APIs: Abstracting from the web

150

(4)

7.1.1 Identifying yourself: Keys and tokens

150

(2)

7.1.2 Securely using credentials

152

(2)

7.2 Accessing Twitter data through the API

154

(6)

7.2.1 Troubleshooting requests

155

(1)

7.2.2 Access rights and Twitter

156

(1)

7.2.3 Strategies for navigating Twitter's API

157

(3)

7.3 Using an API wrapper to simplify data access

160

(3)

7.3.1 Collecting Reddit data using praw

160

(2)

7.3.2 Building a comment tree on Reddit

162

(1)

7.4 Considerations for a data collection pipeline

163

(2)

7.4.1 Version control systems and servers

163

(1)

7.4.2 Storing data remotely

164

(1)

7.4.3 Jupyter in the browser as an alternative

164

(1)

7.5 APIs and epistemology: How data access can mean knowledge access

165

(2)

7.6 Summary

167

(1)

7.7 Further reading

167

(1)

7.8 Extensions and reflections

168

(1)

PART III Interpreting data: Expectations versus Observations

169

(50)

8 Research questions

171

(14)

8.1 Introduction

172

(1)

8.1.1 What is a research question?

172

(1)

8.2 Inductive, deductive, and abductive research questions

173

(3)

8.2.1 Deductive research questions and the null hypothesis

174

(1)

8.2.2 Abductive reasoning and the educated guess

175

(1)

8.3 Avoiding description: Expectation and systematic observation in science

176

(1)

8.4 Prediction versus explanation

177

(2)

8.4.1 Prediction and resampling

178

(1)

8.5 Linking hypotheses to approaches

179

(1)

8.6 Operationalisation

180

(1)

8.7 Boundedness and research questions

181

(1)

8.8 Summary

182

(1)

8.9 Further reading

183

(1)

8.10 Extensions and reflections

183

(2)

9 Visualising expectations: Comparing statistical tests and plots

185

(34)

9.1 Introduction: Why show data?

186

(2)

9.2 Visualising distributions

188

(4)

9.2.1 Uniform distribution with histogram

190

(2)

9.3 Testing a uniform distribution using a chi-squared test

192

(2)

9.4 Testing a uniform distribution using regression

194

(10)

9.4.1 Testing against a uniform distribution: Births in the UK

198

(3)

9.4.2 Annotating a figure

201

(3)

9.4.3 Normal versus skewed distributions as being interesting

204

(1)

9.5 Comparing two distributions versus two groups

204

(12)

9.5.1 Constraining our work based on the properties of data

205

(2)

9.5.2 Two continuous distributions

207

(2)

9.5.3 PRE scores

209

(4)

9.5.4 Comparing distinct groups

213

(2)

9.5.5 Summary

215

(1)

9.6 Further reading in visualisation

216

(1)

9.7 Extensions and reflections

217

(2)

PART IV Social Data Science in Practice: Four Approaches

219

(124)

10 Cleaning data for socially interesting features

221

(28)

10.1 Data as a form of social context

223

(3)

10.2 A sustained example for cleaning: Stack Exchange

226

(5)

10.2.1 Quick summaries of the dataset

229

(2)

10.3 Setting an index

231

(1)

10.4 Handling missing data

232

(1)

10.5 Cleaning numeric data

233

(2)

10.6 Cleaning up web data

235

(3)

10.6.1 Encoding

236

(1)

10.6.2 Stripping HTML from text

236

(1)

10.6.3 Extracting links from HTML

237

(1)

10.7 Cleaning up lists of data

238

(3)

10.8 Parsing time

241

(1)

10.9 Regular expressions

242

(4)

10.9.1 Further learning for regular expressions

244

(1)

10.9.2 Regular expressions and ground truth

245

(1)

10.10 Storing our work

246

(1)

10.11 Summary

246

(1)

10.12 Further reading

247

(1)

10.13 Extensions and reflections

247

(2)

11 Introducing natural language processing: Cleaning, summarising, and classifying text

249

(20)

11.1 Reading language: Encoding text

250

(2)

11.1.1 Key definitions in text

251

(1)

11.2 From text to language

252

(1)

11.3 A sample simple NLP workflow

253

(5)

11.3.1 Preprocessing text

254

(4)

11.4 NLP approaches to analysis

258

(8)

11.4.1 Scoring documents with sentiment analysis

258

(3)

11.4.2 Extracting keywords: TF-IDF scores

261

(2)

11.4.3 Text classification

263

(3)

11.5 Summary

266

(1)

11.6 Further reading

267

(1)

11.7 Extensions and reflections

268

(1)

12 Introducing time-series data: Showing periods and trends

269

(20)

12.1 Introduction: It's about time

270

(1)

12.2 Dates and the datetime module

271

(4)

12.2.1 Parsing time

272

(2)

12.2.2 Timezones

274

(1)

12.2.3 Localisation and time

274

(1)

12.3 Revisiting the Movie Stack Exchange data

275

(1)

12.4 Pandas Datetime Feature Extraction

276

(3)

12.5 Resampling as a way to group by time period

279

(2)

12.6 Slicing and the datetime index in pandas

281

(2)

12.7 Moving window in data

283

(3)

12.7.1 Missing data in a rolling window

284

(2)

12.8 Summary

286

(1)

12.9 Further explorations

287

(1)

12.10 Extensions and reflections

288

(1)

13 Introducing network analysis: Structuring relationships

289

(26)

13.1 Introduction: The connections that signal social structure

290

(1)

13.1.1 Doing network analysis in Python

291

(1)

13.2 Creating network graphs

291

(3)

13.2.1 Selecting a graph type

292

(1)

13.2.2 Adding nodes

293

(1)

13.2.3 Adding edges

293

(1)

13.3 Adding attributes

294

(3)

13.3.1 Working with distributions of attributes: The case of degree

295

(2)

13.4 Plotting a graph

297

(4)

13.4.1 Considering layouts for a graph

299

(2)

13.5 Subgroups and communities in a network

301

(2)

13.5.1 A goodness-of-fit metric for communities

302

(1)

13.6 Creating a network from data

303

(9)

13.6.1 Whole networks versus partial networks

305

(1)

13.6.2 Weighted networks

306

(2)

13.6.3 Bipartite networks

308

(4)

13.7 Summary

312

(1)

13.8 Further reading

313

(1)

13.9 Extensions and reflections

314

(1)

14 Introducing geographic information systems: Data across space and place

315

(24)

14.1 Introduction: From space to place

316

(1)

14.2 Kinds of spatial data

316

(10)

14.2.1 From a sphere to a rectangle

317

(2)

14.2.2 Mapping places onto spaces

319

(2)

14.2.3 Introducing the geopandas GeoDataFrame

321

(2)

14.2.4 Splitting the data into intervals using mapclassif y

323

(2)

14.2.5 Plotting points

325

(1)

14.3 Creating your own GeoDataFrame

326

(8)

14.3.1 Loading your own maps

327

(2)

14.3.2 Linking maps to other data sources

329

(5)

14.4 Summary

334

(1)

14.5 Further topics and reading

335

(1)

14.6 Extensions and reflections

336

(3)

15 Conclusion: There (to data science) and back again (to social science)

339

(4)

References

343

(10)

Index

353

Bernie Hogan (he/him/*) is a Senior Research Fellow at the Oxford Internet Institute and the current Director of the University of Oxfords MSc program in Social Data Science. Bernies work specialises in how to leverage computational tools for creative, challenging, and engaging methodologies to address social science research questions about identity, sexuality, and community. His favourite work in this area focuses on the capture and analysis of personal social networks, using both pen-and-paper tools and the recent free opensource application Network Canvas (https://www.networkcanvas.com). He also has a keen interest in how language is used to either bring people together or push them apart using large scale quantitative data. He has published over 40 peer reviewed articles and presented at over a hundred conferences, including several keynotes. His most famous work reconsidered Goffmans offline stage play metaphor of self-presentation for online life (Hogan, 2010). This piece probably helped in popularising the term algorithmic curation.

Before working at the University of Oxfords Oxford Internet Institute (https://www.oii.ox.ac.uk) he completed his undergraduate and graduate degrees in Canada. His undergraduate was in Sociology and Computer Science at Memorial University in St. Johns, Newfoundland, Canada. His graduate work was in Sociology and Knowledge Media Design at the University of Toronto. During that time Bernie interned at Microsoft Research. Bernie lives in Oxford, UK with his husband and their sprawling vinyl record collection. He tweets (and collects vinyl) under the moniker blurky because it is a very rare word that sounds like Bernie. Most of this research is available from his departmental homepage, (https://www.oii.ox.ac.uk/people/hogan) and or/his GitHub, (https://www.github.com/berniehogan).

Biežāk uzdotie jautājumi par e-grāmatām

Permanent link: https://www.kriso.lv/db/97815297362812e.html

Keywords:

E-grāmata: From Social Science to Data Science: Key Data Collection and Analysis Skills in Python

DRM restrictions

Kopēšana (kopēt/ievietot):

Drukāšana:

Lietošana:

Recenzijas

Konts un iestatījumi

Meklēšana

Meklēt datubāzē

Refine By

Tēmas Ebook Subjects

Izvēlieties iepirkumu grozu