Atjaunināt sīkdatņu piekrišanu

E-grāmata: From Social Science to Data Science: Key Data Collection and Analysis Skills in Python

  • Formāts: 400 pages
  • Izdošanas datums: 23-Nov-2022
  • Izdevniecība: Sage Publications Ltd
  • Valoda: eng
  • ISBN-13: 9781529738070
Citas grāmatas par šo tēmu:
  • Formāts - EPUB+DRM
  • Cena: 46,38 €*
  • * ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
  • Ielikt grozā
  • Pievienot vēlmju sarakstam
  • Šī e-grāmata paredzēta tikai personīgai lietošanai. E-grāmatas nav iespējams atgriezt un nauda par iegādātajām e-grāmatām netiek atmaksāta.
  • Formāts: 400 pages
  • Izdošanas datums: 23-Nov-2022
  • Izdevniecība: Sage Publications Ltd
  • Valoda: eng
  • ISBN-13: 9781529738070
Citas grāmatas par šo tēmu:

DRM restrictions

  • Kopēšana (kopēt/ievietot):

    nav atļauts

  • Drukāšana:

    nav atļauts

  • Lietošana:

    Digitālo tiesību pārvaldība (Digital Rights Management (DRM))
    Izdevējs ir piegādājis šo grāmatu šifrētā veidā, kas nozīmē, ka jums ir jāinstalē bezmaksas programmatūra, lai to atbloķētu un lasītu. Lai lasītu šo e-grāmatu, jums ir jāizveido Adobe ID. Vairāk informācijas šeit. E-grāmatu var lasīt un lejupielādēt līdz 6 ierīcēm (vienam lietotājam ar vienu un to pašu Adobe ID).

    Nepieciešamā programmatūra
    Lai lasītu šo e-grāmatu mobilajā ierīcē (tālrunī vai planšetdatorā), jums būs jāinstalē šī bezmaksas lietotne: PocketBook Reader (iOS / Android)

    Lai lejupielādētu un lasītu šo e-grāmatu datorā vai Mac datorā, jums ir nepieciešamid Adobe Digital Editions (šī ir bezmaksas lietotne, kas īpaši izstrādāta e-grāmatām. Tā nav tas pats, kas Adobe Reader, kas, iespējams, jau ir jūsu datorā.)

    Jūs nevarat lasīt šo e-grāmatu, izmantojot Amazon Kindle.

Built around the entire research process with a main focus on ethics, this book equips you with scaling up your skills to successfully conduct a computation social science research project with Python.



From Social Science to Data Science is a fundamental guide to scaling up and advancing your programming skills in Python. From beginning to end, this book will enable you to understand merging, accessing, cleaning and interpreting data whilst gaining a deeper understanding of computational techniques and seeing the bigger picture. 

With key features such as tables, figures, step-by-step instruction and explanations giving a wider context, Hogan presents a clear and concise analysis of key data collection and skills in Python. 

Recenzijas

Excellent. The students will love I think. It reminds me a bit of a Andy Fields SPSS/R books, which the students have also loved in the past too. This one has that flavour but also pushes the analytics into the contemporary era with Python. I expect it will be a real success. -- Emma Uprichard

List of figures and tables
xiii
Discover this textbook's online resources! xv
About the author xvii
Acknowledgements xix
Prologue xxiii
0.1 Scaling up: Thinking about programming in the social sciences xxiii
0.2 Who is this book for? xxv
0.3 Why Python (and not R, Stata, Java, C, etc.)? xxvi
0.3.1 How much Python should I already know? xxvii
0.4 What version of Python? xxviii
0.4.1 Part I. Thinking programmatically xxix
0.4.2 Part II. Accessing and converting data xxix
0.4.3 Part III. Interpreting data: Expectations versus observations xxx
0.4.4 Part IV. Social data science in practice: Four approaches xxxi
0.5 What about statistics? xxxi
0.6 Writing and coding considerations xxxii
0.6.1 My final tip before we go xxxiii
PART I Thinking Programmatically
1(2)
1 Introduction: Thinking of life at scale
3(1)
1.1 From social science to what?
4(1)
1.2 (PO)DIKW: A potential theoretical framework for data science
5(4)
1.2.1 What is data?
5(3)
1.2.2 From data to wisdom
8(1)
1.3 Beyond the interface
9(2)
1.4 Fixed, variable, and marginal costs: Why not to build a barn
11(3)
1.4.1 From economics to data science
11(2)
1.4.2 The challenges of maximising fixed costs
13(1)
1.5 Code should be FREE
14(4)
1.5.1 Functioning code
14(1)
1.5.2 Robust code
15(1)
1.5.3 Elegant code
16(1)
1.5.4 Efficient code
17(1)
1.6 Pseudocode (and pseudo-pseudocode)
18(1)
1.6.1 Attempt
1. Pseudocode as written word
18(1)
1.6.2 Attempt
2. Pseudocode as mathematical formula
19(1)
1.6.3 Attempt
3. Pseudocode as written code
19(1)
1.6.4 Attempt
4. Slightly more formal pseudocode (in a Python style)
19(1)
1.7 Summary
19(1)
1.8 Further reading
20(1)
1.9 Extensions and reflections
21(2)
2 The Series: Taming the distribution
23(24)
2.1 Introducing the Series: Python's way to store a distribution
24(14)
2.1.1 Working from index
26(2)
2.1.2 Working from values (and masking)
28(2)
2.1.3 Working from distributions
30(2)
2.1.4 Adding data to a Series
32(3)
2.1.5 Deleting data from a Series
35(1)
2.1.6 Working with missing data in a Series
36(1)
2.1.7 Getting unique values in a Series
37(1)
2.2 Changing a Series
38(7)
2.2.1 Changing the order of items in the Series
38(1)
2.2.2 Changing the type of the Series
39(2)
2.2.3 Changing Series values I: Arithmetic operators
41(1)
2.2.4 Changing Series values II: Recoding values using map
42(1)
2.2.5 Changing Series values III: Denning your own mapping
43(2)
2.3 Summary
45(1)
2.4 Extensions and reflections
45(2)
3 The DataFrame: Python's tabular format
47(28)
3.1 From the Series to the DataFrame
48(2)
3.2 A DataFrame with multiple columns
50(3)
3.2.1 From a list of lists
50(1)
3.2.2 From a dictionary
51(2)
3.3 Getting data from a DataFrame: Querying, masking, and slicing
53(4)
3.3.1 Getting data about the DataFrame itself
53(1)
3.3.2 Returning a single row or column
54(1)
3.3.3 Returning multiple columns
55(1)
3.3.4 Returning a single element
55(1)
3.3.5 Returning a slice of data
56(1)
3.4 Changing data at different scales
57(10)
3.4.1 Adding data to an existing DataFrame
57(3)
3.4.2 Adding one DataFrame to another
60(1)
3.4.3 Changing a column or the entire DataFrame: apply, map, and applymap
61(4)
3.4.4 Deep versus shallow copies
65(2)
3.5 Advanced topics: numpy and numpy arrays
67(5)
3.5.1 Reshaping in numpy
69(2)
3.5.2 Linear algebra and numpy
71(1)
3.6 Summary
72(1)
3.7 Further reading
72(1)
3.8 Extensions and reflections
73(2)
PART II Accessing and Converting Data
75(94)
4 File types: Getting data in
77(26)
4.1 Importing data to a DataFrame
78(2)
4.1.1 A important note on file organisation
79(1)
4.1.2 Example data
79(1)
4.2 Rectangular data: CSV
80(3)
4.2.1 Using the csv library
80(2)
4.2.2 Using the pandas CSV reader: read csv {)
82(1)
4.3 Rectangular rich data: Excel
83(3)
4.4 Nested data: JSON
86(5)
4.4.1 Loading JSON
87(4)
4.5 Nested markup languages: HTML and XML
91(9)
4.5.1 HTML: Hypertext Markup Language
91(1)
4.5.2 Wikipedia as a data source
92(1)
4.5.3 Wikipedia as HTML
92(1)
4.5.4 Using Beautiful Soup (bs4) for markup data
93(1)
4.5.5 Data scepticism
94(2)
4.5.6 XML
96(4)
4.6 Serialisation
100(1)
4.6.1 Long-term storage: Pickles and feather
101(1)
4.7 Summary
101(1)
4.8 Extensions and reflections
102(1)
5 Merging and grouping data
103(28)
5.1 Combining data across tables
104(1)
5.2 A review of adding data to a DataFrame using concat
104(6)
5.2.1 Adding rows
104(3)
5.2.2 Adding columns
107(2)
5.2.3 Multi-level indexed data
109(1)
5.2.4 Transposing a DataFrame
110(1)
5.3 The `key' to merging
110(4)
5.3.1 One-to-many versus one-to-one relationships
111(3)
5.4 Understanding joins
114(4)
5.4.1 A join as a kind of set logic
114(2)
5.4.2 Inner join
116(1)
5.4.3 Outer join
116(1)
5.4.4 Left join
117(1)
5.4.5 Right join
117(1)
5.5 Grouping and aggregating data
118(4)
5.5.1 Mean centring
120(2)
5.6 Long versus wide data
122(1)
5.6.1 Advanced reshaping
123(1)
5.7 Using SQL databases
123(5)
5.7.1 SQL basics
124(2)
5.7.2 Using SQL for aggregation and filtering
126(2)
5.8 Summary
128(1)
5.9 Further reading
129(1)
5.10 Extensions and reflections
129(2)
6 Accessing data on the World Wide Web using code
131(18)
6.1 Accessing data I: Remote access of webpages
132(6)
6.1.1 What is a URL?
133(2)
6.1.2 URL parsing
135(1)
6.1.3 What is a web request?
136(2)
6.2 An example web collection task using paging
138(5)
6.3 Other web-related issues to consider
143(1)
6.3.1 When to use your own versus someone else's program
143(1)
6.3.2 Are there ways to simulate a browser?
143(1)
6.4 Ethical issues to consider
143(3)
6.4.1 What is public data and how public?
143(1)
6.4.2 Considering data minimisation as a basic ethical principle
144(2)
6.5 Summary
146(1)
6.6 Further reading in ethics of data access and privacy
146(1)
6.7 Extensions and reflections
147(2)
7 Accessing APIs, including Twitter and Reddit
149(20)
7.1 Accessing APIs: Abstracting from the web
150(4)
7.1.1 Identifying yourself: Keys and tokens
150(2)
7.1.2 Securely using credentials
152(2)
7.2 Accessing Twitter data through the API
154(6)
7.2.1 Troubleshooting requests
155(1)
7.2.2 Access rights and Twitter
156(1)
7.2.3 Strategies for navigating Twitter's API
157(3)
7.3 Using an API wrapper to simplify data access
160(3)
7.3.1 Collecting Reddit data using praw
160(2)
7.3.2 Building a comment tree on Reddit
162(1)
7.4 Considerations for a data collection pipeline
163(2)
7.4.1 Version control systems and servers
163(1)
7.4.2 Storing data remotely
164(1)
7.4.3 Jupyter in the browser as an alternative
164(1)
7.5 APIs and epistemology: How data access can mean knowledge access
165(2)
7.6 Summary
167(1)
7.7 Further reading
167(1)
7.8 Extensions and reflections
168(1)
PART III Interpreting data: Expectations versus Observations
169(50)
8 Research questions
171(14)
8.1 Introduction
172(1)
8.1.1 What is a research question?
172(1)
8.2 Inductive, deductive, and abductive research questions
173(3)
8.2.1 Deductive research questions and the null hypothesis
174(1)
8.2.2 Abductive reasoning and the educated guess
175(1)
8.3 Avoiding description: Expectation and systematic observation in science
176(1)
8.4 Prediction versus explanation
177(2)
8.4.1 Prediction and resampling
178(1)
8.5 Linking hypotheses to approaches
179(1)
8.6 Operationalisation
180(1)
8.7 Boundedness and research questions
181(1)
8.8 Summary
182(1)
8.9 Further reading
183(1)
8.10 Extensions and reflections
183(2)
9 Visualising expectations: Comparing statistical tests and plots
185(34)
9.1 Introduction: Why show data?
186(2)
9.2 Visualising distributions
188(4)
9.2.1 Uniform distribution with histogram
190(2)
9.3 Testing a uniform distribution using a chi-squared test
192(2)
9.4 Testing a uniform distribution using regression
194(10)
9.4.1 Testing against a uniform distribution: Births in the UK
198(3)
9.4.2 Annotating a figure
201(3)
9.4.3 Normal versus skewed distributions as being interesting
204(1)
9.5 Comparing two distributions versus two groups
204(12)
9.5.1 Constraining our work based on the properties of data
205(2)
9.5.2 Two continuous distributions
207(2)
9.5.3 PRE scores
209(4)
9.5.4 Comparing distinct groups
213(2)
9.5.5 Summary
215(1)
9.6 Further reading in visualisation
216(1)
9.7 Extensions and reflections
217(2)
PART IV Social Data Science in Practice: Four Approaches
219(124)
10 Cleaning data for socially interesting features
221(28)
10.1 Data as a form of social context
223(3)
10.2 A sustained example for cleaning: Stack Exchange
226(5)
10.2.1 Quick summaries of the dataset
229(2)
10.3 Setting an index
231(1)
10.4 Handling missing data
232(1)
10.5 Cleaning numeric data
233(2)
10.6 Cleaning up web data
235(3)
10.6.1 Encoding
236(1)
10.6.2 Stripping HTML from text
236(1)
10.6.3 Extracting links from HTML
237(1)
10.7 Cleaning up lists of data
238(3)
10.8 Parsing time
241(1)
10.9 Regular expressions
242(4)
10.9.1 Further learning for regular expressions
244(1)
10.9.2 Regular expressions and ground truth
245(1)
10.10 Storing our work
246(1)
10.11 Summary
246(1)
10.12 Further reading
247(1)
10.13 Extensions and reflections
247(2)
11 Introducing natural language processing: Cleaning, summarising, and classifying text
249(20)
11.1 Reading language: Encoding text
250(2)
11.1.1 Key definitions in text
251(1)
11.2 From text to language
252(1)
11.3 A sample simple NLP workflow
253(5)
11.3.1 Preprocessing text
254(4)
11.4 NLP approaches to analysis
258(8)
11.4.1 Scoring documents with sentiment analysis
258(3)
11.4.2 Extracting keywords: TF-IDF scores
261(2)
11.4.3 Text classification
263(3)
11.5 Summary
266(1)
11.6 Further reading
267(1)
11.7 Extensions and reflections
268(1)
12 Introducing time-series data: Showing periods and trends
269(20)
12.1 Introduction: It's about time
270(1)
12.2 Dates and the datetime module
271(4)
12.2.1 Parsing time
272(2)
12.2.2 Timezones
274(1)
12.2.3 Localisation and time
274(1)
12.3 Revisiting the Movie Stack Exchange data
275(1)
12.4 Pandas Datetime Feature Extraction
276(3)
12.5 Resampling as a way to group by time period
279(2)
12.6 Slicing and the datetime index in pandas
281(2)
12.7 Moving window in data
283(3)
12.7.1 Missing data in a rolling window
284(2)
12.8 Summary
286(1)
12.9 Further explorations
287(1)
12.10 Extensions and reflections
288(1)
13 Introducing network analysis: Structuring relationships
289(26)
13.1 Introduction: The connections that signal social structure
290(1)
13.1.1 Doing network analysis in Python
291(1)
13.2 Creating network graphs
291(3)
13.2.1 Selecting a graph type
292(1)
13.2.2 Adding nodes
293(1)
13.2.3 Adding edges
293(1)
13.3 Adding attributes
294(3)
13.3.1 Working with distributions of attributes: The case of degree
295(2)
13.4 Plotting a graph
297(4)
13.4.1 Considering layouts for a graph
299(2)
13.5 Subgroups and communities in a network
301(2)
13.5.1 A goodness-of-fit metric for communities
302(1)
13.6 Creating a network from data
303(9)
13.6.1 Whole networks versus partial networks
305(1)
13.6.2 Weighted networks
306(2)
13.6.3 Bipartite networks
308(4)
13.7 Summary
312(1)
13.8 Further reading
313(1)
13.9 Extensions and reflections
314(1)
14 Introducing geographic information systems: Data across space and place
315(24)
14.1 Introduction: From space to place
316(1)
14.2 Kinds of spatial data
316(10)
14.2.1 From a sphere to a rectangle
317(2)
14.2.2 Mapping places onto spaces
319(2)
14.2.3 Introducing the geopandas GeoDataFrame
321(2)
14.2.4 Splitting the data into intervals using mapclassif y
323(2)
14.2.5 Plotting points
325(1)
14.3 Creating your own GeoDataFrame
326(8)
14.3.1 Loading your own maps
327(2)
14.3.2 Linking maps to other data sources
329(5)
14.4 Summary
334(1)
14.5 Further topics and reading
335(1)
14.6 Extensions and reflections
336(3)
15 Conclusion: There (to data science) and back again (to social science)
339(4)
References 343(10)
Index 353
Bernie Hogan (he/him/*) is a Senior Research Fellow at the Oxford Internet Institute and the current Director of the University of Oxfords MSc program in Social Data Science. Bernies work specialises in how to leverage computational tools for creative, challenging, and engaging methodologies to address social science research questions about identity, sexuality, and community. His favourite work in this area focuses on the capture and analysis of personal social networks, using both pen-and-paper tools and the recent free opensource application Network Canvas (https://www.networkcanvas.com). He also has a keen interest in how language is used to either bring people together or push them apart using large scale quantitative data. He has published over 40 peer reviewed articles and presented at over a hundred conferences, including several keynotes. His most famous work reconsidered Goffmans offline stage play metaphor of self-presentation for online life (Hogan, 2010). This piece probably helped in popularising the term algorithmic curation.

Before working at the University of Oxfords Oxford Internet Institute (https://www.oii.ox.ac.uk) he completed his undergraduate and graduate degrees in Canada. His undergraduate was in Sociology and Computer Science at Memorial University in St. Johns, Newfoundland, Canada. His graduate work was in Sociology and Knowledge Media Design at the University of Toronto. During that time Bernie interned at Microsoft Research. Bernie lives in Oxford, UK with his husband and their sprawling vinyl record collection. He tweets (and collects vinyl) under the moniker blurky because it is a very rare word that sounds like Bernie. Most of this research is available from his departmental homepage, (https://www.oii.ox.ac.uk/people/hogan) and or/his GitHub, (https://www.github.com/berniehogan).