Atjaunināt sīkdatņu piekrišanu

Data Wrangling with R 1st ed. 2016 [Mīkstie vāki]

3.77/5 (16 ratings by Goodreads)
  • Formāts: Paperback / softback, 238 pages, height x width: 235x155 mm, weight: 3869 g, 10 Illustrations, color; 14 Illustrations, black and white; XII, 238 p. 24 illus., 10 illus. in color., 1 Paperback / softback
  • Sērija : Use R!
  • Izdošanas datums: 23-Nov-2016
  • Izdevniecība: Springer International Publishing AG
  • ISBN-10: 3319455982
  • ISBN-13: 9783319455983
Citas grāmatas par šo tēmu:
  • Mīkstie vāki
  • Cena: 78,14 €*
  • * ši ir gala cena, t.i., netiek piemērotas nekādas papildus atlaides
  • Standarta cena: 91,94 €
  • Ietaupiet 15%
  • Grāmatu piegādes laiks ir 3-4 nedēļas, ja grāmata ir uz vietas izdevniecības noliktavā. Ja izdevējam nepieciešams publicēt jaunu tirāžu, grāmatas piegāde var aizkavēties.
  • Daudzums:
  • Ielikt grozā
  • Piegādes laiks - 4-6 nedēļas
  • Pievienot vēlmju sarakstam
  • Formāts: Paperback / softback, 238 pages, height x width: 235x155 mm, weight: 3869 g, 10 Illustrations, color; 14 Illustrations, black and white; XII, 238 p. 24 illus., 10 illus. in color., 1 Paperback / softback
  • Sērija : Use R!
  • Izdošanas datums: 23-Nov-2016
  • Izdevniecība: Springer International Publishing AG
  • ISBN-10: 3319455982
  • ISBN-13: 9783319455983
Citas grāmatas par šo tēmu:
This guide for practicing statisticians, data scientists, and R users and programmers will teach the essentials of preprocessing: data leveraging the R programming language to easily and quickly turn noisy data into usable pieces of information. Data wrangling, which is also commonly referred to as data munging, transformation, manipulation, janitor work, etc., can be a painstakingly laborious process. Roughly 80% of data analysis is spent on cleaning and preparing data; however, being a prerequisite to the rest of the data analysis workflow (visualization, analysis, reporting), it is essential that one become fluent and efficient in data wrangling techniques.This book will guide the user through the data wrangling process via a step-by-step tutorial approach and provide a solid foundation working with data in R. The author"s goal is to teach the user how to easily wrangle data in order to spend more time on understanding the content of the data. By the end of the book, the user

will have learned:How to work with different types of data such as numerics, characters, regular expressions, factors, and datesThe difference between different data structures and how to create, add additional components to, and subset each data structureHow to acquire and parse data from locations previously inaccessibleHow to develop functions and use loop control structures to reduce code redundancyHow to use pipe operators to simplify code and make it more readableHow to reshape the layout of data and manipulate, summarize, and join data sets 

1. Preface 2. Introduction a. The Role of Data Wrangling i. Introduction to R 1. Open Source 2. Flexibility 3. Community ii. R Basics 1. Assignment & Evaluation 2. Vectorization 3. Getting help 4. Workspace 5. Working with packages 6. Style guide  3. Working with Different Types of Data in R a. Dealing with Numbers i. Integer vs. Double ii. Generating sequence of non-random numbers iii. Generating sequence of random numbers iv. Setting the seed for reproducible random numbers v. Comparing numeric values vi. Rounding numbers b. Dealing with Character Strings i. Character string basics ii. String manipulation with base R iii. String manipulation with stringr iv. Set operatons for character strings c. Dealing with Regular Expressions i. Regex Syntax ii. Regex Functions iii. Additional resources d. Dealing with Factors i. Creating, converting & inspecting factors ii. Ordering levels iii. Revalue levels iv. Dropping levels e. Dealing with Dates i. Getting current date & time ii.

Converting strings to dates iii. Extract & manipulate parts of dates iv. Creating date sequences v. Calculations with dates vi. Dealing with time zones & daylight savings vii. Additional resources  % i. Pipe (%%) Operator ii. Additional Functions iii. Additional Pipe Operators iv. Additional Resources  7. Shaping & Transforming Your Data with R a. Reshaping Your Data with tidyr i. Making wide data long ii. Making long data wide iii. Splitting a single column into multiple columns iv. Combining multiple columns into a single column v. Additional tidyr functions vi. Sequencing your tidyr operations vii. Additional resources b. Transforming Your Data with dplyr i. Selecting variables of interest ii. Filtering rows iii. Grouping data by categorical variables iv. Performing summary statistics on variables v. Arranging variables by value vi. Joining datasets vii. Creating new variables viii. Additional resources
Part I Introduction
1 The Role of Data Wrangling
3(4)
2 Introduction to R
7(4)
2.1 Open Source
7(1)
2.2 Flexibility
8(1)
2.3 Community
9(2)
3 The Basics
11(20)
3.1 Installing R and RStudio
11(2)
3.2 Understanding the Console
13(3)
3.2.1 Script Editor
13(1)
3.2.2 Workspace Environment
13(2)
3.2.3 Console
15(1)
3.2.4 Misc. Displays
15(1)
3.2.5 Workspace Options and Shortcuts
15(1)
3.3 Getting Help
16(1)
3.3.1 General Help
16(1)
3.3.2 Getting Help on Functions
16(1)
3.3.3 Getting Help from the Web
17(1)
3.4 Working with Packages
17(2)
3.4.1 Installing Packages
18(1)
3.4.2 Loading Packages
18(1)
3.4.3 Getting Help on Packages
19(1)
3.4.4 Useful Packages
19(1)
3.5 Assignment and Evaluation
19(2)
3.6 R as a Calculator
21(3)
3.6.1 Vectorization
22(2)
3.7 Styling Guide
24(7)
3.7.1 Notation and Naming
24(1)
3.7.2 Organization
25(1)
3.7.3 Syntax
26(5)
Part II Working with Different Types of Data in R 4 Dealing with Numbers
31(50)
4.1 Integer vs. Double
31(1)
4.1.1 Creating Integer and Double Vectors
31(1)
4.1.2 Converting Between Integer and Double Values
32(1)
4.2 Generating Sequence of Non-random Numbers
32(1)
4.2.1 Specifying Numbers Within a Sequence
32(1)
4.2.2 Generating Regular Sequences
33(1)
4.3 Generating Sequence of Random Numbers
33(4)
4.3.1 Uniform Numbers
34(1)
4.3.2 Normal Distribution Numbers
34(1)
4.3.3 Binomial Distribution Numbers
35(1)
4.3.4 Poisson Distribution Numbers
36(1)
4.3.5 Exponential Distribution Numbers
36(1)
4.3.6 Gamma Distribution Numbers
37(1)
4.4 Setting the Seed for Reproducible Random Numbers
37(1)
4.5 Comparing Numeric Values
37(2)
4.5.1 Comparison Operators
38(1)
4.5.2 Exact Equality
39(1)
4.5.3 Floating Point Comparison
39(1)
4.6 Rounding Numbers
39(2)
5 Dealing with Character Strings
41(14)
5.1 Character String Basics
41(5)
5.1.1 Creating Strings
41(1)
5.1.2 Converting to Strings
42(1)
5.1.3 Printing Strings
43(2)
5.1.4 Counting String Elements and Characters
45(1)
5.2 String Manipulation with Base R
46(3)
5.2.1 Case Conversion
46(1)
5.2.2 Simple Character Replacement
46(1)
5.2.3 String Abbreviations
47(1)
5.2.4 Extract/Replace Substrings
47(2)
5.3 String Manipulation with stringr
49(3)
5.3.1 Basic Operations
49(2)
5.3.2 Duplicate Characters Within a String
51(1)
5.3.3 Remove Leading and Trailing Whitespace
51(1)
5.3.4 Pad a String with Whitespace
52(1)
5.4 Set Operatons for Character Strings
52(3)
5.4.1 Set Union
52(1)
5.4.2 Set Intersection
52(1)
5.4.3 Identifying Different Elements
53(1)
5.4.4 Testing for Element Equality
53(1)
5.4.5 Testing for Exact Equality
53(1)
5.4.6 Identifying If Elements Are Contained in a String
54(1)
5.4.7 Sorting a String
54(1)
6 Dealing with Regular Expressions
55(12)
6.1 Regex Syntax
55(5)
6.1.1 Metacharacters
56(1)
6.1.2 Sequences
56(1)
6.1.3 Character Classes
57(1)
6.1.4 POSIX Character Classes
58(1)
6.1.5 Quantifiers
59(1)
6.2 Regex Functions
60(6)
6.2.1 Main Regex Functions in R
60(3)
6.2.2 Regex Functions in stringr
63(3)
6.3 Additional Resources
66(1)
7 Dealing with Factors
67(4)
7.1 Creating, Converting and Inspecting Factors
67(1)
7.2 Ordering Levels
68(1)
7.3 Revalue Levels
69(1)
7.4 Dropping Levels
69(2)
8 Dealing with Dates
71(10)
8.1 Getting Current Date and Time
71(1)
8.2 Converting Strings to Dates
72(1)
8.2.1 Convert Strings to Dates
72(1)
8.2.2 Create Dates by Merging Data
73(1)
8.3 Extract and Manipulate Parts of Dates
73(2)
8.4 Creating Date Sequences
75(1)
8.5 Calculations with Dates
76(1)
8.6 Dealing with Time Zones and Daylight Savings
77(1)
8.7 Additional Resources
78(3)
Part III Managing Data Structures in R
9 Data Structure Basics
81(4)
9.1 Identifying the Structure
81(1)
9.2 Attributes
82(3)
10 Managing Vectors
85(6)
10.1 Creating Vectors
85(1)
10.2 Adding On To Vectors
86(1)
10.3 Adding Attributes to Vectors
87(1)
10.4 Subsetting Vectors
88(3)
10.4.1 Subsetting with Positive Integers
88(1)
10.4.2 Subsetting with Negative Integers
88(1)
10.4.3 Subsetting with Logical Values
89(1)
10.4.4 Subsetting with Names
89(1)
10.4.5 Simplifying vs. Preserving
89(2)
11 Managing Lists
91(8)
11.1 Creating Lists
91(1)
11.2 Adding On To Lists
92(1)
11.3 Adding Attributes to Lists
93(2)
11.4 Subsetting Lists
95(4)
11.4.1 Subset List and Preserve Output as a List
95(1)
11.4.2 Subset List and Simplify Output
96(1)
11.4.3 Subset List to Get Elements Out of a List
96(1)
11.4.4 Subset List with a Nested List
96(3)
12 Managing Matrices
99(6)
12.1 Creating Matrices
99(1)
12.2 Adding On To Matrices
100(1)
12.3 Adding Attributes to Matrices
101(2)
12.4 Subsetting Matrices
103(2)
13 Managing Data Frames
105(8)
13.1 Creating Data Frames
105(2)
13.2 Adding On To Data Frames
107(2)
13.3 Adding Attributes to Data Frames
109(2)
13.4 Subsetting Data Frames
111(2)
14 Dealing with Missing Values
113(6)
14.1 Testing for Missing Values
113(1)
14.2 Recoding Missing Values
114(1)
14.3 Excluding Missing Values
114(5)
Part IV Importing, Scraping, and Exporting Data with R
15 Importing Data
119(10)
15.1 Reading Data from Text Files
119(4)
15.1.1 Base R Functions
119(3)
15.1.2 readr Package
122(1)
15.2 Reading Data from Excel Files
123(4)
15.2.1 xlsx Package
123(2)
15.2.2 readxl Package
125(2)
15.3 Load Data from Saved R Object File
127(1)
15.4 Additional Resources
127(2)
16 Scraping Data
129(34)
16.1 Importing Tabular and Excel Files Stored Online
129(5)
16.2 Scraping HTML Text
134(9)
16.2.1 Scraping HTML Nodes
135(4)
16.2.2 Scraping Specific HTML Nodes
139(2)
16.2.3 Cleaning Up
141(2)
16.3 Scraping HTML Table Data
143(7)
16.3.1 Scraping HTML Tables with rvest
143(3)
16.3.2 Scraping HTML Tables with XML
146(4)
16.4 Working with APIs
150(12)
16.4.1 Prerequisites?
150(1)
16.4.2 Existing API Packages
151(7)
16.4.3 httr for All Things Else
158(4)
16.5 Additional Resources
162(1)
17 Exporting Data
163(10)
17.1 Writing Data to Text Files
163(2)
17.1.1 Base R Functions
163(1)
17.1.2 readr Package
164(1)
17.2 Writing Data to Excel Files
165(4)
17.2.1 xlsx Package
165(2)
17.2.2 r2excel Package
167(2)
17.3 Saving Data as an R Object File
169(1)
17.4 Additional Resources
169(4)
Part V Creating Efficient and Readable Code in R
18 Functions
173(10)
18.1 Function Components
173(1)
18.2 Arguments
174(1)
18.3 Scoping Rules
175(2)
18.4 Lazy Evaluation
177(1)
18.5 Returning Multiple Outputs from a Function
177(1)
18.6 Dealing with Invalid Parameters
178(1)
18.7 Saving and Sourcing Functions
179(2)
18.8 Additional Resources
181(2)
19 Loop Control Statements
183(16)
19.1 Basic Control Statements (i.e. if, for, while, etc.)
183(7)
19.1.1 If Statement
183(1)
19.1.2 if... else Statement
184(2)
19.1.3 For Loop
186(1)
19.1.4 While Loop
187(2)
19.1.5 Repeat Loop
189(1)
19.1.6 Break Function to Exit a Loop
189(1)
19.1.7 Next Function to Skip an Iteration in a Loop
190(1)
19.2 Apply Family
190(5)
19.2.1 apply() for Matrices and Data Frames
191(1)
19.2.2 lapply() for Lists... Output as a List
192(1)
19.2.3 sapply() for Lists... Output Simplified
193(1)
19.2.4 tapply() for Vectors
194(1)
19.3 Other Useful "Loop-Like" Functions
195(2)
19.4 Additional Resources
197(2)
20 Simplify Your Code with %>%
199(12)
20.1 Pipe (%>%) Operator
199(4)
20.1.1 Nested Option
200(1)
20.1.2 Multiple Object Option
200(1)
20.1.3 %>% Option
201(2)
20.2 Additional Functions
203(1)
20.3 Additional Pipe Operators
204(3)
20.4 Additional Resources
207(4)
Part VI Shaping and Transforming Your Data with R
21 Reshaping Your Data with tidyr
211(8)
21.1 Making Wide Data long
212(1)
21.2 Making Long Data wide
213(1)
21.3 Splitting a Single Column into Multiple Columns
213(1)
21.4 Combining Multiple Columns into a Single Column
214(1)
21.5 Additional tidyr Functions
215(2)
21.6 Sequencing Your tidyr Operations
217(1)
21.7 Additional Resources
218(1)
22 Transforming Your Data with dplyr
219(14)
22.1 Selecting Variables of Interest
220(1)
22.2 Filtering Rows
221(1)
22.3 Grouping Data by Categorical Variables
222(1)
22.4 Performing Summary Statistics on Variables
223(2)
22.5 Arranging Variables by Value
225(1)
22.6 Joining Data Sets
226(2)
22.7 Creating New Variables
228(4)
22.8 Additional Resources
232(1)
Index 233
Brad Boehmke, Ph.D., is an Operations Research Analyst at Headquarters Air Force Materiel Command, Studies and Analyses Division.  He is also Assistant Professor in the Operational Sciences Department at the Air Force Institute of Technology.  Dr. Boehmke's research interests are in the areas of cost analysis, economic modeling, decision analysis, and developing applied modeling applications through the R statistical language.