Atjaunināt sīkdatņu piekrišanu

Machine Learning Engineering in Action [Mīkstie vāki]

  • Formāts: Paperback / softback, 300 pages, height x width x depth: 234x186x34 mm, weight: 960 g
  • Izdošanas datums: 14-Apr-2022
  • Izdevniecība: Manning Publications
  • ISBN-10: 1617298719
  • ISBN-13: 9781617298714
  • Mīkstie vāki
  • Cena: 61,73 €
  • Grāmatu piegādes laiks ir 3-4 nedēļas, ja grāmata ir uz vietas izdevniecības noliktavā. Ja izdevējam nepieciešams publicēt jaunu tirāžu, grāmatas piegāde var aizkavēties.
  • Daudzums:
  • Ielikt grozā
  • Piegādes laiks - 4-6 nedēļas
  • Pievienot vēlmju sarakstam
  • Formāts: Paperback / softback, 300 pages, height x width x depth: 234x186x34 mm, weight: 960 g
  • Izdošanas datums: 14-Apr-2022
  • Izdevniecība: Manning Publications
  • ISBN-10: 1617298719
  • ISBN-13: 9781617298714
Machine Learning Engineering in Action lays out an approach to building deployable, maintainable production machine learning systems. You will adopt software development standards that deliver better code management, and make it easier to test, scale, and even reuse your machine learning code!

You will learn how to plan and scope your project, manage cross-team logistics that avoid fatal communication failures, and design your code's architecture for improved resilience. You will even discover when not to use machine learningand the alternative approaches that might be cheaper and more effective. When you're done working through this toolbox guide, you will be able to reliably deliver cost-effective solutions for organizations big and small alike.





Following established processes and methodology maximizes the likelihood that your machine learning projects will survive and succeed for the long haul. By adopting standard, reproducible practices, your projects will be maintainable over time and easy for new team members to understand and adapt.

Recenzijas

Anice view on practical data science and machine learning. Great reading fornewbies, some interesting views for seasoned practitioners. Johannes Verwijnen 

Amust read for those looking to balance the planning and experimentationlifecycle. Jesśs Antonino Juįrez Guerrero

Apractical book to help engineers understand the workflow of machine learningprojects. Xiangbo Mao

Donot implement your ML model into production without reading this book! Lokesh Kumar

Preface xi
Acknowledgments xiii
About This Book xv
About The Author xviii
About The Cover Illustration xix
Part 1 An Introduction To Machine Learning Engineering
1 What is a machine learning engineer?
3(23)
1.1 Why ML engineering?
5(3)
1.2 The core tenets of ML engineering
8(16)
Planning
8(2)
Scoping and research
10(3)
Experimentation
13(2)
Development
15(3)
Deployment
18(3)
Evaluation
21(3)
1.3 The goals of ML engineering
24(2)
2 Your data science could use some engineering
26(12)
2.1 Augmenting a complex profession with processes to increase project success
27(2)
2.2 A foundation of simplicity
29(2)
2.3 Co-opting principles of Agile software engineering
31(4)
Communication and cooperation
33(2)
Embracing and expecting change
35(1)
2.4 The foundation of ML engineering
35(3)
3 Before you model: Planning and scoping a project
38(38)
3.1 Planning: You want me to predict what?!
42(18)
Basic planning for a project
47(6)
That first meeting
53(3)
Plan for demos-lots of demos
56(2)
Experimentation by solution building: Wasting time for pride's sake
58(2)
3.2 Experimental scoping: Setting expectations and boundaries
60(16)
What is experimental scoping?
61(1)
Experimental scoping for the ML team: Research
62(2)
Experimental scoping for the ML team: Experimentation
64(12)
4 Before you model: Communication and logistics of projects
76(48)
4.1 Communication: Defining the problem
79(22)
Understanding the problem
80(14)
Setting critical discussion boundaries
94(7)
4.2 Don't waste our time: Meeting with cross-functional teams
101(7)
Experimental update meeting: Do we know what we're doing here?
102(1)
SME review/prototype review: Can we solve this?
103(2)
Development progress review(s): Is this thing going to work?
105(1)
MVP review: Did you build what we asked for?
106(1)
Preproduction review: We really hope we didn't screw this up
107(1)
4.3 Setting limits on your experimentation
108(8)
Set a time limit
109(3)
Can you put this into production? Would you want to maintain it?
112(1)
TDD vs. RDD vs. PDD vs. CDD for ML projects
113(3)
4.4 Planning for business rules chaos
116(4)
Embracing chaos by planning for it
117(2)
Human-in-the-loop design
119(1)
What's your backup plan?
119(1)
4.5 Talking about results
120(4)
5 Experimentation in action: Planning and researching an ML project
124(35)
5.1 Planning experiments
126(11)
Perform basic research and planning
126(4)
Forget the blogs-read the API docs
130(5)
Draw straws for an internal hackathon
135(1)
Level the playing field
136(1)
5.2 Performing experimental prep work
137(22)
Performing data analysis
139(7)
Moving from script to reusable code
146(8)
One last note on building reusable code for experimentation
154(5)
6 Experimentation in action: Testing and evaluating a project
159(38)
6.1 Testing ideas
162(35)
Setting guidelines in code
163(9)
Running quick forecasting tests
172(18)
Whittling down the possibilities
190(1)
Evaluating prototypes properly
191(2)
Making a call on the direction to go in
193(3)
So...what's next?
196(1)
7 Experimentation in action: Moving from prototype to MVP
197(31)
7.1 Tuning: Automating the annoying stuff
199(16)
Tuning options
201(5)
Hyperopt primer
206(2)
Using Hyperopt to tune a complex forecasting problem
208(7)
7.2 Choosing the right tech for the platform and the team
215(13)
Why Spark?
216(2)
Handling tuning from the driver with SparkTrials
218(4)
Handling tuning from the workers with a pandas_udf
222(4)
Using new paradigms for teams: Platforms and technologies
226(2)
8 Experimentation in action: Finalizing an MVP with MLflow and runtime optimization
228(15)
8.1 Logging: Code, metrics, and results
229(8)
MLflow tracking
230(2)
Please stop printing and log your information
232(2)
Version control, branch strategies, and working with others
234(3)
8.2 Scalability and concurrency
237(8)
What is concurrency?
239(1)
What you can (and can't) run asynchronously
239(4)
Part 2 Preparing For Production: Creating Maintainable ML 243(156)
9 Modularity for ML: Writing testable and legible code
245(24)
9.1 Understanding monolithic scripts and why they are bad
248(7)
How monoliths come into being
249(1)
Walls of text
249(3)
Considerations for monolithic scripts
252(3)
9.2 Debugging walls of text
255(2)
9.3 Designing modular ML code
257(7)
9.4 Using test-driven development for ML
264(5)
10 Standards of coding and creating maintainable ML code
269(31)
10.1 ML code smells
270(3)
10.2 Naming, structure, and code architecture
273(5)
Naming conventions and structure
273(1)
Trying to be too clever
274(2)
Code architecture
276(2)
10.3 Tuple unpacking and maintainable alternatives
278(4)
Tuple unpacking example
278(2)
A solid alternative to tuple unpacking
280(2)
10.4 Blind to issues: Eating exceptions and other bad practices
282(6)
Try/catch with the precision of a shotgun
283(2)
Exception handling with laser precision
285(1)
Handling errors the right way
286(2)
10.5 Use of global mutable objects
288(4)
How mutability can burn you
288(2)
Encapsulation to prevent mutable side effects
290(2)
10.6 Excessively nested logic
292(8)
11 Model measurement and why it's so important
300(34)
11.1 Measuring model attribution
302(14)
Measuring prediction performance
302(10)
Clarifying correlation vs. causation
312(4)
11.2 Leveraging A/B testing for attribution calculations
316(18)
A/B testing 101
317(2)
Evaluating continuous metrics
319(6)
Using alternative displays and tests
325(4)
Evaluating categorical metrics
329(5)
12 Holding on to your gains by watching for drift
334(19)
12.1 Detecting drift
335(12)
What influences drift?
336(11)
12.2 Responding to drift
347(6)
What can we do about it?
348(2)
Responding to drift
350(3)
13 ML development hubris
353(46)
13.1 Elegant complexity vs. overengineering
355(9)
Lightweight scripted style (imperative)
357(4)
An overengineered mess
361(3)
13.2 Unintentional obfuscation: Could you read this if you didn't write it?
364(15)
The flavors of obfuscation
365(13)
Troublesome coding habits recap
378(1)
13.3 Premature generalization, premature optimization, and other bad ways to show how smart you are
379(11)
Generalization and frameworks: Avoid them until you can't
379(3)
Optimizing too early
382(8)
13.4 Do you really want to be the canary? Alpha testing and the dangers of the open source coal mine
390(3)
13.5 Technology-driven development vs. solution-driven development
393(6)
Part 3 Developing Production Machine
14 Writing production code
399(39)
14.1 Have you met your data?
401(11)
Make sure you have the data
403(1)
Check your data provenance
404(4)
Find a source of truth and align on it
408(2)
Don't embed data cleansing into your production code
410(2)
14.2 Monitoring your features
412(5)
14.3 Monitoring everything else in the model life cycle
417(4)
14.4 Keeping things as simple as possible
421(5)
Simplicity in problem definitions
423(1)
Simplicity in implementation
424(2)
14.5 Wireframing ML projects
426(6)
14.6 Avoiding cargo cult ML behavior
432(6)
15 Quality and acceptance testing
438(33)
15.1 Data consistency
439(8)
Training and inference skew
440(1)
A brief intro to feature stores
441(1)
Process over technology
442(3)
The dangers of a data silo
445(2)
15.2 Fallbacks and cold starts
447(6)
Leaning heavily on prior art
448(2)
Cold-start woes
450(3)
15.3 End user vs. internal use testing
453(7)
Biased testing
456(1)
Dogfooding
457(2)
SME evaluation
459(1)
15.4 Model interpretability
460(11)
Shapley additive explanations
461(2)
Using shag
463(8)
16 Production infrastructure
471(39)
16.1 Artifact management
472(10)
MLflow's model registry
474(2)
Interfacing with the model registry
476(6)
16.2 Feature stores
482(8)
What a feature store is used for
483(2)
Using a feature store
485(4)
Evaluating a feature store
489(1)
16.3 Prediction serving architecture
490(20)
Determining serving needs
493(7)
Bulk external delivery
500(2)
Microbatch streaming
502(1)
Real-time server-side
503(4)
Integrated models (edge deployment)
507(3)
Appendix A Big O(no) and how to think about runtime performance 510(30)
Appendix B Setting up a development environment 540(7)
Index 547
Ben Wilson has worked as a professional data scientist for more than ten years. He currently works as a resident solutions architect at Databricks,where he focuses on machine learning production architecture with companies ranging from 5-person startups to global Fortune 100. Ben is the creator and lead developer of the Databricks Labs AutoML project, a Scala-and Python-based toolkit that simplifies machine learning feature engineering, model tuning, and pipeline-enabled modelling.