Preface |
|
xiii | |
Acknowledgments |
|
xvi | |
About This Book |
|
xviii | |
About The Author |
|
xxiii | |
About The Cover Illustration |
|
xxiv | |
|
|
1 | (22) |
|
1.1 Defining the styles of telemetry |
|
|
4 | (7) |
|
Defining centralized logging |
|
|
4 | (2) |
|
|
6 | (2) |
|
Defining distributed tracing |
|
|
8 | (2) |
|
|
10 | (1) |
|
1.2 How telemetry is consumed by different teams |
|
|
11 | (4) |
|
Telemetry use by Operations, DevOps, and SRE teams |
|
|
11 | (1) |
|
Telemetry use by Security and Compliance teams |
|
|
12 | (1) |
|
Telemetry use by Software Engineering and SRE teams |
|
|
13 | (1) |
|
Telemetry use by Customer Support teams |
|
|
13 | (1) |
|
Telemetry use by business intelligence |
|
|
14 | (1) |
|
1.3 Challenges facing telemetry systems |
|
|
15 | (5) |
|
Chronic underinvestment harms decision-making |
|
|
15 | (2) |
|
Diverse needs resist standardization |
|
|
17 | (1) |
|
Information spills and cleaning them up to avoid legal problems |
|
|
18 | (1) |
|
Court orders break your assumptions |
|
|
19 | (1) |
|
|
20 | (3) |
Part 1 Telemetry System Architecture |
|
23 | (170) |
|
2 The Emitting stage: Creating and submitting telemetry |
|
|
27 | (25) |
|
2.1 Emitting from production code |
|
|
29 | (14) |
|
Emitting telemetry into a log file |
|
|
31 | (2) |
|
Emitting telemetry into the system log |
|
|
33 | (4) |
|
Emitting telemetry into standard output |
|
|
37 | (3) |
|
Formatting telemetry for emissions |
|
|
40 | (3) |
|
2.2 Emitting from hardware |
|
|
43 | (4) |
|
|
43 | (2) |
|
Ingesting telemetry from a Cisco ASA firewall |
|
|
45 | (2) |
|
2.3 Emitting from as-a-Service systems |
|
|
47 | (5) |
|
Emitting events from SaaS systems |
|
|
47 | (2) |
|
Emitting events from IaaS systems |
|
|
49 | (3) |
|
3 The Shipping stage: Moving and storing telemetry |
|
|
52 | (22) |
|
3.1 Emitter/shipper functions, telemetry from production code |
|
|
54 | (15) |
|
Shipping directly into storage |
|
|
54 | (3) |
|
Shipping through queues and streams |
|
|
57 | (10) |
|
|
67 | (2) |
|
3.2 Shipping between SaaS systems |
|
|
69 | (2) |
|
3.3 Tipping points in Shipping-stage architecture |
|
|
71 | (3) |
|
4 The Shipping stage: Unifying diverse telemetry formats |
|
|
74 | (33) |
|
4.1 Shipping locally-emitted telemetry |
|
|
75 | (8) |
|
Shipping telemetry from a log file |
|
|
76 | (3) |
|
Shipping telemetry from the system logger |
|
|
79 | (2) |
|
Shipping telemetry from standard output |
|
|
81 | (2) |
|
4.2 Unifying diverse emitting formats |
|
|
83 | (24) |
|
Encoding telemetry into strings |
|
|
84 | (5) |
|
Picking a shipping format |
|
|
89 | (11) |
|
Converting Syslog to JSON or other object-encoding formats |
|
|
100 | (4) |
|
Designing with cardinality in mind |
|
|
104 | (3) |
|
5 The Presentation stage: Displaying telemetry |
|
|
107 | (31) |
|
5.1 Displaying telemetry in metrics systems |
|
|
109 | (9) |
|
Making pretty pictures with telemetry |
|
|
110 | (2) |
|
Feeding the graphs with aggregation functions |
|
|
112 | (2) |
|
Using aggregations with pdf_pages |
|
|
114 | (4) |
|
5.2 Displaying telemetry in centralized logging systems |
|
|
118 | (9) |
|
Selecting needed features in a display system for centralized logging |
|
|
119 | (2) |
|
Demonstrating centralized logging display |
|
|
121 | (6) |
|
5.3 Displaying telemetry in security systems |
|
|
127 | (4) |
|
5.4 Displaying telemetry distributed tracing systems |
|
|
131 | (4) |
|
5.5 Displaying telemetry in large organizations |
|
|
135 | (3) |
|
6 Marking up and enriching telemetry |
|
|
138 | (36) |
|
6.1 Markup in the Emitting stage |
|
|
141 | (5) |
|
6.2 Markup and enrichment in the Shipping stage |
|
|
146 | (16) |
|
Applying context-related telemetry in the Shipping stage |
|
|
147 | (3) |
|
Extracting and enriching telemetry in-flight |
|
|
150 | (6) |
|
Converting field types during the Shipping stage |
|
|
156 | (6) |
|
6.3 Enrichment in the Presentation stage |
|
|
162 | (3) |
|
6.4 How telemetry style affects markup and enrichment |
|
|
165 | (9) |
|
Markup and enrichment with centralized logging |
|
|
166 | (1) |
|
Markup and enrichment with SIEM systems |
|
|
167 | (2) |
|
Markup and enrichment with metrics |
|
|
169 | (1) |
|
Markup and enrichment with distributed tracing systems |
|
|
170 | (4) |
|
|
174 | (19) |
|
7.1 How multitenant architectures come about |
|
|
175 | (5) |
|
Evolving multitenancy in an early-stage startup |
|
|
175 | (1) |
|
Evolving multitenancy in a culture of free sharing |
|
|
176 | (2) |
|
Evolving multitenancy in a culture of strong separation |
|
|
178 | (2) |
|
7.2 Designing multitenant telemetry systems |
|
|
180 | (15) |
|
Multitenancy in the Shipping stage |
|
|
181 | (8) |
|
Multitenancy in the Presentation stage |
|
|
189 | (4) |
Part 2 Use Cases Revisited: Applying Architecture Concepts |
|
193 | (82) |
|
8 Growing cloud-based startup |
|
|
195 | (31) |
|
8.1 Telemetry at the small-company stage |
|
|
197 | (4) |
|
Describing the small company's telemetry system |
|
|
198 | (1) |
|
Analyzing the small company's telemetry system |
|
|
199 | (2) |
|
8.2 Telemetry at the medium-size company stage |
|
|
201 | (5) |
|
Describing the medium-size company's telemetry system |
|
|
201 | (3) |
|
Analyzing the medium-size company's telemetry system |
|
|
204 | (2) |
|
8.3 Telemetry at the large-company stage |
|
|
206 | (7) |
|
Describing the large company's telemetry system |
|
|
209 | (1) |
|
Analyzing the large company's telemetry system |
|
|
210 | (3) |
|
8.4 Telemetry at the enterprise stage |
|
|
213 | (10) |
|
8.5 Looking back at all this growth |
|
|
223 | (3) |
|
|
226 | (22) |
|
9.1 Telemetry use in small organizations |
|
|
227 | (3) |
|
9.2 Telemetry use in medium-size organizations |
|
|
230 | (3) |
|
9.3 Telemetry use in large organizations |
|
|
233 | (6) |
|
9.4 Telemetry use in enterprise organizations |
|
|
239 | (9) |
|
10 Long-established business IT |
|
|
248 | (27) |
|
10.1 Telemetry use in medium-size organizations |
|
|
250 | (5) |
|
Telemetry use in office IT |
|
|
251 | (3) |
|
Telemetry use in production systems |
|
|
254 | (1) |
|
10.2 Telemetry use in large organizations |
|
|
255 | (7) |
|
10.3 Telemetry use in global organizations |
|
|
262 | (15) |
|
Telemetry use in the Booking and Passenger Manifest department |
|
|
265 | (4) |
|
Telemetry use in the Loyalty Programs department |
|
|
269 | (6) |
Part 3 Techniques For Handling Telemetry |
|
275 | (212) |
|
11 Optimizing for regular expressions at scale |
|
|
277 | (30) |
|
11.1 Anchoring expressions for speed |
|
|
279 | (6) |
|
11.2 Building expressions to fail fast |
|
|
285 | (5) |
|
11.3 Digging into the Cisco ASA firewall telemetry |
|
|
290 | (7) |
|
11.4 Refining emissions to speed regular-expression performance |
|
|
297 | (8) |
|
11.5 Additional regular-expression resources |
|
|
305 | (2) |
|
12 Standardized logging and event formats |
|
|
307 | (28) |
|
12.1 Implementing structured logging in your code |
|
|
309 | (5) |
|
12.2 Implementing standards in your code |
|
|
314 | (11) |
|
12.3 Implementing standards in the Shipping stage |
|
|
325 | (10) |
|
13 Using more nonfile emitting techniques |
|
|
335 | (22) |
|
13.1 Designing for socket- and datagram-based emitters |
|
|
336 | (8) |
|
13.2 Emitting and shipping for container- and serverless-based code |
|
|
344 | (6) |
|
Emitting and shipping from containerd-based code |
|
|
345 | (2) |
|
Emitting and shipping from serverless-based code |
|
|
347 | (3) |
|
13.3 Encrypting UDP-based telemetry |
|
|
350 | (7) |
|
14 Managing cardinality in telemetry |
|
|
357 | (27) |
|
14.1 Identifying cardinality problems |
|
|
359 | (7) |
|
Cardinality in time-series databases |
|
|
360 | (4) |
|
Cardinality in logging databases |
|
|
364 | (2) |
|
14.2 Lowering the cost of cardinality |
|
|
366 | (18) |
|
Use logging standards to contain cardinality |
|
|
366 | (7) |
|
Using storage-side methods to tame cardinality |
|
|
373 | (6) |
|
Make cardinality someone else's problem |
|
|
379 | (5) |
|
15 Ensuring telemetry integrity |
|
|
384 | (27) |
|
15.1 Getting telemetry out of reach of an attacker |
|
|
386 | (8) |
|
Move telemetry too fast to catch |
|
|
386 | (3) |
|
Use ACLs to enforce write- only telemetry |
|
|
389 | (4) |
|
Durable telemetry when using SaaS providers |
|
|
393 | (1) |
|
15.2 Making telemetry harder to mess with |
|
|
394 | (17) |
|
Using access control requirements to defend against attacks |
|
|
395 | (2) |
|
Ensuring configuration integrity in your telemetry systems |
|
|
397 | (3) |
|
|
400 | (11) |
|
16 Redacting and reprocessing telemetry |
|
|
411 | (28) |
|
16.1 Identifying toxic data and where it comes from |
|
|
412 | (4) |
|
16.2 Redacting toxic information spills |
|
|
416 | (7) |
|
16.3 Reprocessing telemetry to support upgrades |
|
|
423 | (6) |
|
16.4 Isolating toxic data to reduce cleanup costs |
|
|
429 | (10) |
|
17 Building policies for telemetry retention and aggregation |
|
|
439 | (24) |
|
17.1 Creating a retention policy |
|
|
440 | (8) |
|
Building a policy for centralized logging |
|
|
443 | (2) |
|
Building a policy for metrics |
|
|
445 | (1) |
|
Building a policy for distributed tracing |
|
|
446 | (1) |
|
Building a policy for STEM systems |
|
|
447 | (1) |
|
17.2 Creating an aggregation policy |
|
|
448 | (9) |
|
17.3 Using sampling to reduce costs and increase retention |
|
|
457 | (6) |
|
18 Surviving legal processes |
|
|
463 | (24) |
|
18.1 Defining the eDiscovery process |
|
|
466 | (3) |
|
18.2 Dealing with records-retention requests |
|
|
469 | (8) |
|
Examining an ELK-based centralized logging system |
|
|
471 | (3) |
|
Examining a Sumo Logic-based centralized logging system |
|
|
474 | (3) |
|
18.3 Dealing with document-production requests |
|
|
477 | (5) |
|
Telemetry in the collection phase |
|
|
478 | (2) |
|
Telemetry in the review phase |
|
|
480 | (1) |
|
Telemetry in the production phase |
|
|
481 | (1) |
|
18.4 Working with lawyers |
|
|
482 | (5) |
Appendix A Telemetry Storage Systems |
|
487 | (12) |
Appendix B Recommendation Checklist Reference |
|
499 | (21) |
Appendix C Exercise Answers |
|
520 | (5) |
Index |
|
525 | |