A common sense guide to data structures and algorithms pdf github

Some copies of this book have a printing error that causes the figures to be unreadable. If you have received one of these copies, please contact the Pragmatic Bookshelf at , and we will replace it for you.

Algorithms and data structures are much more than abstract concepts. Mastering them enables you to write code that runs faster and more efficiently, which is particularly important for today’s web and mobile apps. This book takes a practical approach to data structures and algorithms, with techniques and real-world scenarios that you can use in your daily production code. Graphics and examples make these computer science concepts understandable and relevant. You can use these techniques with any language; examples in the book are in JavaScript, Python, and Ruby.

Use Big O notation, the primary tool for evaluating algorithms, to measure and articulate the efficiency of your code, and modify your algorithm to make it faster. Find out how your choice of arrays, linked lists, and hash tables can dramatically affect the code you write. Use recursion to solve tricky problems and create algorithms that run exponentially faster than the alternatives. Dig into advanced data structures such as binary trees and graphs to help scale specialized applications such as social networks and mapping software. You’ll even encounter a single keyword that can give your code a turbo boost. Jay Wengrow brings to this book the key teaching practices he developed as a web development bootcamp founder and educator.

Use these techniques today to make your code faster and more scalable.

Table of contents :
Cover
Table of Contents
Preface
Who Is This Book For?
What’s New in the Second Edition
What’s in This Book?
How to Read This Book
Code Examples
Online Resources
Acknowledgments
Connecting
1. Why Data Structures Matter
Data Structures
The Array: The Foundational Data Structure
Measuring Speed
Reading
Searching
Insertion
Deletion
Sets: How a Single Rule Can Affect Efficiency
Wrapping Up
Exercises
2. Why Algorithms Matter
Ordered Arrays
Searching an Ordered Array
Binary Search
Binary Search vs. Linear Search
Wrapping Up
Exercises
3. O Yes! Big O Notation
Big O: How Many Steps Relative to N Elements?
The Soul of Big O
An Algorithm of the Third Kind
Logarithms
O(log N) Explained
Practical Examples
Wrapping Up
Exercises
4. Speeding Up Your Code with Big O
Bubble Sort
Bubble Sort in Action
The Efficiency of Bubble Sort
A Quadratic Problem
A Linear Solution
Wrapping Up
Exercises
5. Optimizing Code with and Without Big O
Selection Sort
Selection Sort in Action
The Efficiency of Selection Sort
Ignoring Constants
Big O Categories
Wrapping Up
Exercises
6. Optimizing for Optimistic Scenarios
Insertion Sort
Insertion Sort in Action
The Efficiency of Insertion Sort
The Average Case
A Practical Example
Wrapping Up
Exercises
7. Big O in Everyday Code
Mean Average of Even Numbers
Word Builder
Array Sample
Average Celsius Reading
Clothing Labels
Count the Ones
Palindrome Checker
Get All the Products
Password Cracker
Wrapping Up
Exercises
8. Blazing Fast Lookup with Hash Tables
Hash Tables
Hashing with Hash Functions
Building a Thesaurus for Fun and Profit, but Mainly Profit
Hash Table Lookups
Dealing with Collisions
Making an Efficient Hash Table
Hash Tables for Organization
Hash Tables for Speed
Wrapping Up
Exercises
9. Crafting Elegant Code with Stacks and Queues
Stacks
Abstract Data Types
Stacks in Action
The Importance of Constrained Data Structures
Queues
Queues in Action
Wrapping Up
Exercises
10. Recursively Recurse with Recursion
Recurse Instead of Loop
The Base Case
Reading Recursive Code
Recursion in the Eyes of the Computer
Filesystem Traversal
Wrapping Up
Exercises
11. Learning to Write in Recursive
Recursive Category: Repeatedly Execute
Recursive Category: Calculations
Top-Down Recursion: A New Way of Thinking
The Staircase Problem
Anagram Generation
Wrapping Up
Exercises
12. Dynamic Programming
Unnecessary Recursive Calls
The Little Fix for Big O
The Efficiency of Recursion
Overlapping Subproblems
Dynamic Programming through Memoization
Dynamic Programming through Going Bottom-Up
Wrapping Up
Exercises
13. Recursive Algorithms for Speed
Partitioning
Quicksort
The Efficiency of Quicksort
Quicksort in the Worst-Case Scenario
Quickselect
Sorting as a Key to Other Algorithms
Wrapping Up
Exercises
14. Node-Based Data Structures
Linked Lists
Implementing a Linked List
Reading
Searching
Insertion
Deletion
Efficiency of Linked List Operations
Linked Lists in Action
Doubly Linked Lists
Queues as Doubly Linked Lists
Wrapping Up
Exercises
15. Speeding Up All the Things with Binary Search Trees
Trees
Binary Search Trees
Searching
Insertion
Deletion
Binary Search Trees in Action
Binary Search Tree Traversal
Wrapping Up
Exercises
16. Keeping Your Priorities Straight with Heaps
Priority Queues
Heaps
Heap Properties
Heap Insertion
Looking for the Last Node
Heap Deletion
Heaps vs. Ordered Arrays
The Problem of the Last Node…Again
Arrays as Heaps
Heaps as Priority Queues
Wrapping Up
Exercises
17. It Doesn't Hurt to Trie
Tries
Storing Words
Trie Search
The Efficiency of Trie Search
Trie Insertion
Building Autocomplete
Completing Autocomplete
Tries with Values: A Better Autocomplete
Wrapping Up
Exercises
18. Connecting Everything with Graphs
Graphs
Directed Graphs
Object-Oriented Graph Implementation
Graph Search
Depth-First Search
Breadth-First Search
The Efficiency of Graph Search
Weighted Graphs
Dijkstra’s Algorithm
Wrapping Up
Exercises
19. Dealing with Space Constraints
Big O of Space Complexity
Trade-Offs Between Time and Space
The Hidden Cost of Recursion
Wrapping Up
Exercises
20. Techniques for Code Optimization
Prerequisite: Determine Your Current Big O
Start Here: The Best-Imaginable Big O
Magical Lookups
Recognizing Patterns
Greedy Algorithms
Change the Data Structure
Wrapping Up
Parting Thoughts
Exercises
A1. Exercise Solutions
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17
Chapter 18
Chapter 19
Chapter 20
Index
– A –
– B –
– C –
– D –
– E –
– F –
– G –
– H –
– I –
– J –
– K –
– L –
– M –
– N –
– O –
– P –
– Q –
– R –
– S –
– T –
– U –
– V –
– W –

Early Praise for A Common-Sense Guide to Data Structures and Algorithms, Second Edition If you—like me—got into coding without the benefit of a “traditional” computer science education, this book is a fantastic way to pick up the basics of how to think about algorithms, as well as how to use and implement a range of common data structures. All that, presented in plain language with a minimum of jargon, and a fun, pleasant tone! A great read for folks looking to pick up foundational programming knowledge. ➤ John Anderson VP, Technology, Infinity Interactive I’ve been coding for well over 30 years and one thing I’ve learned is to ALWAYS return to basics. A Common-Sense Guide to Data Structures and Algorithms has been a great way for me to re-assert all the assumptions I’ve made in the past, and re-cement my understanding of the core skills I’ll use in the future. Don’t buy this book if you think it’ll help you on whiteboarding tests. Those tests suck anyway. Buy this book to continue a path to real algorithmic thinking. Whether you’re early in your career or further along like me, you want a full toolbox of data structures and the common (and not so common) algorithms that complement them. Even better, you’ll learn how, when, and why to optimize your code. You’ll make it work, make it fast, then make it elegant—and learn the trade-offs along the way. ➤ Scott Hanselman Microsoft Programmer, Professor, Blogger, Podcast

Despite being a software professional for fifteen years, I’ve learnt a great deal from this updated edition. This is also the book I wish I’d had twenty years ago, when learning these concepts as an undergraduate. It’s like I have a new superpower tuning me in to the opportunities hash tables can provide to improve the Big O of code. Forget what it looks like, forget what it feels like, forget what you think it is, forget what habits you’ve accumulated: forget all that stuff; it won’t give you the most efficient Big O! ➤ Nigel Lowry Principal Consultant, Lemmata The second edition of this book includes many worthy additions to an already excellent resource for learning data structures and algorithms. Additional commonsense explanations of topics like dynamic programming, and questions to check your understanding at the end of each chapter, make this book invaluable to developers with or without a computer science background. ➤ Jason Pike Senior Software Engineer, KEYSYS Consulting The perfect introduction to algorithms and structures for beginners. Highly recommended! ➤ Brian Schau Lead Developer, Schau Consulting

A Common-Sense Guide to Data Structures and Algorithms, Second Edition Level Up Your Core Programming Skills

Jay Wengrow

The Pragmatic Bookshelf Raleigh, North Carolina

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and The Pragmatic Programmers, LLC was aware of a trademark claim, the designations have been printed in initial capital letters or in all capitals. The Pragmatic Starter Kit, The Pragmatic Programmer, Pragmatic Programming, Pragmatic Bookshelf, PragProg and the linking g device are trademarks of The Pragmatic Programmers, LLC. Every precaution was taken in the preparation of this book. However, the publisher assumes no responsibility for errors or omissions, or for damages that may result from the use of information (including program listings) contained herein. Our Pragmatic books, screencasts, and audio books can help you and your team create better software and have more fun. Visit us at https://pragprog.com. The team that produced this book includes: Publisher: Andy Hunt VP of Operations: Janet Furlow Executive Editor: Dave Rankin Development Editor: Brian MacDonald Copy Editor: Katharine Dvorak Indexing: Potomac Indexing, LLC Layout: Gilson Graphics For sales, volume licensing, and support, please contact [email protected] For international rights, please contact [email protected]

Copyright © 2020 The Pragmatic Programmers, LLC. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher. ISBN-13: 978-1-68050-722-5 Encoded using the finest acid-free high-entropy binary digits. Book version: P1.0—August 2020

Contents Preface .

.

.

.

.

.

.

.

.

.

.

1.

Why Data Structures Matter . . . . Data Structures The Array: The Foundational Data Structure Measuring Speed Reading Searching Insertion Deletion Sets: How a Single Rule Can Affect Efficiency Wrapping Up Exercises

2.

Why Algorithms Matter . . . Ordered Arrays Searching an Ordered Array Binary Search Binary Search vs. Linear Search Wrapping Up Exercises

3.

O Yes! Big O Notation . . . . . . Big O: How Many Steps Relative to N Elements? The Soul of Big O An Algorithm of the Third Kind Logarithms O(log N) Explained Practical Examples

.

.

.

.

.

.

.

.

.

xiii

.

.

1 2 3 4 5 8 11 13 15 18 19

.

.

.

.

21 22 24 26 31 34 34

.

.

.

.

35 36 37 40 41 42 43

Contents

Wrapping Up Exercises

• vi 45 45

4.

Speeding Up Your Code with Big O Bubble Sort Bubble Sort in Action The Efficiency of Bubble Sort A Quadratic Problem A Linear Solution Wrapping Up Exercises

.

.

.

.

.

.

47 47 48 53 56 58 60 60

5.

Optimizing Code with and Without Big O . Selection Sort Selection Sort in Action The Efficiency of Selection Sort Ignoring Constants Big O Categories Wrapping Up Exercises

.

.

.

.

.

63 63 64 70 71 72 76 76

6.

Optimizing for Optimistic Scenarios . Insertion Sort Insertion Sort in Action The Efficiency of Insertion Sort The Average Case A Practical Example Wrapping Up Exercises

.

.

.

.

.

.

79 79 80 86 88 91 93 93

7.

Big O in Everyday Code . . Mean Average of Even Numbers Word Builder Array Sample Average Celsius Reading Clothing Labels Count the Ones Palindrome Checker Get All the Products Password Cracker

.

.

.

.

.

.

95 95 97 99 100 101 102 102 103 107

.

.

.

Contents

Wrapping Up Exercises

• vii

109 109

8.

Blazing Fast Lookup with Hash Tables . . . . . Hash Tables Hashing with Hash Functions Building a Thesaurus for Fun and Profit, but Mainly Profit Hash Table Lookups Dealing with Collisions Making an Efficient Hash Table Hash Tables for Organization Hash Tables for Speed Wrapping Up Exercises

.

113 113 114 115 117 119 122 124 125 130 131

9.

Crafting Elegant Code with Stacks and Queues . Stacks Abstract Data Types Stacks in Action The Importance of Constrained Data Structures Queues Queues in Action Wrapping Up Exercises

10. Recursively Recurse with Recursion . Recurse Instead of Loop The Base Case Reading Recursive Code Recursion in the Eyes of the Computer Filesystem Traversal Wrapping Up Exercises

.

.

.

133 133 136 137 143 144 146 147 148

.

.

.

.

.

149 149 151 151 154 156 159 159

11. Learning to Write in Recursive . . . . Recursive Category: Repeatedly Execute Recursive Category: Calculations Top-Down Recursion: A New Way of Thinking The Staircase Problem Anagram Generation

.

.

.

.

161 161 166 168 173 177

Contents

Wrapping Up Exercises

• viii

181 181

12. Dynamic Programming . . . . . . . Unnecessary Recursive Calls The Little Fix for Big O The Efficiency of Recursion Overlapping Subproblems Dynamic Programming through Memoization Dynamic Programming through Going Bottom-Up Wrapping Up Exercises

.

.

.

183 183 187 188 189 191 194 196 197

13. Recursive Algorithms for Speed . Partitioning Quicksort The Efficiency of Quicksort Quicksort in the Worst-Case Scenario Quickselect Sorting as a Key to Other Algorithms Wrapping Up Exercises

.

.

.

.

.

.

199 199 205 211 216 218 222 223 224

14. Node-Based Data Structures . . Linked Lists Implementing a Linked List Reading Searching Insertion Deletion Efficiency of Linked List Operations Linked Lists in Action Doubly Linked Lists Queues as Doubly Linked Lists Wrapping Up Exercises

.

.

.

.

.

.

225 225 227 229 231 232 236 238 239 240 242 244 244

15. Speeding Up All the Things with Binary Search Trees Trees Binary Search Trees Searching

.

.

247 248 250 251

Contents

Insertion Deletion Binary Search Trees in Action Binary Search Tree Traversal Wrapping Up Exercises

• ix

256 260 271 272 276 276

16. Keeping Your Priorities Straight with Heaps Priority Queues Heaps Heap Properties Heap Insertion Looking for the Last Node Heap Deletion Heaps vs. Ordered Arrays The Problem of the Last Node…Again Arrays as Heaps Heaps as Priority Queues Wrapping Up Exercises

.

.

.

.

279 279 281 284 285 287 288 292 293 295 302 302 303

17. It Doesn’t Hurt to Trie . . . . . Tries Storing Words Trie Search The Efficiency of Trie Search Trie Insertion Building Autocomplete Completing Autocomplete Tries with Values: A Better Autocomplete Wrapping Up Exercises

.

.

.

.

.

305 306 307 311 315 316 320 326 327 328 329

18. Connecting Everything with Graphs . Graphs Directed Graphs Object-Oriented Graph Implementation Graph Search Depth-First Search Breadth-First Search The Efficiency of Graph Search

.

.

.

.

.

331 332 334 334 337 339 348 361

Contents

Weighted Graphs Dijkstra’s Algorithm Wrapping Up Exercises

•x

364 367 384 384

19. Dealing with Space Constraints . . Big O of Space Complexity Trade-Offs Between Time and Space The Hidden Cost of Recursion Wrapping Up Exercises

.

.

.

.

.

387 387 390 393 395 395

20. Techniques for Code Optimization . . . Prerequisite: Determine Your Current Big O Start Here: The Best-Imaginable Big O Magical Lookups Recognizing Patterns Greedy Algorithms Change the Data Structure Wrapping Up Parting Thoughts Exercises

.

.

.

.

397 397 397 399 406 414 427 433 433 434

A1. Exercise Solutions . Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17

.

.

.

.

439 439 440 440 441 442 442 443 444 446 447 448 450 451 453 456 458 459

.

.

.

.

.

.

.

Contents

Chapter 18 Chapter 19 Chapter 20 Index

.

• xi

461 464 465 .

.

.

.

.

.

.

.

.

.

.

.

.

475

Preface Data structures and algorithms are more than abstract concepts. Mastering them enables you to write code that is efficient, leading to software that runs faster and consumes less memory. This is a big deal for today’s software applications, which exist on increasingly mobile platforms and handle increasingly greater amounts of data. The problem with most resources on these subjects, though, is that they are...well...obtuse. Most texts go heavy on the math jargon, and if you are not a mathematician, it can be difficult to grasp what on Earth is going on. Even books that claim to make algorithms “easy” seem to assume that the reader has an advanced math degree. Because of this, too many people shy away from these concepts, feeling that they are not “smart” enough to understand them. The truth, though, is that everything about data structures and algorithms boils down to common sense. Mathematical notation itself is simply a particular language, and everything in math can also be explained with common-sense terminology. In this book, I use that common-sense language (plus a lot of diagrams!) to explain these concepts in simple, and dare I say, enjoyable ways. Once you understand these concepts, you will be equipped to write code that is efficient, fast, and elegant. You will be able to weigh the pros and cons of various code alternatives, and be able to make educated decisions as to which code is best for the given situation. In this book, I go out of my way to make these concepts real and practical with ideas that you can make use of today. Sure, you’ll learn some really cool computer science along the way. But this book is about taking that seemingly abstract stuff and making it directly practical. You’ll be writing better code and faster software by the time you’re done reading this book.

report erratum • discuss

Preface

• xiv

Who Is This Book For? This book is ideal for several audiences: • You are a computer science student who wants a text that explains data structures and algorithms in plain English. This book can serve as a supplement to whatever “classic” textbook you happen to be using. • You are a beginning developer who knows basic programming, but wants to learn the fundamentals of computer science to write better code and increase your programming knowledge and skills. • You are a self-taught developer who has never studied formal computer science (or a developer who did but forgot everything!) and wants to leverage the power of data structures and algorithms to write more scalable and elegant code. Whoever you may be, I tried to write this book so it can be accessed and enjoyed by people of all skill levels.

What’s New in the Second Edition Why a second edition? In the few years since the original edition was published, I’ve had the opportunity to teach these topics to various audiences. Over time, I continued to refine my explanations as well as discover additional topics that I thought were exciting and important. There was also quite a bit of demand for hands-on exercises that would help people gain practice with these concepts. Accordingly, the second edition has the following new features: 1. Revised material. I’ve made significant revisions to the original chapters for the sake of clarity. While I thought that the first edition did a pretty good job in making these complex topics easy to understand, I found that there was room to make certain areas even clearer. Many sections of the original chapters have been entirely rewritten, and I’ve added brand-new sections as well. I feel that these revisions alone have enhanced the book to make it worthy of a new edition. 2. New chapters and topics. The second edition contains six new chapters that cover topics I’m particularly excited about. The book has always blended practice with theory, but I added even more material that you can take straight to the bank. The chapters Big O in Everyday Code and Techniques for Code Optimization focus exclusively

report erratum • discuss

What’s in This Book?

• xv

on day-to-day code, and how knowledge of data structures and algorithms can help you write more efficient software. I went all out when it came to recursion. While the previous edition contained a chapter on this topic, I devoted an entirely new chapter, Learning to Write in Recursive, to teach you how to write recursive code, which can be confusing for beginners. I haven’t seen this explained anywhere else, and I think it’s a unique and valuable addition. I also added the chapter, Dynamic Programming, which is a popular subject, and one critical to making recursive code more efficient. There are many data structures out there, and it’s difficult to select which to include and which to leave out. However, there’s been increased demand to learn about heaps and tries, and I’ve found them to be fascinating as well. Hence the chapters, Keeping Your Priorities Straight with Heaps and It Doesn't Hurt to Trie. 3. Exercises and solutions. Each chapter now has a number of exercises that allow you to gain hands-on practice with each topic within the book. I also added detailed solutions, which are available in an appendix at the back of the book. This is a significant enhancement that transforms the book into a more complete learning experience.

What’s in This Book? As you may have guessed, this book talks quite a bit about data structures and algorithms. More specifically, the book is laid out as follows: In Why Data Structures Matter and Why Algorithms Matter, I explain what data structures and algorithms are and explore the concept of time complexity—which is used to determine how fast an algorithm is. In the process, I also talk a great deal about arrays, sets, and binary search. In O Yes! Big O Notation, I unveil Big O Notation and explain it in terms that are easy to understand. We use this notation throughout the book, so this chapter is pretty important. In Speeding Up Your Code with Big O, Optimizing Code with and Without Big O, and Optimizing for Optimistic Scenarios, we delve further into Big O Notation and use it to make our day-to-day code faster. Along the way, I cover various sorting algorithms, including Bubble Sort, Selection Sort, and Insertion Sort. In Big O in Everyday Code, you apply all that you learned about Big O Notation and analyze the efficiency of code from the real world.

report erratum • discuss

Preface

• xvi

In Blazing Fast Lookup with Hash Tables and Crafting Elegant Code, I discuss a few additional data structures, including hash tables, stacks, and queues. I show how they impact the speed and elegance of our code and how we can use them to solve real-world problems. Recursively Recurse with Recursion introduces recursion, an anchor concept in the world of computer science. We break it down in this chapter and see how it can be a great tool for certain situations. Learning to Write in Recursive teaches you how to write recursive code, which can otherwise be quite confusing. Dynamic Programming shows you how to optimize recursive code and prevent it from spiraling out of control. And Recursive Algorithms for Speed shows you how to use recursion as the foundation for turbo-fast algorithms like Quicksort and Quickselect, and takes your algorithm-development skills up a few notches. The following chapters, Node-Based Data Structures, Speeding Up All the Things, Keeping Your Priorities Straight with Heaps, It Doesn't Hurt to Trie, and Connecting Everything with Graphs, explore node-based data structures including the linked list, the binary tree, the heap, the trie, and the graph, and show how each is ideal for various applications. Dealing with Space Constraints explores space complexity, which is important when programming for devices with relatively small amounts of disk space, or when dealing with big data. The final chapter, Techniques for Code Optimization, walks you through various practical techniques for optimizing the efficiency of code, and gives you new ideas for improving the code you write every day.

How to Read This Book You’ve got to read this book in order. There are books out there where you can read each chapter independently and skip around a bit, but this isn’t one of them. Each chapter assumes you’ve read the previous ones, and the book is carefully constructed so you can ramp up your understanding as you proceed. That being said, there are certain chapters in the latter half of the book that don’t depend entirely on each other. The diagram on page xvii depicts which chapters are prerequisites for other chapters. For example, you could technically skip from Chapter 10 to Chapter 13 if you wanted to. (Oh! And this diagram is based on a data structure called a tree. You’re going to learn about it in Chapter 15.)

report erratum • discuss

How to Read This Book

• xvii

Another important note: to make this book easy to understand, I don’t always reveal everything about a particular concept when I first introduce it. Sometimes, the best way to break down a complex concept is to reveal a small piece of it, and only reveal the next piece when the first piece has sunken in. If I

report erratum • discuss

Preface

• xviii

define a particular term as such-and-such, don’t take that as the textbook definition until you’ve completed the entire section on that topic. It’s a trade-off: to make the book digestible, I’ve chosen to oversimplify certain concepts at first and clarify them over time, rather than ensure that every sentence is completely, academically, accurate. But don’t worry too much, because by the end, you’ll see the entire accurate picture.

Code Examples The concepts in this book aren’t specific to one particular programming language. Because of this, I chose to use a variety of languages to demo the examples in this book. The examples are written in Ruby, Python, and JavaScript, so having a basic understanding of these languages would be helpful. That being said, I’ve tried to write the examples in such a way that even if you’re not familiar with the language of a given example, you should be able to follow along. To help with this, I don’t always follow the popular idioms for each language where I feel that an idiom may confuse someone new to that particular language. I understand that having the book switch languages frequently can come with a certain mental switching cost. However, I felt that it was important to keep the book language-agnostic. And again, I’ve tried to keep the code in all languages easy to read and straightforward. There are some longer code snippets under the headings that read “Code Implementation.” I certainly encourage you to study these code samples, but you don’t necessarily have to understand every last line in order to proceed to the next section of the book. If these long pieces of code are bogging you down, just skip (or skim) them for now. Finally, it’s important to note that not every code snippet is “production ready.” My greatest focus has been to clarify the concept at hand, and while I did also try to make the code generally complete, I may not have accounted for every edge case. There’s certainly room for you to optimize the code further—so feel free to go crazy with that!

Online Resources This book has its own web page1 on which you can find more information about the book and help improve it by reporting errata, including content suggestions and typos. 1.

https://pragprog.com/titles/jwdsal2

report erratum • discuss

Acknowledgments

• xix

Acknowledgments While the task of writing a book may seem like a solitary one, this book simply could not have happened without the many people who have supported me in my journey writing it. I’d like to personally thank all of you. To my wonderful wife, Rena—thank you for the time and emotional support you’ve given to me. You took care of everything while I hunkered down like a recluse and wrote. To my adorable kids, Tuvi, Leah, Shaya, and Rami— thank you for your patience as I wrote my book on “algorizms.” And yes—it’s finally done. To my parents, Mr. and Mrs. Howard and Debbie Wengrow—thank you for initially sparking my interest in computer programming and helping me pursue it. Little did you know that getting me a computer tutor for my ninth birthday would set the foundation for my career—and now this book. To my wife’s parents, Mr. and Mrs. Paul and Kreindel Pinkus—thank you for your continued support of my family and me. Your wisdom and warmth mean so much to me. When I first submitted my manuscript to the Pragmatic Bookshelf, I thought it was good. However, through the expertise, suggestions, and demands of all the wonderful people who work there, the book has become something much, much better than I could have written on my own. To my editor, Brian MacDonald— you’ve shown me how a book should be written, and your insights have sharpened each chapter; this book has your imprint all over it. To the original managing editor, Susannah Pfalzer and executive editor, Dave Rankin—you’ve given me the vision for what this book could be, taking my theory-based manuscript and transforming it into a book that can be applied to the everyday programmer. To the publishers, Andy Hunt and Dave Thomas—thank you for believing in this book and making the Pragmatic Bookshelf the most wonderful publishing company to write for. To the extremely talented software developer and artist, Colleen McGuckin— thank you for taking my chicken scratch and transforming it into beautiful digital imagery. This book would be nothing without the spectacular visuals that you’ve created with such skill and attention to detail. I’ve been fortunate that so many experts have reviewed this book. Your feedback has been extremely helpful and has made sure that this book can be as accurate as possible. I’d like to thank all of you for your contributions.

report erratum • discuss

Preface

• xx

The reviewers for the first edition were: Alessandro Bahgat, Ivo Balbaert, Alberto Boschetti, Javier Collado, Mohamed Fouad, Derek Graham, Neil Hainer, Peter Hampton, Rod Hilton, Jeff Holland, Jessica Janiuk, Aaron Kalair, Stephan Kämper, Arun S. Kumar, Sean Lindsay, Nigel Lowry, Joy McCaffrey, Daivid Morgan, Jasdeep Narang, Stephen Orr, Kenneth Parekh, Jason Pike, Sam Rose, Frank Ruiz, Brian Schau, Tibor Simic, Matteo Vaccari, Stephen Wolff, and Peter W. A. Wood. The reviewers for the second edition are: Rinaldo Bonazzo, Mike Browne, Craig Castelaz, Jacob Chae, Zulfikar Dharmawan, Ashish Dixit, Dan Dybas, Emily Ekhdal, Derek Graham, Rod Hilton, Jeff Holland, Grant Kazan, Sean Lindsay, Nigel Lowry, Dary Merckens, Kevin Mitchell, Nouran Mhmoud, Daivid Morgan, Brent Morris, Emanuele Origgi, Jason Pike, Ayon Roy, Brian Schau, Mitchell Volk, and Peter W. A. Wood. In addition to the official reviewers, I’d also like to thank all the beta book readers who provided feedback as I continued to write and edit the book. Your suggestions, comments, and questions have been invaluable. I’d also like to thank all the staff, students, and alumni at Actualize for your support. This book was originally an Actualize project, and you’ve all contributed in various ways. I’d like to particularly thank Luke Evans for giving me the idea to write this book. Thank you all for making this book a reality.

Connecting I enjoy connecting with my readers, and invite you to find me on LinkedIn.2 I’d gladly accept your connection request—just send a message that you’re a reader of this book. I look forward to hearing from you! Jay Wengrow [email protected]

May, 2020

2.

https://www.linkedin.com/in/jaywengrow

report erratum • discuss

CHAPTER 1

Why Data Structures Matter When people first learn to code, their focus is—and should be—on getting their code to run properly. Their code is measured using one simple metric: does the code actually work? As software engineers gain more experience, though, they begin to learn about additional layers and nuances regarding the quality of their code. They learn that there can be two snippets of code that both accomplish the same task, but that one snippet is better than the other. There are numerous measures of code quality. One important measure is code maintainability. Maintainability of code involves aspects such as the readability, organization, and modularity of one’s code. However, there’s another aspect of high-quality code, and that is code efficiency. For example, you can have two code snippets that both achieve the same goal, but one runs faster than the other. Take a look at these two functions, both of which print all the even numbers from 2 to 100: def print_numbers_version_one(): number = 2 while number value_at_midpoint lower_bound = midpoint + 1 end end # If we've narrowed the bounds until they've reached each other, that # means that the value we're searching for is not contained within # this array: return nil end

Let’s break this down. As with the linear_search method, binary_search accepts the array and the search_value as arguments. Here’s an example of how to call this method: p binary_search([3, 17, 75, 80, 202], 22)

The method first establishes the range of indexes in which the search_value might be found. We do this with: lower_bound = 0 upper_bound = array.length - 1

Because, when starting our search, the search_value might be found anywhere within the entire array, we establish the lower_bound as the first index and the upper_bound as the last index. The essence of the search takes place within the while loop: while lower_bound list[i+1]: list[i], list[i+1] = list[i+1], list[i] sorted = False unsorted_until_index -= 1 return list

To use this function, we can pass an unsorted array to it, like so: print(bubble_sort([65, 55, 45, 35, 25, 15, 10]))

This function will then return the sorted array. Let’s break the function down line by line to see how it works. I’ll explain each line by first providing the explanation, followed by the line of code itself. The first thing we do is create a variable called unsorted_until_index. This keeps track of the rightmost index of the array that has not yet been sorted. When we first start the algorithm, the array is completely unsorted, so we initialize this variable to be the final index in the array:

report erratum • discuss

The Efficiency of Bubble Sort

• 53

unsorted_until_index = len(list) - 1

We also create a variable called sorted that will keep track of whether the array is fully sorted. Of course, when our code first runs, it isn’t, so we set it to False: sorted = False

We begin a while loop that continues to run as long as the array is not sorted. Each round of this loop represents a pass-through of the array: while not sorted:

Next, we preliminarily establish sorted to be True: sorted = True

The approach here is that in each pass-through, we’ll assume the array is sorted until we encounter a swap, in which case we’ll change the variable back to False. If we get through an entire pass-through without having to make any swaps, sorted will remain True, and we’ll know that the array is completely sorted. Within the while loop, we begin a for loop in which we point to each pair of values in the array. We use the variable i as our first pointer, and it starts from the beginning of the array and goes until the index that has not yet been sorted: for i in range(unsorted_until_index):

Within this loop, we compare each pair of adjacent values and swap those values if they’re out of order. We also change sorted to False if we have to make a swap: if list[i] > list[i+1]: list[i], list[i+1] = list[i+1], list[i] sorted = False

At the end of each pass-through, we know that the value we bubbled up all the way to the right is now in its correct position. Because of this, we decrement the unsorted_until_index by 1, since the index it was already pointing to is now sorted: unsorted_until_index -= 1

The while loop ends once sorted is True, meaning the array is completely sorted. Once this is the case, we return the sorted array: return list

The Efficiency of Bubble Sort The Bubble Sort algorithm contains two significant kinds of steps:

report erratum • discuss

Chapter 4. Speeding Up Your Code with Big O

• 54

• Comparisons: two numbers are compared with one another to determine which is greater. • Swaps: two numbers are swapped with one another in order to sort them. Let’s start by determining how many comparisons take place in Bubble Sort. Our example array has five elements. Looking back, you can see that in our first pass-through, we had to make four comparisons between sets of two numbers. In our second pass-through, we only had to make three comparisons. This is because we didn’t have to compare the final two numbers, since we knew that the final number was in the correct spot due to the first pass-through. In our third pass-through, we made two comparisons, and in our fourth passthrough, we made just one comparison. So, that’s: 4 + 3 + 2 + 1 = 10 comparisons. To put it this in a way that would hold true for arrays of all sizes, we’d say that for N elements, we make (N - 1) + (N - 2) + (N - 3) … + 1 comparisons. Now that we’ve analyzed the number of comparisons that take place in Bubble Sort, let’s analyze the swaps. In a worst-case scenario, where the array is sorted in descending order (the exact opposite of what we want), we’d actually need a swap for each comparison. So, we’d have 10 comparisons and 10 swaps in such a scenario for a grand total of 20 steps. Let’s look at the big picture. With an array containing five values in reverse order, we make 4 + 3 + 2 + 1 = 10 comparisons. Along with the 10 comparisons, we also have 10 swaps, totaling 20 steps. For such an array with 10 values, we get 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = 45 comparisons, and another 45 swaps. That’s a total of 90 steps. With an array containing 20 values, we’d have: 19 + 18 + 17 + 16 + 15 + 14 + 13 + 12 + 11 + 10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = 190 comparisons, and approximately 190 swaps, for a total of 380 steps.

report erratum • discuss

The Efficiency of Bubble Sort

• 55

Notice the inefficiency here. As the number of elements increases, the number of steps grows exponentially. We can see this clearly in the following table: N Data Elements

Max # of Steps

5

20

10

90

20

380

40

1560

80

6320

If you look at the growth of steps as N increases, you’ll see that it’s growing by approximately N2. Take a look at the following table: N Data Elements

# of Bubble Sort Steps

N2

5

20

25

10

90

100

20

380

400

40

1560

1600

80

6320

6400

Let’s express the time complexity of Bubble Sort with Big O Notation. Remember, Big O always answers the key question: if there are N data elements, how many steps will the algorithm take? Because for N values, Bubble Sort takes N2 steps, in Big O, we say that Bubble Sort has an efficiency of O(N2). O(N2) is considered to be a relatively inefficient algorithm, since as the data increases, the steps increase dramatically. Look at this graph, which compares O(N2) against the faster O(N):

report erratum • discuss

Chapter 4. Speeding Up Your Code with Big O

• 56

Note how O(N2) curves sharply upward in terms of number of steps as the data grows. Compare this with O(N), which plots along a simple, diagonal line. One last note: O(N2) is also referred to as quadratic time.

A Quadratic Problem Here’s a practical example of where we can replace a slow O(N2) algorithm with a speedy O(N) one. Let’s say you’re working on a JavaScript application that analyzes the ratings people give to products, where users leave ratings from 1 to 10. Specifically, you’re writing a function that checks whether an array of ratings contains any duplicate numbers. This will be used in more complex calculations in other parts of the software. For example, the array [1, 5, 3, 9, 1, 4] has two instances of the number 1, so we’d return true to indicate that the array has a case of duplicate numbers. One of the first approaches that may come to mind is the use of nested loops, as follows: function hasDuplicateValue(array) { for(let i = 0; i < array.length; i++) { for(let j = 0; j < array.length; j++) { if(i !== j && array[i] === array[j]) { return true; } } } return false; }

In this function, we iterate through each value of the array using the variable i. As we focus on each value in i, we then run a second loop that looks through all the values in the array—using j—and checks if the values at positions i and j are the same. If they are, it means we’ve encountered duplicate values and we return true. If we get through all of the looping and we haven’t encountered any duplicates, we return false, since we know that there are no duplicates in the array. While this certainly works, is it efficient? Now that we know a bit about Big O Notation, let’s take a step back and see what Big O would say about this function. Remember that Big O expresses how many steps the algorithm takes relative to N data values. To apply this to our situation, we’d ask ourselves: for N

report erratum • discuss

A Quadratic Problem

• 57

values in the array provided to our hasDuplicateValue function, how many steps would our algorithm take in a worst-case scenario? To answer the preceding question, we need to analyze what steps our function takes, as well as what the worst-case scenario would be. The preceding function has one type of step, namely comparisons. It repeatedly compares array[i] and array[j] to see if they are equal and therefore represent a duplicate pair. In a worst-case scenario, the array contains no duplicates, which would force our code to complete all of the loops and exhaust every possible comparison before returning false. Based on this, we can conclude that for N values in the array, our function would perform N2 comparisons. This is because we perform an outer loop that must iterate N times to get through the entire array, and for each iteration, we must iterate another N times with our inner loop. That’s N steps * N steps, which is N2 steps, leaving us with an algorithm of O(N2). We can actually prove that our function takes N2 steps by adding some code to our function that tracks the algorithm’s number of steps: function hasDuplicateValue(array) { let steps = 0; // count of steps for(let i = 0; i < array.length; i++) { for(let j = 0; j < array.length; j++) { steps++; // increment number of steps if(i !== j && array[i] === array[j]) { return true; } } } console.log(steps); // print number of steps if no duplicates return false; }

This added code will print the number of steps taken when there are no duplicates. If we run hasDuplicateValue([1, 4, 5, 2, 9]), for example, we’ll see an output of 25 in the JavaScript console, indicating that there were 25 comparisons for the 5 elements in the array. If we test this for other values, we’ll see that the output is always the size of the array squared. This is classic O(N2). Very often (but not always), when an algorithm nests one loop inside another, the algorithm is O(N2). So, whenever you see a nested loop, O(N2) alarm bells should go off in your head. Now, the fact that our function is O(N2) should give us pause. This is because O(N2) is considered a relatively slow algorithm. Whenever you encounter a

report erratum • discuss

Chapter 4. Speeding Up Your Code with Big O

• 58

slow algorithm, it’s worth spending some time to consider whether there are any faster alternatives. There may not be any better alternatives, but let’s first make sure.

A Linear Solution Following is another implementation of the hasDuplicateValue function that doesn’t rely on nested loops. It’s a bit clever, so let’s first look at how it works and then we’ll see if it’s any more efficient than our first implementation. function hasDuplicateValue(array) { let existingNumbers = []; for(let i = 0; i < array.length; i++) { if(existingNumbers[array[i]] === 1) { return true; } else { existingNumbers[array[i]] = 1; } } return false; }

Here’s what this function does. It creates an array called existingNumbers, which starts out empty. Then, we use a loop to check each number in the array. As it encounters each number, it places an arbitrary value (we’ve chosen to use a 1) in the existingNumbers array at the index of the number we’re encountering. For example, let’s say our array is [3, 5, 8]. When we encounter the 3, we place a 1 at index 3 of existingNumbers. So, the existingNumbers array will now be the rough equivalent of this: [undefined, undefined, undefined, 1]

There’s now a 1 at index 3 of existingNumbers, to indicate and remember for the future that we’ve already encountered a 3 in our given array. When our loop then encounters the 5 from the given array, it adds a 1 to index 5 of existingNumbers: [undefined, undefined, undefined, 1, undefined, 1]

Finally, when we reach the 8, existingNumbers will now look like this: [undefined, undefined, undefined, 1, undefined, 1, undefined, undefined, 1]

Essentially, we’re using the indexes of existingNumbers to remember which numbers from the array we’ve seen so far.

report erratum • discuss

A Linear Solution

• 59

Now, here’s the real trick. Before the code stores a 1 in the appropriate index, it first checks to see whether that index already has a 1 as its value. If it does, this means we’ve already encountered that number, meaning we found a duplicate. If this is the case, we simply return true and cut the function short. If we get to the end of the loop without having returned true, it means there are no duplicates and we return false. To determine the efficiency of this new algorithm in terms of Big O, we once again need to determine the number of steps the algorithm takes in a worstcase scenario. Here, the significant type of step is looking at each number and checking whether the value of its index in existingNumbers is a 1: if(existingNumbers[array[i]] === 1)

(In addition to the comparisons, we also make insertions into the existingNumbers array, but we’re considering that kind of step trivial in this analysis. More on this in the next chapter.) In terms of the worst-case scenario, such a scenario would occur when the array contains no duplicates, in which case our function must complete the entire loop. This new algorithm appears to make N comparisons for N data elements. This is because there’s only one loop, and it simply iterates for as many numbers as there are in the array. We can test out this theory by tracking the steps in the JavaScript console: function hasDuplicateValue(array) { let steps = 0; let existingNumbers = []; for(let i = 0; i < array.length; i++) { steps++; if(existingNumbers[array[i]] === 1) { return true; } else { existingNumbers[array[i]] = 1; } } console.log(steps); return false; }

If we run hasDuplicateValue([1, 4, 5, 2, 9]) now, we’ll see that the output in the JavaScript console is 5, which is the same as the size of our array. We’d find this to be true across arrays of all sizes. This algorithm, then, is O(N).

report erratum • discuss

Chapter 4. Speeding Up Your Code with Big O

• 60

We know that O(N) is much faster than O(N2), so by using this second approach, we’ve optimized our hasDuplicateValue function significantly. This is a huge speed boost. (There is actually one disadvantage with this new implementation, namely that this approach will consume more memory than the first approach. Don’t worry about this for now; we’ll discuss this at length in Dealing with Space Constraints.)

Wrapping Up It’s clear that having a solid understanding of Big O Notation can enable you to identify slow code and select the faster of two competing algorithms. However, there are situations in which Big O Notation will have us believe that two algorithms have the same speed, while one is actually faster. In the next chapter, you’re going to learn how to evaluate the efficiencies of various algorithms even when Big O isn’t nuanced enough to do so.

Exercises The following exercises provide you with the opportunity to practice with speeding up your code. The solutions to these exercises are found in the section, Chapter 4, on page 441. 1. Replace the question marks in the following table to describe how many steps occur for a given number of data elements across various types of Big O: N Elements

O(N)

O(log N)

O(N2)

100

100

?

?

2000

?

?

?

2. If we have an O(N2) algorithm that processes an array and find that it takes 256 steps, what is the size of the array? 3. Use Big O Notation to describe the time complexity of the following function. It finds the greatest product of any pair of two numbers within a given array: def greatestProduct(array): greatestProductSoFar = array[0] * array[1] for i, iVal in enumerate(array): for j, jVal in enumerate(array): if i != j and iVal * jVal > greatestProductSoFar: greatestProductSoFar = iVal * jVal return greatestProductSoFar

report erratum • discuss

Exercises

• 61

4. The following function finds the greatest single number within an array, but has an efficiency of O(N2). Rewrite the function so that it becomes a speedy O(N): def greatestNumber(array): for i in array: # Assume for now that i is the greatest: isIValTheGreatest = True for j in array: # If we find another value that is greater than i, # i is not the greatest: if j > i: isIValTheGreatest = False # If, by the time we checked all the other numbers, i # is still the greatest, it means that i is the greatest number: if isIValTheGreatest: return i

report erratum • discuss

CHAPTER 5

Optimizing Code with and Without Big O We’ve seen that Big O Notation is a great tool for comparing algorithms and determining which algorithm should be used for a given situation. However, it’s certainly not the only tool. In fact, there may be times when two competing algorithms are described in the same way using Big O, yet one algorithm is faster than the other. In this chapter, you’re going to learn how to discern between two algorithms that seem to have the same efficiency, and how to select the faster of the two.

Selection Sort In the previous chapter, we explored a sorting algorithm known as Bubble Sort, which had an efficiency of O(N2). We’re now going to dig into another sorting algorithm called Selection Sort, and see how it measures up to Bubble Sort. The steps of Selection Sort are as follows: 1. We check each cell of the array from left to right to determine which value is least. As we move from cell to cell, we keep track of the lowest value we’ve encountered so far. (We’ll do this by storing its index in a variable.) If we encounter a cell that contains a value that is even lower than the one in our variable, we replace it so that the variable now points to the new index. See the following diagram:

report erratum • discuss

Chapter 5. Optimizing Code with and Without Big O

• 64

2. Once we’ve determined which index contains the lowest value, we swap its value with the value we began the pass-through with. This would be index 0 in the first pass-through, index 1 in the second pass-through, and so on. The diagram here illustrates making the swap of the first pass-through. 3. Each pass-through consists of Steps 1 and 2. We repeat the pass-throughs until we reach a pass-through that would start at the end of the array. By this point, the array will have been fully sorted.

Selection Sort in Action Let’s walk through the steps of Selection Sort using the example array, [4, 2, 7, 1, 3]. We begin our first pass-through: We set things up by inspecting the value at index 0. By definition, it’s the lowest value in the array we’ve encountered so far (as it’s the only value we’ve encountered so far), so we keep track of its index in a variable:

Step 1: We compare the 2 with the lowest value so far (which happens to be 4):

The 2 is even less than the 4, so it becomes the lowest value so far:

report erratum • discuss

Selection Sort in Action

• 65

Step 2: We compare the next value—the 7—with the lowest value so far. The 7 is greater than the 2, so 2 remains our lowest value:

Step 3: We compare the 1 with the lowest value so far:

Because the 1 is even less than the 2, 1 becomes our new lowest value:

Step 4: We compare the 3 to the lowest value so far, which is the 1. We’ve reached the end of the array, and we’ve determined that 1 is the lowest value out of the entire array:

Step 5: Because 1 is the lowest value, we swap it with whatever value is at index 0—the index we began this pass-through with:

report erratum • discuss

Chapter 5. Optimizing Code with and Without Big O

• 66

Since we’ve moved the lowest value to the beginning of the array, that means the lowest value is now in its correct spot:

We are now ready to begin our second pass-through. Setup: The first cell—index 0—is already sorted, so this pass-through begins at the next cell, which is index 1. The value at index 1 is the number 2, and it is the lowest value we’ve encountered in this pass-through so far:

Step 6: We compare the 7 with the lowest value so far. The 2 is less than the 7, so 2 remains our lowest value:

Step 7: We compare the 4 with the lowest value so far. The 2 is less than the 4, so 2 remains our lowest value:

Step 8: We compare the 3 with the lowest value so far. The 2 is less than the 3, so 2 remains our lowest value:

report erratum • discuss

Selection Sort in Action

• 67

We’ve reached the end of the array. Since the lowest value from this passthrough was already in its correct spot, we don’t need to perform a swap. This ends our second pass-through, leaving us with:

We now begin the third pass-through. Setup: We begin at index 2, which contains the value 7. The 7 is the lowest value we’ve encountered so far in this pass-through:

Step 9: We compare the 4 with the 7:

We note that 4 is our new lowest value:

Step 10: We encounter the 3, which is even lower than the 4:

report erratum • discuss

Chapter 5. Optimizing Code with and Without Big O

• 68

The 3 becomes our new lowest value:

Step 11: We’ve reached the end of the array, so we swap the 3 with the value we started our pass-through with, which is the 7:

We now know that the 3 is in the correct place within the array:

While you and I can both see that the entire array is correctly sorted at this point, the computer does not know this yet, so it must begin a fourth passthrough. Setup: We begin the pass-through with index 3. The 4 is the lowest value so far:

Step 12: We compare the 7 with the 4:

The 4 remains the lowest value we’ve encountered in this pass-through so far, so we don’t need to swap it, since it’s already in the correct place.

report erratum • discuss

Selection Sort in Action

• 69

Because all the cells besides the last one are correctly sorted, that must mean the last cell is also in the correct order, and our entire array is properly sorted:

Code Implementation: Selection Sort Here’s a JavaScript implementation of Selection Sort: function selectionSort(array) { for(let i = 0; i < array.length - 1; i++) { let lowestNumberIndex = i; for(let j = i + 1; j < array.length; j++) { if(array[j] < array[lowestNumberIndex]) { lowestNumberIndex = j; } } if(lowestNumberIndex != i) { let temp = array[i]; array[i] = array[lowestNumberIndex]; array[lowestNumberIndex] = temp; } } return array; }

Let’s break this down line by line. We begin a loop that represents each pass-through. It uses the variable i to point to each value of the array, and goes up through the second-to-last value: for(let i = 0; i < array.length - 1; i++) {

It doesn’t need to run for the last value itself, since the array will be fully sorted by that point. Next, begin keeping track of the index containing the lowest value we encounter so far: let lowestNumberIndex = i;

This lowestNumberIndex will be 0 at the beginning of the first pass-through, 1 at the beginning of the second, and so on. The reason we specifically keep track of the index is because we’ll need access to both the lowest value and its index in the rest of our code, and we can use the index to reference both. (We can check the lowest value by calling array[lowestNumberIndex]).

report erratum • discuss

Chapter 5. Optimizing Code with and Without Big O

• 70

Within each pass-through, we check the remaining values of the array to see if there might be a lower value than the current lowest value: for(let j = i + 1; j < array.length; j++) {

Indeed, if we find a lower value, we store this new value’s index in lowestNumberIndex: if(array[j] < array[lowestNumberIndex]) { lowestNumberIndex = j; }

By the end of the inner loop, we’ll have found the index of the lowest number from this pass-through. If the lowest value from this pass-through is already in its correct place (which would happen when the lowest value is the first value we encounter in the passthrough), we don’t need to do anything. But if the lowest value is not in its correct place, we need to perform a swap. Specifically, we swap the lowest value with the value at i, which was the index we started the pass-through with: if(lowestNumberIndex != i) { let temp = array[i]; array[i] = array[lowestNumberIndex]; array[lowestNumberIndex] = temp; }

Finally, we return the sorted array: return array;

The Efficiency of Selection Sort Selection Sort contains two types of steps: comparisons and swaps. That is, we compare each value with the lowest number we’ve encountered in each pass-through, and we swap the lowest number into its correct position. Looking back at our example array that contains five elements, we had to make a total of 10 comparisons. Let’s break it down in the following table: Pass-Through #

# of Comparisons

1

4 comparisons

2

3 comparisons

3

2 comparisons

4

1 comparison

That’s a grand total of 4 + 3 + 2 + 1 = 10 comparisons.

report erratum • discuss

Ignoring Constants

• 71

To put it in a way that works for arrays of all sizes, we’d say that for N elements, we make (N - 1) + (N - 2) + (N - 3) … + 1 comparisons. As for swaps, though, we only need to make a maximum of one swap per passthrough. This is because in each pass-through, we make either one or zero swaps, depending on whether the lowest number of that pass-through is already in the correct position. Contrast this with Bubble Sort, where in a worst-case scenario—an array in descending order—we have to make a swap for each and every comparison. Here’s a side-by-side comparison of Bubble Sort and Selection Sort: N Elements

Max # of Steps in Bubble Sort

Max # of Steps in Selection Sort

5

20

14 (10 comparisons + 4 swaps)

10

90

54 (45 comparisons + 9 swaps)

20

380

199 (180 comparisons + 19 swaps)

40

1560

819 (780 comparisons + 39 swaps)

80

6320

3239 (3160 comparisons + 79 swaps)

From this comparison, it’s clear Selection Sort takes about half the number of steps Bubble Sort does, indicating that Selection Sort is twice as fast.

Ignoring Constants But here’s the funny thing: in the world of Big O Notation, Selection Sort and Bubble Sort are described in exactly the same way. Again, Big O Notation answers the key question: if there are N data elements, how many steps will the algorithm take? Because Selection Sort takes roughly half of N2 steps, it would seem reasonable that we’d describe the efficiency of Selection Sort as being O(N2 / 2). That is, for N data elements, there are N2 / 2 steps. The following table bears this out: N2 / 2

Max # of Steps in Selection Sort

5 / 2 = 12.5

14

N Elements

5 10 20 40 80

2

2

10 / 2 = 50

54

2

199

2

819

20 / 2 = 200 40 / 2 = 800 2

80 / 2 = 3200

3239

report erratum • discuss

Chapter 5. Optimizing Code with and Without Big O

• 72

In reality, however, Selection Sort is described in Big O as O(N2), just like Bubble Sort. This is because of a major rule of Big O that I’m now introducing for the first time: Big O Notation ignores constants. This is simply a mathematical way of saying that Big O Notation never includes regular numbers that aren’t an exponent. We simply drop these regular numbers from the expression. In our case, then, even though the algorithm takes N2 / 2 steps, we drop the “/ 2” because it’s a regular number, and express the efficiency as O(N2). Here’s a few more examples: For an algorithm that takes N / 2 steps, we’d call it O(N). An algorithm that takes N2 + 10 steps would be expressed as O(N2), since we drop the 10, which is a regular number. With an algorithm that takes 2N steps (meaning N * 2), we drop the regular number and call it O(N). Even O(100N), which is 100 times slower than O(N), is also referred to as O(N). Offhand, it would seem that this rule would render Big O Notation entirely useless, as you can have two algorithms that are described in the same exact way with Big O, and yet one can be 100 times faster than the other. And that’s exactly what we’re seeing here with Selection Sort and Bubble Sort. Both are described in Big O as O(N2), but Selection Sort is actually twice as fast as Bubble Sort. So, what gives?

Big O Categories This leads us to the next concept within Big O: Big O Notation only concerns itself with general categories of algorithm speeds. As an analogy, let’s talk about physical buildings. There are, of course, many different types of buildings. There are one-floor single-family homes, and twofloor single-family homes, and three-floor single-family homes. There are highrise apartment buildings with varying numbers of floors. And there are skyscrapers with various heights and shapes. If we were to compare two buildings, one of which is a single-family home and one of which is a skyscraper, it becomes almost moot to mention how many floors each one has. Because the two buildings are so incredibly different

report erratum • discuss

Big O Categories

• 73

in their sizes and functions, we don’t need to say, “This one is a two-story home, while the other is a one-hundred-floor skyscraper.” We may as well just call one a house and the other a skyscraper. Calling them by their general categories is enough to signify their vast differences. The same applies to algorithm efficiencies. If we compare, say, an O(N) algorithm with an O(N2) algorithm, the two efficiencies are so different that it doesn’t really matter whether the O(N) algorithm is actually O(2N), or O(N / 2) or even O(100N). Now, here’s why O(N) and O(N2) are considered two separate categories, while O(N) and O(100N) are part of the same category. Remember The Soul of Big O, on page 37. Big O Notation doesn’t care merely about the number of steps an algorithm takes. It cares about the long-term trajectory of the algorithm’s steps as the data increases. O(N) tells a story of straight growth—that the steps increase in a straight line according to some proportion of the data. This is true even when the steps are 100N. O(N2) tells a different story—one of exponential growth. Exponential growth is a completely different category compared to any form of O(N). This point is really driven home when we consider that O(N2) will, at some point in data growth, become slower than O(N) multiplied by any factor. In the following graph, you can see how O(N2) becomes slower than various factors of N:

report erratum • discuss

Chapter 5. Optimizing Code with and Without Big O

• 74

Therefore, when comparing two efficiencies that belong to two different categories of Big O, it’s enough to identify them by their general category. Talking about O(2N) when compared to O(N2) is like talking about a two-story house compared to a skyscraper. We may as well just say that O(2N) is part of the general category of O(N). All the types of Big O we’ve encountered, whether it’s O(1), O(log N), O(N), O(N2), or the types we’ll encounter later in this book, are general categories of Big O that are widely different from each other. Multiplying or dividing the number of steps by a regular number doesn’t make them change to another category. However, when two algorithms fall under the same classification of Big O, it doesn’t necessarily mean that both algorithms have the same speed. After all, Bubble Sort is twice as slow as Selection Sort even though both are O(N2). So, while Big O is perfect for contrasting algorithms that fall under different classifications of Big O, when two algorithms fall under the same classification, further analysis is required to determine which algorithm is faster.

A Practical Example Let’s return to the very first code example from Chapter 1, with minor changes: def print_numbers_version_one(upperLimit): number = 2 while number = 0:

We then perform our comparison. That is, we check whether the value at position is greater than the temp_value: if array[position] > temp_value:

If it is, we shift that left value to the right: array[position + 1] = array[position]

We then decrement position by 1 to compare the next left value against the temp_value in the next round of the while loop: position = position - 1

If at any point we encounter a value at position that is greater or equal to the temp_value, we can get ready to end our pass-through, since it’s time to move the temp_value into the gap: else: break

report erratum • discuss

Chapter 6. Optimizing for Optimistic Scenarios

• 86

The final step of each pass-through is moving the temp_value into the gap: array[position + 1] = temp_value

After all pass-throughs have been completed, we return the sorted array: return array

The Efficiency of Insertion Sort Four types of steps occur in Insertion Sort: removals, comparisons, shifts, and insertions. To analyze the efficiency of Insertion Sort, we need to tally up each of these steps. First, let’s dig into comparisons. A comparison takes place each time we compare a value to the left of the gap with the temp_value. In a worst-case scenario, where the array is sorted in reverse order, we have to compare every number to the left of temp_value with temp_value in each pass-through. This is because each value to the left of temp_value will always be greater than temp_value, so the pass-through will only end when the gap reaches the left end of the array. During the first pass-through, in which temp_value is the value at index 1, a maximum of one comparison is made, since there is only one value to the left of the temp_value. On the second pass-through, a maximum of two comparisons are made, and so on. On the final pass-through, we need to compare the temp_value with every single value in the array besides temp_value itself. In other words, if there are N elements in the array, a maximum of N - 1 comparisons are made in the final pass-through. We can, therefore, formulate the total number of comparisons as: 1 + 2 + 3 + … + (N - 1) comparisons. In our example array that contains five elements, that’s a maximum of: 1 + 2 + 3 + 4 = 10 comparisons. For an array containing 10 elements, there would be: 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 = 45 comparisons. For an array containing 20 elements, there would be a total of 190 comparisons, and so on. When examining this pattern, it emerges that for an array containing N elements, there are approximately N2 / 2 comparisons. (102 / 2 is 50, and 202 / 2 is 200. We’ll look at this pattern more closely in the next chapter.)

report erratum • discuss

The Efficiency of Insertion Sort

• 87

Let’s continue analyzing the other types of steps. Shifts occur each time we move a value one cell to the right. When an array is sorted in reverse order, there will be as many shifts as there are comparisons, since every comparison will force us to shift a value to the right. Let’s add up comparisons and shifts for a worst-case scenario: N2 / 2 comparisons 2

+ N / 2 shifts _____________________________

N2 steps Removing and inserting the temp_value from the array happens once per passthrough. Since there are always N - 1 pass-throughs, we can conclude that there are N - 1 removals and N - 1 insertions. So, now we’ve got: N2 comparisons and shifts combined N - 1 removals + N - 1 insertions _____________________________

N2 + 2N - 2 steps You’ve already learned one major rule of Big O: that Big O ignores constants. With this rule in mind, we’d—at first glance—simplify this to O(N2 + N). However, there is another major rule of Big O that I’ll reveal now: Big O Notation only takes into account the highest order of N when we have multiple orders added together. That is, if we have an algorithm that takes N4 + N3 + N2 + N steps, we only consider N4 to be significant—and just call it O(N4). Why is this? Look at the following table: N

N2

N3

N4

2

4

8

16

5

25

125

625

10

100

1,000

10,000

100

10,000 1,000,000 100,000,000

report erratum • discuss

Chapter 6. Optimizing for Optimistic Scenarios

• 88

As N increases, N4 becomes so much more significant than any other order of N that the smaller orders are considered trivial. For example, when looking at the bottom row of the table, when we add N4 + N3 + N2 + N, we get a total of 101,010,100. But we may as well round that down to 100,000,000, which is accomplished by ignoring those lower orders of N. We can apply this same concept to Insertion Sort. Even though we’ve already simplified Insertion Sort down to N2 + N steps, we simplify the expression further by throwing out the lower order, reducing it to O(N2). It emerges that in a worst-case scenario, Insertion Sort has the same time complexity as Bubble Sort and Selection Sort. They’re all O(N2). I noted in the previous chapter that although Bubble Sort and Selection Sort are both O(N2), Selection Sort is faster because Selection Sort has N2 / 2 steps compared with Bubble Sort’s N2 steps. At first glance, then, we’d say that Insertion Sort is as slow as Bubble Sort, since it too takes about N2 steps. If I stop the book here, you’d walk away thinking that Selection Sort is the best choice out of the three, since it’s twice as fast as either Bubble Sort or Insertion Sort. But it’s actually not that simple.

The Average Case Indeed, in a worst-case scenario, Selection Sort is faster than Insertion Sort. However, it is critical we also take into account the average-case scenario. Why? By definition, the cases that occur most frequently are average scenarios. Take a look at this simple bell curve:

report erratum • discuss

The Average Case

• 89

Best- and worst-case scenarios happen relatively infrequently. In the real world, average scenarios are what occur most of the time. Take a randomly sorted array, for example. What are the odds that the values will occur in perfect ascending or descending order? It’s much more likely that the values will be all over the place. Let’s examine Insertion Sort, then, in the context of all scenarios. We’ve looked at how Insertion Sort performs in a worst-case scenario—where the array is sorted in descending order. In the worst case, we saw that in each pass-through, we compare and shift every value we encounter. (We calculated this to be a total of N2 comparisons and shifts.) In the best-case scenario, where the data is already sorted in ascending order, we end up making just one comparison per pass-through and not a single shift, since each value is already in its correct place. Where data is randomly sorted, however, we’ll have pass-throughs in which we compare and shift all of the data, some of the data, or possibly none of the data. If you look at the preceding walk-through example in Insertion Sort in Action, on page 80, you’ll notice that in the first and third pass-throughs, we compare and shift all the data we encounter. In the fourth pass-through, we compare and shift just some of it, and in the second pass-through, we make just one comparison and shift no data at all. (This variance occurs because some pass-throughs compare all the data to the left of the temp_value, while other pass-throughs end early, due to encountering a value that is less than the temp_value.) So, in the worst-case scenario, we compare and shift all the data, and in the best-case scenario, we shift none of the data (and just make one comparison per pass-through). For the average scenario, we can say that in the aggregate, we probably compare and shift about half the data. Thus, if Insertion Sort takes N2 steps for the worst-case scenario, we’d say that it takes about N2 / 2 steps for the average scenario. (In terms of Big O, however, both scenarios are O(N2).) Let’s dive into some specific examples. The array, [1, 2, 3, 4] is already presorted, which is the best case. The worst case for the same data would be [4, 3, 2, 1], and an example of an average case might be [1, 3, 4, 2].

report erratum • discuss

Chapter 6. Optimizing for Optimistic Scenarios

• 90

In the worst case ([4, 3, 2, 1]), there are six comparisons and six shifts, for a total of 12 steps. In an average case of [1, 3, 4, 2], there are four comparisons and two shifts, for a total of six steps. In the best case ([1, 2, 3, 4]), there are three comparisons and zero shifts. We can now see that the performance of Insertion Sort varies greatly based on the scenario. In the worst-case scenario, Insertion Sort takes N2 steps. In an average scenario, it takes N2 / 2 steps. And in the best-case scenario, it takes about N steps. You can see these three types of performance in the following graph:

Contrast this with Selection Sort. Selection Sort takes N2 / 2 steps in all cases, from worst to average to best-case scenarios. This is because Selection Sort doesn’t have any mechanism for ending a pass-through early at any point. Each pass-through compares every value to the right of the chosen index no matter what. Here’s a table that compares Selection Sort and Insertion Sort: Best Case

Average Case

Worst Case

Selection Sort

N2 / 2

N2 / 2

N2 / 2

Insertion Sort

N

N2 / 2

N2

So, which is better: Selection Sort or Insertion Sort? The answer is: well, it depends. In an average case—where an array is randomly sorted—they perform similarly. If you have reason to assume you’ll be dealing with data that is mostly sorted, Insertion Sort will be a better choice. If you have reason to assume you’ll be dealing with data that is mostly sorted in reverse order, Selection Sort will be faster. If you have no idea what the data will be like, that’s essentially an average case, and both will be equal.

report erratum • discuss

A Practical Example

• 91

A Practical Example Suppose you are writing a JavaScript application, and somewhere in your code you find that you need to get the intersection between two arrays. The intersection is a list of all the values that occur in both of the arrays. For example, if you have the arrays, [3, 1, 4, 2] and [4, 5, 3, 6], the intersection would be a third array, [3, 4], since both of those values are common to the two arrays. Here’s one possible implementation: function intersection(firstArray, secondArray){ let result = []; for (let i = 0; i < firstArray.length; i++) { for (let j = 0; j < secondArray.length; j++) { if (firstArray[i] == secondArray[j]) { result.push(firstArray[i]); } } } return result; }

Here, we are running nested loops. In the outer loop, we iterate over each value in the first array. As we point to each value in the first array, we then run an inner loop that checks each value of the second array to see if it can find a match with the value being pointed to in the first array. There are two types of steps taking place in this algorithm: comparisons and insertions. We compare every value of the two arrays against each other, and we insert matching values into the array result. Let’s start by seeing how many comparisons there are. If the two arrays are of equal size, and say that N is the size of either array, the number of comparisons performed are N2. This is because we compare each element of the first array to each element of the second array. Thus, if we have two arrays that each contain five elements, we’d end up making 25 comparisons. So, this intersection algorithm has an efficiency of O(N2). The insertions, at most, would take N steps (if the two arrays happened to be identical). This is a lower order compared to N2, so we’d still consider the algorithm to be O(N2). If the arrays are different sizes—say N and M—we’d say that the efficiency of this function is O(N * M). (More on this in Big O in Everyday Code.) Is there any way we can improve this algorithm?

report erratum • discuss

Chapter 6. Optimizing for Optimistic Scenarios

• 92

This is where it’s important to consider scenarios beyond the worst case. In the current implementation of the intersection function, we make N2 comparisons in all scenarios, no matter whether the arrays are identical or the arrays do not share a single common value. However, where the two arrays share common values, we really shouldn’t have to check every value of the second array against a value of the first array. Let’s see why:

In this example, as soon as we find a common value (the 8), there’s really no reason to complete the second loop. What are we checking for at this point? We’ve already determined that the second array contains the same 8 that the first array does, and can add it to result. We’re performing an unnecessary step. To fix this, we can add a single word to our implementation: function intersection(firstArray, secondArray){ let result = []; for (let i = 0; i < firstArray.length; i++) { for (let j = 0; j < secondArray.length; j++) { if (firstArray[i] == secondArray[j]) { result.push(firstArray[i]); break; } } } return result; }

With the addition of the break, we can cut the inner loop short and save steps (and therefore time).

report erratum • discuss

Wrapping Up

• 93

It’s still true that in a worst-case scenario, where the two elements do not contain a single shared value, we have no choice but to perform N2 comparisons. But now, in the best-case scenario, where the two arrays are identical, we only have to perform N comparisons. In an average case, where the two arrays are different but share some values, the performance will be somewhere between N and N2. This is a significant optimization to our intersection function, since our first implementation would make N2 comparisons in all cases.

Wrapping Up Being able to discern between best-, average-, and worst-case scenarios is a key skill in choosing the best algorithm for your needs, as well as taking existing algorithms and optimizing them further to make them significantly faster. Remember, while it’s good to be prepared for the worst case, average cases are what happen most of the time. Now that we’ve covered the important concepts related to Big O Notation, let’s apply our knowledge to practical algorithms. In the next chapter, we’re going to take a look at various everyday algorithms that may appear in real codebases, and identify the time complexity of each one in terms of Big O.

Exercises The following exercises provide you with the opportunity to practice with optimizing for best- and worst-case scenarios. The solutions to these exercises are found in the section, Chapter 6, on page 442. 1. Use Big O Notation to describe the efficiency of an algorithm that takes 3N2 + 2N + 1 steps. 2. Use Big O Notation to describe the efficiency of an algorithm that takes N + log N steps. 3. The following function checks whether an array of numbers contains a pair of two numbers that add up to 10. function twoSum(array) { for (let i = 0; i < array.length; i++) { for (let j = 0; j < array.length; j++) { if (i !== j && array[i] + array[j] === 10) { return true; } } } return false;

report erratum • discuss

Chapter 6. Optimizing for Optimistic Scenarios

• 94

}

What are the best-, average-, and worst-case scenarios? Then, express the worst-case scenario in terms of Big O Notation. 4. The following function returns whether or not a capital “X” is present within a string. function containsX(string) { foundX = false; for(let i = 0; i < string.length; i++) { if (string[i] === "X") { foundX = true; } } return foundX; }

What is this function’s time complexity in terms of Big O Notation? Then, modify the code to improve the algorithm’s efficiency for best- and average-case scenarios.

report erratum • discuss

CHAPTER 7

Big O in Everyday Code In the previous chapters, you learned how to use Big O notation to express the time complexity of code. As you’ve seen, there are quite a few details that go into Big O analysis. In this chapter, we’ll use everything you’ve learned so far to analyze the efficiency of practical code samples that might be found in real-world codebases. Determining the efficiency of our code is the first step in optimizing it. After all, if we don’t know how fast our code is, how would we know if our modifications would make it faster? Additionally, once we know how our code is categorized in terms of Big O Notation, we can make a judgment call as to whether it may need optimization in the first place. For example, an algorithm that is O(N2) is generally considered to be a “slow” algorithm. So, if we’ve determined that our algorithm falls into such a category, we should take pause and wonder if there are ways to optimize it. Of course, O(N2) may be the best we can do for a given problem. However, knowing that our algorithm is considered slow can signal to us to dig deeper and analyze whether faster alternatives are available. In the future chapters of this book, you’re going to learn many techniques for optimizing our code for speed. But the first step of optimization is being able to determine how fast our code currently is. So, let’s begin.

Mean Average of Even Numbers The following Ruby method accepts an array of numbers and returns the mean average of all its even numbers. How would we express its efficiency in terms of Big O?

report erratum • discuss

Chapter 7. Big O in Everyday Code

• 96

def average_of_even_numbers(array) # The mean average of even numbers will be defined as the sum of # the even numbers divided by the count of even numbers, so we # keep track of both the sum and the count: sum = 0.0 count_of_even_numbers = 0 # We iterate through each number in the array, and if we encounter # an even number, we modify the sum and the count: array.each do |number| if number.even? sum += number count_of_even_numbers += 1 end end # We return the mean average: return sum / count_of_even_numbers end

Here’s how to break the code down to determine its efficiency. Remember that Big O is all about answering the key question: if there are N data elements, how many steps will the algorithm take? Therefore, the first thing we want to do is determine what the “N” data elements are. In this case, the algorithm is processing the array of numbers passed into this method. These, then, would be the “N” data elements, with N being the size of the array. Next, we have to determine how many steps the algorithm takes to process these N values. We can see that the guts of the algorithm is the loop that iterates over each number inside the array, so we’ll want to analyze that first. Since the loop iterates over each of the N elements, we know that the algorithm takes at least N steps. Looking inside the loop, though, we can see that a varying number of steps occur within each round of the loop. For each and every number, we check whether the number is even. Then, if the number is even, we perform two more steps: we modify the sum variable, and we modify the count_of_even_numbers variable. So, we execute three more steps for even numbers than we do for odd numbers. As you’ve learned, Big O focuses primarily on worst-case scenarios. In our case, the worst case is when all the numbers are even, in which case we

report erratum • discuss

Word Builder

• 97

perform three steps during each round of the loop. Because of this, we can say that for N data elements, our algorithm takes 3N steps. That is, for each of the N numbers, our algorithm executes three steps. Now, our method performs a few other steps outside of the loop as well. Before the loop, we initialize the two variables and set them to 0. Technically, these are two steps. After the loop, we perform another step: the division of sum / count_of_even_numbers. Technically, then, our algorithm takes three extra steps in addition to the 3N steps, so the total number of steps is 3N + 3. However, you also learned that Big O notation ignores constant numbers, so instead of calling our algorithm O(3N + 3), we simply call it O(N).

Word Builder The next example is an algorithm that collects every combination of twocharacter strings built from an array of single characters. For example, given the array: ["a", "b", "c", "d"], we’d return a new array containing the following string combinations: [ 'ab', 'ac', 'ad', 'ba', 'bc', 'bd', 'ca', 'cb', 'cd', 'da', 'db', 'dc' ]

Following is a JavaScript implementation of this algorithm. Let’s see if we can figure out its Big O efficiency: function wordBuilder(array) { let collection = []; for(let i = 0; i < array.length; i++) { for(let j = 0; j < array.length; j++) { if (i !== j) { collection.push(array[i] + array[j]); } } } return collection; }

Here we’re running one loop nested inside another. The outer loop iterates over each character in the array, keeping track of the index i. For each index i, we run an inner loop that iterates again over each character in the same array using the index j. Within this inner loop, we concatenate the characters at i and j, with the exception of when i and j are pointing to the same index.

report erratum • discuss

Chapter 7. Big O in Everyday Code

• 98

To determine the efficiency of our algorithm, we once again need to determine what the N data elements are. In our case, as in the previous example, N is the number of items inside the array passed to the function. The next step is to determine the number of steps our algorithm takes relative to the N data elements. In our case, the outer loop iterates over all N elements, and for each element, the inner loop iterates again over all N elements, which amounts to N steps multiplied by N steps. This is the classic case of O(N2), and is often what nested-loop algorithms turn out to be. Now, what would happen if we modified our algorithm to compute each combination of three-character strings? That is, for our example array of ["a", "b", "c", "d"], our function would return the array: [ 'abc', 'acd', 'bac', 'bcd', 'cab', 'cbd', 'dab', 'dbc',

'abd', 'adb', 'bad', 'bda', 'cad', 'cda', 'dac', 'dca',

'acb', 'adc', 'bca', 'bdc', 'cba', 'cdb', 'dba', 'dcb'

]

Here’s an implementation that uses three nested loops. What is its time complexity? function wordBuilder(array) { let collection = []; for(let i = 0; i < array.length; i++) { for(let j = 0; j < array.length; j++) { for(let k = 0; k < array.length; k++) { if (i !== j && j !== k && i !== k) { collection.push(array[i] + array[j] + array[k]); } } } } return collection; }

In this algorithm, for N data elements, we have N steps of the i loop multiplied by the N steps of the j loop multiplied by the N steps of the k loop. This is N * N * N, which is N3 steps, which is described as O(N3). If we had four or five nested loops, we’d have algorithms that are O(N4) and O(N5), respectively. Let’s see how these all appear on a graph on page 99.

report erratum • discuss

Array Sample

• 99

Optimizing code from a speed of O(N3) to O(N2) would be a big win, since the code becomes exponentially faster.

Array Sample In the next example, we create a function that takes a small sample of an array. We expect to have very large arrays, so our sample is just the first, middlemost, and last value from the array. Here is a Python implementation of this function. See if you can identify its efficiency in Big O: def sample(array): first = array[0] middle = array[int(len(array) / 2)] last = array[-1] return [first, middle, last]

In this case again, the array passed into this function is the primary data, so we can say that N is the number of elements in this array. However, our function ends up taking the same number of steps no matter what N is. Reading from the beginning, midpoint, and last indexes of an array each takes one step no matter the size of the array. Similarly, finding the array’s length and dividing it by 2 also takes one step. Since the number of steps is constant—that is, it remains the same no matter what N is—this algorithm is considered O(1).

report erratum • discuss

Chapter 7. Big O in Everyday Code

• 100

Average Celsius Reading Here’s another example that involves mean averages. Let’s say we’re building weather-forecasting software. To determine the temperature of a city, we take temperature readings from across many thermometers across the city, and we calculate the mean average of those temperatures. We’d also like to display the temperatures in both Fahrenheit and Celsius, but our readings are initially only provided to us in Fahrenheit. To get the average Celsius temperature, our algorithm does two things: first, it converts all the readings from Fahrenheit to Celsius. Then, it calculates the mean average of all the Celsius numbers. Following is some Ruby code that accomplishes this. What is its Big O? def average_celsius(fahrenheit_readings) # Collect Celsius numbers here: celsius_numbers = [] # Convert each reading to Celsius and add to array: fahrenheit_readings.each do |fahrenheit_reading| celsius_conversion = (fahrenheit_reading - 32) / 1.8 celsius_numbers.push(celsius_conversion) end # Get sum of all Celsius numbers: sum = 0.0 celsius_numbers.each do |celsius_number| sum += celsius_number end # Return mean average: return sum / celsius_numbers.length end

First, we can say that N is the number of fahrenheit_readings passed into our method. Inside the method, we run two loops. The first converts the readings to Celsius, and the second sums all the Celsius numbers. Since we have two loops that each iterate over all N elements, we have N + N, which is 2N (plus a few constant steps). Because Big O Notation drops the constants, this gets reduced to O(N). Don’t get thrown off by the fact that in the earlier Word Builder example, two loops led to an efficiency of O(N2). There, the loops were nested, which led to N steps multiplied by N steps. In our case, however, we simply have two loops, one after the other. This is N steps plus N steps (2N), which is a mere O(N).

report erratum • discuss

Clothing Labels

• 101

Clothing Labels Suppose we’re writing software for a clothing manufacturer. Our code accepts an array of newly produced clothing items (stored as strings), and creates text for every possible label we’ll need. Specifically, our labels should contain the item name plus its size, ranging from 1 to 5. For example, if we have the array, ["Purple Shirt", "Green Shirt"], we want to produce label text for those shirts like this: [ "Purple Shirt Size: 1", "Purple Shirt Size: 2", "Purple Shirt Size: 3", "Purple Shirt Size: 4", "Purple Shirt Size: 5", "Green Shirt Size: 1", "Green Shirt Size: 2", "Green Shirt Size: 3", "Green Shirt Size: 4", "Green Shirt Size: 5" ]

Here is Python code that will create this text for an entire array of clothing items: def mark_inventory(clothing_items): clothing_options = [] for item in clothing_items: # For sizes 1 through 5 (Python ranges go UP TO second # number, but do not include it): for size in range(1, 6): clothing_options.append(item + " Size: " + str(size)) return clothing_options

Let’s determine this algorithm’s efficiency. The clothing_items are the primary data being processed, so N is the number of clothing_items. This code contains nested loops, so it’s tempting to declare this algorithm to be O(N2). However, we need to analyze this case a little more carefully. While code containing nested loops often is O(N2), in this case, it’s not. Nested loops that result in O(N2) occur when each loop revolves around N. In our case, however, while our outer loop runs N times, our inner loop runs a constant five times. That is, this inner loop will always run five times no matter what N is.

report erratum • discuss

Chapter 7. Big O in Everyday Code

• 102

So, while our outer loop runs N times, the inner loop runs five times for each of the N strings. While this means our algorithm runs 5N times, this is reduced to O(N), since Big O notation ignores constants.

Count the Ones Here’s another algorithm where the Big O is different from what it seems at first glance. This function accepts an array of arrays, where the inner arrays contain 1’s and 0’s. The function then returns how many 1’s there are. So, for this example input: [ [0, 1, 1, 1, 0], [0, 1, 0, 1, 0, 1], [1, 0] ]

our function will return 7, since there are seven 1’s. Here’s the function in Python: def count_ones(outer_array): count = 0 for inner_array in outer_array: for number in inner_array: if number == 1: count += 1 return count

What’s the Big O of this algorithm? Again, it’s easy to notice the nested loops and jump to the conclusion that it’s O(N2). However, the two loops are iterating over two completely different things. The outer loop is iterating over the inner arrays, and the inner loop is iterating over the actual numbers. At the end of the day, our inner loop only runs for as many numbers as there are in total. Because of this, we can say that N represents how many numbers there are. And since our algorithm simply processes each number, the function’s time complexity is O(N).

Palindrome Checker A palindrome is a word or phrase that reads the same both forward and backward. Some examples include “racecar,” “kayak,” and “deified.”

report erratum • discuss

Get All the Products

• 103

Here’s a JavaScript function that determines whether a string is a palindrome: function isPalindrome(string) { // Start the leftIndex at index 0: let leftIndex = 0; // Start rightIndex at last index of array: let rightIndex = string.length - 1; // Iterate until leftIndex reaches the middle of the array: while (leftIndex < string.length / 2) { // If the character on the left doesn't equal the character // on the right, the string is not a palindrome: if (string[leftIndex] !== string[rightIndex]) { return false; } // Move leftIndex one to the right: leftIndex++; // Move rightIndex one to the left: rightIndex--; } // If we got through the entire loop without finding any // mismatches, the string must be a palindrome: return true; }

Let’s determine the Big O of this algorithm. In this case, N is the size of the string passed to this function. The guts of the algorithm takes place within the while loop. Now, this loop is somewhat interesting because it only runs until it reaches the midpoint of the string. That would mean that the loop runs N / 2 steps. However, Big O ignores constants. Because of this, we drop the division by 2, and our algorithm is O(N).

Get All the Products Our next example is an algorithm that accepts an array of numbers and returns the product of every combination of two numbers. For example, if we passed in the array, [1, 2, 3, 4, 5], the function returns: [2, 3, 4, 5, 6, 8, 10, 12, 15, 20]

This is because we first multiply the 1 by the 2, 3, 4, and 5. Then we multiply the 2 by the 3, 4, and 5. Next, we multiply the 3 by the 4 and the 5. And finally, we multiply the 4 by the 5.

report erratum • discuss

Chapter 7. Big O in Everyday Code

• 104

Note something interesting: when we multiply, say, the 2 by the other numbers, we only have to multiply it by the numbers that are to the right of it. We don’t have to go back and multiply 2 by the 1, because that was already covered back when we multiplied by the 1 by the 2. So, each number only needs to be multiplied by the remaining numbers to the right of it. Here’s a JavaScript implementation of this algorithm: function twoNumberProducts(array) { let products = []; // Outer array: for(let i = 0; i < array.length - 1; i++) { // Inner array, in which j always begins one index // to the right of i: for(let j = i + 1; j < array.length; j++) { products.push(array[i] * array[j]); } } return products; }

Let’s break this down. N is the number of items in the array passed to this function. We run the outer loop N times. (We actually run it N - 1 times, but we’ll drop that constant.) The inner loop, though, is different. Since j always begins one index to the right of i, the inner loop’s number of steps decrease each time that it’s launched by the outer loop. Let’s see how many times the inner loop runs for our example array, which contains five elements: When i is 0, the inner loop runs while j is 1, 2, 3, and 4. When i is 1, the inner loop runs while j is 2, 3, and 4. When i is 2, the inner loop runs while j is 3, and 4. When i is 3, the inner loop runs while j is 4. When all is said and done, the inner loop runs: 4 + 3 + 2 + 1 times. To put this in terms of N, we can say that the inner loop runs approximately: N + (N - 1) + (N - 2) + (N - 3) … + 1 times. This formula always turns out to compute to about N2 / 2. We can visualize this in the diagram on page 105. For the purposes of the diagram, we’ll say that N is 8, so there are 82, or 64, squares.

report erratum • discuss

Get All the Products

• 105

If you work your way from the top row to the bottom, you’ll see that the top row has all N squares shaded gray. The next row has N - 1 squares shaded gray, and the one after that has N - 2 gray squares. This pattern continues until the bottom row, which has just one shaded square. You can also see at a glance that half of the squares are shaded. This demonstrates that the pattern of N + (N - 1) + (N - 2) + (N - 3)… + 1 is equivalent to N2 / 2. We’ve figured out, then, that the inner loop runs for N2 / 2 steps. But because Big O ignores constants, we express this as O(N2).

Dealing with Multiple Datasets Now, what happens if instead of computing the product of every two numbers from a single array, we instead compute the product of every number from one array by every number of a second array? For example, if we had the array, [1, 2, 3] and the array, [10, 100, 1000], we’d compute the products as: [10, 100, 1000, 20, 200, 2000, 30, 300, 3000]

Our code would be similar to the previous snippet, with some slight modifications: function twoNumberProducts(array1, array2) { let products = []; for(let i = 0; i < array1.length; i++) { for(let j = 0; j < array2.length; j++) { products.push(array1[i] * array2[j]); } } return products; }

report erratum • discuss

Chapter 7. Big O in Everyday Code

• 106

Let’s analyze the time complexity of this function. First, what is N? This is the first hurdle, as we now have two datasets, namely, the two arrays. It’s tempting to lump everything together and say that N is the total number of items of both arrays combined. However, this is problematic for the following reason: Here’s a tale of two scenarios. In Scenario 1, there are two arrays of size 5. In Scenario 2, there’s one array of size 9, and another of size 1. In both scenarios, we’d end up saying that N is 10, since 5 + 5 = 10 and 9 + 1 = 10. However, the efficiency of both scenarios is very different. In Scenario 1, our code takes 25 (5 * 5) steps. Because N is 10, this is equivalent to (N / 2)2 steps. In Scenario 2, though, our code takes 9 (9 * 1) steps, which is close to about N steps. This is dramatically faster than Scenario 1! So, we don’t want to consider N to be the total number of integers from both arrays, since we’d never be able to pin down the efficiency in terms of Big O Notation, as it varies based on the different scenarios. We’re in a bit of a bind here. We have no choice but to express the time complexity as O(N * M), where N is the size of one array, and M is the size of the other. This is a new concept: whenever we have two distinct datasets that have to interact with each other through multiplication, we have to identify both sources separately when we describe the efficiency in terms of Big O. While this is the correct way of expressing this algorithm in terms of Big O Notation, it’s a little less useful than other expressions of Big O. Comparing an O(N * M) algorithm to algorithms that only have an N (and not an M) is a little like comparing apples to oranges. However, we do know that there’s a specific range in which O(N * M) lies. That is, if N and M are the same, it’s equivalent to O(N2). And if they’re not the same, and we arbitrarily assign the smaller number to be M, even if M is as low as 1, we end up with O(N). In a sense then, O(N * M) can be construed as a range between O(N) and O(N2). Is this great? No, but it’s the best we can do.

report erratum • discuss

Password Cracker

• 107

Password Cracker You’re a hacker (an ethical one, of course) who’s trying to figure out someone’s password. You decide on a brute-force approach, and write some code that produces every possible string of a given length. Here’s the code you whipped up: def every_password(n) (("a" * n)..("z" * n)).each do |str| puts str end end

The way this works is you pass in a number to the function, which becomes the variable n. If n is 3, for example, the code "a" * n produces the string "aaa". The code then sets a loop between all possible strings within the range of "aaa" and "zzz". Running this code will print: aaa aab aac aad aae ... zzx zzy zzz

If n is 4, your code will print all possible strings of length 4: aaaa aaab aaac aaad aaae ... zzzx zzzy zzzz

If you try running this code even for a mere length of 5, you may be waiting quite some time for it to finish. This is a slow algorithm! But how do we express it in terms of Big O? Let’s break it down.

report erratum • discuss

Chapter 7. Big O in Everyday Code

• 108

If we simply print each letter from the alphabet once, it would take 26 steps. When we print every two-character combination, we end up with 26 characters multiplied by 26 characters. When printing every three-character combination, we end up with 26 * 26 * 26 combinations. Do you see the pattern? Length

Combinations

1

26

2

262

3

263

4

264

If we look at this in terms of N, it emerges that if N is the length of each string, the number of combinations is 26N. Therefore, in Big O Notation, we express this as O(26N). This is an utterly glacial algorithm! The truth is that even an algorithm that is a “mere” O(2N) is incredibly slow. Let’s see how it looks on a graph compared to some of the other algorithms we’ve seen so far:

As you can see, O(2N) gets even slower than O(N3) at a point. In a certain sense, O(2N) is the opposite of O(log N). With an algorithm of O(log N) (like binary search), each time the data is doubled, the algorithm takes one additional step. With an algorithm of O(2N), each time we add one element of data, the algorithm doubles in steps!

report erratum • discuss

Wrapping Up

• 109

In our password cracker, each time we increase N by one, the number of steps get multiplied by 26. This takes an incredible amount of time, which is why brute force is such an inefficient way to crack a password.

Wrapping Up Congratulations! You’re now a Big O pro. You can analyze all sorts of algorithms and categorize their time complexities. Armed with this knowledge, you’ll be able to methodically optimize your code for speed. Speaking of which, in the next chapter, we’ll discover a new data structure that is the one of the most useful and common tools for speeding up algorithms. And I’m talking about some serious speed.

Exercises The following exercises provide you with the opportunity to practice with algorithms in practical situations. The solutions to these exercises are found in the section, Chapter 7, on page 443. 1. Use Big O Notation to describe the time complexity of the following function. The function returns true if the array is a “100-Sum Array,” and false if it is not. A “100-Sum Array” meets the following criteria: • Its first and last numbers add up to 100. • Its second and second-to-last numbers add up to 100. • Its third and third-to-last numbers add up to 100, and so on. Here is the function: def one_hundred_sum?(array) left_index = 0 right_index = array.length - 1 while left_index < array.length / 2 if array[left_index] + array[right_index] != 100 return false end left_index += 1 right_index -= 1 end return true end

report erratum • discuss

Chapter 7. Big O in Everyday Code

• 110

2. Use Big O Notation to describe the time complexity of the following function. It merges two sorted arrays together to create a new sorted array containing all the values from both arrays: def merge(array_1, array_2) new_array = [] array_1_pointer = 0 array_2_pointer = 0 # Run the loop until we've reached end of both arrays: while array_1_pointer < array_1.length || array_2_pointer < array_2.length # If we already reached the end of the first array, # add item from second array: if !array_1[array_1_pointer] new_array 2.5, "hot dog" => 1.5, "soda" => 0.6 }

report erratum • discuss

Chapter 8. Blazing Fast Lookup with Hash Tables

• 114

A hash table is a list of paired values. The first item in each pair is called the key, and the second item is called the value. In a hash table, the key and value have some significant association with one another. In this example, the string, "french fries" is the key, and 0.75 is the value. They are paired together to indicate that french fries cost 75 cents. In Ruby, you can look up a key’s value using this syntax: menu["french fries"]

This would return the value 0.75. Looking up a value in a hash table has an efficiency of O(1) on average, as it usually takes just one step. Let’s see why.

Hashing with Hash Functions Do you remember those secret codes you used as a kid to create and decipher messages? For example, here’s a simple way to map letters to numbers: A=1 B=2 C=3 D=4 E=5 and so on. According to this code, ACE converts to 135, CAB converts to 312, DAB converts to 412, and BAD converts to 214. This process of taking characters and converting them to numbers is known as hashing. And the code that is used to convert those letters into particular numbers is called a hash function. There are many other hash functions besides this one. Another example of a hash function may be to take each letter’s corresponding number and return the sum of all the numbers. If we did that, BAD would become the number 7 following a two-step process:

report erratum • discuss

Building a Thesaurus for Fun and Profit, but Mainly Profit

• 115

Step 1: First, BAD converts to 214. Step 2: We then take each of these digits and get their sum: 2+1+4=7 Another example of a hash function may be to return the product of all the letters’ corresponding numbers. This would convert the word BAD into the number 8: Step 1: First, BAD converts to 214. Step 2: We then take the product of these digits: 2*1*4=8 In our examples for the remainder of this chapter, we’re going to stick with this last version of the hash function. Real-world hash functions are more complex than this, but this “multiplication” hash function will keep our examples clear and simple. The truth is that a hash function needs to meet only one criterion to be valid: a hash function must convert the same string to the same number every single time it’s applied. If the hash function can return inconsistent results for a given string, it’s not valid. Examples of invalid hash functions include functions that use random numbers or the current time as part of their calculation. With these functions, BAD might convert to 12 one time, and 106 another time. With our “multiplication” hash function, however, BAD will always convert to 8. That’s because B is always 2, A is always 1, and D is always 4. And 2 * 1 * 4 is always 8. There’s no way around this. Note that with this hash function, DAB will also convert into 8 just as BAD will. This will actually cause some issues that I’ll address later. Armed with the concept of hash functions, we can now understand how a hash table actually works.

Building a Thesaurus for Fun and Profit, but Mainly Profit On nights and weekends, you’re single-handedly working on a stealth startup that will take over the world. It’s…a thesaurus app. But this isn’t any old thesaurus app—this is Quickasaurus. And you know that it will totally disrupt the billion-dollar thesaurus market. When a user looks up a word in Quickasaurus, it returns just one synonym, instead of every possible synonym as old-fashioned thesaurus apps do.

report erratum • discuss

Chapter 8. Blazing Fast Lookup with Hash Tables

• 116

Since every word has an associated synonym, this is a great use case for a hash table. After all, a hash table is a list of paired items. Let’s get started. We can represent our thesaurus using a hash table: thesaurus = {}

Under the hood, a hash table stores its data in a bunch of cells in a row, similar to an array. Each cell has a corresponding number. For example:

(We left off index 0 since nothing would be stored there given our “multiplication” hash function.) Let’s add our first entry into the hash table: thesaurus["bad"] = "evil"

In code, our hash table now looks like this: {"bad" => "evil"}

Let’s explore how the hash table stores this data. First, the computer applies the hash function to the key. Again, we’ll be using the “multiplication” hash function described previously. So, this would compute as: BAD = 2 * 1 * 4 = 8 Since our key ("bad") hashes into 8, the computer places the value ("evil") into cell 8:

Now, let’s add another key-value pair: thesaurus["cab"] = "taxi"

Again, the computer hashes the key: CAB = 3 * 1 * 2 = 6

report erratum • discuss

Hash Table Lookups

• 117

Since the resulting value is 6, the computer stores the value ("taxi") inside cell 6.

Let’s add one more key-value pair: thesaurus["ace"] = "star"

To sum up what’s happening here: for every key-value pair, each value is stored at the index of the key, after the key has been hashed. ACE hashes into 15, since ACE = 1 * 3 * 5 = 15, so "star" gets placed into cell 15:

In code, our hash table currently looks like this: {"bad" => "evil", "cab" => "taxi", "ace" => "star"}

Hash Table Lookups When we look up items from a hash table, we use a key to find its associated value. Let’s see how this works with our Quickasaurus example hash table. Suppose we want to look up the value associated with the key "bad". In our code, we’d say: thesaurus["bad"]

To find the value associated with "bad", the computer executes two simple steps: 1. The computer hashes the key we’re looking up: BAD = 2 * 1 * 4 = 8. 2. Since the result is 8, the computer looks inside cell 8 and returns the value stored there. In this case, that is the string, "evil". Let’s take a step back and look at the big picture here. In a hash table, the placement of each value is determined by its key. That is, by hashing the key itself, we compute the index number where the key’s associated value should be placed.

report erratum • discuss

Chapter 8. Blazing Fast Lookup with Hash Tables

• 118

Because the key determines the placement of the value, we use this principle to make lookups a cinch. When we have any key and want to find its value, the key itself tells us where the value will be found. Just as we hashed the key to insert the value in the appropriate cell, we can hash the key again to find where we previously put that value. It now becomes clear why looking up a value in a hash table is typically O(1): it’s a process that takes a constant amount of time. The computer hashes the key, turns it into a number, and jumps to the index with that number to retrieve the value stored there. We can now understand why a hash table would yield faster lookups for our restaurant menu than an array. With an array, when we look up the price of a menu item, we would have to search through each cell until we find it. For an unordered array, this would take up to O(N), and for an ordered array, this would take up to O(log N). Using a hash table, however, we can now use the actual menu items as keys, allowing us to do a hash table lookup of O(1). And that’s the beauty of a hash table.

One-Directional Lookups It’s important to point out that the ability to find any value within the hash table in a single step only works if we know the value’s key. If we tried to find a particular value without knowing its key, we’d still have to resort to searching each and every key-value pair within the hash table, which is O(N). Similarly, we can only do O(1) lookups when using a key to find the value. If, on the other hand, we want to use a value to find its associated key, we cannot take advantage of the hash table’s fast lookup ability. This is because the whole premise of the hash table is that the key determines the value’s location. But this premise only works in one direction: we use the key to find the value. The value does not determine the key’s location, so we have no way to easily find any key without combing through all of them. Come to think of it, where are the keys stored? In the previous diagrams, we only saw how the values are stored in the hash table. While this detail may vary from language to language, some languages store the keys next to the values themselves. This is useful in case of collisions, which I’ll discuss in the next section. In any case, there’s another aspect of the hash table’s one-directional nature that’s worth noting. Each key can exist only once in the hash table, but there can be multiple instances of a value.

report erratum • discuss

Dealing with Collisions

• 119

If we think about the menu example from the beginning of this chapter, we can’t have the hamburger listed twice (nor would we want to, as it only has one price). However, we could have multiple foods that cost $2.50. In many languages, if we try to store a key-value pair where the key already exists, it simply overwrites the old value while keeping the same key.

Dealing with Collisions Hash tables are awesome, but are not without complications. Continuing our thesaurus example: what happens if we want to add the following entry into our thesaurus? thesaurus["dab"] = "pat"

First, the computer would hash the key: DAB = 4 * 1 * 2 = 8 Then, it would try to add "pat" to our hash table’s cell 8:

Uh-oh. Cell 8 is already filled with "evil"—literally! Trying to add data to a cell that is already filled is known as a collision. Fortunately, there are ways around it. One classic approach for handling collisions is known as separate chaining. When a collision occurs, instead of placing a single value in the cell, it places in it a reference to an array. Let’s look more carefully at a subsection of our hash table’s underlying data storage:

In our example, the computer wants to add "pat" to cell 8, but it already contains "evil". So, it replaces the contents of cell 8 with an array as shown on page 120.

report erratum • discuss

Chapter 8. Blazing Fast Lookup with Hash Tables

• 120

This array contains subarrays where the first value is the word, and the second value is its synonym. Let’s walk through how a hash table lookup works in this case. If we look up: thesaurus["dab"]

the computer takes the following steps: 1. It hashes the key: DAB = 4 * 1 * 2 = 8. 2. It looks up cell 8. The computer takes note that cell 8 contains an array of arrays rather than a single value. 3. It searches through the array linearly, looking at index 0 of each subarray until it finds our key ("dab"). It then returns the value at index 1 of the correct subarray. Let’s walk through these steps visually. We hash DAB into 8, so the computer inspects that cell:

report erratum • discuss

Dealing with Collisions

• 121

Since cell 8 contains an array of subarrays, we begin a linear search through each subarray, starting at the first one. We inspect index 0 of the first subarray:

It does not contain the key we are looking for ("dab"), so we move on to index 0 of the next subarray:

We found "dab", which would indicate that the value at index 1 of that subarray ("pat") is the value we’re looking for. In a scenario where the computer hits upon a cell that references an array, its search can take some extra steps, as it needs to conduct a linear search within an array of multiple values. If somehow all of our data ended up within a single cell of our hash table, our hash table would be no better than an array. So, it actually turns out that the worst-case performance for a hash table lookup is O(N). Because of this, it is critical that a hash table is designed in such a way that it will have few collisions, and therefore, typically perform lookups in O(1) time rather than O(N) time.

report erratum • discuss

Chapter 8. Blazing Fast Lookup with Hash Tables

• 122

Luckily, most programming languages implement hash tables and handle these details for us. However, by understanding how it all works under the hood, we can appreciate how hash tables eke out O(1) performance. Let’s see how hash tables can be set up to avoid frequent collisions.

Making an Efficient Hash Table Ultimately, a hash table’s efficiency depends on three factors: • How much data we’re storing in the hash table • How many cells are available in the hash table • Which hash function we’re using It makes sense why the first two factors are important. If you have a lot of data and only a few cells, there will be many collisions and the hash table will lose its efficiency. Let’s explore, however, why the hash function itself is important for efficiency. Let’s say we’re using a hash function that always produces a value that falls in the range from 1 to 9. An example of this is a hash function that converts letters into their corresponding numbers and keeps adding the resulting digits together until it ends up with a single digit. For example: PUT = 16 + 21 + 20 = 57 Because 57 contains more than one digit, the hash function breaks up the 57 into 5 + 7: 5 + 7 = 12 12 also contains more than one digit, so it breaks up the 12 into 1 + 2: 1+2=3 In the end, PUT hashes into 3. This hash function by its very nature will always return a number 1 through 9. Let’s return to our example hash table:

With this hash function, the computer would never even use cells 10 through 16 even though they exist. All data would be stuffed into cells 1 through 9.

report erratum • discuss

Making an Efficient Hash Table

• 123

A good hash function, therefore, is one that distributes its data across all available cells. The more we can spread out our data, the fewer collisions we will have.

The Great Balancing Act You learned that a hash table’s efficiency goes up as its number of collisions goes down. In theory, then, the best way to avoid collisions would be to have a hash table with a large number of cells. Imagine we want to store just five items in our hash table. A hash table with 1,000 cells would seem to be wonderful for our case, since odds are there would be no collisions. However, while avoiding collisions is important, we have to balance that with avoiding memory hogging as well. Although a hash table with 1,000 cells for our five pieces of data is great for avoiding collisions, we’d be using up 1,000 cells just to store just five pieces of data, and that’s a poor use of memory. This is the balancing act that a hash table must perform. A good hash table strikes a balance of avoiding collisions while not consuming lots of memory. To accomplish this, computer scientists have developed the following rule of thumb: for every 7 data elements stored in a hash table, it should have 10 cells. So, if you’re planning on storing 14 elements, you’d want to have 20 available cells, and so on. This ratio of data to cells is called the load factor. Using this terminology, we’d say that the ideal load factor is 0.7 (7 elements / 10 cells). If you initially stored 7 pieces of data in a hash table, the computer might allocate a hash table with 10 cells. When you begin to add more data, though, the computer will expand the hash table by adding more cells and changing the hash function so that the new data will be distributed evenly across the new cells. Again, most of the internals of a hash table are managed by the computer language you’re using. It decides how big the hash table needs to be, what hash function to use, and when it’s time to expand the hash table. You have the right to assume that your programming language has implemented its hash table to allow for peak performance. Now that we’ve seen how hashes work, it’s clear that they have a superior lookup efficiency of O(1). We’re going to use this knowledge shortly to optimize our code for speed.

report erratum • discuss

Chapter 8. Blazing Fast Lookup with Hash Tables

• 124

But first, let’s first take a quick tour of the many different use cases for hash tables when it comes to simple data organization.

Hash Tables for Organization Because hash tables keep data in pairs, they are useful in many scenarios for organizing data. Some data exists naturally in paired form. The fast-food menu and thesaurus scenarios from this chapter are classic examples of this. The menu contains each food item paired with its price. The thesaurus contains each word paired with its synonym. In fact, in Python, hash tables are known as dictionaries since a dictionary is a common form of paired data: it’s a list of words with their respective definitions. Other examples of naturally paired data can include tallies, such as political candidates and the number of votes each received: {"Candidate A" => 1402021, "Candidate B" => 2321443, "Candidate C" => 432}

An inventory tracking system, which keeps track of how much of each item is in supply, is another tally example: {"Yellow Shirt" => 1203, "Blue Jeans" => 598, "Green Felt Hat" => 65}

Hash tables are such a natural fit for paired data that we can even use them to simplify conditional logic in certain instances. Say we encounter a function that returns the meaning of common HTTP status code numbers: def status_code_meaning(number) if number == 200 return "OK" elsif number == 301 return "Moved Permanently" elsif number == 401 return "Unauthorized" elsif number == 404 return "Not Found" elsif number == 500 return "Internal Server Error" end end

If we think about this code, we’ll realize that the conditional logic revolves around paired data, namely, the status code numbers and their respective meanings.

report erratum • discuss

Hash Tables for Speed

• 125

By using a hash table, we can completely eliminate the conditional logic: STATUS_CODES = {200 => "OK", 301 => "Moved Permanently", 401 => "Unauthorized", 404 => "Not Found", 500 => "Internal Server Error"} def status_code_meaning(number) return STATUS_CODES[number] end

Another common use for hash tables is to represent objects that have various attributes. For example, here’s a representation of a dog: {"Name" => "Fido", "Breed" => "Pug", "Age" => 3, "Gender" => "Male"}

As you can see, attributes are a kind of paired data, since the attribute name becomes the key, and the actual attribute becomes the value. We can create an entire list of dogs if we place multiple hash tables inside an array: [ {"Name" => "Fido", "Breed" => "Pug", "Age" => 3, "Gender" => "Male"}, {"Name" => "Lady", "Breed" => "Poodle", "Age" => 6, "Gender" => "Female"}, {"Name" => "Spot", "Breed" => "Dalmatian", "Age" => 2, "Gender" => "Male"} ]

Hash Tables for Speed While hash tables are a perfect fit for paired data, they can also be used to make your code faster—even if your data doesn’t exist as pairs. And this is where things really get exciting. Here’s a simple array: array = [61, 30, 91, 11, 54, 38, 72]

If you want to search for a number in this array, how many steps would it take? Because the array is unordered, you would have to perform a linear search, which would take N steps. You learned this all the way back at the beginning of the book. However, what would happen if we ran some code that would convert these numbers into a hash table that looked like this? hash_table = {61 => true, 30 => true, 91 => true, 11 => true, 54 => true, 38 => true, 72 => true}

report erratum • discuss

Chapter 8. Blazing Fast Lookup with Hash Tables

• 126

Here, we’ve stored each number as a key and assigned the boolean true as the associated value for each number. Now, if I asked you to search this hash table for a certain number as a key, how many steps would it take? Well, with the simple code: hash_table[72]

I could look up the number 72 in a single step. That is, by doing a hash table lookup using 72 as the key, I can determine in one step whether the 72 is present in the hash table. The reasoning is straightforward: if 72 is a key in the hash table, I’d get back true, since the 7 has true as its value. On the other hand, if the 72 is not a key in the hash table, I’d get back nil. (Various languages return different values when a key is not present in a hash table. Ruby returns nil.) Since doing a hash table lookup takes just one step, I can therefore find any number in the hash table (as a key) in one step. Can you see the magic? By converting an array into a hash table in this way, we can go from O(N) searches to O(1) searches. Here’s what’s interesting about using a hash table in this way. Even though hash tables are often used for naturally paired data, our data here is not paired. We just care about a list of single numbers. While we did assign a value to each key, it doesn’t really matter what the value is. We used true as the value for each key, but any arbitrary value (that is “truthy”) would achieve the same results. The trick here is that by placing each number in the hash table as a key, we can later look up each of those keys in one step. If our lookup returns any value, it means the key itself must be in the hash table. If we get back nil, then the key must not be in the hash table. I refer to using a hash table in this way as “using it as an index.” (It’s my own term.) An index at the back of a book tells you whether the topic can be found in the book, instead of you having to flip through all the pages to find it. Here as well, we created the hash table to serve as a kind of index; in our case, it’s an index that tells us whether a specific item is contained within the original array. Let’s use this technique to boost the speed of a very practical algorithm.

report erratum • discuss

Hash Tables for Speed

• 127

Array Subset Let’s say we need to determine whether one array is a subset of another array. Take these two arrays, for example: ["a", "b", "c", "d", "e", "f"] ["b", "d", "f"]

The second array, ["b", "d", "f"] is a subset of the first array, ["a", "b", "c", "d", "e", "f"] because every value of the second array is contained within the first array. However, if our arrays were: ["a", "b", "c", "d", "e", "f"] ["b", "d", "f", "h"]

the second array is not a subset of the first array, because the second array contains the value "h", which does not exist within the first array. How would we write a function that compares two arrays and lets us know if one is a subset of the other? One way we can do this is by using nested loops. Essentially, we’d iterate through every element of the smaller array, and for each element in the smaller array, we’d then begin a second loop that iterates through each element of the larger array. If we ever find an element in the smaller array that isn’t contained within the larger array, our function will return false. If the code gets past the loops, it means it never encountered a value in the smaller array that wasn’t contained within the larger array, so it returns true. Here’s a JavaScript implementation of this approach: function isSubset(array1, array2) { let largerArray; let smallerArray; // Determine which array is smaller: if(array1.length > array2.length) { largerArray = array1; smallerArray = array2; } else { largerArray = array2; smallerArray = array1; } // Iterate through smaller array: for(let i = 0; i < smallerArray.length; i++) { // Assume temporarily that the current value from // smaller array is not found in larger array: let foundMatch = false;

report erratum • discuss

Chapter 8. Blazing Fast Lookup with Hash Tables

• 128

// For each value in smaller array, iterate through // larger array: for(let j = 0; j < largerArray.length; j++) { // If the two values are equal, it means the current // value in smaller array is present in the larger array: if(smallerArray[i] === largerArray[j]) { foundMatch = true; break; } } // If the current value in smaller array doesn't exist // in larger array, return false: if(foundMatch === false) { return false; } } // If we get to the end of the loops, it means that all // values from smaller array are present in larger array: return true; }

When we analyze the efficiency of this algorithm, we find that it’s O(N * M), since it runs for the number of items in the first array multiplied by the number of items in the second array. Now, let’s harness the power of a hash table to dramatically improve the efficiency of our algorithm. Let’s ditch our original approach and start again from scratch. In our new approach, after we’ve determined which array is larger and which is smaller, we’re going to run a single loop through the larger array, and store each value inside of a hash table: let hashTable = {}; for(const value of largerArray) { hashTable[value] = true; }

In this code snippet, we create an empty hash table inside the hashTable variable. Then, we iterate through each value in the largerArray, and add the item from the array to the hash table. We add the item itself as a key, and true as the value. For the earlier example, ["a", "b", "c", "d", "e", "f"], once we’ve run it through this loop, we end up with a hash table that looks like this: {"a": true, "b": true, "c": true, "d": true, "e": true, "f": true}

report erratum • discuss

Hash Tables for Speed

• 129

This becomes our “index” that will allow us to conduct O(1) lookups of these items later on. Now, here’s the brilliant part. Once the first loop is complete and we now have this hash table to work with, we can then begin a second (non-nested) loop that iterates through the smaller array: for(const value of smallerArray) { if(!hashTable[value]) { return false; } }

This loop looks at each item in the smallerArray and checks to see whether it exists as a key inside the hashTable. Remember, the hashTable stores all the items from largerArray as its keys. So, if we find an item in hashTable, it means the item is also in largerArray. And if we don’t find an item in hashTable, it means it’s also not inside the largerArray. So, for each item in smallerArray, we check whether it’s a key in hashTable. If it’s not, that means the item isn’t contained within the largerArray, and the smallerArray is therefore not a subset of the larger array and we return false. (However, if we get past this loop, it means the smaller array is a subset of the larger one.) Let’s put this altogether in one complete function: function isSubset(array1, array2) { let largerArray; let smallerArray; let hashTable = {}; // Determine which array is smaller: if(array1.length > array2.length) { largerArray = array1; smallerArray = array2; } else { largerArray = array2; smallerArray = array1; } // Store all items from largerArray inside hashTable: for(const value of largerArray) { hashTable[value] = true; } // Iterate through each item in smallerArray and return false // if we encounter an item not inside hashTable: for(const value of smallerArray) { if(!hashTable[value]) { return false; } }

report erratum • discuss

Chapter 8. Blazing Fast Lookup with Hash Tables

• 130

// If we got this far in our code without returning false, // it means that all the items in the smallerArray // must be contained within largerArray: return true; }

Now, how many steps did this algorithm take? We iterated through each item of the larger array once in order to build the hash table. And we iterated through each item of the smaller array taking just one step per item to look up the item in the hash table. Remember, a hash table lookup takes just one step. If we say that N is the total number of items of both arrays combined, our algorithm is O(N), since we touched each item just once. That is, we spent one step on each item from the larger array followed by one step on each item from the smaller array. That’s a huge win over our first algorithm, which was O(N * M). This technique of using a hash table as an “index” comes up frequently in algorithms that require multiple searches within an array. That is, if your algorithm will need to keep searching for values inside an array, each search would itself take up to N steps. By creating a hash table “index” of the array, we reduce each search to only one step. As I pointed out, what makes this technique particularly interesting is that when using a hash table as an “index,” we aren’t even dealing with naturally paired data. Instead, we just want to know whether the key itself is in the hash table. When we use the key to perform a lookup in the hash table and receive any value (no matter how arbitrary it is), it means the key must be present in the hash table.

Wrapping Up Hash tables are indispensable when it comes to building efficient software. With their O(1) reads and insertions, it’s a difficult data structure to beat. Until now, our analysis of various data structures revolved around their efficiency and speed. But did you know that some data structures provide advantages other than speed? In the next lesson, we’re going to explore two data structures that can help improve code elegance and maintainability.

report erratum • discuss

Exercises

• 131

Exercises The following exercises provide you with the opportunity to practice with hash tables. The solutions to these exercises are found in the section, Chapter 8, on page 444. 1. Write a function that returns the intersection of two arrays. The intersection is a third array that contains all values contained within the first two arrays. For example, the intersection of [1, 2, 3, 4, 5] and [0, 2, 4, 6, 8] is [2, 4]. Your function should have a complexity of O(N). (If your programming language has a built-in way of doing this, don’t use it. The idea is to build the algorithm yourself.) 2. Write a function that accepts an array of strings and returns the first duplicate value it finds. For example, if the array is ["a", "b", "c", "d", "c", "e", "f"], the function should return "c", since it’s duplicated within the array. (You can assume that there’s one pair of duplicates within the array.) Make sure the function has an efficiency of O(N). 3. Write a function that accepts a string that contains all the letters of the alphabet except one and returns the missing letter. For example, the string, "the quick brown box jumps over a lazy dog" contains all the letters of the alphabet except the letter, "f". The function should have a time complexity of O(N). 4. Write a function that returns the first non-duplicated character in a string. For example, the string, "minimum" has two characters that only exist once—the "n" and the "u", so your function should return the "n", since it occurs first. The function should have an efficiency of O(N).

report erratum • discuss

CHAPTER 9

Crafting Elegant Code with Stacks and Queues Until now, our discussion around data structures has focused primarily on how they affect the performance of various operations. However, having a variety of data structures in your programming arsenal also allows you to create code that is simpler and easier to read. In this chapter, you’re going to discover two new data structures: stacks and queues. The truth is that these two structures are not entirely new. They’re simply arrays with restrictions. Yet, these restrictions are exactly what make them so elegant. More specifically, stacks and queues are elegant tools for handling temporary data. From operating system architecture to printing jobs to traversing data, stacks and queues serve as temporary containers that can be used to form beautiful algorithms. Think of temporary data like the food orders in a diner. What each customer orders is important until the meal is made and delivered; then you throw the order slip away. You don’t need to keep that information around. Temporary data is information that doesn’t have any meaning after it’s processed, so you can throw it away once you’re done with it. Stacks and queues handle this kind of temporary data, but have a special focus on the order in which the data is handled, as you’ll now learn.

Stacks A stack stores data in the same way arrays do—it’s simply a list of elements. The one catch is that stacks have the following three constraints:

report erratum • discuss

Chapter 9. Crafting Elegant Code with Stacks and Queues

• 134

• Data can be inserted only at the end of a stack. • Data can be deleted only from the end of a stack. • Only the last element of a stack can be read. You can think of a stack as an actual stack of dishes; you can’t look at the face of any dish other than the one at the top. Similarly, you can’t add any dish except to the top of the stack, nor can you remove any dish besides the one at the top. (At least, you shouldn’t.) In fact, most computer science literature refers to the end of the stack as its top, and the beginning of the stack as its bottom. Our diagrams will reflect this terminology by viewing stacks as vertical arrays, like so:

As you can see, the first item in the array becomes the bottom of the stack, while the last item becomes the stack’s top. While the restrictions of a stack seem—well—restrictive, we’ll see shortly how they are to our benefit. To see a stack in action, let’s start with an empty stack. Inserting a new value into a stack is also called pushing onto the stack. Think of it as adding a dish onto the top of the dish stack. Let’s push a 5 onto the stack:

Again, there’s nothing fancy going on here. All we’re really doing is just inserting a data element into the end of an array.

report erratum • discuss

Stacks

• 135

Now, let’s push a 3 onto the stack:

Next, let’s push a 0 onto the stack:

Note that we’re always adding data to the top (that is, the end) of the stack. If we wanted to insert the 0 into the bottom or middle of the stack, we couldn’t, because that is the nature of a stack: data can only be added to the top. Removing elements from the top of the stack is called popping from the stack. Because of a stack’s restrictions, we can only pop data from the top. Let’s pop some elements from our example stack. First, we pop the 0:

Our stack now contains only two elements: the 5 and the 3. Next, we pop the 3:

Our stack now only contains the 5:

report erratum • discuss

Chapter 9. Crafting Elegant Code with Stacks and Queues

• 136

A handy acronym used to describe stack operations is LIFO, which stands for “Last In, First Out.” All this means is that the last item pushed onto a stack is always the first item popped from it. It’s sort of like students who slack off—they’re always the last to arrive to class, but the first to leave.

Abstract Data Types Most programming languages don’t actually come with the stack as a builtin data type or class. Instead, it’s up to you to implement it yourself. This is a stark contrast with arrays, which are available in most languages. To create a stack, then, you generally have to use one of the built-in data structures to actually hold the data. Here is one way to implement a stack using Ruby, which uses an array under the hood: class Stack def initialize @data = [] end def push(element) @data ")", "[" => "]", "{" => "}"}[opening_brace] end end

report erratum • discuss

Chapter 9. Crafting Elegant Code with Stacks and Queues

• 142

The lint method accepts a string containing JavaScript code and iterates over each character with: text.each_char do |char|

If we encounter an opening brace, we push it onto the stack: if is_opening_brace?(char) @stack.push(char)

Note that we’re using a helper method defined as follows called is_opening_brace?, which checks whether the character is an opening brace: ["(", "[", "{"].include?(char)

When we encounter a closing brace, we pop the top of the stack and store it in a variable called popped_opening_brace: popped_opening_brace = @stack.pop

Our stack only stores opening braces, so whatever we popped will be some sort of opening brace, assuming there was something on the stack to pop. However, it’s possible the stack was empty, in which case the result of our pop will be nil. If this is the case, it means we’ve encountered Syntax Error Type #2: if !popped_opening_brace return "#{char} doesn't have opening brace" end

For simplicity’s sake, we return a basic string containing an error message when we encounter any errors during our linting process. Assuming that we did pop an opening brace off the stack, we then check it to see whether it matches the current closing brace. If it doesn’t match, this is Syntax Error Type #3: if is_not_a_match(popped_opening_brace, char) return "#{char} has mismatched opening brace" end

(The is_not_a_match helper method is defined later in our code.) Finally, when our code finishes parsing the line, we check whether there are any opening braces left in the stack with @stack.read. If there are, it means we have an opening brace that was never closed, and we produce an error message. This is Syntax Error Type #1: if @stack.read return "#{@stack.read} does not have closing brace" end

report erratum • discuss

The Importance of Constrained Data Structures

• 143

Finally, we return true if the JavaScript contains no errors. We can then use our Linter class as follows: linter = Linter.new puts linter.lint("( var x = { y: [1, 2, 3] } )")

In this example, the JavaScript code is correct, so we just get back true. However, if we input a line that has an error, such as a missing opening parenthesis: "var x = { y: [1, 2, 3] })"

we’ll get the error message, ) doesn't have opening brace. In this example, we used a stack to implement our linter with a really neat algorithm. But if a stack actually uses an array under the hood, why bother with a stack? Couldn’t we have accomplished the same task using an array?

The Importance of Constrained Data Structures By definition, if a stack is just a constrained version of an array, that means an array can do anything a stack can do. If so, what advantage does a stack provide? Constrained data structures like the stack (and the queue, which we’ll encounter shortly) are important for several reasons. First, when we work with a constrained data structure, we can prevent potential bugs. The linting algorithm, for example, only works if we exclusively remove items from the top of the stack. If a programmer inadvertently writes code that removes items from the middle of the array, the algorithm will break down. By using a stack, we’re forced into only removing items from the top, as it’s impossible to get the stack to remove any other item. Second, data structures like stacks give us a new mental model for tackling problems. The stack, for example, gives us the whole idea of a Last-In-FirstOut process. We can then apply this LIFO mindset to solve all sorts of problems, such as the linter just described. And once we’re familiar with the stack and its LIFO nature, the code we write using the stack becomes familiar and elegant to other developers who read our code. As soon as someone sees a stack being used within an algorithm, they immediately know that the algorithm is working with a LIFO-based process.

report erratum • discuss

Chapter 9. Crafting Elegant Code with Stacks and Queues

• 144

Stack Wrap-Up Stacks are ideal for processing any data that should be handled last in, first out. The “undo” function in a word processor, for example, is a great use case for a stack. As the user types, we keep track of each keystroke by pushing the keystroke onto the stack. Then, when the user hits the “undo” key, we pop the most recent keystroke from the stack and eliminate it from the document. At this point, their next-to-most recent keystroke is now sitting at the top of the stack, ready to be undone if need be.

Queues A queue is another data structure designed to process temporary data. It’s like a stack in many ways, except that it processes data in a different order. Like a stack, a queue is also an abstract data type. You can think of a queue as a line of people at the movie theater. The first one in the line is the first one to leave the line and enter the theater. With queues, the first item added to the queue is the first item to be removed. That’s why computer scientists apply the acronym “FIFO” to queues: First In, First Out. As with a line of people, a queue is usually depicted horizontally. It’s also common to refer to the beginning of the queue as its front, and the end of the queue as its back. Like stacks, queues are arrays with three restrictions (it’s just a different set of restrictions): • Data can be inserted only at the end of a queue. (This is identical behavior to the stack.) • Data can be deleted only from the front of a queue. (This is the opposite behavior of the stack.) • Only the element at the front of a queue can be read. (This, too, is the opposite of behavior of the stack.) Let’s see a queue in action, beginning with an empty queue. First, we insert a 5 (a common term for inserting into a queue is enqueue, but we’ll use the terms insert and enqueue interchangeably):

report erratum • discuss

Queues

• 145

Next, we insert a 9:

Next, we insert a 100:

As of now, the queue has functioned just like a stack. However, removing data happens in the reverse, as we remove data from the front of the queue. (Removing an element from a queue is also known as dequeuing.) If we want to remove data, we must start with the 5, since it’s at the front of the queue:

Next, we remove the 9:

Our queue now only contains one element, the 100.

Queue Implementation I mentioned that the queue is an abstract data type. Like many other abstract data types, it doesn’t come implemented in many programming languages. Here is a Ruby implementation of a queue: class Queue def initialize @data = [] end def enqueue(element) @data 100 return add_until_100(array[1, array.length - 1]) else return array[0] + add_until_100(array[1, array.length - 1]) end end

2. The following function uses recursion to calculate the Nth number from a mathematical sequence known as the “Golomb sequence.” It’s terribly inefficient, though! Use memoization to optimize it. (You don’t have to actually understand how the Golomb sequence works to do this exercise.) def golomb(n) return 1 if n == 1 return 1 + golomb(n - golomb(golomb(n - 1))); end

3. Here is a solution to the “Unique Paths” problem from an exercise in the previous chapter. Use memoization to improve its efficiency: def unique_paths(rows, columns) return 1 if rows == 1 || columns == 1 return unique_paths(rows - 1, columns) + unique_paths(rows, columns - 1) end

report erratum • discuss

CHAPTER 13

Recursive Algorithms for Speed We’ve seen that understanding recursion unlocks all sorts of new algorithms, such as traversing a filesystem or producing anagrams. In this chapter, you’re going to learn that recursion is also the key to algorithms that can make our code run much, much faster. In previous chapters, we’ve encountered a number of sorting algorithms, including Bubble Sort, Selection Sort, and Insertion Sort. In real life, however, none of these methods are actually used to sort arrays. Most computer languages have built-in sorting functions for arrays that save us the time and effort of implementing our own. And in many of these languages, the sorting algorithm that is employed under the hood is Quicksort. The reason we’re going to dig into Quicksort (even though it’s already built into many computer languages) is because by studying how it works, you can learn how to use recursion to greatly speed up an algorithm, and you can do the same for other practical algorithms of the real world. Quicksort is an extremely fast sorting algorithm that is particularly efficient for average scenarios. While in worst-case scenarios (that is, inversely sorted arrays), it performs similarly to Insertion Sort and Selection Sort, it is much faster for average scenarios—which are what occur most of the time. Quicksort relies on a concept called partitioning, so we’ll jump into that first.

Partitioning To partition an array is to take a random value from the array—which is then called the pivot—and make sure that every number that is less than the pivot ends up to the left of the pivot, and that every number greater than the pivot ends up to the right of the pivot. We accomplish partitioning through a simple algorithm that will be described in the following example.

report erratum • discuss

Chapter 13. Recursive Algorithms for Speed

• 200

Let’s say we have the following array:

For consistency’s sake, we’ll always select the rightmost value to be our pivot (although we can technically choose other values). In this case, the number 3 is our pivot. We indicate this by circling it:

We then assign “pointers”—one to the left-most value of the array, and one to the rightmost value of the array, excluding the pivot itself:

We’re now ready to begin the actual partition, which follows these steps. Don’t worry—the steps will become clearer when we walk through our example shortly. 1. The left pointer continuously moves one cell to the right until it reaches a value that is greater than or equal to the pivot, and then stops. 2. Then, the right pointer continuously moves one cell to the left until it reaches a value that is less than or equal to the pivot, and then stops. The right pointer will also stop if it reaches the beginning of the array. 3. Once the right pointer has stopped, we reach a crossroads. If the left pointer has reached (or gone beyond) the right pointer, we move on to Step 4. Otherwise, we swap the values that the left and right pointers are pointing to, and then go back to repeat Steps 1, 2, and 3 again. 4. Finally, we swap the pivot with the value that the left pointer is currently pointing to. When we’re done with a partition, we are now assured that all values to the left of the pivot are less than the pivot, and all values to the right of the pivot are greater than it. And that means the pivot itself is now in its correct place within the array, although the other values are not yet necessarily completely sorted.

report erratum • discuss

Partitioning

• 201

Let’s apply this to our example: Step 1: Compare the left pointer (now pointing to 0) to our pivot (the value 3):

Since 0 is less than the pivot, the left pointer moves on in the next step. Step 2: The left pointer moves on:

We compare the left pointer (the 5) to our pivot. Is the 5 lower than the pivot? It’s not, so the left pointer stops, and we activate the right pointer in our next step. Step 3: Compare the right pointer (6) to our pivot. Is the value greater than the pivot? It is, so our pointer will move on in the next step. Step 4: The right pointer moves on:

We compare the right pointer (1) to our pivot. Is the value greater than the pivot? It’s not, so our right pointer stops. Step 5: Since both pointers have stopped, we swap the values of the two pointers:

report erratum • discuss

Chapter 13. Recursive Algorithms for Speed

• 202

We then activate our left pointer again in the next step. Step 6: The left pointer moves on:

We compare the left pointer (2) to our pivot. Is the value less than the pivot? It is, so the left pointer moves on. Step 7: The left pointer moves on to the next cell. Note that at this point, both the left and right pointers are pointing to the same value:

We compare the left pointer to our pivot. Because our left pointer is pointing to a value that is greater than our pivot, it stops. At this point, since our left pointer has reached our right pointer, we are done with moving pointers. Step 8: For our final step of the partition, we swap the value that the left pointer is pointing to with the pivot:

Although our array isn’t completely sorted, we have successfully completed a partition. That is, since our pivot was the number 3, all numbers that are less than 3 are to its left, while all numbers greater than 3 are to its right. This also means, by definition, that the 3 is now in its correct place within the array.

Code Implementation: Partitioning Following is an implementation of a SortableArray class in Ruby that includes a partition! method that partitions the array as we’ve described:

report erratum • discuss

Partitioning

• 203

class SortableArray attr_reader :array def initialize(array) @array = array end def partition!(left_pointer, right_pointer) # We always choose the right-most element as the pivot. # We keep the index of the pivot for later use: pivot_index = right_pointer # We grab the pivot value itself: pivot = @array[pivot_index] # We start the right pointer immediately to the left of the pivot right_pointer -= 1 while true # Move the left pointer to the right as long as it # points to value that is less than the pivot: while @array[left_pointer] < pivot do left_pointer += 1 end # Move the right pointer to the left as long as it # points to a value that is greater than the pivot: while @array[right_pointer] > pivot do right_pointer -= 1 end # We've now reached the point where we've stopped # moving both the left and right pointers. # We check whether the left pointer has reached # or gone beyond the right pointer. If it has, # we break out of the loop so we can swap the pivot later # on in our code: if left_pointer >= right_pointer break # If the left pointer is still to the left of the right # pointer, we swap the values of the left and right pointers: else @array[left_pointer], @array[right_pointer] = @array[right_pointer], @array[left_pointer] # We move the left pointer over to the right, gearing up # for the next round of left and right pointer movements: left_pointer += 1 end end

report erratum • discuss

Chapter 13. Recursive Algorithms for Speed

• 204

# As the final step of the partition, we swap the value # of the left pointer with the pivot: @array[left_pointer], @array[pivot_index] = @array[pivot_index], @array[left_pointer] # We return the left_pointer for the sake of the quicksort method # which will appear later in this chapter: return left_pointer end end

Let’s break this code down a bit. The partition! method accepts the starting points of the left and right pointers as parameters: def partition!(left_pointer, right_pointer)

When this method is first called on an array, these pointers will point to the left and right ends of the array, respectively. However, we’ll see that Quicksort will call this method on subsections of the array as well. Because of this, we can’t always assume the left and right pointers are always the two extremities of the array, so they need to become method arguments. This point will be clearer when I explain the complete Quicksort algorithm. Next, we select our pivot, which is always the rightmost element in the range we’re dealing with: pivot_index = right_pointer pivot = @array[pivot_index]

Once our pivot has been identified, we move the right_pointer to the item immediately left of the pivot: right_pointer -= 1

We then begin a loop that will run until the left_pointer and right_pointer meet. Within this loop, we use another loop to keep moving the left_pointer to the right until it reaches an item that is greater or equal to the pivot: while @array[left_pointer] < pivot do left_pointer += 1 end

Similarly, we move the right_pointer to the left until it hits an item that is less or equal to the pivot: while @array[right_pointer] > pivot do right_pointer -= 1 end

report erratum • discuss

Quicksort

• 205

Once the left_pointer and right_pointer have stopped moving, we check whether the two pointers have met: if left_pointer >= right_pointer break

If they have, we exit the loop and get ready to swap the pivot, which we’ll get to momentarily. However, if the two pointers have not yet met, we swap the values at the two pointers: @array[left_pointer], @array[right_pointer] = @array[right_pointer], @array[left_pointer]

Finally, once the two pointers have met, we swap the pivot with the value at the left_pointer: @array[left_pointer], @array[pivot_index] = @array[pivot_index], @array[left_pointer]

The method returns the left_pointer, as this will be needed by the Quicksort algorithm (which I’ll explain shortly).

Quicksort The Quicksort algorithm is a combination of partitions and recursion. It works as follows: 1. Partition the array. The pivot is now in its proper place. 2. Treat the subarrays to the left and right of the pivot as their own arrays, and recursively repeat Steps 1 and 2. That means we’ll partition each subarray and end up with even smaller sub-subarrays to the left and right of each subarray’s pivot. We then partition those sub-subarrays, and so on and so forth. 3. When we have a subarray that has zero or one elements, that is our base case and we do nothing. Let’s return to our example. We began with the array of [0, 5, 2, 1, 6, 3] and ran a single partition on the entire array. Since Quicksort begins with such a partition, that means we’re already partly through the Quicksort process. We left off with:

report erratum • discuss

Chapter 13. Recursive Algorithms for Speed

• 206

As you can see, the value 3 was the original pivot. Now that the pivot is in the correct place, we need to sort whatever is to the left and right of the pivot. Note that in our example, it just so happens that the numbers to the left of the pivot are already sorted, but the computer doesn’t know that yet. The next step after the partition is to treat everything to the left of the pivot as its own array and partition it. We’ll obscure the rest of the array for now, as we’re not focusing on it at the moment:

Now, of this [0, 1, 2] subarray, we’ll make the rightmost element the pivot. So, that would be the number 2:

We’ll establish our left and right pointers:

And now we’re ready to partition this subarray. Let’s continue from after Step 8, where we left off previously. Step 9: We compare the left pointer (0) to the pivot (2). Since the 0 is less than the pivot, we continue moving the left pointer. Step 10: We move the left pointer one cell to the right, and it now happens to be pointing to the same value the right pointer is pointing to:

We compare the left pointer to the pivot. Since the value 1 is less than the pivot, we move on.

report erratum • discuss

Quicksort

• 207

Step 11: We move the left pointer one cell to the right, which just happens to be the pivot:

At this point, the left pointer is pointing to a value that is equal to the pivot (since it is the pivot!), and so the left pointer stops. Note how the left pointer managed to sneak past the right pointer. That’s okay, though. The algorithm is designed to work even with such an occurrence. Step 12: Now, we activate the right pointer. However, because the right pointer’s value (1) is less than the pivot, it stays still. Since our left pointer has passed our right pointer, we’re done moving pointers altogether in this partition. Step 13: Next, we swap the pivot with the left pointer’s value. Now, it just so happens that the left pointer is pointing to the pivot itself, so we swap the pivot with itself, which results in no change at all. At this point, the partition is complete and the pivot (2) is now in its correct spot:

We now have a subarray of [0, 1] to the left of the pivot (the 2) and no subarray to its right. The next step is to recursively partition the subarray to the pivot’s left, which again, is [0, 1]. We don’t have to deal with any subarray to the right of the pivot since no such subarray exists. Because all we’ll focus on in the next step is the subarray [0, 1], we’ll block out the rest of the array so it looks like this:

To partition the subarray, [0, 1], we’ll make the rightmost element (the 1) the pivot. Where will we put the left and right pointers? Well, the left pointer will

report erratum • discuss

Chapter 13. Recursive Algorithms for Speed

• 208

point to the 0, but the right pointer will also point to the 0, since we always start the right pointer at one cell to the left of the pivot. This gives us this:

We’re now ready to begin the partition. Step 14: We compare the left pointer (0) with the pivot (1):

It’s less than the pivot, so we move on. Step 15: We shift the left pointer one cell to the right. It now points to the pivot:

Since the left pointer’s value (1) is not lower than the pivot (since it is the pivot), the left pointer stops moving. Step 16: We compare the right pointer with the pivot. Since it’s pointing to a value that is less than the pivot, we don’t move it anymore. And since the left pointer has passed the right pointer, we’re done moving pointers for this partition. Step 17: We now swap the left pointer with the pivot. Again, in this case, the left pointer is actually pointing to the pivot itself, so the swap doesn’t actually change anything. The pivot is now in its proper place, and we’re done with this partition. That leaves us with this:

report erratum • discuss

Quicksort

• 209

Next up, we need to partition the subarray to the left of the most recent pivot. In this case, that subarray is [0]—an array of just one element. An array of zero or one elements is our base case, so we don’t do anything. The element is just considered to be in its proper place automatically. So, now we’ve got:

We started out by treating 3 as our pivot, and recursively partitioned the subarray to its left ([0, 1, 2]). As promised, we now need to come back to recursively partitioning the subarray to the right of the 3, which is [6, 5]. We’ll obscure the [0, 1, 2, 3], since we’ve already sorted those, and now we’re only focusing on the [6, 5]:

In the next partition, we’ll treat the rightmost element (the 5) as the pivot. That gives us:

When setting up our next partition, our left and right pointers both end up pointing to the 6:

Step 18: We compare the left pointer (6) with the pivot (5). Since 6 is greater than the pivot, the left pointer doesn’t move further. Step 19: The right pointer is pointing to the 6 as well, so we would theoretically move on to the next cell to the left. However, there are no more cells to the left of the 6, so the right pointer stops moving. Since the left pointer has reached the right pointer, we’re done moving pointers altogether for this partition. That means we’re ready for the final step.

report erratum • discuss

Chapter 13. Recursive Algorithms for Speed

• 210

Step 20: We swap the pivot with the value of the left pointer:

Our pivot (5) is now in its correct spot, leaving us with:

Next up, we technically need to recursively partition the subarray to the left and right of the [5, 6] subarray. Since there is no subarray to its left, that means we only need to partition the subarray to the right. Since the subarray to the right of the 5 is a single element of [6], that’s our base case and we do nothing—the 6 is automatically considered to be in its proper place:

And we’re done!

Code Implementation: Quicksort Following is a quicksort! method we can add to the previous SortableArray class that would successfully complete the Quicksort: def quicksort!(left_index, right_index) # Base case: the subarray has 0 or 1 elements: if right_index - left_index