Coding Interview University
originally created this as a short to-do list of study topics for becoming a software engineer, but it grew to the large list you see today. After going through this study plan, I got hired as a Software Development Engineer at Amazon! You probably won't have to study as much as I did. Anyway, everything you need is here.
I studied about 8-12 hours a day, for several months. This is my story: Why I studied full-time for 8 months for a Google interview
Please Note: You won't need to study as much as I did. I wasted a lot of time on things I didn't need to know. More info about that below. I'll help you get there without wasting your precious time.
The items listed here will prepare you well for a technical interview at just about any software company, including the giants: Amazon, Facebook, Google, and Microsoft.
Best of luck to you!
Become a sponsor and support Coding Interview University!
Special thanks to:
Founded in 2018, OSS Capital is the first and only venture capital platform focused
exclusively on supporting early-stage COSS (commercial open source) startup founders.
Natively integrated with GitLab, GitHub, and Bitbucket, Gitpod automatically and continuously prebuilds dev environments for all your branches. As a result team members can instantly start coding with fresh dev environments for each new task - no matter if you are building a new feature, want to fix a bug, or work on a code review.
What is it?
This is my multi-month study plan for becoming a software engineer for a large company.
Required:
A little experience with coding (variables, loops, methods/functions, etc)
Patience
Time
Note this is a study plan for software engineering, not web development. Large software companies like Google, Amazon, Facebook and Microsoft view software engineering as different from web development. For example, Amazon has Frontend Engineers (FEE) and Software Development Engineers (SDE). These are 2 separate roles and the interviews for them will not be the same, as each has its own competencies. These companies require computer science knowledge for software development/engineering roles.
Table of Contents
The Study Plan
Topics of Study
balanced search trees (general concept, not details)
traversals: preorder, inorder, postorder, BFS, DFS
selection
insertion
heapsort
quicksort
merge sort
directed
undirected
adjacency matrix
adjacency list
traversals: BFS, DFS
Getting the Job
---------------- Everything below this point is optional ----------------
Optional Extra Topics & Resources
System Design, Scalability, Data Handling (if you have 4+ years experience)
AVL trees
Splay trees
Red/black trees
2-3 search trees
2-3-4 Trees (aka 2-4 trees)
N-ary (K-ary, M-ary) trees
B-Trees
Why use it?
If you want to work as a software engineer for a large company, these are the things you have to know.
If you missed out on getting a degree in computer science, like I did, this will catch you up and save four years of your life.
When I started this project, I didn't know a stack from a heap, didn't know Big-O anything, or anything about trees, or how to traverse a graph. If I had to code a sorting algorithm, I can tell ya it would have been terrible. Every data structure I had ever used was built into the language, and I didn't know how they worked under the hood at all. I never had to manage memory unless a process I was running would give an "out of memory" error, and then I'd have to find a workaround. I used a few multidimensional arrays in my life and thousands of associative arrays, but I never created data structures from scratch.
It's a long plan. It may take you months. If you are familiar with a lot of this already it will take you a lot less time.
How to use it
Everything below is an outline, and you should tackle the items in order from top to bottom.
I'm using GitHub's special markdown flavor, including tasks lists to track progress.
If you don't want to use git
On this page, click the Code button near the top, then click "Download ZIP". Unzip the file and you can work with the text files.
If you're open in a code editor that understands markdown, you'll see everything formatted nicely.
If you're comfortable with git
Create a new branch so you can check items like this, just put an x in the brackets: [x]
Fork a branch and follow the commands below
Fork the GitHub repo https://github.com/jwasham/coding-interview-university by clicking on the Fork button.
Clone to your local repo:
git clone git@github.com:<your_github_username>/coding-interview-university.git
git checkout -b progress
git remote add jwasham https://github.com/jwasham/coding-interview-university
git fetch --all
Mark all boxes with X after you completed your changes:
git add .
git commit -m "Marked x"
git rebase jwasham/main
git push --set-upstream origin progress
git push --force
Don't feel you aren't smart enough
Successful software engineers are smart, but many have an insecurity that they aren't smart enough.
A Note About Video Resources
Some videos are available only by enrolling in a Coursera or EdX class. These are called MOOCs. Sometimes the classes are not in session so you have to wait a couple of months, so you have no access.
It would be great to replace the online course resources with free and always-available public sources, such as YouTube videos (preferably university lectures), so that you people can study these anytime, not just when a specific online course is in session.
Choose a Programming Language
You'll need to choose a programming language for the coding interviews you do, but you'll also need to find a language that you can use to study computer science concepts.
Preferably the language would be the same, so that you only need to be proficient in one.
For this Study Plan
When I did the study plan, I used 2 languages for most of it: C and Python
C: Very low level. Allows you to deal with pointers and memory allocation/deallocation, so you feel the data structures and algorithms in your bones. In higher level languages like Python or Java, these are hidden from you. In day to day work, that's terrific, but when you're learning how these low-level data structures are built, it's great to feel close to the metal.
C is everywhere. You'll see examples in books, lectures, videos, everywhere while you're studying.
The C Programming Language, Vol 2
This is a short book, but it will give you a great handle on the C language and if you practice it a little you'll quickly get proficient. Understanding C helps you understand how programs and memory work.
You don't need to go super deep in the book (or even finish it). Just get to where you're comfortable reading and writing in C.
Python: Modern and very expressive, I learned it because it's just super useful and also allows me to write less code in an interview.
This is my preference. You do what you like, of course.
You may not need it, but here are some sites for learning a new language:
For your Coding Interview
You can use a language you are comfortable in to do the coding part of the interview, but for large companies, these are solid choices:
C++
Java
Python
You could also use these, but read around first. There may be caveats:
JavaScript
Ruby
Here is an article I wrote about choosing a language for the interview: Pick One Language for the Coding Interview. This is the original article my post was based on: Choosing a Programming Language for Interviews
You need to be very comfortable in the language and be knowledgeable.
Read more about choices:
See language-specific resources here
Books for Data Structures and Algorithms
This book will form your foundation for computer science.
Just choose one, in a language that you will be comfortable with. You'll be doing a lot of reading and coding.
C
Algorithms in C, Parts 1-5 (Bundle), 3rd Edition
Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms
Python
Data Structures and Algorithms in Python
by Goodrich, Tamassia, Goldwasser
I loved this book. It covered everything and more.
Pythonic code
my glowing book report: https://startupnextdoor.com/book-report-data-structures-and-algorithms-in-python/
Java
Your choice:
Goodrich, Tamassia, Goldwasser
Sedgewick and Wayne:
Free Coursera course that covers the book (taught by the authors!):
C++
Your choice:
Goodrich, Tamassia, and Mount
Interview Prep Books
You don't need to buy a bunch of these. Honestly "Cracking the Coding Interview" is probably enough, but I bought more to give myself more practice. But I always do too much.
I bought both of these. They gave me plenty of practice.
Programming Interviews Exposed: Coding Your Way Through the Interview, 4th Edition
Answers in C++ and Java
This is a good warm-up for Cracking the Coding Interview
Not too difficult. Most problems may be easier than what you'll see in an interview (from what I've read)
Cracking the Coding Interview, 6th Edition
answers in Java
If you have tons of extra time:
Choose one:
Don't Make My Mistakes
This list grew over many months, and yes, it got out of hand.
Here are some mistakes I made so you'll have a better experience. And you'll save months of time.
1. You Won't Remember it All
I watched hours of videos and took copious notes, and months later there was much I didn't remember. I spent 3 days going through my notes and making flashcards, so I could review. I didn't need all of that knowledge.
Please, read so you won't make my mistakes:
Retaining Computer Science Knowledge.
2. Use Flashcards
To solve the problem, I made a little flashcards site where I could add flashcards of 2 types: general and code. Each card has different formatting. I made a mobile-first website, so I could review on my phone or tablet, wherever I am.
Make your own for free:
I DON'T RECOMMEND using my flashcards. There are too many and many of them are trivia that you don't need.
But if you don't want to listen to me, here you go:
Keep in mind I went overboard and have cards covering everything from assembly language and Python trivia to machine learning and statistics. It's way too much for what's required.
Note on flashcards: The first time you recognize you know the answer, don't mark it as known. You have to see the same card and answer it several times correctly before you really know it. Repetition will put that knowledge deeper in your brain.
An alternative to using my flashcard site is Anki, which has been recommended to me numerous times. It uses a repetition system to help you remember. It's user-friendly, available on all platforms and has a cloud sync system. It costs $25 on iOS but is free on other platforms.
My flashcard database in Anki format: https://ankiweb.net/shared/info/25173560 (thanks @xiewenya).
Some students have mentioned formatting issues with white space that can be fixed by doing the following: open deck, edit card, click cards, select the "styling" radio button, add the member "white-space: pre;" to the card class.
3. Do Coding Interview Questions While You're Learning
THIS IS VERY IMPORTANT.
Start doing coding interview questions while you're learning data structures and algorithms.
You need to apply what you're learning to solving problems, or you'll forget. I made this mistake.
Once you've learned a topic, and feel somewhat comfortable with it, for example, linked lists:
Open one of the coding interview books (or coding problem websites, listed below)
Do 2 or 3 questions regarding linked lists.
Move on to the next learning topic.
Later, go back and do another 2 or 3 linked list problems.
Do this with each new topic you learn.
Keep doing problems while you're learning all this stuff, not after.
You're not being hired for knowledge, but how you apply the knowledge.
There are many resources for this, listed below. Keep going.
4. Focus
There are a lot of distractions that can take up valuable time. Focus and concentration are hard. Turn on some music without lyrics and you'll be able to focus pretty well.
What you won't see covered
These are prevalent technologies but not part of this study plan:
SQL
Javascript
HTML, CSS, and other front-end technologies
The Daily Plan
This course goes over a lot of subjects. Each will probably take you a few days, or maybe even a week or more. It depends on your schedule.
Each day, take the next subject in the list, watch some videos about that subject, and then write an implementation of that data structure or algorithm in the language you chose for this course.
You can see my code here:
You don't need to memorize every algorithm. You just need to be able to understand it enough to be able to write your own implementation.
Coding Question Practice
Why is this here? I'm not ready to interview.
Why you need to practice doing programming problems:
Problem recognition, and where the right data structures and algorithms fit in
Gathering requirements for the problem
Talking your way through the problem like you will in the interview
Coding on a whiteboard or paper, not a computer
Coming up with time and space complexity for your solutions (see Big-O below)
Testing your solutions
There is a great intro for methodical, communicative problem solving in an interview. You'll get this from the programming interview books, too, but I found this outstanding: Algorithm design canvas
Write code on a whiteboard or paper, not a computer. Test with some sample inputs. Then type it and test it out on a computer.
If you don't have a whiteboard at home, pick up a large drawing pad from an art store. You can sit on the couch and practice. This is my "sofa whiteboard". I added the pen in the photo just for scale. If you use a pen, you'll wish you could erase. Gets messy quick. I use a pencil and eraser.
Coding question practice is not about memorizing answers to programming problems.
Coding Problems
Don't forget your key coding interview books here.
Solving Problems:
Coding Interview Question Videos:
Super for walkthroughs of problem solutions
Nick White - LeetCode Solutions (187 Videos)
Good explanations of solution and the code
You can watch several in a short time
Neetcode - BLIND 75 LeetCode Solutions
Good explanations of solution and the python code
Also checkout excel sheet for all question list
Github links for all solutions code
Challenge sites:
My favorite coding problem site. It's worth the subscription money for the 1-2 months you'll likely be preparing.
See Nick White and FisherCoder Videos above for code walk-throughs.
Let's Get Started
Alright, enough talk, let's learn!
But don't forget to do coding problems from above while you learn!
Algorithmic complexity / Big-O / Asymptotic analysis
Well, that's about enough of that.
When you go through "Cracking the Coding Interview", there is a chapter on this, and at the end there is a quiz to see if you can identify the runtime complexity of different algorithms. It's a super review and test.
Data Structures
- Linked Lists
C Code (video) - not the whole video, just portions about Node struct and memory allocation
Linked List vs Arrays:
Gotcha: you need pointer to pointer knowledge: (for when you pass a pointer to a function that may change the address where that pointer points) This page is just to get a grasp on ptr to ptr. I don't recommend this list traversal style. Readability and maintainability suffer due to cleverness.
Implement (I did with tail pointer & without):
size() - returns number of data elements in list
empty() - bool returns true if empty
value_at(index) - returns the value of the nth item (starting at 0 for first)
push_front(value) - adds an item to the front of the list
pop_front() - remove front item and return its value
push_back(value) - adds an item at the end
pop_back() - removes end item and returns its value
front() - get value of front item
back() - get value of end item
insert(index, value) - insert value at index, so current item at that index is pointed to by new item at index
erase(index) - removes node at given index
value_n_from_end(n) - returns the value of the node at nth position from the end of the list
reverse() - reverses the list
remove_value(value) - removes the first item in the list with this value
Doubly-linked List
No need to implement
- Stack
Will not implement. Implementing with array is trivial
- Queue
Implement using linked-list, with tail pointer:
enqueue(value) - adds value at position at tail
dequeue() - returns value and removes least recently added element (front)
empty()
Implement using fixed-sized array:
enqueue(value) - adds item at end of available storage
dequeue() - returns value and removes least recently added element
empty()
full()
Cost:
a bad implementation using linked list where you enqueue at head and dequeue at tail would be O(n) because you'd need the next to last element, causing a full traversal each dequeue
enqueue: O(1) (amortized, linked list and array [probing])
dequeue: O(1) (linked list and array)
empty: O(1) (linked list and array)
- Hash table
Videos:
Online Courses:
distributed hash tables:
Implement with array using linear probing
hash(k, m) - m is size of hash table
add(key, value) - if key already exists, update value
exists(key)
get(key)
remove(key)
More Knowledge
- Binary search
Implement:
binary search (on sorted array of integers)
binary search using recursion
- Bitwise operations
Bits cheat sheet - you should know many of the powers of 2 from (2^1 to 2^16 and 2^32)
Get a really good understanding of manipulating bits with: &, |, ^, ~, >>, <<
Good intro: Bit Manipulation (video)
Swap values:
Absolute value:
Trees
- Trees - Notes & Background
basic tree construction
traversal
manipulation algorithms
BFS(breadth-first search) and DFS(depth-first search) (video)
BFS notes:
level order (BFS, using queue)
time complexity: O(n)
space complexity: best: O(1), worst: O(n/2)=O(n)
DFS notes:
time complexity: O(n)
space complexity: best: O(log n) - avg. height of tree worst: O(n)
inorder (DFS: left, self, right)
postorder (DFS: left, right, self)
preorder (DFS: self, left, right)
- Binary search trees: BSTs
C/C++:
Implement:
insert // insert value into tree
get_node_count // get count of values stored
print_values // prints the values in the tree, from min to max
delete_tree
is_in_tree // returns true if given value exists in the tree
get_height // returns the height in nodes (single node's height is 1)
get_min // returns the minimum value stored in the tree
get_max // returns the maximum value stored in the tree
is_binary_search_tree
delete_value
get_successor // returns next-highest value in tree after given value, -1 if none
- Heap / Priority Queue / Binary Heap
visualized as a tree, but is usually linear in storage (array, linked list)
Implement a max-heap:
insert
sift_up - needed for insert
get_max - returns the max item, without removing it
get_size() - return number of elements stored
is_empty() - returns true if heap contains no elements
extract_max - returns the max item, removing it
sift_down - needed for extract_max
remove(x) - removes item at index x
heapify - create a heap from an array of elements, needed for heap_sort
heap_sort() - take an unsorted array and turn it into a sorted array in-place using a max heap or min heap
Sorting
Notes:
Implement sorts & know best case/worst case, average complexity of each:
no bubble sort - it's terrible - O(n^2), except when n <= 16
Stability in sorting algorithms ("Is Quicksort stable?")
Which algorithms can be used on linked lists? Which on arrays? Which on both?
I wouldn't recommend sorting a linked list, but merge sort is doable.
For heapsort, see Heap data structure above. Heap sort is great, but not stable
Merge sort code:
Quick sort code:
Implement:
Mergesort: O(n log n) average and worst case
Quicksort O(n log n) average case
Selection sort and insertion sort are both O(n^2) average and worst case
For heapsort, see Heap data structure above
Not required, but I recommended them:
As a summary, here is a visual representation of 15 sorting algorithms. If you need more detail on this subject, see "Sorting" section in Additional Detail on Some Subjects
Graphs
Graphs can be used to represent many problems in computer science, so this section is long, like trees and sorting were.
Notes:
There are 4 basic ways to represent a graph in memory:
objects and pointers
adjacency matrix
adjacency list
adjacency map
Familiarize yourself with each representation and its pros & cons
BFS and DFS - know their computational complexity, their trade offs, and how to implement them in real code
When asked a question, look for a graph-based solution first, then move on if none
MIT(videos):
Skiena Lectures - great intro:
Graphs (review and more):
Full Coursera Course:
I'll implement:
DFS with adjacency list (recursive)
DFS with adjacency list (iterative with stack)
DFS with adjacency matrix (recursive)
DFS with adjacency matrix (iterative with stack)
BFS with adjacency list
BFS with adjacency matrix
single-source shortest path (Dijkstra)
minimum spanning tree
DFS-based algorithms (see Aduni videos above):
check for cycle (needed for topological sort, since we'll check for cycle before starting)
topological sort
count connected components in a graph
list strongly connected components
check for bipartite graph
Even More Knowledge
- Recursion
Stanford lectures on recursion & backtracking:
When it is appropriate to use it?
How is tail recursion better than not?
- Dynamic Programming
You probably won't see any dynamic programming problems in your interview, but it's worth being able to recognize a problem as being a candidate for dynamic programming.
This subject can be pretty difficult, as each DP soluble problem must be defined as a recursion relation, and coming up with it can be tricky.
I suggest looking at many examples of DP problems until you have a solid understanding of the pattern involved.
Videos:
List of individual DP problems (each is short): Dynamic Programming (video)
Yale Lecture notes:
- Design patterns
Learn these patterns:
strategy
singleton
adapter
prototype
decorator
visitor
factory, abstract factory
facade
observer
proxy
delegate
command
state
memento
iterator
composite
flyweight
Book: Head First Design Patterns
I know the canonical book is "Design Patterns: Elements of Reusable Object-Oriented Software", but Head First is great for beginners to OO.
- Combinatorics (n choose k) & Probability
Khan Academy:
Course layout:
Just the videos - 41 (each are simple and each are short):
- NP, NP-Complete and Approximation Algorithms
Know about the most famous classes of NP-complete problems, such as traveling salesman and the knapsack problem, and be able to recognize them when an interviewer asks you them in disguise.
Know what NP-complete means.
Peter Norvig discusses near-optimal solutions to traveling salesman problem:
Pages 1048 - 1140 in CLRS if you have it.
- Processes and Threads
Computer Science 162 - Operating Systems (25 videos):
for processes and threads see videos 1-11
Covers:
Processes, Threads, Concurrency issues
Difference between processes and threads
Processes
Threads
Locks
Mutexes
Semaphores
Monitors
How they work?
Deadlock
Livelock
CPU activity, interrupts, context switching
Modern concurrency constructs with multicore processors
Process resource needs (memory: code, static storage, stack, heap, and also file descriptors, i/o)
Thread resource needs (shares above (minus stack) with other threads in the same process but each has its own pc, stack counter, registers, and stack)
Forking is really copy on write (read-only) until the new process writes to memory, then it does a full copy.
Context switching
How context switching is initiated by the operating system and underlying hardware?
- Testing
To cover:
how unit testing works
what are mock objects
what is integration testing
what is dependency injection
Dependency injection:
- String searching & manipulations
If you need more detail on this subject, see "String Matching" section in Additional Detail on Some Subjects.
- Tries
Note there are different kinds of tries. Some have prefixes, some don't, and some use string instead of bits to track the path
I read through code, but will not implement
Short course videos:
- Floating Point Numbers
- Endianness
Big And Little Endian Inside/Out (video)
Very technical talk for kernel devs. Don't worry if most is over your head.
The first half is enough.
- Networking
if you have networking experience or want to be a reliability engineer or operations engineer, expect questions
Otherwise, this is just good to know
Final Review
This section will have shorter videos that you can watch pretty quickly to review most of the important concepts.
It's nice if you want a refresher often.
Series of 2-3 minutes short subject videos (23 videos)
Series of 2-5 minutes short subject videos - Michael Sambol (18 videos):
Update Your Resume
See Resume prep information in the books: "Cracking The Coding Interview" and "Programming Interviews Exposed"
I don't know how important this is (you can do your own research) but here is an article on making your resume ATS Compliant:
Note by the author: "This is for a US-focused resume. CVs for India and other countries have different expectations, although many of the points will be the same."
"Step-by-step resume guide" by Tech Interview Handbook
Detailed guide on how to set up your resume from scratch, write effective resume content, optimize it, and test your resume
Find a Job
Interview Process & General Interview Prep
How to Get a Job at the Big 4:
Cracking The Coding Interview Set 1:
Cracking the Facebook Coding Interview:
Prep Courses:
Software Engineer Interview Unleashed (paid course):
Learn how to make yourself ready for software engineer interviews from a former Google interviewer.
Python for Data Structures, Algorithms, and Interviews (paid course):
A Python centric interview prep course which covers data structures, algorithms, mock interviews and much more.
Intro to Data Structures and Algorithms using Python (Udacity free course):
A free Python centric data structures and algorithms course.
Data Structures and Algorithms Nanodegree! (Udacity paid Nanodegree):
Get hands-on practice with over 100 data structures and algorithm exercises and guidance from a dedicated mentor to help prepare you for interviews and on-the-job scenarios.
Grokking the Behavioral Interview (Educative free course):
Many times, it’s not your technical competency that holds you back from landing your dream job, it’s how you perform on the behavioral interview.
Mock Interviews:
Gainlo.co: Mock interviewers from big companies - I used this and it helped me relax for the phone screen and on-site interview
Pramp: Mock interviews from/with peers - peer-to-peer model of practice interviews
interviewing.io: Practice mock interview with senior engineers - anonymous algorithmic/systems design interviews with senior engineers from FAANG anonymously
Be thinking of for when the interview comes
Think of about 20 interview questions you'll get, along with the lines of the items below. Have at least one answer for each. Have a story, not just data, about something you accomplished.
Why do you want this job?
What's a tough problem you've solved?
Biggest challenges faced?
Best/worst designs seen?
Ideas for improving an existing product
How do you work best, as an individual and as part of a team?
Which of your skills or experiences would be assets in the role and why?
What did you most enjoy at [job x / project y]?
What was the biggest challenge you faced at [job x / project y]?
What was the hardest bug you faced at [job x / project y]?
What did you learn at [job x / project y]?
What would you have done better at [job x / project y]?
If you find it hard to come up with good answers of these types of interview questions, here are some ideas:
Have questions for the interviewer
Some of mine (I already may know the answers, but want their opinion or team perspective):
How large is your team?
What does your dev cycle look like? Do you do waterfall/sprints/agile?
Are rushes to deadlines common? Or is there flexibility?
How are decisions made in your team?
How many meetings do you have per week?
Do you feel your work environment helps you concentrate?
What are you working on?
What do you like about it?
What is the work life like?
How is the work/life balance?
Once You've Got The Job
Congratulations!
Keep learning.
You're never really done.
*****************************************************************************************************
*****************************************************************************************************
Everything below this point is optional. It is NOT needed for an entry-level interview.
However, by studying these, you'll get greater exposure to more CS concepts, and will be better prepared for
any software engineering job. You'll be a much more well-rounded software engineer.
*****************************************************************************************************
*****************************************************************************************************
Additional Books
These are here so you can dive into a topic you find interesting.
The Unix Programming Environment
An oldie but a goodie
The Linux Command Line: A Complete Introduction
A modern option
A gentle introduction to design patterns
Design Patterns: Elements of Reusable Object-Oriente​d Software
AKA the "Gang Of Four" book, or GOF
The canonical design patterns book
Algorithm Design Manual (Skiena)
As a review and problem recognition
The algorithm catalog portion is well beyond the scope of difficulty you'll get in an interview
This book has 2 parts:
Class textbook on data structures and algorithms
Pros:
Is a good review as any algorithms textbook would be
Nice stories from his experiences solving problems in industry and academia
Code examples in C
Cons:
Can be as dense or impenetrable as CLRS, and in some cases, CLRS may be a better alternative for some subjects
Chapters 7, 8, 9 can be painful to try to follow, as some items are not explained well or require more brain than I have
Don't get me wrong: I like Skiena, his teaching style, and mannerisms, but I may not be Stony Brook material
Algorithm catalog:
This is the real reason you buy this book.
This book is better as an algorithm reference, and not something you read cover to cover.
Can rent it on Kindle
Answers:
Write Great Code: Volume 1: Understanding the Machine
The book was published in 2004, and is somewhat outdated, but it's a terrific resource for understanding a computer in brief
The author invented HLA, so take mentions and examples in HLA with a grain of salt. Not widely used, but decent examples of what assembly looks like
These chapters are worth the read to give you a nice foundation:
Chapter 2 - Numeric Representation
Chapter 3 - Binary Arithmetic and Bit Operations
Chapter 4 - Floating-Point Representation
Chapter 5 - Character Representation
Chapter 6 - Memory Organization and Access
Chapter 7 - Composite Data Types and Memory Objects
Chapter 9 - CPU Architecture
Chapter 10 - Instruction Set Architecture
Chapter 11 - Memory Architecture and Organization
Important: Reading this book will only have limited value. This book is a great review of algorithms and data structures, but won't teach you how to write good code. You have to be able to code a decent solution efficiently
AKA CLR, sometimes CLRS, because Stein was late to the game
Computer Architecture, Sixth Edition: A Quantitative Approach
For a richer, more up-to-date (2017), but longer treatment
System Design, Scalability, Data Handling
You can expect system design questions if you have 4+ years of experience.
Scalability and System Design are very large topics with many topics and resources, since there is a lot to consider when designing a software/hardware system that can scale. Expect to spend quite a bit of time on this
Considerations:
Scalability
Distill large data sets to single values
Transform one data set to another
Handling obscenely large amounts of data
System design
features sets
interfaces
class hierarchies
designing a system under certain constraints
simplicity and robustness
tradeoffs
performance analysis and optimization
START HERE: The System Design Primer
System Design Interview - There are a lot of resources in this one. Look through the articles and examples. I put some of them below
Consensus Algorithms:
Scalability:
You don't need all of these. Just pick a few that interest you.
Short series:
See "Messaging, Serialization, and Queueing Systems" way below for info on some of the technologies that can glue services together
For even more, see "Mining Massive Datasets" video series in the Video Series section
Practicing the system design process: Here are some ideas to try working through on paper, each with some documentation on how it was handled in the real world:
review: The System Design Primer
flow:
Understand the problem and scope:
Define the use cases, with interviewer's help
Suggest additional features
Remove items that interviewer deems out of scope
Assume high availability is required, add as a use case
Think about constraints:
Ask how many requests per month
Ask how many requests per second (they may volunteer it or make you do the math)
Estimate reads vs. writes percentage
Keep 80/20 rule in mind when estimating
How much data written per second
Total storage required over 5 years
How much data read per second
Abstract design:
Layers (service, data, caching)
Infrastructure: load balancing, messaging
Rough overview of any key algorithm that drives the service
Consider bottlenecks and determine solutions
Additional Learning
I added them to help you become a well-rounded software engineer, and to be aware of certain
technologies and algorithms, so you'll have a bigger toolbox.
- Emacs and vi(m)
Familiarize yourself with a unix-based code editor
emacs:
- Information theory (videos)
More about Markov processes:
See more in MIT 6.050J Information and Entropy series below
- Parity & Hamming Code (videos)
Hamming Code:
- Entropy
Also see videos below
Make sure to watch information theory videos first
- Cryptography
Also see videos below
Make sure to watch information theory videos first
- Compression
Make sure to watch information theory videos first
- Parallel Programming
- Bloom Filter
Given a Bloom filter with m bits and k hashing functions, both insertion and membership testing are O(k)
- Locality-Sensitive Hashing
Used to determine the similarity of documents
The opposite of MD5 or SHA which are used to determine if 2 documents/strings are exactly the same
- van Emde Boas Trees
- Augmented Data Structures
- Balanced search trees
Know at least one type of balanced binary tree (and know how it's implemented):
"Among balanced search trees, AVL and 2/3 trees are now passé, and red-black trees seem to be more popular. A particularly interesting self-organizing data structure is the splay tree, which uses rotations to move any accessed key to the root." - Skiena
Of these, I chose to implement a splay tree. From what I've read, you won't implement a balanced search tree in your interview. But I wanted exposure to coding one up and let's face it, splay trees are the bee's knees. I did read a lot of red-black tree code
Splay tree: insert, search, delete functions If you end up implementing red/black tree try just these:
Search and insertion functions, skipping delete
I want to learn more about B-Tree since it's used so widely with very large data sets
AVL trees
In practice: From what I can tell, these aren't used much in practice, but I could see where they would be: The AVL tree is another structure supporting O(log n) search, insertion, and removal. It is more rigidly balanced than red–black trees, leading to slower insertion and removal but faster retrieval. This makes it attractive for data structures that may be built once and loaded without reconstruction, such as language dictionaries (or program dictionaries, such as the opcodes of an assembler or interpreter)
Splay trees
In practice: Splay trees are typically used in the implementation of caches, memory allocators, routers, garbage collectors, data compression, ropes (replacement of string used for long text strings), in Windows NT (in the virtual memory, networking and file system code) etc
MIT Lecture: Splay Trees:
Gets very mathy, but watch the last 10 minutes for sure.
Red/black trees
These are a translation of a 2-3 tree (see below).
In practice: Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures which provide worst-case guarantees; for example, many data structures used in computational geometry can be based on red–black trees, and the Completely Fair Scheduler used in current Linux kernels uses red–black trees. In the version 8 of Java, the Collection HashMap has been modified such that instead of using a LinkedList to store identical elements with poor hashcodes, a Red-Black tree is used
2-3 search trees
In practice: 2-3 trees have faster inserts at the expense of slower searches (since height is more compared to AVL trees).
You would use 2-3 tree very rarely because its implementation involves different types of nodes. Instead, people use Red Black trees.
2-3-4 Trees (aka 2-4 trees)
In practice: For every 2-4 tree, there are corresponding red–black trees with data elements in the same order. The insertion and deletion operations on 2-4 trees are also equivalent to color-flipping and rotations in red–black trees. This makes 2-4 trees an important tool for understanding the logic behind red–black trees, and this is why many introductory algorithm texts introduce 2-4 trees just before red–black trees, even though 2-4 trees are not often used in practice.
N-ary (K-ary, M-ary) trees
note: the N or K is the branching factor (max branches)
binary trees are a 2-ary tree, with branching factor = 2
2-3 trees are 3-ary
B-Trees
Fun fact: it's a mystery, but the B could stand for Boeing, Balanced, or Bayer (co-inventor).
In Practice: B-Trees are widely used in databases. Most modern filesystems use B-trees (or Variants). In addition to its use in databases, the B-tree is also used in filesystems to allow quick random access to an arbitrary block in a particular file. The basic problem is turning the file block i address into a disk block (or perhaps to a cylinder-head-sector) address
MIT 6.851 - Memory Hierarchy Models (video) - covers cache-oblivious B-Trees, very interesting data structures - the first 37 minutes are very technical, may be skipped (B is block size, cache line size)
- k-D Trees
Great for finding number of points in a rectangle or higher dimension object
A good fit for k-nearest neighbors
- Skip lists
"These are somewhat of a cult data structure" - Skiena
- Disjoint Sets & Union Find
- Treap
Combination of a binary search tree and a heap
- Machine Learning
Courses:
Great starter course: Machine Learning - videos only - see videos 12-18 for a review of linear algebra (14 and 15 are duplicates)
Additional Detail on Some Subjects
I added these to reinforce some ideas already presented above, but didn't want to include them
above because it's just too much. It's easy to overdo it on a subject.
You want to get hired in this century, right?
SOLID
I - Interface segregation principle | clients should not be forced to implement interfaces they don't use
D -Dependency Inversion principle | Reduce the dependency In composition of objects.
More Dynamic Programming (videos)
MIT Probability (mathy, and go slowly, which is good for mathy things) (videos):
String Matching
Knuth-Morris-Pratt (KMP):
Boyer–Moore string search algorithm
Coursera: Algorithms on Strings
starts off great, but by the time it gets past KMP it gets more complicated than it needs to be
nice explanation of tries
can be skipped
Sorting
Stanford lectures on sorting:
Steven Skiena lectures on sorting:
Video Series
Sit back and enjoy.
Computer Science Courses
Algorithms implementation
Papers
replaced by Colossus in 2012
2004: MapReduce: Simplified Data Processing on Large Clusters
mostly replaced by Cloud Dataflow?
2007: Dynamo: Amazon’s Highly Available Key-value Store
The Dynamo paper kicked off the NoSQL revolution
More papers: 1,000 papers
Last updated