How to choose and use appropriate algorithms and data structures to help solve data science problems. Key concepts such as algorithmic complexity and recursion.
© 2022 Mike Gelbart & Arman Seyed-Ahmadi
Software licensed under the MIT License, non-software content licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information.
Find Panopto lecture recordings here.
# | Topic | Optional supplementary material |
---|---|---|
1 | Time complexity, big O, space complexity [lecture, activity] | Introduction to Big O notation, Runestone Chapter 3 |
2 | Searching, sorting, hash tables [lecture, activity] | Binary search video, watch until 31:00 (9 min), notes on Profiling and Timing Code, Runestone Chapter 6.1-6.8 |
3 | Recursive algorithms [lecture, activity] | Runestone Chapter 5 |
4 | Recursive data structures [lecture, activity] | Runestone Chapter 7 |
5 | Graphs, graph searches [lecture, activity] | NetworkX tutorial, AIspace graph searches, Graph traversals video; brilliant.org on Graph Theory, Traversals, and BSTs, Runestone Chapter 8 |
6 | Sparse matrices [lecture, activity] | |
7 | Discrete optimization & scheduling [lecture, activity] | Linear programming and discrete optimization with Python using PuLP |
8 | Dynamic programming, code vectorization [lecture, no activity] | Dynamic programming |
There will be one lab assignment per week. We will follow the standard MDS lab deadlines.
Quizzes will be open book, meaning you may consult course materials, online sources, Python, etc. However, communication with other people during the quiz is strictly forbidden. See the MDS quiz instructions here. For the dates/times of the quizzes, see the MDS calendar.
The course's conda environment file is here. To create the environment: conda env create -f dsci512env.yml
(you only need to do this once). To activate the environment: conda activate dsci512env
(you need this for each new terminal window). In short, run conda install -c conda-forge nb_conda_kernels
in your base environment (this only has to be done once across all courses, so you may have done it already), then launch Jupyter Lab from your base environment, and finally select the dsci512env
kernel from within Jupyter.
By the end of the course, students are expected to be able to:
- Analyze the scalability and trade-offs of various basic algorithms and data structures using Big-O notation.
- Read and interpret recursive functions.
- Select appropriate data structures, such as graphs, given a data set.
- Map certain real-world problems to discrete optimization problems.
- Diagnose rate-limiting aspects of slow Python code and enumerate various options for speeding it up.
- Problem Solving with Algorithms and Data Structures using Python
- Visualizing Algorithms
- 500 Data Structures and Algorithms practice problems and their solutions
- Recursion practice problems
- P vs. NP and the Computational Complexity Zoo (video)
- Goodrich, Tamassia, Goldwasser. Data Structures and Algorithms in Python.
- Sanjoy Dasgupta, Christos Papadimitriou and Umesh Vazirani, Algorithms, available here.
The content of this course was mainly developed by Mike Gelbart, with minor modifications and updates made by Arman Seyed-Ahmadi during the 2022 offering of the course.