Faster pandas

Go to class
Write Review

Free Online Course: Faster pandas provided by LinkedIn Learning is a comprehensive online course, which lasts for 1-2 hours worth of material. The course is taught in English and is free of charge. Upon completion of the course, you can receive an e-certificate from LinkedIn Learning. Faster pandas is taught by Miki Tebeka.

Overview
  • Learn how to make your pandas code quicker and more efficient. This course covers vectorization, common mistakes, pandas performance, saving memory, Numba, Cython, and more.

Syllabus
  • Introduction

    • pandas and performance
    • What you should know
    • Working with the files on GitHub
    1. Overview
    • Why performance matters
    • Setting goals
    • Measuring performance
    • Profiling
    • Challenge: Identify bottleneck
    • Solution: Identify bottleneck
    2. Vectorization
    • What is vectorization?
    • Boolean indexing
    • Understanding ufuncs
    • Challenge: Selecting and manipulating data
    • Solution: Selecting and manipulating data
    3. Common Mistakes
    • The limitations of appending
    • The limitations of object dtype
    • The limitations of row iteration
    • Understanding the isin function
    • Parsing time once
    • Challenge: Query a DataFrame
    • Solution: Query a DataFrame
    4. pandas Performance
    • Using built-in functions
    • Understanding eval and query
    • Understanding the join function
    • Challenge: Join and query
    • Solution: Join and query
    5. Saving Memory
    • Why memory is important?
    • Measuring memory
    • Loading parts of data
    • Categorical data
    • Challenge: Reducing memory
    • Solution: Reducing memory
    6. Fast Serialization
    • Various formats and why not CSV
    • Optimizing with SQL
    • Optimizing with HDF5
    • Challenge: Bike ride duration
    • Solution: Bike ride duration
    7. Numba and Cython
    • What is Numba?
    • Using Numba
    • What's Cython?
    • Writing Cython code
    • Compiling Cython
    • %%cython magic
    • Challenge: Cython speedup
    • Solution: Cython speedup
    8. Alternative DataFrames
    • Overview of alternative DataFrames
    • Using Dask
    • Using Vaex
    • Challenge: Vaex vs. pandas
    • Solution: Vaex vs. pandas
    Conclusion
    • Next steps