-
Debugging skills beyond IDE of C++, Java, Python,Rust, Javascript, GoLang or gdb, windbg. Works for Distributed systems.
What you'll learn:- Thinking based on first principles when dealing with complex systems. Reducing down indicators to root cause instead of implementing workarounds.
- Methods to isolate root cause without rushing to attaching debuggers. Building observable systems and dealing with legacy code with minimal intrusion.
- Debugging as a practical deduction based process. Not getting distracted by too much irrelvant information and customer pressure of getting to a resolution.
- Factors to consider in production system analysis. Specially applicable to niche problems like distributed systems.
- Examples of production fire fighting to highlight the need for open minded thinking and knowledge across domains. Embedded systems as well as distributed system
- Understanding systems as permuatation of components. Applicable across all software from embedded to databases to distributed systems written in any language.
- Factors to consider while dealing with systems with components owned by different teams
- Different engineering communication channels that dictate the scope for debugging
Debugging is more than just attaching a debugger to a running program. Identifying the correct root cause is a skill. In addition, the complexity of distributed systems and multi-language stacks makes debugging even harder.
The key to a long career in software is the ability to build large systems. Large-scale systems cannot be created on a single machine using a single programming language. Hence one has to evolve into a generalist engineer to lead such efforts. Irrespective of role, understanding complexity and the ability to navigate it during production firefighting is a growth accelerator in the industry.
The course takes a generic view of workflows leading to frequently occurring debugging problems in large systems. Intentionally no tool details or deep dives are included. Instead, a guidance framework is provided for the students to explore further in their day job or software projects.
Distributed systems are a challenge to debug. Not understanding a problem can easily lead one to debug the wrong services and stacks. It is always a communication problem first. Focus on building a generic debugging mental framework instead of becoming dependent on tools like windbg and gdb as the primary response to any situation.
This course is for people who consider themselves problem solvers ahead of their designation and qualifications. For example, if you believe only developers should debug or only support should talk to customers, then this course is incompatible with your ideas.
Each section represents the debugging consideration for a particular scenario.
Questions the course addresses
How domain plays a role in debugging?
What considerations do software engineers make while debugging?
How to deal with stressful production debugging scenarios?
How to approach SRE?
Can all scenarios be automated for ease of debugging?
How do monitoring and observability help in debugging?
Can bugs to identified at design time itself?
What makes distributed systems hard to debug?
Why is debugging generic skills, unlike programming language expertise?
Are there software techniques beyond windbg, gdb?
How is debugging different from reverse engineering?
Course Updates:
[Sept 2022]:Course with rich details and elaborate real-life examples
[Oct 2022]:Subtitles fixed
Overview