Python Course Part 9: Quality Control — Testing & Type Hints

There's a line every project eventually crosses — the one that separates a script (works on my machine, today, if I squint) from software (works reliably, survives refactors, can be changed without fear). Two habits get you across that line: type hints and automated tests.
In Part 8 we wrote a script that hits a live API. It runs — but how would you know if a refactor silently broke the star-counting logic? You'd have to run it, eyeball the output, and hope. That doesn't scale. Today we make our code provably correct.
We'll add type hints, run mypy to catch type mistakes without executing anything, and write real pytest tests. The key technique — and the thing that makes a network-touching script testable at all — is separating pure logic from side effects.
By the end you'll see this:
$ mypy repo_stats.py
Success: no issues found in 1 source file
$ pytest -q
....... [100%]
7 passed in 0.03s
1. Type Hints
Python is dynamically typed — you never have to declare types. But since version 3.5 you can, optionally, annotate them. These annotations are called type hints:
def greet(name: str) -> str:
return f"Hello, {name}"
attempts: int = 0
ratio: float = 0.85
Read name: str as "name is expected to be a string," and -> str as "this function returns a string." For collections and optional values, the modern syntax (Python 3.10+) is clean:
def total(numbers: list[int]) -> int:
return sum(numbers)
# str | None means "a string, or None" — the value might be missing:
def find_language(repo: dict) -> str | None:
return repo.get("language")
Here's the crucial part, and the thing that surprises people from compiled languages:
🛑 Dev Callout: Gradual Typing — Hints Don't Run
In C# or Java, types are enforced by the compiler; code that violates them won't build. Python's hints are not enforced at runtime — the interpreter ignores them entirely. This code runs without complaint:
def total(numbers: list[int]) -> int: return sum(numbers) total("not a list at all") # Python runs this — the hint is just a noteThis is gradual typing: you add hints where they help and leave them off where they don't, mixing typed and untyped code freely. The hints become useful in two ways — your editor uses them for autocomplete and inline warnings, and a separate tool called
mypyreads them to catch mismatches before you run the program. Think of hints as checkable documentation: comments that a tool can verify never went stale.
2. Catching Bugs with mypy
mypy is a static type checker. It reads your hints and flags code that contradicts them — no execution required. Install it into your virtual environment (Part 8):
pip install mypy
Point it at a file:
mypy repo_stats.py
If you wrote total("not a list"), mypy would tell you before you ever ran it:
error: Argument 1 to "total" has incompatible type "str"; expected "list[int]"
That's a whole class of bug — the "I passed the wrong thing" bug — caught at your desk instead of in production.
3. Designing for Testability: Pure Functions
Here's the problem with testing the Part 8 script directly: fetch_repos() calls the live GitHub API. A test that depends on the network is slow, flaky, and fails when GitHub has a bad day — that's testing GitHub, not your code.
The fix is a design principle: separate the part that talks to the world from the part that thinks. The function that does I/O (the HTTP call) should be thin. The functions that compute things should be pure — same input, same output, no network, no files, no surprises. Pure functions are trivial to test, because you just hand them data and check what comes back. (This is also why Part 4 pushed return over print — a function that returns its result is one you can assert on.)
Let's refactor the repo logic into a clean, fully-typed, pure module. Note how we reuse the @dataclass from Part 7 to give our data a real type.
Create repo_stats.py:
from dataclasses import dataclass
@dataclass
class Repo:
name: str
stars: int
is_fork: bool
language: str | None = None
def non_forks(repos: list[Repo]) -> list[Repo]:
"""Return only the repositories that are not forks."""
return [r for r in repos if not r.is_fork]
def total_stars(repos: list[Repo]) -> int:
"""Sum the stars across the given repositories."""
return sum(r.stars for r in repos)
def top_repos(repos: list[Repo], limit: int = 3) -> list[Repo]:
"""Return the `limit` most-starred repositories, highest first."""
return sorted(repos, key=lambda r: r.stars, reverse=True)[:limit]
Every function takes data and returns data. None of them touches the network. The actual requests.get() call would live in a separate, thin function that builds Repo objects from the JSON and hands them to these — keeping the messy part small and the testable part large.
4. Writing Tests with pytest
pytest is the testing tool the Python community standardized on. Install it:
pip install pytest
The conventions are refreshingly minimal:
Put tests in files named
test_*.py.Write each test as a function named
test_*.Use a plain
assertstatement to check expectations — no special assertion methods to memorize.
Create test_repo_stats.py next to the module:
from repo_stats import Repo, non_forks, total_stars, top_repos
def test_non_forks_filters_out_forks():
repos = [
Repo("a", 10, is_fork=False),
Repo("b", 5, is_fork=True),
Repo("c", 8, is_fork=False),
]
result = non_forks(repos)
assert len(result) == 2
assert all(not r.is_fork for r in result)
def test_total_stars_sums_correctly():
repos = [Repo("a", 10, False), Repo("b", 5, False)]
assert total_stars(repos) == 15
def test_total_stars_of_empty_list_is_zero():
# Edge cases are where bugs hide — always test the empty case.
assert total_stars([]) == 0
def test_top_repos_orders_by_stars_descending():
repos = [Repo("a", 1, False), Repo("b", 99, False), Repo("c", 50, False)]
top = top_repos(repos, limit=2)
assert [r.name for r in top] == ["b", "c"]
Run the whole suite from your terminal:
pytest -q
.... [100%]
4 passed in 0.02s
Each green dot is a passing test. Break the code on purpose — change reverse=True to reverse=False in top_repos — and watch pytest fail loudly, telling you exactly which assertion broke and what it expected. That instant feedback is the entire point.
5. One Test, Many Cases: parametrize
When you want to run the same test logic over several inputs, copy-pasting test functions is wasteful. pytest's parametrize decorator runs one test once per row of data:
import pytest
from repo_stats import Repo, total_stars
@pytest.mark.parametrize("stars, expected", [
([], 0),
([10], 10),
([10, 20, 30], 60),
])
def test_total_stars_cases(stars, expected):
repos = [Repo(f"repo{i}", s, False) for i, s in enumerate(stars)]
assert total_stars(repos) == expected
That's three distinct test cases — empty, single, and several — from one function body. pytest reports each row separately, so a failure points you straight at the input that broke.
This is the "type safety + unit tests on your API script" the curriculum promised: the network-touching part stays thin, the logic is pure, typed, and covered. With mypy green and pytest green, you can refactor with confidence — the tools will tell you the instant something slips.
What's Next?
You now have the full toolkit of a working Python developer: clean syntax, control flow, data structures, functions, error handling, modern files, comprehensions, classes, third-party packages, live APIs, types, and tests.
It's time to put all of it into one real project. In Part 10: The Capstone, we'll build a complete command-line tool that connects to an AI API to summarize text and review code — pulling together virtual environments, an SDK, environment-variable secrets, functions, error handling, type hints, and argument parsing into something you'd actually keep in your toolbox.