Unit Testing: What to Test and What to Skip
What separates tests that protect real behavior from tests that just inflate coverage and become a burden when you need to refactor.
You spend hours writing tests for getters and setters. Coverage hits 95%. Feels productive. Then a requirement changes, you refactor three lines of business logic, and the entire suite breaks — not because behavior changed, but because you were testing implementation, not behavior. Congratulations, you just created maintenance debt disguised as coverage.
This post is about the decision that separates tests worth the cost from tests that just become a burden.
What a unit test actually protects
A unit test doesn't protect code. It protects behavior. That distinction sounds semantic, but it completely changes what you write.
When you test code — "this method returns the value of property x" — you're testing something any IDE already checks. When you test behavior — "given this order with accumulated discounts over $500, the final amount should include free shipping" — you're documenting a business rule that no compiler will ever verify for you.
The right question before writing any test: "if I delete this test, what loses its guarantee?" If the answer is "nothing, the type checker already handles that," the test probably isn't worth writing.
What's worth testing
Domain logic with multiple paths
Any function with conditionals, edge cases, boundaries, or combinations of state is a candidate. Discount calculations, form validations, data transformations, authorization rules — these are where production bugs are born.
def calculate_discount(amount: float, customer: Customer) -> float:
if customer.type == "premium" and amount >= 500:
return amount * 0.15
if customer.type == "premium":
return amount * 0.10
if amount >= 1000:
return amount * 0.05
return 0.0
This function deserves tests. Not one — several. One for each combination of type and amount that produces a different result. If you only write the happy path, you're leaving four edge cases unguarded.
Pure functions with non-obvious inputs
Pure functions are the easiest to test and the most valuable to cover. No side effects, no mocks needed, no complex setup — you pass input, verify output.
def format_phone(phone: str) -> str:
digits = re.sub(r'\D', '', phone)
if len(digits) not in (10, 11):
raise ValueError(f"Invalid phone number: {phone}")
if len(digits) == 11:
return f"({digits[:2]}) {digits[2:7]}-{digits[7:]}"
return f"({digits[:2]}) {digits[2:6]}-{digits[6:]}"
Test with: formatted input, unformatted, with spaces, with mixed characters, with 9 digits, with 12. Each variant is a real case someone will pass to this function in production.
Explicit edge cases
Numeric boundaries, empty strings, empty lists, null/None values, month-end dates, negative values where they're unexpected. These don't appear in happy-path tests, but they're where most production bugs live.
The rule is simple: if you asked "what happens if...?" during implementation, that's a test case.
Documented regressions
When a bug reaches production, the first step before fixing it is writing the test that reproduces it. Then fix it. That test guarantees the bug never comes back unnoticed — and it documents the expected behavior for anyone who comes after you.
What's not worth testing
Trivial getters, setters, and properties
class Product:
def __init__(self, name: str, price: float):
self._name = name
self._price = price
@property
def name(self) -> str:
return self._name # testing this is a waste
There's no logic here. The type checker verifies the type, the runtime verifies access. A getter test will only break when you rename the property — and at that point, the test gives you zero information about what actually broke.
Framework and library code
If you're testing whether your ORM saves correctly to the database, you're testing the ORM, not your code. Trust that SQLAlchemy, Django ORM, Prisma — whatever you're using — works. Your tests should cover what you wrote on top of those tools.
The same applies to simple serialization: if you have a name field and you're testing that the serialized JSON contains "name": "value", you're testing the serialization library.
Excessive mocking
When a test has more mock setup than logic being tested, that's a sign you're testing implementation rather than behavior. If you need to mock five dependencies to test one function, consider whether the function is well-structured — but don't mistake that for the test being useful.
Mocks have their place: network calls, disk I/O, external services. But mocking a discount service to test another service that uses it can mean you're not testing the real flow — and the bug will be exactly in the integration you abstracted away.
Code that only exists as boilerplate
Controllers that just forward requests to services, DTOs without validation, adapters that only convert formats. No logic, nothing to test. Forcing coverage here is coverage theater.
The coverage metric is a well-intentioned lie
100% coverage doesn't mean the suite is good. It means every line was executed at least once. You can have full coverage and test no edge cases, no error paths, no non-trivial state combinations.
Coverage as an absolute number is useful for finding dead code and obvious gaps. As a quality target, it leads teams to write empty tests just to hit the percentage.
The number that matters isn't coverage — it's confidence. Can you refactor without fear? Can you swap your validation library and trust the tests will catch any behavioral regression? If yes, the suite is doing its job.
Integration tests vs unit tests: it's not a competition
Many teams burn energy on the wrong debate. The point isn't which type is better — it's using each where it has an advantage.
Unit tests are fast and precise: great for domain logic with many paths, where you want to test every combination in milliseconds. Integration tests verify that parts work together: great for end-to-end flows, repositories talking to databases, API handlers.
The mistake is using unit tests where integration would be more appropriate (mocking the database when the repository test is what matters) or integration tests where unit would be faster (spinning up the full application context to test a formatting function).
Frequently asked questions
How much coverage should I have?
Depends on the type of code. Domain logic: high — 80–90% makes sense. Infrastructure and adapters: lower — code that only delegates doesn't need aggressive coverage. Project-wide coverage as a single number is an average that hides where you're doing well and where you're not.
Should I write tests before or after the code?
TDD has real value for complex domain code — writing the test first forces you to think about the interface and edge cases before implementation. But it's not a law. Writing tests after understanding the problem is also valid. What's not valid is writing no tests at all because "there's no time right now."
What do I do with legacy code without tests?
Don't try to add coverage retrospectively everywhere. Prioritize: when you're about to change a function, write the test that documents current behavior first. That's the safety net that matters. It's not worth spending hours testing code that won't be touched.
When is a failing test good news?
Any time it breaks because of an unintentional behavioral change. The test did its job. When it breaks because of an implementation change that didn't alter behavior — renamed an internal variable, refactored structure — that's a sign the test was testing implementation details, not the contract.
Good tests are the ones you want to write
The clearest sign of a healthy suite isn't the number of tests or coverage — it's how much friction the team feels about writing more. If tests are seen as bureaucracy, something is wrong: either you're testing the wrong things, the setup is too painful, or tests break on every trivial refactor.
Before writing a test that depends on a regular expression in a validator, I use the Regex Tester to confirm the pattern actually matches what I expect — faster than running the full test suite to find out the regex was wrong to begin with.
Good tests are cheap to write, fast to run, and only break when behavior changes. If yours aren't, the suite isn't protecting you — it's slowing you down.
- 01 What DevOps Actually Is Beyond the Tools DevOps isn't a pipeline or a job title. It's shared ownership between the people who write code and the people who run it in production — and why most teams get it wrong.
- 02 Relational vs NoSQL: How to Actually Choose An honest comparison of relational and NoSQL databases — real tradeoffs on consistency, schema, and scale, with a clear default recommendation.