Why Snapshot Testing Sucks: Or, How I Learned to Stop Updating Snapshots and Write Better Tests
Imagine this: you're deep into developing a new feature. The sprint is almost over, and your team lead reminds you to ensure full test coverage. You know you should write proper unit tests, but you're feeling pressed for time. Instead, you rely on snapshot tests. They're quick, easy, and it's still testing everything, right? Resigned, you update the snapshots, commit the results, and move on, just hoping nothing breaks - even though you know you haven't really tested the code.
Fast forward a week. A pull request lands in your codebase, and the CI pipeline fails. A snapshot test broke. You open the diff and see ...a handful of seemingly random, tiny changes. A stray div
appeared. An aria-label was added. Nothing that screams "game-breaking bug." You shrug, update the snapshots, and move on.
Sound familiar? Let's talk about why this sucks.
What Snapshot Testing Promises
At its core, snapshot testing seems brilliant. It's supposed to:
- Catch unexpected changes: Ensure your UI or API output doesn't change unexpectedly.
- Be easy to implement: A few lines of code, and boom, your component's output is saved and tested against.
- Increase confidence: Every diff is an opportunity to catch bugs.
In theory, it's like having a hawk-eyed reviewer for your code. But in practice? It's more like a pedantic roommate who complains if you move the saltshaker an inch.
The Reality of Snapshot Testing
1. The "Everything Is Fine" Updates
You've added a small feature - say, a new button - and a snapshot test breaks.
"Awesome" you think, "It's working!" You update the snapshot and move on. But did you really check every line of that diff? Do you trust your future self to scrutinize a 200-line snapshot to find the single change you care about?
Let's face it: you probably didn't. Most developers treat snapshot diffs like terms of service agreements - skim and click "I agree." This makes the test almost meaningless.
2. Death by Noise
A co-worker changes a margin in a CSS file. Suddenly, 15 snapshots are failing. None of the failures are actual bugs, but now someone has to trudge through every single one and verify they're all "acceptable" changes.
Snapshot tests don't just catch regressions; they also catch intentional changes. In practice, this means:
- Developers get annoyed and stop trusting tests.
- Teams waste time verifying or updating snapshots.
Instead of being helpful, the tests turn into a time sink - the equivalent of debugging a fire alarm that blares every time you toast bread.
3. The False Sense of Security
Here's the kicker: Snapshot tests often give a false sense of security.
Imagine your component outputs a massive JSON object. Your snapshot test dutifully captures it all: key-value pairs, nested structures, and even random whitespace. But the bug you're looking for? It's buried deep in the output, invisible amidst the noise.
You think your snapshot test is comprehensive. In reality, it's just brittle and opaque. What you really need are targeted tests that verify specific behaviors, not a firehose of output.
Working in Large Teams and Monorepos
Snapshot testing can be especially problematic in large teams or monorepos.
-
Knock-On Effects: A change to a root-level dependency, like a shared component library, can trigger failures across dozens (or hundreds) of snapshot tests. For example, updating a button component to include an additional
aria-label
for accessibility might result in failed tests throughout the monorepo. These failures often represent minor visual changes, but the effort required to review and update them can grind development to a halt. Similarly, innocuous changes like adding adata-*
attribute for analytics or debugging shouldn't cause tests to fail, but with snapshot tests, they often do. -
Flaky Tests: Snapshots are brittle, and their propensity to break over minor, intentional updates makes them unreliable. This issue is compounded when using libraries like styled-components, where dynamically generated class names or IDs are frequently part of the output. While it's possible to serialize and stabilize these IDs, you shouldn't need to go to such lengths for basic testing.
-
Coordination Overhead: In a monorepo, changes often require cross-team collaboration. Imagine a shared dropdown component is updated to support new keyboard navigation. This improvement could inadvertently cause snapshots to fail in unrelated projects, requiring multiple teams to coordinate fixes and updates, further complicating the workflow. This causes friction and frustration across teams, reducing the likelihood of updating the components in the future.
To address these issues, consider replacing snapshot tests with more targeted, behavior-driven tests that focus on specific functionality rather than capturing entire outputs. This reduces noise and makes it easier to manage large-scale changes.
How UI Should Be Tested
Effective UI testing focuses on replicating how users actually interact with your application. At the heart of this is proper unit testing. Well-written unit tests ensure that your components work as expected in isolation. They target specific behaviors and outputs, making it easier to identify and address bugs without relying on brittle or opaque snapshot tests. These tests provide clarity, are easier to maintain, and help developers confidently refactor code without fear of breaking unrelated functionality.
When designing your tests, think about how a user engages with the application. Focus on key interactions, such as button clicks or form submissions, and verify the outcomes align with expectations. This approach ensures your tests reflect real-world use and provide meaningful coverage.
Once you've established strong unit tests, you can layer on integration and end-to-end tests for broader coverage, ensuring every level of the application behaves cohesively under realistic scenarios.
Avoid Overusing Test IDs
While data-testid
attributes can be useful in some cases, relying on them should be a last resort. As outlined in the Testing Library Guiding Principles, your tests should resemble how users interact with your software. Users don't see test IDs; they click buttons, fill out forms, and navigate based on visible content.
When data-testid
becomes necessary, it's still better than querying DOM structure or CSS class names, which are prone to frequent changes. For a deeper dive into making your UI tests resilient to change, check out Kent C. Dodds' blog post.
Best Practices for UI Testing
-
Query by Text or Role: Use selectors that reflect what users see, like button labels or ARIA roles. For example, instead of querying a button by its test ID, use its visible text:
screen.getByText('Submit')
. LeveraginggetByRole
is particularly effective for writing accessible code. For instance,screen.getByRole('button', { name: 'Submit' })
ensures the element is not only visually correct but also follows accessibility guidelines, which benefits users relying on assistive technologies. However, note thatgetByRole
can sometimes have a performance impact, especially in large DOM trees, as discussed in this GitHub issue. -
Simulate Real User Actions: Instead of manually triggering events, use tools like Testing Library's
fireEvent
oruserEvent
to simulate real-world interactions. -
Focus on Behavior: Your tests should validate the functionality, not the implementation. For example, test that submitting a form triggers the correct API call and shows a success message, not that a specific div appears in the DOM.
Considerations for i18n
One challenge with queries like getByText
is handling internationalization (i18n). If your application supports multiple languages, visible text queries may fail when running tests in different locales. To mitigate this, consider:
- Using ARIA roles and attributes as fallback selectors.
- Implementing localization-aware utilities to adapt queries based on language.
While these approaches add complexity, they help ensure your tests remain robust across diverse use cases.
A Better Way
Snapshot testing isn't inherently evil, but it's frequently overused and misapplied. To make it more effective and manageable, focus on these impactful strategies:
-
Test Specific Behaviors: Focus on what actually matters. Instead of snapshotting an entire component, test specific outputs or behaviors.
Example: Instead of snapshotting a whole modal, write tests that check if the modal's title and button labels are correct.
-
Keep Snapshots Small: If you must use snapshot tests, keep them scoped. Don't snapshot a massive DOM tree when you only care about one button.
-
Review Snapshots Carefully: Treat snapshot diffs like code reviews. If you can't verify a snapshot change quickly, it's probably too big.
-
Use Visual Regression Testing: For UI-heavy projects, tools like Percy or Chromatic can be better alternatives. They provide visual diffs, which are easier to understand than raw snapshot files.
Break the Cycle
Snapshot testing is like bubble wrap: satisfying at first, but useless when overdone. It promises safety but often delivers clutter and complacency.
The next time you find yourself updating snapshots without a second thought, pause and ask yourself: "Am I actually improving this codebase?" If the answer is no, it's time to rethink your approach.
Testing should make your life easier, not harder. It's okay to admit that snapshot testing sucks - because it often does. By embracing accessible and behavior-driven testing practices, you can improve both your codebase and the user experience. With a little thought and restraint, you can tame snapshot testing and make it work for you, not against you.