Skip to Content
LearnCode Review5 WhysVersion Compatibility

Version Compatibility

This learning is based on a real production incident that affected customers.

Let’s examine a real-world example to understand the impact of thorough reviews, considering version compatibility of the dependencies.

The Scenario

A developer bumped the version of a networking software in one of the clusters, however, the target version was not compatible with the rest of the system. This resulted in a cascading failure across the system for a few hours. It was not caught during code review because:

  1. The target version was not tested in a lower environment.
  2. An even more recent version was tested in the lower environment, but that doesn’t guarantee compatibility of the versions in between.
calico.tf
locals { - calico_version = "3.28.3" + calico_version = "3.29.3" }

PR Comment

Choose the comment that you think is the most constructive and helpful.

Click here to learn more

Key Lessons

1. Understanding Version Upgrades

  • Be aware of the version compatibility of the dependencies
  • Be even more cautious of major version upgrades
  • Verify changes don’t break existing integrations

2. Testing Strategy

  • Test in lower environments before promoting to customer facing environments
  • Test integration points between systems
  • Always have a rollback strategy in case of issues

3. Code Review Best Practices

  • Reviewers should understand the dependency graph of the system
  • Ask questions about compatibility with the rest of the system
  • Verify how the tests were conducted and what was validated
  • Ask about the rollback strategies

Tips for Reviewers

1. Ask System-Wide Backwards Compatibility Questions

  • How does this version upgrade affect the rest of the system?
  • Does it break any existing integrations?
  • Are there any downstream effects we should test?
  • Example: “Have we tested the new version of calico in a lower environment?“

2. Verify Testing Strategy

  • What environments are we testing in?
  • What did we monitor during the testing?
  • Example: “Have we tested the new version of calico in one of smokebox clusters?“

3. Question failure scenarios

  • Have we come across any issues during the testing?
  • Do we have a rollback strategy in case of issues in production?
  • How long does a typical rollback take with the stragety in mind?
  • Example: “How do we rollback if we come across any issues in production? How long do we anticipate the rollback to take?”

Common Pitfalls to Avoid

1. Focusing Only on Code

  • ❌ “The code looks good, approved.”
  • ✅ “The code looks good, but let’s test in a lower environment first.”

2. Missing Integration Points

  • ❌ “This is a version upgrade, should be backwards compatible.”
  • ✅ “We should make sure that the version upgrade is backwards compatible for all the integration points.”

3. Focusing on Deployment Without a Rollback Strategy

  • ❌ “Let’s test on staging and promote to production.”
  • ✅ “We should test on staging and also come up with a rollback strategy in case of issues before rolling out to production.”

Remember: A good code review promotes both comprehensive testing and a rollback strategy for version upgrades, especially for major version upgrades!

Last updated on