Version Compatibility
This learning is based on a real production incident that affected customers.
Let’s examine a real-world example to understand the impact of thorough reviews, considering version compatibility of the dependencies.
The Scenario
A developer bumped the version of a networking software in one of the clusters, however, the target version was not compatible with the rest of the system. This resulted in a cascading failure across the system for a few hours. It was not caught during code review because:
- The target version was not tested in a lower environment.
- An even more recent version was tested in the lower environment, but that doesn’t guarantee compatibility of the versions in between.
calico.tf
locals {
- calico_version = "3.28.3"
+ calico_version = "3.29.3"
}PR Comment
Choose the comment that you think is the most constructive and helpful.
Click here to learn more
Key Lessons
1. Understanding Version Upgrades
- Be aware of the version compatibility of the dependencies
- Be even more cautious of major version upgrades
- Verify changes don’t break existing integrations
2. Testing Strategy
- Test in lower environments before promoting to customer facing environments
- Test integration points between systems
- Always have a rollback strategy in case of issues
3. Code Review Best Practices
- Reviewers should understand the dependency graph of the system
- Ask questions about compatibility with the rest of the system
- Verify how the tests were conducted and what was validated
- Ask about the rollback strategies
Tips for Reviewers
1. Ask System-Wide Backwards Compatibility Questions
- How does this version upgrade affect the rest of the system?
- Does it break any existing integrations?
- Are there any downstream effects we should test?
- Example: “Have we tested the new version of calico in a lower environment?“
2. Verify Testing Strategy
- What environments are we testing in?
- What did we monitor during the testing?
- Example: “Have we tested the new version of calico in one of smokebox clusters?“
3. Question failure scenarios
- Have we come across any issues during the testing?
- Do we have a rollback strategy in case of issues in production?
- How long does a typical rollback take with the stragety in mind?
- Example: “How do we rollback if we come across any issues in production? How long do we anticipate the rollback to take?”
Common Pitfalls to Avoid
1. Focusing Only on Code
- ❌ “The code looks good, approved.”
- ✅ “The code looks good, but let’s test in a lower environment first.”
2. Missing Integration Points
- ❌ “This is a version upgrade, should be backwards compatible.”
- ✅ “We should make sure that the version upgrade is backwards compatible for all the integration points.”
3. Focusing on Deployment Without a Rollback Strategy
- ❌ “Let’s test on staging and promote to production.”
- ✅ “We should test on staging and also come up with a rollback strategy in case of issues before rolling out to production.”
Remember: A good code review promotes both comprehensive testing and a rollback strategy for version upgrades, especially for major version upgrades!
Last updated on