Infrastructure Migration
This learning is based on a real sandbox incident that affected customers.
Let’s examine how to properly review infrastructure migration changes, using a real-world example where service connectivity issues arose during an EKS cluster migration.
The Scenario
A team was migrating services from a legacy EKS cluster to a new one. While the changes were reviewed and deployed, they caused connectivity issues because:
- Network policies were missing in the new environment.
- IAM roles weren’t properly configured.
- Testing was insufficient before the cutover.
- The review process didn’t catch configuration gaps.
- Service dependencies weren’t fully validated.
Before
service-config.yaml
# Legacy cluster configuration
serviceUrl: "https://entity-sandbox-wallets-api-eks:8233"
networkPolicy:
ingress:
- from:
- namespaceSelector:
matchLabels:
name: identity
iamRole:
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::123:role/legacy-role"PR Comment
Choose the comment that you think is the most constructive and helpful.
Click here to learn more
Key Lessons
1. Infrastructure Dependencies
- Verify network policies
- Check IAM role configurations
- Validate service connectivity
- Test with real traffic patterns
2. Migration Process
- Follow standardized runbooks
- Document validation steps
- Test in all environments
- Verify all dependencies
3. Review Best Practices
- Check configuration completeness
- Verify security settings
- Require test evidence
- Consider service dependencies
Tips for Reviewers
1. Ask Configuration Questions
- Are network policies complete?
- Is IAM properly configured?
- How will this be tested?
- Example: “How have you verified the network policies?“
2. Verify Migration Steps
- Is the runbook being followed?
- Are all environments considered?
- Is there a rollback plan?
- Example: “Can you share the validation steps?“
3. Document Requirements
- List required configurations
- Note dependent services
- Document testing steps
- Example: “This service requires network policies for X, Y, Z”
Common Pitfalls to Avoid
1. Focusing Only on Code
- ❌ “The YAML looks good, let’s merge.”
- ✅ “The YAML looks good, but let’s verify the infrastructure requirements.”
2. Insufficient Testing
- ❌ “We’ll test it after deployment.”
- ✅ “Let’s verify all configurations and connectivity before deploying.”
3. Missing Dependencies
- ❌ “It’s just a URL change.”
- ✅ “This URL change affects service connectivity, let’s verify all dependencies.”
Remember: A good infrastructure review considers all dependencies and requires thorough validation. Understanding the full impact of changes and maintaining proper documentation helps prevent issues before they affect services!
Last updated on