Skip to Content
LearnCode Review5 WhysCross-Service Impact

Cross-Service Impact

This learning is based on a real production incident that affected multiple services, including Circle Mint and Web3 developers.

Let’s examine how to properly review changes that affect critical user permissions and access controls, using a real-world example where a data corruption incident blocked developers from upgrading to mainnet and customers from completing onboarding.

The Scenario

A data corruption incident in the identity service affected the authorized representative (AR) flags, which led to:

  1. Mint users are unable to continue onboarding after login.
  2. Developers unable to upgrade to mainnet.
  3. Diameter users not seeing AR flags.
  4. Periodic review submission data potentially impacted.
  5. Extended resolution time due to complex data dependencies.

While the immediate trigger was an untracked PR that accidentally removed an SQL WHERE clause, the broader root causes were:

  1. Lack of proper error handling and fallbacks in dependent services.
  2. No validation of affected row counts in SQL updates.
  3. Missing monitoring for permission state changes.
  4. Insufficient safeguards against unintended data modifications.
  5. No automated checks for critical permission changes.

The incident started with a seemingly simple PR to fix a button in Diameter for re-designating ARs, but cascaded into a system-wide issue due to these underlying gaps.

The Code Change

update_ar.sql
-- Original query with proper WHERE clause UPDATE user_records SET is_authorized_representative = false WHERE entity_id = :entityId AND email != :newArEmail;

Impact on Services

The lack of proper error handling and fallbacks meant that when AR flags were corrupted:

mainnet-access.ts
const MainnetButton = () => { const { data } = useQuery(USER_PERMISSIONS_QUERY); if (!data?.canUpgradeToMainnet) { return ( <Tooltip content="Only the owner can upgrade this account to a production environment"> <Button disabled>Upgrade to Mainnet</Button> </Tooltip> ) } return <Button>Upgrade to Mainnet</Button> }

PR Comment

Choose the comment that you think is the most constructive and helpful.

Why Wasn’t This Caught?

Several factors contributed to this issue making it to production:

  1. Untracked Change: The PR was created outside the normal process, bypassing initial review gates.
  2. Missing Safety Checks: No automated validation of SQL changes that affect all records.
  3. Limited Error Handling: Services assumed the AR flags would always be valid.
  4. No Monitoring: No alerts for unusual permission changes or AR flag modifications.
  5. Testing Gaps: Test data didn’t represent production scale.
  6. Review Focus: Code reviews focused on the UI change, not the underlying data modification.

Click here to learn more

Key Lessons

1. Access Control Fundamentals

  • Validate data integrity constraints
  • Add comprehensive monitoring
  • Consider cross-service impact
  • Document service dependencies

2. Testing Strategy

  • Test all permission scenarios
  • Verify data integrity
  • Include cross-service tests
  • Monitor data consistency

3. Review Best Practices

  • Require multiple reviewers
  • Use automated code analysis
  • Consider data safety
  • Plan for service disruptions

Tips for Reviewers

1. Ask Data-Focused Questions

  • Is data integrity maintained?
  • How are constraints enforced?
  • What services are affected?
  • Example: “How do we ensure exactly one AR per entity?“

2. Verify Testing Approach

  • Are all scenarios tested?
  • Is data validation verified?
  • Are there integration tests?
  • Example: “Can we test AR flag consistency?“

3. Document Requirements

  • List service dependencies
  • Note data constraints
  • Document recovery procedures
  • Example: “Document AR flag requirements”

Common Pitfalls to Avoid

1. Focusing Only on Happy Path

  • ❌ “The access check works for valid cases.”
  • ✅ “The access check maintains data integrity and handles errors.”

2. Insufficient Validation

  • ❌ “The flag is updated correctly.”
  • ✅ “The flag update maintains the one-AR-per-entity constraint.”

3. Missing Cross-Service Impact

  • ❌ “The change works in our service.”
  • ✅ “The change is verified across all dependent services.”

Remember: A good access control review considers data integrity, cross-service impact, and proper validation. Understanding the full scope of permission changes and implementing proper safeguards helps prevent widespread issues!

Last updated on