Skip to Content
LearnCode Review5 WhysComplex SQL Queries

Complex SQL Queries

This learning is based on a real production incident that affected customers.

Let’s examine how to ensure design and code reviews properly consider scalability and downstream product impact, using a real-world example where an un-optimized SQL query and an underscaled database caused hundreds of failed wallet transactions in the Paymaster Gas Station (PGS) service.

The Scenario

A single customer generated ~14,000 transactions within minutes. Each transaction triggered a complex 24-hour policy-check query. The writer Aurora Postgres instance (db.t4g.medium, 2 vCPU/4 GiB) could not sustain the read-heavy sequential scans, causing CPU saturation, elevated latencies, and 942 transaction failures. Tech-Ops mitigated by manually scaling the instance to db.r6i.xlarge (4 vCPU/32 GiB), after which service health recovered.

-- Simplified version of the expensive policy-check query SELECT sender, SUM(network_fee) AS daily_fee FROM user_operations_transactions WHERE paymaster_id = $1 AND created_at >= NOW() - INTERVAL '24 hours' GROUP BY sender HAVING SUM(network_fee) > $2;

The incident revealed that the design team had noted scalability risk and even spiked a Redis-based replacement, but the work was de-prioritized in favour of new features. Periodic load testing covered average traffic but not worst-case daily-growth scenarios, and query/DB scaling recommendations were never executed.

PR Comment

Choose the comment that you think is the most constructive and helpful.

Click here to learn more

Key Lessons

1. Understanding Product Impact

  • Consider how changes affect the entire system
  • Map out dependencies and downstream effects
  • Verify changes don’t break existing functionality
  • Document product impact in PR descriptions

2. Testing Strategy

  • Don’t just update unit tests without understanding why
  • Include end-to-end testing for critical flows
  • Test integration points between systems
  • Verify data transformations maintain required formats

3. Code Review Best Practices

  • Reviewers should understand the product context
  • Ask questions about system-wide impact
  • Verify test coverage matches the change scope
  • Consider both technical and product implications

Tips for Reviewers

1. Ask Product-Focused Questions

  • How does this change affect the user experience?
  • What systems depend on this data?
  • Are there any downstream effects we should test?
  • Example: “How does this affect the payment confirmation flow?“

2. Verify Testing Strategy

  • Are the tests comprehensive enough?
  • Do they cover all affected systems?
  • Is there end-to-end testing for critical flows?
  • Example: “Let’s add integration tests for the email system”

3. Document Dependencies

  • List systems that might be affected
  • Note any data format requirements
  • Document integration points
  • Example: “Payment emails expect user data in format X”

Common Pitfalls to Avoid

1. Focusing Only on Code

  • ❌ “The code looks good, let’s merge.”
  • ✅ “The code looks good, but let’s verify the product impact.”

2. Assuming Tests Are Sufficient

  • ❌ “The tests pass, so it’s good.”
  • ✅ “The tests pass, but let’s verify the end-to-end flow.”

3. Missing Integration Points

  • ❌ “This is just a data change.”
  • ✅ “This data change affects multiple systems, let’s verify them all.”

Remember: A good code review considers both technical correctness and product impact. Understanding how changes affect the entire system helps prevent issues before they reach production!

Last updated on