Decoupling Security Friction: The Async Sidecar Pattern For Logins

The Bottom Line Up Front (BLUF): Security friction is a revenue-killer. In a high-growth environment, putting rigid security checks in the critical path kills conversion. To solve this, I built an asynchronous “Credential Sidecar” in Python that strictly decouples the risk assessment from the login flow. By shifting from a synchronous “Block/Allow” model to an event-driven “Allow/Audit/Nudge” framework, we identified unhealthy credentials for 19% of our user base while maintaining zero latency impact on login velocity.

The Background

In a previous high-growth role, we weren’t under active attack, but the mere existence of a public-facing login portal meant we were a persistent target. Rather than waiting for the inevitable first wave of “Account Takeover” support tickets to force a reactive posture, I wanted to proactively secure the perimeter before threat actors even turned the door handle.

The Problem: We actively chose not to deploy traditional front-door defenses like aggressive IP rate-limiting or mandatory CAPTCHAs. These “solutions” introduce unacceptable friction into the legitimate user journey. A disruptive, global forced password reset was also a non-starter.
The Constraint: We lacked the budget allocation for heavyweight, enterprise-grade threat intelligence platforms (e.g., Okta ThreatInsight).

I needed an architecture that allowed users to log in freely while silently verifying the integrity of their credentials in the background. Here is the Python sidecar I built to solve this constraint.

The Problem: The “Block or Breach” Dilemma

Traditional security tools force you to make a binary choice:

Block Aggressively: You stop the fraud, but you also block legitimate users (False Positives).
Allow Freely: You maximize growth, but you let in “Credential Stuffing” attacks.

I needed a third option: Allow the login, but verify the password.

The Build: The “Pragmatic” Sidecar (MVP)

I didn’t have the budget for a full enterprise suite, so I built a “Pragmatic MVP” using a simple Python sidecar.

What I Actually Built (The MVP):

A lightweight container that runs a “Double-Blind” Credential Check asynchronously.

Hash & Check: We hash the password (k-Anonymity) and check if it exists in a breach DB. Note: We hash locally and send only the prefix to HIBP to preserve privacy. A match here does not mean the user’s account is compromised; it simply means their chosen password exists in a known public data breach. This acts as an additional health check on top of standard complexity requirements.
Context Check: We check if the email is associated with that breach.

The Decision Logic:

Scenario A (Email + Password Both Breached): Theoretical Ideal. In a strict environment, you force a reset.
Scenario B (Password Breached, Email Safe): The Architectural Blueprint. The user is reusing an unhealthy password. Action: Queue an asynchronous In-App Notification (“Time to rotate your password”) rather than forcing an immediate reset.

import hashlib
import requests
# This runs asynchronously via a Worker (e.g., Celery/SQS).
# Architectural separation: The worker only receives the hash, ensuring the password never leaves the primary login context.

def check_credential_risk_async(email, password_hash_prefix, password_hash_suffix):
    # 1. Check Password Integrity (k-Anonymity)
    # We cache the top 100k common hashes locally to save API calls
    if check_local_cache(password_hash_suffix):
         password_leaked = True
    else:
        response = requests.get(f"https://api.pwnedpasswords.com/range/{password_hash_prefix}")
        password_leaked = password_hash_suffix in response.text

    # 2. Check Email Integrity
    email_leaked = check_email_breach_db(email) 

    # 3. Decision Matrix (Post-Login Actions)
    if password_leaked and email_leaked:
        # ASYNC LOCK: User is already logged in, so we must REVOKE the session.
        revoke_session_token(email) 
        return "CRITICAL: Session Revoked"
    elif password_leaked:
        # UX Pivot: Don't email them (it causes panic).
        # send_in_app_notification(user_id, "Password Check: Weak")
        return "HIGH: In-App Nudge Queued"
    
    return "CLEAN"

Bonus: The “Lean” Enhancements

You don’t need to rebuild Splunk to have good monitoring. Here are three things I added in an afternoon:

The PII Check: A simple function that checks if first_name or last_name is inside the password string. (You’d be shocked how many people use “Ali123”).
2FA Geography: If a user passes 2FA, I log the location. If the next 2FA attempt is from a different continent 5 minutes later, I alert the Admin (not the user).
Lean Alerting: Instead of a heavy SOC deployment, we routed critical risk scores directly to a dedicated alerting channel. Simple, high-signal, and zero-overhead.

Pros & Cons: The “Build vs Buy” Reality

Pros (The Pragmatic Win):

Cost: ~$50/mo vs $50k/yr for Enterprise ITDR.
Velocity: Zero latency impact on login (Async).
Control: I own the “Risk Score” logic, not a black-box vendor.

Cons (The Trade-off):

Maintenance: I have to maintain the Sidecar and the Breach DB connections.
Signal Noise: Without advanced AI models, I miss subtle “slow” attacks that vendor tools catch.
Scale: Managing a “Breach DB” cache at scale is non-trivial.

The Outcome: The 19% Reality

Even with this “Pragmatic MVP,” we surfaced a metric that fundamentally changed our approach: When we back-tested our active accounts against the hash DB, we found a 19.4% match rate.

Nearly 1 in 5 users were using unhealthy passwords. A match didn’t mean a hacker was in their account, just that their lock was fundamentally weak. However, blocking them would have triggered a support nightmare and destroyed our user velocity, proving my hypothesis that static rules are a liability. Instead of a binary block, I architected a “Nudge Protocol.”

Eliminate Front-Door Friction: Instead of degrading the user experience with CAPTCHAs or complex IP filtering, we focused our threat detection entirely on deterministic, background password auditing.
Business-Aligned Remediation: I collaborated directly with GTM and R&D to define the response. Knowing that forced resets would cause unacceptable churn, we architected an asynchronous, in-app nudge. The architectural goal was to drive adoption upon deployment without interrupting the login flow.
45-Day Time-to-Value: We went from initial scope to surfacing the 19% risk metric in just 45 days. While the final mitigation phase was paused pending the rollout of core notification infrastructure, the detection architecture successfully proved the thesis: You don’t need to block the user to audit the password.

The Lesson: You don’t need a multi-million dollar budget to have enterprise-grade security. You just need to audit the password and protect the user’s velocity.

Posted in Blog by Ali Kaba