Last Modified: February 4th, 2026

# ADR 013: Autocomplete Platform Selection

By
Brent Sakata

Title: Platform Selection for Autocomplete Functionality in IBP Orders

Status: Proposed


# Context

# What is the background to this decision?

IBP Orders requires autocomplete functionality across multiple use cases within the UI for various entities such as orders and users based on different attributes. Currently, IBP does not have a platform-level search solution to support these requirements.

Key Business Drivers:

  • User Experience: Best practices recommend p99 response times < 200ms for autocomplete. Nielsen group studies show that 0.1 seconds creates the feeling of instantaneous response—critical for direct manipulation UI patterns.
  • Feature Requirements: The solution must support:
    • Wildcard/contains searching (prefix searching alone is insufficient)
    • Multiple entity indexes (Orders and Recipients/Users)
    • Complex search queries with filtering, grouping, and sorting capabilities
    • High performance at scale

Technical Challenges:

  • Database-level solutions (PostgreSQL pg_trgm, MySQL n-gram) are CPU-intensive and do not scale adequately for the required performance targets and search requirements
  • Existing database infrastructure cannot reliably meet p99 < 50ms server response time targets needed to achieve the 200ms end-to-end goal

# Decision

# What decision have you made?

We will deploy self-hosted Typesense as the autocomplete platform for IBP Orders, running on AWS ECS with a High-Availability (HA) cluster configuration. (Pre prod will be a single task)

Infrastructure

graph TB  
    subgraph "Client Layer"  
        User[👤 End Users]  
        CF[☁️ Cloudflare CDN]  
        Angular[🅰️ Angular App]  
    end  
     
    subgraph "AWS Cloud - us-east-1"  
        subgraph "Edge/API Layer"  
            APIGW[🚪 API Gateway<br/>REST API]  
            LambdaAuth[λ IBP Authorizer<br/>Token Validation<br/>User Identity]  
        end  
         
        subgraph "Processing Layer"  
            LambdaProxy[λ Proxy Function<br/>Add Tenant Filters<br/>Query Transformation]  
        end  
         
        subgraph "VPC - [10.0.0.0/16](http://10.0.0.0/16)"  
            subgraph "Load Balancing"  
                ALB[⚖️ Application Load Balancer<br/>Internal<br/>Health Checks]  
            end  
             
            subgraph "Availability Zone 1a"  
                ECSTask1[🐳 ECS Fargate Task 1<br/>Typesense Node<br/>1 vCPU, 4 GB RAM<br/>Leader/Follower]  
                EFS1[📁 EFS Mount]  
            end  
             
            subgraph "Availability Zone 1b"  
                ECSTask2[🐳 ECS Fargate Task 2<br/>Typesense Node<br/>1 vCPU, 4 GB RAM<br/>Follower]  
                EFS2[📁 EFS Mount]  
            end  
             
            subgraph "Availability Zone 1c"  
                ECSTask3[🐳 ECS Fargate Task 3<br/>Typesense Node<br/>1 vCPU, 4 GB RAM<br/>Follower]  
                EFS3[📁 EFS Mount]  
            end  
             
            EFSStorage[(💾 EFS File System<br/>Shared Storage<br/>Multi-AZ)]  
        end  
         
        subgraph "Supporting Services"  
            CloudWatch[📊 CloudWatch<br/>Logs & Metrics]  
            Secrets[🔐 Secrets Manager<br/>API Keys]  
            VPCLink[🔗 VPC Link<br/>Private Integration]  
        end  
    end  
     
    subgraph "Raft Cluster Communication"  
        Raft[⚡ Raft Consensus Protocol<br/>Port 8107<br/>Leader Election & Replication]  
    end  
     
    %% User Flow  
    User -->|HTTPS| CF  
    CF -->|Cached Assets| Angular  
    Angular -->|API Requests<br/>+ Auth Token| APIGW  
     
    %% API Gateway Flow  
    APIGW -->|Authorize Request| LambdaAuth  
    LambdaAuth -->|Validated<br/>User Context| APIGW  
    APIGW -->|Forward Request<br/>+ User Info| LambdaProxy  
     
    %% Lambda Proxy Flow  
    LambdaProxy -->|Add tenant_id filter<br/>Transform query| VPCLink  
    VPCLink -->|Private Network| ALB  
     
    %% Load Balancer Flow  
    ALB -->|Round Robin<br/>Health Check| ECSTask1  
    ALB -->|Round Robin<br/>Health Check| ECSTask2  
    ALB -->|Round Robin<br/>Health Check| ECSTask3  
     
    %% EFS Storage  
    ECSTask1 -.->|Mount /data| EFS1  
    ECSTask2 -.->|Mount /data| EFS2  
    ECSTask3 -.->|Mount /data| EFS3  
    EFS1 -.->|Multi-AZ Replication| EFSStorage  
    EFS2 -.->|Multi-AZ Replication| EFSStorage  
    EFS3 -.->|Multi-AZ Replication| EFSStorage  
     
    %% Raft Communication  
    ECSTask1 <-->|Raft Peering<br/>8107| Raft  
    ECSTask2 <-->|Raft Peering<br/>8107| Raft  
    ECSTask3 <-->|Raft Peering<br/>8107| Raft  
     
    %% Supporting Services
    LambdaProxy -.->|Get Config| Secrets  
    ECSTask1 -.->|Send Logs/Metrics| CloudWatch  
    ECSTask2 -.->|Send Logs/Metrics| CloudWatch  
    ECSTask3 -.->|Send Logs/Metrics| CloudWatch  
    ALB -.->|Send Metrics| CloudWatch  
     
    %% Response Flow (dotted lines for clarity)  
    ECSTask1 -.->|Search Results| ALB  
    ECSTask2 -.->|Search Results| ALB  
    ECSTask3 -.->|Search Results| ALB  
    ALB -.->|Response| LambdaProxy  
    LambdaProxy -.->|Filtered Results| APIGW  
    APIGW -.->|JSON Response| Angular  
     
    %% Styling  
    classDef userLayer fill:#e1f5ff,stroke:#01579b,stroke-width:2px  
    classDef apiLayer fill:#fff3e0,stroke:#e65100,stroke-width:2px  
    classDef computeLayer fill:#f3e5f5,stroke:#4a148c,stroke-width:2px  
    classDef storageLayer fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px  
    classDef networkLayer fill:#fce4ec,stroke:#880e4f,stroke-width:2px  
    classDef supportLayer fill:#f5f5f5,stroke:#424242,stroke-width:2px  
     
    class User,CF,Angular userLayer
    class APIGW,LambdaAuth apiLayer
    class LambdaProxy,ECSTask1,ECSTask2,ECSTask3 computeLayer
    class EFSStorage,EFS1,EFS2,EFS3 storageLayer
    class ALB,VPCLink networkLayer
    class CloudWatch,Secrets,Raft supportLayer

Autocomplete Flow

sequenceDiagram  
    actor User  
    participant CF as Cloudflare<br/>(Angular App)  
    participant APIGW as API Gateway  
    participant Auth as Lambda Authorizer  
    participant Proxy as Lambda Proxy  
    participant ECS as ECS Fargate<br/>(Typesense)  
     
    User->>CF: Types in search box  
    CF->>CF: Debounce input (300ms)  
    CF->>APIGW: POST /search<br/>Authorization: Bearer {token}<br/>Body: {query: "prod"}  
     
    APIGW->>Auth: Validate token  
    Auth->>Auth: Verify JWT<br/>Extract user_id, tenant_id  
    Auth-->>APIGW: Auth context<br/>{user_id, tenant_id, role}  
     
    alt Authentication Failed  
        APIGW-->>CF: 401 Unauthorized  
        CF-->>User: Show error  
    else Authentication Successful  
        APIGW->>Proxy: Forward request<br/>+ auth context  
         
        Proxy->>Proxy: Build Typesense query<br/>filter_by: tenant_id=123  
         
        Proxy->>ECS: POST /collections/products/documents/search<br/>X-TYPESENSE-API-KEY: {admin_key}<br/>Body: {q: "prod", filter_by: "tenant_id:=123"}  
         
        ECS->>ECS: Search index with filter  
        ECS-->>Proxy: Search results<br/>[{id: 1, name: "Product A"}]  
         
        Proxy-->>APIGW: 200 OK<br/>Filtered results  
        APIGW-->>CF: Results  
        CF-->>User: Display suggestions  
    end

# Rationale

# Why did you choose this decision?

This decision prioritizes long-term cost efficiency, operational control, and alignment with existing AWS infrastructure while maintaining excellent performance.

1. Performance Excellence

  • Self-hosted Typesense delivers p99 latency of 20-50ms (same AZ) for wildcard searches—well below the 50ms server-side target
  • Supports all required query types: prefix, contains, fuzzy matching, and complex filtering
  • Performance scales linearly with infrastructure; same latency characteristics as cloud version
  • Cache hit rates of 70-85% on typical autocomplete patterns further improve user experience
  • HA cluster eliminates single points of failure for internal tool reliability

2. Exceptional Cost Optimization

Configuration Fargate ALB Storage NAT/VPC Other Total
Small (1 vCPU, 4 GB × 3) $128 $25 $12-15 $10-37 $8 $182-212
Medium (2 vCPU, 8 GB × 3) $255 $30 $24-30 $15-42 $17 $341-374
Large (4 vCPU, 16 GB × 3) $510 $35 $32-40 $20-45 $25 $610-650

Supports 3.5M records efficiently with predictable costs for foreseeable growth to 10M+ records

3. Infrastructure Alignment

  • Leverages existing AWS account, VPC, and IAM infrastructure
  • ECS deployment aligns with current DevOps tooling and CI/CD pipelines
  • Data remains in-house within controlled AWS environment
  • Direct integration with existing observability stack (CloudWatch, DataDog, etc.)

4. Operational Control

  • Full control over performance tuning, ranking algorithms, and index configuration
  • Can implement custom analyzers and filters specific to IBP Orders domain
  • Data remains within organizational control—no third-party dependency
  • Direct access to all operational metrics and logs for debugging
  • Ability to scale independently of SaaS provider tiers

5. Feature Set Alignment

  • Provides typo tolerance (2 typos), faceted search, complex filtering, and sorting—all required capabilities
  • Supports all IBP Order use cases: wildcard searching, multiple entity indexes, and complex grouping
  • Open-source foundation enables custom feature development if needed
  • No artificial limits on query throughput or data size

6. Future Flexibility and Scalability

  • Linear scaling path: add more powerful instances or multi-region HA without architectural changes
  • Open-source nature prevents vendor lock-in; can fork or migrate to OpenSearch if business needs diverge
  • Skills and infrastructure investments directly benefit AWS ecosystem knowledge
  • Option to implement advanced features (custom ranking, ML-based search, etc.) without SaaS limitations

7. Team Fit

  • Leverages existing AWS and DevOps expertise within organization
  • Infrastructure-as-Code (Terraform/CloudFormation) templates integrate with current practices
  • Straightforward API comparable to cloud alternatives reduces integration complexity
  • Excellent documentation and active open-source community support

# Implications

# What are the implications of this decision?

1. People/Training

  • Infrastructure expertise required: Team members (DevOps/SRE) need familiarity with ECS, auto-scaling, and infrastructure monitoring
  • Developers need 4-8 hours for Typesense API and integration best practices (same as cloud option)
  • Recommend 40-80 hours of architectural and implementation work upfront
  • Plan 1-2 hours/month ongoing maintenance per DevOps engineer

2. Process Adjustments

  • Infrastructure as Code: Develop Terraform/CloudFormation templates for cluster provisioning, backup, and disaster recovery
  • Data Pipeline: Establish ETL process to sync Order and User entities to Typesense (daily or event-driven)
  • Relevance Tuning: Initially configure relevance settings for different entity types; plan quarterly reviews based on user feedback
  • Monitoring & Alerting: Integrate with CloudWatch, DataDog, or similar; set alerts for cluster health, disk space, memory usage, and query latency p99 exceeding 100ms
  • Backup Strategy: Implement automated daily snapshots to S3; document recovery procedures
  • Security: Manage API key rotation, network policies, and RBAC within VPC

3. Tooling

  • AWS ECS: Container orchestration for Typesense deployment
  • Terraform/CloudFormation: Infrastructure as Code for reproducible deployments
  • Typesense JavaScript client library for backend/frontend integration
  • AWS Systems Manager: Secrets management for API keys and credentials
  • CloudWatch/DataDog: Monitoring, logging, and alerting
  • Data sync tooling: Develop ETL using Lambda, Glue, or managed message queues (SQS/SNS)
  • Optional: Add front-end autocomplete component library (e.g., instantsearch.js-compatible solutions)

4. Risks and Mitigation

Risk Severity Mitigation
Initial Setup Complexity Medium Allocate 40-80 hours for architecture, IaC development, and deployment automation; leverage AWS best practices and existing tooling
Operational Overhead Medium Assign responsibility to DevOps/SRE team; document runbooks for common tasks (scaling, backups, incident response); plan 1-2 hours/month baseline maintenance
Infrastructure Failures Medium-High Implement HA cluster across multiple AZs; automated failover via ECS service discovery; regular disaster recovery drills (quarterly)
Cluster Underprovisioning Medium Monitor QPS, latency, and resource utilization weekly; establish clear scaling triggers (e.g., p99 latency > 100ms or CPU > 75%); scale up proactively
Data Sync Latency Low-Medium Implement event-driven index updates for time-sensitive entities (Orders); schedule batch sync for less critical data (Users); monitor index staleness; target < 5 min for Orders
Search Volume Growth Exceeds Capacity Low Scaling from r6g.large to r6g.xlarge adds ~$200/month but handles 3× throughput; plan cost impact in quarterly budgeting
Operational Knowledge Silos Low Pair programming on infrastructure setup; documentation in wiki; cross-train at least 2 team members on cluster management

# Trade-Offs

# What are the pros and cons of this decision?

Benefits:

  • Superior Long-Term Cost: $200-400/month regardless of search volume; breaks even vs. Typesense Cloud after ~10 months; dramatically cheaper than Algolia ($280-830/month) at scale
  • Excellent Performance: 20-50ms p99 latency consistently beats requirements; scales linearly with infrastructure investment
  • Full Operational Control: Custom tuning, data sovereignty, and zero vendor lock-in; infrastructure remains within organizational control
  • Scalability: Linear scaling path without tier constraints; can handle 10M+ records and 1000+ QPS with vertical scaling
  • AWS Integration: Native AWS services (ECS, CloudWatch, IAM, VPC) simplify operations; leverages existing DevOps expertise
  • Future Flexibility: Open-source foundation enables forking, custom development, or seamless migration to OpenSearch if needed
  • No Vendor Lock-In: Skills and investments transfer directly to broader DevOps/AWS ecosystem
  • Modern Features: Built-in typo tolerance, faceted search, and complex query support with option for custom enhancements

Drawbacks:

  • Higher Initial Setup Time: 40-80 hours upfront for architecture, IaC, and cluster provisioning (vs. 3-6 hours for cloud)
  • Operational Overhead: Requires 1-2 hours/month maintenance (monitoring, patching, scaling decisions); lower than OpenSearch but higher than cloud solutions
  • Infrastructure Complexity: Must manage HA across AZs, failover, backup/recovery, security patches, and incident response
  • Operational Risk: Infrastructure failures become team responsibility; requires runbook documentation and incident response training
  • DevOps Expertise Required: Team must have AWS, ECS, and observability stack competency; knowledge silos create risk
  • Scaling Complexity: Requires proactive monitoring and planned scaling; unexpected traffic spikes may impact latency until scaled (vs. automatic scaling in cloud)
  • No Built-in Analytics: Must develop custom tracking for search patterns, zero-result queries, and user behavior
  • Single Region (Initial): Cross-region latency 60-120ms; multi-region HA requires significant additional infrastructure

# Key Evaluation Metrics

# How will success be measured?

Define clear criteria to determine if this decision solves the intended problems:

Metric Target How Measured Review Cadence
p99 Query Latency < 50ms CloudWatch metrics / custom dashboards Weekly
End-to-End Response Time < 200ms Client-side instrumentation (RUM) Weekly
Index Freshness < 5 minutes for Orders, < 1 hour for Users Sync pipeline monitoring Daily
Cluster Availability > 99.5% ECS service health / custom monitoring Weekly
Infrastructure Health CPU 40-70%, Memory 50-75%, Disk > 20% free CloudWatch alarms Real-time
Search Success Rate < 5% zero-result queries (tunable by entity type) Custom query analytics Bi-weekly
Deployment Time Infrastructure setup + integration < 12 weeks Project tracking Completion metrics
Monthly Search Volume Baseline within 3 months Custom instrumentation Monthly
Cost vs. Budget Actual cost within ±10% of $200-400/month forecast AWS billing integration Monthly
Mean Time to Recovery (MTTR) < 30 minutes for common failures Incident tracking Quarterly review
DevOps Team Satisfaction Operational burden reasonable (1-2 hrs/month) Team feedback Quarterly retrospectives

Scaling Decision Gate (Quarterly Review):

  • If p99 latency > 100ms or CPU > 80%: Upgrade to next tier (e.g., t3.medium → r6g.large)
  • If search volume growth > 50% YoY: Plan vertical scaling; evaluate multi-region HA if global expansion needed
  • If infrastructure costs exceed budget by > 15%: Review query patterns and optimize indexing strategy

# Cost Analysis - Self-Hosted Typesense

# Infrastructure Costs

Configuration Fargate ALB Storage NAT/VPC Other Total
Small (1 vCPU, 4 GB × 3) $128 $25 $12-15 $10-37 $8 $182-212
Medium (2 vCPU, 8 GB × 3) $255 $30 $24-30 $15-42 $17 $341-374
Large (4 vCPU, 16 GB × 3) $510 $35 $32-40 $20-45 $25 $610-650

# Scaling Cost Impact

# Comprehensive Cost & Capacity Comparison

# Small Cluster

Solution Monthly Cost Dataset Queries/Day Documents Concurrent Users
Typesense Cloud (Prod-1) $91 10-30 GB 5,000-20,000 < 10M 50-200
ECS Fargate (1vCPU, 4GB × 3) $182 10-30 GB 5,000-20,000 < 10M 50-200
ECS Fargate + Spot $120 10-30 GB 5,000-20,000 < 10M 50-200
EC2 (3 × t3.medium) $154 10-30 GB 5,000-20,000 < 10M 50-200
EC2 (3 × r6g.large) $225 30-50 GB 10,000-30,000 10M-20M 100-300
OpenSearch Small $900 50-200 GB 20,000-100,000 20M-100M 200-1,000

# Medium Cluster

Solution Monthly Cost Dataset Queries/Day Documents Concurrent Users
Typesense Cloud (Prod-2) $182 30-80 GB 20,000-100,000 10M-40M 200-800
ECS Fargate (2vCPU, 8GB × 3) $341 30-80 GB 20,000-100,000 10M-40M 200-800
ECS Fargate + Spot $230 30-80 GB 20,000-100,000 10M-40M 200-800
EC2 (3 × r6g.large) $313 30-80 GB 20,000-100,000 10M-40M 200-800
EC2 (3 × r6g.xlarge) $435 50-120 GB 50,000-200,000 20M-80M 500-1,500
OpenSearch Medium $1,725 500GB-1.5TB 100,000-500,000 100M-500M 1,000-5,000

# Large Cluster

Solution Monthly Cost Dataset Queries/Day Documents Concurrent Users
Typesense Cloud (Prod-4) $365 80-150 GB 100,000-300,000 40M-100M 800-2,000
ECS Fargate (4vCPU, 16GB × 3) $610 80-150 GB 100,000-300,000 40M-100M 800-2,000
ECS Fargate + Spot $410 80-150 GB 100,000-300,000 40M-100M 800-2,000
EC2 (3 × r6g.xlarge) $540 80-150 GB 100,000-300,000 40M-100M 800-2,000
EC2 (3 × r6g.2xlarge) $870 100-200 GB 200,000-500,000 80M-200M 1,500-3,000
OpenSearch Large $3,400+ 2TB-5TB 500,000-2M 500M-2B 5,000-20,000

# Conclusion

# What is the final recommendation?

Deploy self-hosted Typesense on AWS ECS as IBP Orders' autocomplete platform.

This decision prioritizes long-term value creation and operational control while maintaining excellent performance:

  1. Superior Economics: $200-400/month infrastructure cost with minimal maintenance (fargate instance is managed by AWS)
  2. Operational Control: Full transparency and customization; data remains within organizational control
  3. Technical Soundness: Exceeds all performance and feature requirements (< 50ms p99 latency); scales linearly to 10M+ records
  4. AWS Alignment: Leverages existing infrastructure, expertise, and tooling; no vendor lock-in
  5. Opensearch Evaluation: Evaluate moving to Opensearch once IBP Search is implemented

Why benefits outweigh challenges:

  • $200-300/month fixed cost is dramatically cheaper than Algolia ($280-830/month) and AWS Opensearch and provides better long-term value than Typesense Cloud
  • Performance targets (< 50ms p99) are exceeded; HA configuration ensures reliability for internal tool
  • DevOps overhead (1-2 hours/month) is reasonable given cost savings and organizational AWS expertise
  • Open-source foundation and AWS integration enable future optimization without vendor constraints

# Success Criteria

  • Infrastructure deployed and tested across 2 AZs with automated failover
  • Launch Orders autocomplete with p99 < 50ms in production
  • Achieve < 5% zero-result search queries after relevance tuning
  • Maintain > 99.5% uptime (HA validation during testing)
  • Infrastructure cost tracking within 10% of $200-400/month forecast
  • Operational team reports manageable 1-2 hours/month maintenance burden
  • Feature-complete delivery within 12 weeks from decision

# References (Optional)