Last Modified: February 4th, 2026

# ADR 013: Autocomplete Platform Selection

Brent Sakata

Title: Platform Selection for Autocomplete Functionality in IBP Orders

Status: Proposed

# Context

# What is the background to this decision?

IBP Orders requires autocomplete functionality across multiple use cases within the UI for various entities such as orders and users based on different attributes. Currently, IBP does not have a platform-level search solution to support these requirements.

Key Business Drivers:

User Experience: Best practices recommend p99 response times < 200ms for autocomplete. Nielsen group studies show that 0.1 seconds creates the feeling of instantaneous response—critical for direct manipulation UI patterns.
Feature Requirements: The solution must support:
- Wildcard/contains searching (prefix searching alone is insufficient)
- Multiple entity indexes (Orders and Recipients/Users)
- Complex search queries with filtering, grouping, and sorting capabilities
- High performance at scale

Technical Challenges:

Database-level solutions (PostgreSQL pg_trgm, MySQL n-gram) are CPU-intensive and do not scale adequately for the required performance targets and search requirements
Existing database infrastructure cannot reliably meet p99 < 50ms server response time targets needed to achieve the 200ms end-to-end goal

# Decision

# What decision have you made?

We will deploy self-hosted Typesense as the autocomplete platform for IBP Orders, running on AWS ECS with a High-Availability (HA) cluster configuration. (Pre prod will be a single task)

Infrastructure

graph TB  
    subgraph "Client Layer"  
        User[👤 End Users]  
        CF[☁️ Cloudflare CDN]  
        Angular[🅰️ Angular App]  
    end  
     
    subgraph "AWS Cloud - us-east-1"  
        subgraph "Edge/API Layer"  
            APIGW[🚪 API Gateway<br/>REST API]  
            LambdaAuth[λ IBP Authorizer<br/>Token Validation<br/>User Identity]  
        end  
         
        subgraph "Processing Layer"  
            LambdaProxy[λ Proxy Function<br/>Add Tenant Filters<br/>Query Transformation]  
        end  
         
        subgraph "VPC - [10.0.0.0/16](http://10.0.0.0/16)"  
            subgraph "Load Balancing"  
                ALB[⚖️ Application Load Balancer<br/>Internal<br/>Health Checks]  
            end  
             
            subgraph "Availability Zone 1a"  
                ECSTask1[🐳 ECS Fargate Task 1<br/>Typesense Node<br/>1 vCPU, 4 GB RAM<br/>Leader/Follower]  
                EFS1[📁 EFS Mount]  
            end  
             
            subgraph "Availability Zone 1b"  
                ECSTask2[🐳 ECS Fargate Task 2<br/>Typesense Node<br/>1 vCPU, 4 GB RAM<br/>Follower]  
                EFS2[📁 EFS Mount]  
            end  
             
            subgraph "Availability Zone 1c"  
                ECSTask3[🐳 ECS Fargate Task 3<br/>Typesense Node<br/>1 vCPU, 4 GB RAM<br/>Follower]  
                EFS3[📁 EFS Mount]  
            end  
             
            EFSStorage[(💾 EFS File System<br/>Shared Storage<br/>Multi-AZ)]  
        end  
         
        subgraph "Supporting Services"  
            CloudWatch[📊 CloudWatch<br/>Logs & Metrics]  
            Secrets[🔐 Secrets Manager<br/>API Keys]  
            VPCLink[🔗 VPC Link<br/>Private Integration]  
        end  
    end  
     
    subgraph "Raft Cluster Communication"  
        Raft[⚡ Raft Consensus Protocol<br/>Port 8107<br/>Leader Election & Replication]  
    end  
     
    %% User Flow  
    User -->|HTTPS| CF  
    CF -->|Cached Assets| Angular  
    Angular -->|API Requests<br/>+ Auth Token| APIGW  
     
    %% API Gateway Flow  
    APIGW -->|Authorize Request| LambdaAuth  
    LambdaAuth -->|Validated<br/>User Context| APIGW  
    APIGW -->|Forward Request<br/>+ User Info| LambdaProxy  
     
    %% Lambda Proxy Flow  
    LambdaProxy -->|Add tenant_id filter<br/>Transform query| VPCLink  
    VPCLink -->|Private Network| ALB  
     
    %% Load Balancer Flow  
    ALB -->|Round Robin<br/>Health Check| ECSTask1  
    ALB -->|Round Robin<br/>Health Check| ECSTask2  
    ALB -->|Round Robin<br/>Health Check| ECSTask3  
     
    %% EFS Storage  
    ECSTask1 -.->|Mount /data| EFS1  
    ECSTask2 -.->|Mount /data| EFS2  
    ECSTask3 -.->|Mount /data| EFS3  
    EFS1 -.->|Multi-AZ Replication| EFSStorage  
    EFS2 -.->|Multi-AZ Replication| EFSStorage  
    EFS3 -.->|Multi-AZ Replication| EFSStorage  
     
    %% Raft Communication  
    ECSTask1 <-->|Raft Peering<br/>8107| Raft  
    ECSTask2 <-->|Raft Peering<br/>8107| Raft  
    ECSTask3 <-->|Raft Peering<br/>8107| Raft  
     
    %% Supporting Services
    LambdaProxy -.->|Get Config| Secrets  
    ECSTask1 -.->|Send Logs/Metrics| CloudWatch  
    ECSTask2 -.->|Send Logs/Metrics| CloudWatch  
    ECSTask3 -.->|Send Logs/Metrics| CloudWatch  
    ALB -.->|Send Metrics| CloudWatch  
     
    %% Response Flow (dotted lines for clarity)  
    ECSTask1 -.->|Search Results| ALB  
    ECSTask2 -.->|Search Results| ALB  
    ECSTask3 -.->|Search Results| ALB  
    ALB -.->|Response| LambdaProxy  
    LambdaProxy -.->|Filtered Results| APIGW  
    APIGW -.->|JSON Response| Angular  
     
    %% Styling  
    classDef userLayer fill:#e1f5ff,stroke:#01579b,stroke-width:2px  
    classDef apiLayer fill:#fff3e0,stroke:#e65100,stroke-width:2px  
    classDef computeLayer fill:#f3e5f5,stroke:#4a148c,stroke-width:2px  
    classDef storageLayer fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px  
    classDef networkLayer fill:#fce4ec,stroke:#880e4f,stroke-width:2px  
    classDef supportLayer fill:#f5f5f5,stroke:#424242,stroke-width:2px  
     
    class User,CF,Angular userLayer
    class APIGW,LambdaAuth apiLayer
    class LambdaProxy,ECSTask1,ECSTask2,ECSTask3 computeLayer
    class EFSStorage,EFS1,EFS2,EFS3 storageLayer
    class ALB,VPCLink networkLayer
    class CloudWatch,Secrets,Raft supportLayer

Autocomplete Flow

sequenceDiagram  
    actor User  
    participant CF as Cloudflare<br/>(Angular App)  
    participant APIGW as API Gateway  
    participant Auth as Lambda Authorizer  
    participant Proxy as Lambda Proxy  
    participant ECS as ECS Fargate<br/>(Typesense)  
     
    User->>CF: Types in search box  
    CF->>CF: Debounce input (300ms)  
    CF->>APIGW: POST /search<br/>Authorization: Bearer {token}<br/>Body: {query: "prod"}  
     
    APIGW->>Auth: Validate token  
    Auth->>Auth: Verify JWT<br/>Extract user_id, tenant_id  
    Auth-->>APIGW: Auth context<br/>{user_id, tenant_id, role}  
     
    alt Authentication Failed  
        APIGW-->>CF: 401 Unauthorized  
        CF-->>User: Show error  
    else Authentication Successful  
        APIGW->>Proxy: Forward request<br/>+ auth context  
         
        Proxy->>Proxy: Build Typesense query<br/>filter_by: tenant_id=123  
         
        Proxy->>ECS: POST /collections/products/documents/search<br/>X-TYPESENSE-API-KEY: {admin_key}<br/>Body: {q: "prod", filter_by: "tenant_id:=123"}  
         
        ECS->>ECS: Search index with filter  
        ECS-->>Proxy: Search results<br/>[{id: 1, name: "Product A"}]  
         
        Proxy-->>APIGW: 200 OK<br/>Filtered results  
        APIGW-->>CF: Results  
        CF-->>User: Display suggestions  
    end

# Rationale

# Why did you choose this decision?

This decision prioritizes long-term cost efficiency, operational control, and alignment with existing AWS infrastructure while maintaining excellent performance.

1. Performance Excellence

Self-hosted Typesense delivers p99 latency of 20-50ms (same AZ) for wildcard searches—well below the 50ms server-side target
Supports all required query types: prefix, contains, fuzzy matching, and complex filtering
Performance scales linearly with infrastructure; same latency characteristics as cloud version
Cache hit rates of 70-85% on typical autocomplete patterns further improve user experience
HA cluster eliminates single points of failure for internal tool reliability

2. Exceptional Cost Optimization

Configuration	Fargate	ALB	Storage	NAT/VPC	Other	Total
Small (1 vCPU, 4 GB × 3)	$128	$25	$12-15	$10-37	$8	$182-212
Medium (2 vCPU, 8 GB × 3)	$255	$30	$24-30	$15-42	$17	$341-374
Large (4 vCPU, 16 GB × 3)	$510	$35	$32-40	$20-45	$25	$610-650

Supports 3.5M records efficiently with predictable costs for foreseeable growth to 10M+ records

3. Infrastructure Alignment

Leverages existing AWS account, VPC, and IAM infrastructure
ECS deployment aligns with current DevOps tooling and CI/CD pipelines
Data remains in-house within controlled AWS environment
Direct integration with existing observability stack (CloudWatch, DataDog, etc.)

4. Operational Control

Full control over performance tuning, ranking algorithms, and index configuration
Can implement custom analyzers and filters specific to IBP Orders domain
Data remains within organizational control—no third-party dependency
Direct access to all operational metrics and logs for debugging
Ability to scale independently of SaaS provider tiers

5. Feature Set Alignment

Provides typo tolerance (2 typos), faceted search, complex filtering, and sorting—all required capabilities
Supports all IBP Order use cases: wildcard searching, multiple entity indexes, and complex grouping
Open-source foundation enables custom feature development if needed
No artificial limits on query throughput or data size

6. Future Flexibility and Scalability

Linear scaling path: add more powerful instances or multi-region HA without architectural changes
Open-source nature prevents vendor lock-in; can fork or migrate to OpenSearch if business needs diverge
Skills and infrastructure investments directly benefit AWS ecosystem knowledge
Option to implement advanced features (custom ranking, ML-based search, etc.) without SaaS limitations

7. Team Fit

Leverages existing AWS and DevOps expertise within organization
Infrastructure-as-Code (Terraform/CloudFormation) templates integrate with current practices
Straightforward API comparable to cloud alternatives reduces integration complexity
Excellent documentation and active open-source community support

# Implications

# What are the implications of this decision?

1. People/Training

Infrastructure expertise required: Team members (DevOps/SRE) need familiarity with ECS, auto-scaling, and infrastructure monitoring
Developers need 4-8 hours for Typesense API and integration best practices (same as cloud option)
Recommend 40-80 hours of architectural and implementation work upfront
Plan 1-2 hours/month ongoing maintenance per DevOps engineer

2. Process Adjustments

Infrastructure as Code: Develop Terraform/CloudFormation templates for cluster provisioning, backup, and disaster recovery
Data Pipeline: Establish ETL process to sync Order and User entities to Typesense (daily or event-driven)
Relevance Tuning: Initially configure relevance settings for different entity types; plan quarterly reviews based on user feedback
Monitoring & Alerting: Integrate with CloudWatch, DataDog, or similar; set alerts for cluster health, disk space, memory usage, and query latency p99 exceeding 100ms
Backup Strategy: Implement automated daily snapshots to S3; document recovery procedures
Security: Manage API key rotation, network policies, and RBAC within VPC

3. Tooling

AWS ECS: Container orchestration for Typesense deployment
Terraform/CloudFormation: Infrastructure as Code for reproducible deployments
Typesense JavaScript client library for backend/frontend integration
AWS Systems Manager: Secrets management for API keys and credentials
CloudWatch/DataDog: Monitoring, logging, and alerting
Data sync tooling: Develop ETL using Lambda, Glue, or managed message queues (SQS/SNS)
Optional: Add front-end autocomplete component library (e.g., instantsearch.js-compatible solutions)

4. Risks and Mitigation

Risk	Severity	Mitigation
Initial Setup Complexity	Medium	Allocate 40-80 hours for architecture, IaC development, and deployment automation; leverage AWS best practices and existing tooling
Operational Overhead	Medium	Assign responsibility to DevOps/SRE team; document runbooks for common tasks (scaling, backups, incident response); plan 1-2 hours/month baseline maintenance
Infrastructure Failures	Medium-High	Implement HA cluster across multiple AZs; automated failover via ECS service discovery; regular disaster recovery drills (quarterly)
Cluster Underprovisioning	Medium	Monitor QPS, latency, and resource utilization weekly; establish clear scaling triggers (e.g., p99 latency > 100ms or CPU > 75%); scale up proactively
Data Sync Latency	Low-Medium	Implement event-driven index updates for time-sensitive entities (Orders); schedule batch sync for less critical data (Users); monitor index staleness; target < 5 min for Orders
Search Volume Growth Exceeds Capacity	Low	Scaling from r6g.large to r6g.xlarge adds ~$200/month but handles 3× throughput; plan cost impact in quarterly budgeting
Operational Knowledge Silos	Low	Pair programming on infrastructure setup; documentation in wiki; cross-train at least 2 team members on cluster management

# Trade-Offs

# What are the pros and cons of this decision?

Benefits:

✅ Superior Long-Term Cost: $200-400/month regardless of search volume; breaks even vs. Typesense Cloud after ~10 months; dramatically cheaper than Algolia ($280-830/month) at scale
✅ Excellent Performance: 20-50ms p99 latency consistently beats requirements; scales linearly with infrastructure investment
✅ Full Operational Control: Custom tuning, data sovereignty, and zero vendor lock-in; infrastructure remains within organizational control
✅ Scalability: Linear scaling path without tier constraints; can handle 10M+ records and 1000+ QPS with vertical scaling
✅ AWS Integration: Native AWS services (ECS, CloudWatch, IAM, VPC) simplify operations; leverages existing DevOps expertise
✅ Future Flexibility: Open-source foundation enables forking, custom development, or seamless migration to OpenSearch if needed
✅ No Vendor Lock-In: Skills and investments transfer directly to broader DevOps/AWS ecosystem
✅ Modern Features: Built-in typo tolerance, faceted search, and complex query support with option for custom enhancements

Drawbacks:

❌ Higher Initial Setup Time: 40-80 hours upfront for architecture, IaC, and cluster provisioning (vs. 3-6 hours for cloud)
❌ Operational Overhead: Requires 1-2 hours/month maintenance (monitoring, patching, scaling decisions); lower than OpenSearch but higher than cloud solutions
❌ Infrastructure Complexity: Must manage HA across AZs, failover, backup/recovery, security patches, and incident response
❌ Operational Risk: Infrastructure failures become team responsibility; requires runbook documentation and incident response training
❌ DevOps Expertise Required: Team must have AWS, ECS, and observability stack competency; knowledge silos create risk
❌ Scaling Complexity: Requires proactive monitoring and planned scaling; unexpected traffic spikes may impact latency until scaled (vs. automatic scaling in cloud)
❌ No Built-in Analytics: Must develop custom tracking for search patterns, zero-result queries, and user behavior
❌ Single Region (Initial): Cross-region latency 60-120ms; multi-region HA requires significant additional infrastructure

# Key Evaluation Metrics

# How will success be measured?

Define clear criteria to determine if this decision solves the intended problems:

Metric	Target	How Measured	Review Cadence
p99 Query Latency	< 50ms	CloudWatch metrics / custom dashboards	Weekly
End-to-End Response Time	< 200ms	Client-side instrumentation (RUM)	Weekly
Index Freshness	< 5 minutes for Orders, < 1 hour for Users	Sync pipeline monitoring	Daily
Cluster Availability	> 99.5%	ECS service health / custom monitoring	Weekly
Infrastructure Health	CPU 40-70%, Memory 50-75%, Disk > 20% free	CloudWatch alarms	Real-time
Search Success Rate	< 5% zero-result queries (tunable by entity type)	Custom query analytics	Bi-weekly
Deployment Time	Infrastructure setup + integration < 12 weeks	Project tracking	Completion metrics
Monthly Search Volume	Baseline within 3 months	Custom instrumentation	Monthly
Cost vs. Budget	Actual cost within ±10% of $200-400/month forecast	AWS billing integration	Monthly
Mean Time to Recovery (MTTR)	< 30 minutes for common failures	Incident tracking	Quarterly review
DevOps Team Satisfaction	Operational burden reasonable (1-2 hrs/month)	Team feedback	Quarterly retrospectives

Scaling Decision Gate (Quarterly Review):

If p99 latency > 100ms or CPU > 80%: Upgrade to next tier (e.g., t3.medium → r6g.large)
If search volume growth > 50% YoY: Plan vertical scaling; evaluate multi-region HA if global expansion needed
If infrastructure costs exceed budget by > 15%: Review query patterns and optimize indexing strategy

# Cost Analysis - Self-Hosted Typesense

# Infrastructure Costs

Configuration	Fargate	ALB	Storage	NAT/VPC	Other	Total
Small (1 vCPU, 4 GB × 3)	$128	$25	$12-15	$10-37	$8	$182-212
Medium (2 vCPU, 8 GB × 3)	$255	$30	$24-30	$15-42	$17	$341-374
Large (4 vCPU, 16 GB × 3)	$510	$35	$32-40	$20-45	$25	$610-650

# Scaling Cost Impact

# Comprehensive Cost & Capacity Comparison

# Small Cluster

Solution	Monthly Cost	Dataset	Queries/Day	Documents	Concurrent Users
Typesense Cloud (Prod-1)	$91	10-30 GB	5,000-20,000	< 10M	50-200
ECS Fargate (1vCPU, 4GB × 3)	$182	10-30 GB	5,000-20,000	< 10M	50-200
ECS Fargate + Spot	$120	10-30 GB	5,000-20,000	< 10M	50-200
EC2 (3 × t3.medium)	$154	10-30 GB	5,000-20,000	< 10M	50-200
EC2 (3 × r6g.large)	$225	30-50 GB	10,000-30,000	10M-20M	100-300
OpenSearch Small	$900	50-200 GB	20,000-100,000	20M-100M	200-1,000

# Medium Cluster

Solution	Monthly Cost	Dataset	Queries/Day	Documents	Concurrent Users
Typesense Cloud (Prod-2)	$182	30-80 GB	20,000-100,000	10M-40M	200-800
ECS Fargate (2vCPU, 8GB × 3)	$341	30-80 GB	20,000-100,000	10M-40M	200-800
ECS Fargate + Spot	$230	30-80 GB	20,000-100,000	10M-40M	200-800
EC2 (3 × r6g.large)	$313	30-80 GB	20,000-100,000	10M-40M	200-800
EC2 (3 × r6g.xlarge)	$435	50-120 GB	50,000-200,000	20M-80M	500-1,500
OpenSearch Medium	$1,725	500GB-1.5TB	100,000-500,000	100M-500M	1,000-5,000

# Large Cluster

Solution	Monthly Cost	Dataset	Queries/Day	Documents	Concurrent Users
Typesense Cloud (Prod-4)	$365	80-150 GB	100,000-300,000	40M-100M	800-2,000
ECS Fargate (4vCPU, 16GB × 3)	$610	80-150 GB	100,000-300,000	40M-100M	800-2,000
ECS Fargate + Spot	$410	80-150 GB	100,000-300,000	40M-100M	800-2,000
EC2 (3 × r6g.xlarge)	$540	80-150 GB	100,000-300,000	40M-100M	800-2,000
EC2 (3 × r6g.2xlarge)	$870	100-200 GB	200,000-500,000	80M-200M	1,500-3,000
OpenSearch Large	$3,400+	2TB-5TB	500,000-2M	500M-2B	5,000-20,000

# Conclusion

# What is the final recommendation?

Deploy self-hosted Typesense on AWS ECS as IBP Orders' autocomplete platform.

This decision prioritizes long-term value creation and operational control while maintaining excellent performance:

Superior Economics: $200-400/month infrastructure cost with minimal maintenance (fargate instance is managed by AWS)
Operational Control: Full transparency and customization; data remains within organizational control
Technical Soundness: Exceeds all performance and feature requirements (< 50ms p99 latency); scales linearly to 10M+ records
AWS Alignment: Leverages existing infrastructure, expertise, and tooling; no vendor lock-in
Opensearch Evaluation: Evaluate moving to Opensearch once IBP Search is implemented

Why benefits outweigh challenges:

$200-300/month fixed cost is dramatically cheaper than Algolia ($280-830/month) and AWS Opensearch and provides better long-term value than Typesense Cloud
Performance targets (< 50ms p99) are exceeded; HA configuration ensures reliability for internal tool
DevOps overhead (1-2 hours/month) is reasonable given cost savings and organizational AWS expertise
Open-source foundation and AWS integration enable future optimization without vendor constraints

# Success Criteria

✅ Infrastructure deployed and tested across 2 AZs with automated failover
✅ Launch Orders autocomplete with p99 < 50ms in production
✅ Achieve < 5% zero-result search queries after relevance tuning
✅ Maintain > 99.5% uptime (HA validation during testing)
✅ Infrastructure cost tracking within 10% of $200-400/month forecast
✅ Operational team reports manageable 1-2 hours/month maintenance burden
✅ Feature-complete delivery within 12 weeks from decision

# References (Optional)

Performance Benchmarks: Tables 1-11 in solution_analysis.md provide detailed comparative analysis
Typesense Documentation: https://typesense.org/docs/ and https://typesense.org/docs/guide/high-availability.html
AWS EC2 Pricing: https://aws.amazon.com/ec2/pricing/on-demand/
Nielsen Norman Group Response Time Study: https://www.nngroup.com/articles/website-response-times/
Cost Comparison Models: See solution_analysis.md TCO calculations
Operational Complexity Analysis: Detailed in Operational Complexity section of solution_analysis.md
AWS ECS Best Practices: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/
Terraform AWS Provider: https://registry.terraform.io/providers/hashicorp/aws/