Optimization & Scaling
Overview
Pinchtab multi-instance architecture is production-ready. Phase 7 focuses on optimizing performance, establishing baseline metrics, and validating system behavior under high load.
Goals
- Establish performance baselines (latency, memory, throughput)
- Optimize critical paths (connection pooling, caching, batch operations)
- Validate scalability (100+ concurrent instances)
- Document resource requirements and limits
Section 1: Benchmarking & Metrics
Objectives
Capture baseline performance metrics across single and multi-instance scenarios.
Tasks
1.1 Request Latency Baseline
-
Measure endpoint latency (p50, p95, p99)
/navigate- first request (Chrome init) vs subsequent/snapshot- element retrieval time/actions(click, type, etc.) - per action cost/tabsaggregate - response time with N instances
-
Tools & Methods
- Use
ab(Apache Bench) for load testing - Use
vegeta(Go HTTP load testing) for sustained load - Capture detailed timing with custom harness
- Document environment: CPU, RAM, disk speed
- Use
-
Acceptance Criteria
- Single instance:
/navigate< 2s (first request) - Single instance:
/navigate< 200ms (subsequent) - 10 instances:
/tabsaggregate < 500ms - All operations p99 < 5s
- Single instance:
1.2 Memory per Instance
-
Measure baseline memory footprint
- Idle Chrome process (headless)
- Idle Chrome process (headed)
- Memory after 10 navigations
- Memory after 100 navigations
- Memory leak detection over 1 hour
-
Tools & Methods
- Use
pmapor/proc/[pid]/statuson Linux - Use
Instruments.apporActivity Monitoron macOS - Track RSS (resident set size) and VSZ (virtual size)
- Profile heap with
pprof
- Use
-
Acceptance Criteria
- Idle instance: < 150 MB RSS
- 10 instances: < 2 GB total RSS
- No memory growth > 5 MB/hour idle
- No memory growth > 20 MB/100 navigations
1.3 Concurrent Instance Limits
-
Identify breaking points
- How many instances can run concurrently? (CPU bound? File descriptor bound? Memory?)
- How many concurrent requests per instance?
- Orchestrator overhead as instance count grows
-
Tools & Methods
- Create instances in loop, monitor system metrics
- Run concurrent navigation requests across instances
- Use
ulimitand/proc/sys/fs/file-maxto check limits - Profile orchestrator CPU usage
-
Acceptance Criteria
- 100 instances should be sustainable with 16GB RAM
- Orchestrator should handle 1000 concurrent requests
- Port allocation should handle allocation/release in < 10ms
- No file descriptor leaks (connections cleaned up properly)
Section 2: Optimizations & Improvements
Objectives
Implement targeted performance improvements based on bottleneck analysis.
Tasks
2.1 Connection Pooling
-
HTTP connection reuse
- Orchestrator to instances: Keep-Alive connections
- Bridge handlers: Connection pools for concurrent requests
- Measure impact on latency (target: 10-20% improvement)
-
Implementation Strategy
- Update
proxyToInstance()to use persistent HTTP client - Configure http.Client with max idle connections, timeouts
- Consider connection limits per instance
- Update
-
Testing
- Baseline: 100 sequential requests to single endpoint
- After: 100 sequential requests (measure improvement)
- Concurrent: 50 concurrent requests, measure connection reuse
-
Acceptance Criteria
- Connection pool should reduce latency by 10-20%
- No connection leaks (verify with
ss -tna | grep ESTABLISHED) - Graceful connection cleanup on instance shutdown
2.2 Cache Improvements
-
Query result caching
- Cache
/tabsaggregate results (5-10s TTL) - Cache
/instanceslisting (2-5s TTL) - Cache profile data (1-5m TTL)
- Cache
-
Implementation Strategy
- Add simple in-memory cache with sync.RWMutex
- Invalidate cache on instance create/delete/update
- Log cache hits/misses for monitoring
-
Testing
- Measure reduction in aggregation time
- Verify cache invalidation on changes
- Test with 50+ instances
-
Acceptance Criteria
/tabsresponse time with cache < 50ms (vs 500ms without)- Cache eviction doesn’t break consistency
- Stale reads acceptable for 5-10s window
2.3 Batch Navigation Support
-
Bulk navigation endpoint
POST /instances/{id}/batch-navigate- navigate multiple URLsPOST /batch-navigate- navigate across instances
-
Implementation Strategy
- Accept array of URLs
- Return array of tab IDs
- Consider sequential vs parallel (with concurrency limits)
-
Testing
- Navigate 50 URLs on single instance
- Measure latency improvement vs sequential calls
- Verify all tabs created correctly
-
Acceptance Criteria
- 50-URL batch < 20s (vs 50 * 200ms = 10s sequential + overhead)
- All tabs successfully created
- Error handling for partial failures
Section 3: Scaling Tests & Validation
Objectives
Validate system behavior under realistic production load scenarios.
Tasks
3.1 Stress Test: 100+ Concurrent Instances
-
Test scenario
- Create 100 instances in rapid sequence
- Verify all reach “running” status
- Verify port allocation doesn’t fail
- Verify Chrome initialization completes for all
-
Monitoring
- Track system metrics: CPU, RAM, file descriptors, disk I/O
- Monitor orchestrator response times
- Log any errors or warnings
-
Implementation
- Build test script: Create 100 instances, pause, verify all running
- Run 3x and average results
- Document peak resource usage
-
Acceptance Criteria
- All 100 instances successfully created
- No port allocation conflicts
- CPU usage < 80% during test
- RAM usage < 12 GB
- All instances responsive after test
3.2 Multi-Agent Concurrent Load
-
Test scenario
- Simulate 5-10 agents, each using 2-3 instances
- Each agent navigates different URLs concurrently
- Run for 5 minutes
- Measure latency, errors, resource usage
-
Monitoring
- Track latency distribution (p50, p95, p99)
- Monitor for race conditions or conflicts
- Verify instance isolation (no data leakage)
-
Implementation
- Build multi-agent test harness
- Use goroutines to simulate agents
- Log actions and timing
-
Acceptance Criteria
- p99 latency < 5s for all operations
- 0 errors from isolation violations
- CPU usage stable (not growing over time)
- No connection leaks
3.3 Resource Limits Testing
-
Test orchestrator limits
- What happens at 1000 instances? (Should gracefully degrade or error)
- What happens with port range exhaustion? (Clear error, not crash)
- What happens with memory pressure? (Can’t create more instances)
-
Test instance limits
- How many tabs per instance? (Chrome limit, usually 500+)
- What happens over 500 tabs? (Error or performance degradation)
- How many concurrent requests per instance?
-
Implementation
- Build test for each limit
- Document expected behavior and error messages
- Verify graceful degradation
-
Acceptance Criteria
- All limit conditions handled gracefully (no crashes)
- Clear error messages to users
- Documentation of limits
- Possible auto-scaling recommendations
Success Metrics
| Metric | Current | Target | Priority |
|---|---|---|---|
/navigate latency (p99) | TBD | < 5s | High |
| Instance memory (idle) | TBD | < 150MB | High |
| 100 instances creation time | TBD | < 2 min | High |
| Cache hit ratio | N/A | > 80% | Medium |
| Connection reuse efficiency | N/A | > 90% | Medium |
| Error rate under load | TBD | < 0.1% | High |
Timeline Estimate
- Benchmarking (3.1): 2-3 hours
- Connection pooling (2.1): 1-2 hours
- Cache improvements (2.2): 1-2 hours
- Batch navigation (2.3): 1-2 hours
- Stress testing (3.1-3.3): 2-3 hours
- Documentation & analysis: 1-2 hours
Total: 9-14 hours
Deliverables
-
docs/performance-benchmarks.md- Baseline metrics and analysis -
docs/performance-tuning.md- Optimization recommendations -
scripts/benchmark.sh- Automated benchmarking harness -
scripts/stress-test.sh- 100+ instance stress test -
scripts/multi-agent-load.sh- Concurrent agent simulation - Updated
TESTING.mdwith performance test procedures - Performance dashboard or metrics export (optional)
Notes
- Focus on realistic production scenarios, not synthetic micro-benchmarks
- Profile actual bottlenecks before optimizing (avoid premature optimization)
- Document all findings, even if results are “good enough”
- Consider infrastructure costs (more instances = more resource usage)
- Plan for monitoring/observability from the start