AutoSolver Architecture
Overview
The AutoSolver system provides modular, semantic-first browser automation for Pinchtab. It evolves the existing internal/solver framework (PR #395) into a general-purpose automation agent capable of handling CAPTCHAs, login flows, signup flows, multi-step navigation, and onboarding sequences.
Design Principles
-
Isolation First — The autosolver module (
internal/autosolver/) has zero coupling to chromedp or the bridge runtime. All browser interactions go throughPageandActionExecutorinterfaces. -
Semantic First — The
pinchtab/semanticpackage is the primary intelligence layer. LLM is used only as a last-resort fallback. -
Pluggable Architecture — Solvers are registered at runtime via a
Registry. External solvers (Capsolver, 2Captcha) are optional plugins enabled via configuration. -
Behavior > Spoofing — Solvers interact with pages through legitimate browser actions (clicking, typing) rather than API hacks or fake tokens.
-
Extensible Beyond CAPTCHA — The
IntentTypesystem supports login, signup, onboarding, and navigation flows alongside CAPTCHA solving.
Architecture Diagram
┌─────────────────────────────────────────────────────┐│ AutoSolver ││ ││ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ││ │ Registry │───▶│Core Loop │───▶│ Fallback │ ││ │ (solvers)│ │(detect + │ │ Chain: │ ││ └──────────┘ │ dispatch)│ │ built-in → │ ││ └────┬─────┘ │ semantic → │ ││ │ │ external → │ ││ ▼ │ LLM │ ││ ┌────────────────┐ └──────────────┘ ││ │ Interfaces │ ││ │ Page │ ││ │ ActionExecutor│ ││ │ SemanticEngine│ ││ │ LLMProvider │ ││ └────────┬───────┘ │└───────────────────────┼──────────────────────────────┘ │ (interface boundary) ┌───────────────┼───────────────┐ ▼ ▼ ▼┌──────────────┐ ┌─────────────┐ ┌──────────────┐│adapters/ │ │semantic/ │ │external/ ││pinchtab.go │ │adapter.go │ │capsolver.go ││(chromedp) │ │(semantic pkg)│ │twocaptcha.go │└──────────────┘ └─────────────┘ └──────────────┘Module Structure
internal/autosolver/├── interfaces.go # Page, ActionExecutor, Solver, SemanticEngine, LLMProvider├── types.go # Result, Intent, Config, enums├── autosolver.go # Core orchestrator with fallback chain├── heuristics.go # Title-based intent detection fallback├── registry.go # Instance-level solver registry with priority ordering├── autosolver_test.go # Core loop tests (7 test cases)├── registry_test.go # Registry tests (8 test cases)├── adapters/│ └── pinchtab.go # Bridge adapter (ONLY chromedp import)├── semantic/│ └── adapter.go # Wraps pinchtab/semantic ElementMatcher├── external/│ ├── capsolver.go # Capsolver API skeleton│ └── twocaptcha.go # 2Captcha API skeleton├── llm/│ ├── llm.go # LLM provider skeleton with structured prompts│ └── trim.go # HTML trimming for token efficiency└── solvers/ ├── cloudflare.go # Cloudflare Turnstile (new interface, no chromedp) └── legacy.go # Compatibility shim for existing solver.SolverCore Interfaces
Page
Read-only view of the current browser page:
type Page interface {
URL() string
Title() string
HTML() (string, error)
Screenshot() ([]byte, error)
}
ActionExecutor
Performs browser actions with human-like behavior:
type ActionExecutor interface {
Click(ctx context.Context, x, y float64) error
Type(ctx context.Context, text string) error
WaitFor(ctx context.Context, selector string, timeout time.Duration) error
Evaluate(ctx context.Context, expr string, result interface{}) error
Navigate(ctx context.Context, url string) error
}
Solver
Handles a specific class of challenge:
type Solver interface {
Name() string
Priority() int // Lower = tried first
CanHandle(ctx context.Context, page Page) (bool, error)
Solve(ctx context.Context, page Page, executor ActionExecutor) (*Result, error)
}
Priority ranges:
| Range | Category |
|---|---|
| 0–99 | Built-in solvers (Cloudflare, etc.) |
| 100–199 | Semantic-based solvers |
| 200–299 | External API solvers (Capsolver, 2Captcha) |
| 900+ | LLM fallback |
Fallback Chain
The core loop executes this chain per attempt:
1. Detect intent (semantic engine → title heuristics)2. If intent = normal → return solved3. Find matching solvers (CanHandle = true, sorted by priority)4. Try each solver: a. Built-in (cloudflare, priority 10) b. External (capsolver priority 200, twocaptcha priority 210)5. If all fail AND LLM enabled: a. Trim HTML to ~4KB b. Build structured prompt with attempt history c. Execute LLM-suggested action6. Retry with exponential backoff (500ms → 10s cap)7. Stop after MaxAttempts (default: 8)Configuration
Config File (config.json)
{
"autoSolver": {
"enabled": true,
"maxAttempts": 8,
"solvers": ["cloudflare", "semantic", "capsolver", "twocaptcha"],
"llmProvider": "openai",
"llmFallback": false,
"external": {
"capsolverKey": "CAP-xxx",
"twoCaptchaKey": "xxx"
}
}
}
External provider API keys are configured only in autoSolver.external in the
config file.
Extension Guide
Adding a New Solver
- Create a new file in
internal/autosolver/solvers/:
package solvers
type MySolver struct{}
func (s *MySolver) Name() string { return "myservice" }
func (s *MySolver) Priority() int { return 150 }
func (s *MySolver) CanHandle(ctx context.Context, page autosolver.Page) (bool, error) {
// Check if this solver can handle the current page
return strings.Contains(page.Title(), "my-challenge"), nil
}
func (s *MySolver) Solve(ctx context.Context, page autosolver.Page, executor autosolver.ActionExecutor) (*autosolver.Result, error) {
// Implement solving logic using Page + ActionExecutor
result := &autosolver.Result{SolverUsed: "myservice"}
// ...
return result, nil
}
- Register it with the AutoSolver:
as := autosolver.New(cfg, semanticEngine, nil)
as.Registry().Register(&solvers.MySolver{})
Using with Pinchtab Bridge
// Create Page + Executor from a bridge tab
page, executor, err := adapters.NewFromBridge(bridge, tabID)
// Run the autosolver
as := autosolver.New(autosolver.DefaultConfig(), semanticAdapter, nil)
as.Registry().MustRegister(&solvers.Cloudflare{})
result, err := as.Solve(ctx, page, executor)
if result.Solved {
log.Printf("Solved by %s in %d attempts", result.SolverUsed, result.Attempts)
}
Comparison with browser-use
| Aspect | browser-use | Pinchtab AutoSolver |
|---|---|---|
| Decision engine | LLM per step | Semantic first, LLM fallback |
| DOM handling | Full DOM/screenshot each step | Trimmed HTML, a11y tree |
| Cost | High (LLM every step) | Low (LLM only on failure) |
| Speed | Slow (LLM latency) | Fast (local semantic matching) |
| Determinism | Low (LLM non-deterministic) | High (rule-based + semantic) |
| Modularity | Monolithic | Interface-driven, pluggable |
Backward Compatibility
The existing internal/solver package (PR #395) is not modified. The CloudflareSolver in bridge/cloudflare.go continues to work as-is. A LegacyAdapter shim (solvers/legacy.go) wraps old solver.Solver implementations to work with the new autosolver.Solver interface.