AI-Powered Browser Swarm Automation
Complete User Guide & Tutorial
Control dozens of mobile-sized browsers with AI agents. Each browser has a unique fingerprint, persistent sessions, and full DOM-based automation — all from one desktop app.
Version 1.0 • April 2026
SwarmAI is a desktop application that manages multiple mobile-sized browser instances controlled by AI agents. Instead of writing scripts or macros, you describe what you want in natural language — the AI reads the page DOM, identifies elements, and performs clicks, typing, and scrolling automatically.
Each browser runs as a persistent Chromium context with a unique device fingerprint, making it appear as a real mobile device. Run dozens of browsers simultaneously from one desktop.
| Component | Requirement |
|---|---|
| Operating System | Windows 10/11 (64-bit) |
| RAM | 8 GB minimum, 16 GB recommended for 10+ browsers |
| CPU | 4+ cores recommended |
| Disk | 500 MB for app + ~100 MB per browser profile |
| Internet | Required for LLM API calls and browsing |
| LLM API Key | Anthropic, OpenAI, Gemini, DeepSeek, Grok, or custom |
SwarmAI uses Playwright's Chromium. On first launch, it automatically installs the correct Chromium version. If you need to reinstall manually:
playwright install chromium
Browser sessions are fully persistent. Cookies, logins, local storage, and history are preserved across restarts. Each browser maintains its own isolated profile.
| Element | Function |
|---|---|
| Title Bar | Shows browser name and selection badge |
| URL Bar | Navigate to any URL, shows current page address |
| Screencast | Live view of the browser via CDP screencast |
| Nav Bar | Back, Forward, Refresh, Home buttons |
SwarmAI requires an LLM API key to power its AI agent. The AI reads the browser DOM and decides what actions to take.
| Provider | Recommended Models | Notes |
|---|---|---|
| Anthropic | Claude Sonnet 4, Claude Haiku | Best overall accuracy. Supports prompt caching. |
| OpenAI | GPT-4o, GPT-4o-mini | Fast response times, good accuracy. |
| Google Gemini | Gemini 2.0 Flash | Cost-effective. Uses OpenAI-compatible endpoint. |
| DeepSeek | DeepSeek Chat | Budget option for simple tasks. |
| Grok | Grok-2 | xAI |
| Custom | Any OpenAI-compatible API | Use with any OpenAI-format provider. |
sk-ant-)sk-)If you have an OpenAI-compatible API endpoint (e.g., local model, proxy, third-party):
http://localhost:11434/v1 for Ollama)SwarmAI has a split-panel layout: Chat Panel on the left and Browser Grid on the right.
| Button | Function |
|---|---|
| + Browser | Create a new browser instance |
| Browser List | Select, rename, or manage existing browsers |
| Mirror | Toggle mirror mode (green = ON). Forwards input to all selected browsers. |
| Hidden | Toggle hidden browsers visibility |
| Proxy | Open proxy management panel |
| Settings | Open settings (API keys, agent config, display) |
| Scale | Adjust browser panel and font sizes |
The chat panel has four tabs:
Live execution log. Shows commands, AI reasoning, tool calls, and results in a styled HTML view.
Manage saved commands, AI personas, AI rules, and site cards.
View structured data extracted by the AI agent in Extract Mode.
Raw debug/diagnostic logs for troubleshooting.
| Toggle | Function |
|---|---|
| Stealth | Human-like delays and jitter for anti-detection. |
| Loop | Repeat the command N times with an interval. |
| Screenshot | Cycle: OFF → AUTO → ALWAYS. Sends page screenshots to the AI. |
| Extract | When ON, the AI returns structured data instead of a text response. |
| Target | Select which browsers receive the command. |
Let's walk through your very first SwarmAI task, step by step.
Go to google.com and search for "best AI tools 2026"| Command | What It Does |
|---|---|
Sign up on instagram.com with email test@example.com | Opens Instagram, fills sign-up form |
Go to twitter.com, log in, and like 3 posts in my feed | Logs in, scrolls feed, likes posts |
Open amazon.com and search for "wireless earbuds" | Navigates to Amazon, searches product |
Go to reddit.com and extract the top 10 post titles | Scrapes data using Extract mode |
Click the red Stop button during execution. The AI agent will halt immediately.
SwarmAI uses the browser-use library for agent logic. For each step, the agent follows this cycle:
| Action | Description |
|---|---|
click(index) | Click an element by DOM index |
input_text(index, text) | Type text into an input field |
scroll(direction) | Scroll up or down the page |
go_to_url(url) | Navigate to a specific URL |
go_back | Go back to previous page |
wait(seconds) | Pause execution for page loads |
extract_data | Extract structured data from the page |
done | Mark task finished |
| Feature | Description |
|---|---|
| Judge Mode | LLM evaluates whether the task was actually completed |
| Auto Re-plan | Automatically re-plans when progress stalls (after 2 steps) |
| Loop Detection | Detects repetitive actions and breaks the loop |
| Failure Recovery | Returns partial results after max failures (3) |
SwarmAI is built for scale. Run dozens of browsers simultaneously, each with its own identity.
Use the Target toggle and select all browsers. Each browser gets its own independent AI agent.
Click the selection badge on each browser to target. Commands run only on selected browsers.
| Browsers | Recommended PC | Notes |
|---|---|---|
| 1-5 | Any modern PC, 8 GB RAM | Runs smoothly |
| 5-15 | 16 GB RAM, decent CPU | Reduce screencast quality if needed |
| 15-30 | 32 GB RAM, 8+ core CPU | Hide unused browser panels |
| 30+ | 64 GB RAM recommended | Use headless mode for browsers you don't need to watch |
SwarmAI spoofs 15 fingerprint vectors for each browser, making them appear as unique real mobile devices.
| Vector | Description |
|---|---|
| User-Agent | Mobile browser user-agent string (iPhone, Galaxy, Pixel, etc.) |
| Screen Size | Device-specific screen resolution and viewport |
| WebGL | WebGL renderer and vendor info spoofing |
| Canvas | Canvas fingerprint randomization |
| GPU | GPU vendor and renderer info |
| Timezone | Timezone matching proxy or target location |
| Language | Browser language headers |
| Platform | navigator.platform spoofing |
| Fonts | Available fonts list matching the device |
| Touch | Touch event support (mobile emulation) |
SwarmAI includes 25+ mobile device profiles: iPhone 15 Pro, Galaxy S24, Pixel 8, and many more. Each browser is assigned a random profile on creation.
Mirror Mode lets you manually control browsers using your mouse and keyboard, with all selected browsers receiving the same input simultaneously.
| PC Input | Browser Action |
|---|---|
| Left click | Touch tap at position |
| Click and drag | Swipe gesture |
| Mouse wheel | Scroll on page |
| Keyboard typing | Text input |
Loop Mode repeats the same command multiple times with optional delays between cycles. Essential for repetitive tasks like engagement, monitoring, or data collection.
Loop: 10 cycles, 5 minute interval
Command: "Go to instagram.com, scroll feed, like 3 posts"
Result: Every 5 minutes, each browser opens Instagram,
likes 3 posts. Repeats 10 times over ~50 minutes.
Stealth Mode makes the AI's actions appear more human-like by introducing natural variations.
| Feature | Description |
|---|---|
| Click Jitter | Random offset on click coordinates |
| Speed Variation | Random variation in action timing |
| Reading Pauses | Random pauses between actions simulating reading |
| Action Delays | Variable delays between consecutive actions |
Save frequently-used commands for one-click execution:
Personas customize the AI agent's behavior. Only one persona can be active at a time.
Examples: Speed Runner (fast, skip verifications), Careful (verify before/after each action), Social Media Expert (navigate social apps expertly).
Rules are constraints always injected into the AI's system prompt. Multiple rules can be active simultaneously.
Examples: "Never click on ads", "Always close popups first", "Use search instead of scrolling", "Skip sponsored content".
Site Cards provide context about specific websites to help the AI navigate more accurately. When you send a command and the browser is on a matching site, the card's instructions are injected into the system prompt.
Extract Mode forces the AI agent to return structured data instead of completing a task. This is browser-use's native extraction mechanism.
Command: "Go to reddit.com/r/technology and extract the top 10 post titles, authors, and upvote counts"
Result (in Extracts tab):
{
"items": [
{"title": "...", "author": "u/...", "upvotes": 1234},
...
],
"summary": "Extracted 10 posts from r/technology"
}
Assign unique proxy IP addresses to each browser for maximum anonymity and anti-detection.
host:port:user:pass| Type | Description |
|---|---|
| HTTP/HTTPS | Standard web proxies, most common |
| SOCKS5 | Full tunnel proxy, better anonymity |
Control SwarmAI remotely from your phone using a Telegram bot. Run tasks, take screenshots, check status, and manage your browser swarm — all from Telegram chat.
/newbot to start the creation process7123456789:AAHxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx123456789/help| Command | Description |
|---|---|
/help | Show all commands and interactive keyboard buttons |
/browsers | List all browsers with their status (idle/busy) |
/select N | Set browser #N as the default target for commands |
/status | Show status of all browsers and running agents |
/run [text] | Execute an AI task on the selected browser |
/stop | Stop all running agents immediately |
/ss [N|all] | Take a screenshot of browser #N or all browsers |
/loop N [M] cmd | Repeat command N times with M minute interval |
/presets [N] | List saved presets, or run preset #N |
/settings | Show current SwarmAI settings |
/set key val | Change a setting remotely (e.g., /set provider openai) |
/list [what] | List available providers, models, languages, or personas |
/doctor | Run diagnostics and show system info |
You: /run Go to instagram.com and like 3 posts
Bot: ✅ Task sent to Browser #1
Bot: [Step 1] Navigating to instagram.com...
Bot: [Step 5] Task complete. Liked 3 posts.
You: /ss
Bot: [Screenshot of Browser #1]
You: /ss all
Bot: [Screenshot of Browser #1]
Bot: [Screenshot of Browser #2]
Bot: [Screenshot of Browser #3]
You: /loop 5 10 Like 3 posts on instagram.com
Bot: 🔄 Loop started: 5 cycles, 10 min interval
Bot: [Cycle 1/5] Running...
Bot: [Cycle 1/5] Done
Bot: ⏲ Next cycle in 10 minutes...
You can also just type plain text (without /run) and the bot will treat it as a task:
You: Search for "best laptops 2026" on Google
Bot: ✅ Task sent to Browser #1
| Setting | Default | Description |
|---|---|---|
| Provider | Anthropic | LLM provider selection |
| API Key | — | Your provider's API key |
| Model | — | Specific model to use |
| Prompt Caching | ON | Cache system prompts (Anthropic only) |
| Setting | Default | Range | Description |
|---|---|---|---|
| Max Steps | 30 | 10-100 | Maximum actions per task |
| Action Delay | 0s | 0-5s | Pause between actions |
| Setting | Default | Description |
|---|---|---|
| Language | English | UI language |
| Browser Panel Size | 100% | Scale browser panels (40-200%) |
| Font Scale | 100% | UI text size (50-200%) |
| Homepage | google.com | Default page for new browsers |
| Symptom | Solution |
|---|---|
| Browser won't start | Run playwright install chromium in terminal. |
| Black/blank screencast | Restart the browser. Check if CDP port is available. |
| Browser crashes frequently | Check available RAM. Reduce number of active browsers. |
| Slow performance | Hide browser panels you don't need. Close unused tabs. |
| Symptom | Solution |
|---|---|
| Agent does nothing | Check API key. Verify internet. Check Activity for errors. |
| Agent clicks wrong elements | Enable Screenshot mode (AUTO or ALWAYS). |
| Agent stuck in loop | Click Stop. Try rephrasing your command. |
| "Max steps reached" | Increase Max Steps in Settings (up to 100). |
A typical task (10-15 steps) costs ~$0.01-0.03 with Claude Sonnet, ~$0.005-0.01 with GPT-4o-mini. Screenshots add ~$0.01-0.02 each. Token costs are shown in the status bar.
Internet is required for LLM APIs and web browsing. You can use a local model via Ollama with the Custom provider to reduce external API dependency.
No hard limit. Practical limits depend on RAM and CPU. Typical users run 5-30 browsers per PC.
API keys are stored locally in settings.json in your AppData folder. Never sent to SwarmAI servers.
Yes. All cookies, logins, local storage, and history are saved per browser profile and persist across app restarts.