SwarmAI SwarmAI

Select Language

🇺🇸 English
🇰🇷 한국어
🇯🇵 日本語
🇨🇳 中文
🇪🇸 Español
🇫🇷 Français
🇩🇪 Deutsch
🇧🇷 Português
🇻🇳 Tiếng Việt
🇹🇭 ไทย
SwarmAI Logo

SwarmAI

AI-Powered Browser Swarm Automation
Complete User Guide & Tutorial

Control dozens of mobile-sized browsers with AI agents. Each browser has a unique fingerprint, persistent sessions, and full DOM-based automation — all from one desktop app.

Version 1.0 • April 2026

Table of Contents

1. Introduction & System Requirements

What is SwarmAI?

SwarmAI is a desktop application that manages multiple mobile-sized browser instances controlled by AI agents. Instead of writing scripts or macros, you describe what you want in natural language — the AI reads the page DOM, identifies elements, and performs clicks, typing, and scrolling automatically.

Each browser runs as a persistent Chromium context with a unique device fingerprint, making it appear as a real mobile device. Run dozens of browsers simultaneously from one desktop.

How It Works

You type a command → AI reads the page DOM (elements + structure)

AI decides an action (click, type, scroll, etc.) → Executes via CDP

AI observes the result → Decides next action → Repeats until done

System Requirements

ComponentRequirement
Operating SystemWindows 10/11 (64-bit)
RAM8 GB minimum, 16 GB recommended for 10+ browsers
CPU4+ cores recommended
Disk500 MB for app + ~100 MB per browser profile
InternetRequired for LLM API calls and browsing
LLM API KeyAnthropic, OpenAI, Gemini, DeepSeek, Grok, or custom

2. Installation & First-Time Setup

Step 1: Download & Install

  1. Download the latest installer from the SwarmAI dashboard after signing in
  2. Run the installer and follow on-screen instructions
  3. Launch SwarmAI from the Start Menu or desktop shortcut
  4. Sign in with your Google account when the login screen appears

Step 2: Chromium Auto-Install

SwarmAI uses Playwright's Chromium. On first launch, it automatically installs the correct Chromium version. If you need to reinstall manually:

playwright install chromium

Step 3: Configure Your API Key

  1. Click the Settings button (gear icon) in the top toolbar
  2. Select your LLM provider (e.g., Anthropic, OpenAI)
  3. Paste your API key
  4. Select a model from the dropdown
  5. Click Save
You can change the AI provider and model at any time in Settings. No restart required.

3. Creating Your First Browser

Creating a Browser Instance

  1. Click the + Browser button in the top toolbar
  2. A new mobile-sized browser appears in the browser grid
  3. Each browser is assigned a unique device fingerprint automatically
  4. The browser opens to the configured homepage (default: Google)

Persistent Sessions

Browser sessions are fully persistent. Cookies, logins, local storage, and history are preserved across restarts. Each browser maintains its own isolated profile.

To log in once and have the session persist forever, simply log in via the browser and close SwarmAI normally. The session is automatically saved.

Browser Panel Controls

ElementFunction
Title BarShows browser name and selection badge
URL BarNavigate to any URL, shows current page address
ScreencastLive view of the browser via CDP screencast
Nav BarBack, Forward, Refresh, Home buttons

4. Configuring LLM Providers & API Keys

SwarmAI requires an LLM API key to power its AI agent. The AI reads the browser DOM and decides what actions to take.

Supported Providers

ProviderRecommended ModelsNotes
AnthropicClaude Sonnet 4, Claude HaikuBest overall accuracy. Supports prompt caching.
OpenAIGPT-4o, GPT-4o-miniFast response times, good accuracy.
Google GeminiGemini 2.0 FlashCost-effective. Uses OpenAI-compatible endpoint.
DeepSeekDeepSeek ChatBudget option for simple tasks.
GrokGrok-2xAI
CustomAny OpenAI-compatible APIUse with any OpenAI-format provider.

Getting an API Key

Anthropic (Recommended)

  1. Go to console.anthropic.com
  2. Create an account or sign in
  3. Navigate to API Keys in the dashboard
  4. Click Create Key and copy it (starts with sk-ant-)
  5. Add credits to your account (API is pay-per-use)

OpenAI

  1. Go to platform.openai.com
  2. Create an account or sign in
  3. Navigate to API Keys
  4. Click Create new secret key and copy it (starts with sk-)

Entering Your API Key

  1. Click the Settings button (gear icon) in the top toolbar
  2. In AI Model section, select your provider
  3. Paste your API key
  4. Select a model from the dropdown
  5. Click Save
Enable Prompt Caching (Anthropic only) to significantly reduce API costs — saves up to 90% on repeated requests.

Using a Custom Provider

If you have an OpenAI-compatible API endpoint (e.g., local model, proxy, third-party):

  1. Select Custom from the provider dropdown
  2. Enter the base URL (e.g., http://localhost:11434/v1 for Ollama)
  3. Enter your API key (if required)
  4. Type your model name manually

5. Interface Overview

SwarmAI has a split-panel layout: Chat Panel on the left and Browser Grid on the right.

Top Toolbar

ButtonFunction
+ BrowserCreate a new browser instance
Browser ListSelect, rename, or manage existing browsers
MirrorToggle mirror mode (green = ON). Forwards input to all selected browsers.
HiddenToggle hidden browsers visibility
ProxyOpen proxy management panel
SettingsOpen settings (API keys, agent config, display)
ScaleAdjust browser panel and font sizes

Chat Panel (Left Side)

The chat panel has four tabs:

Activity

Live execution log. Shows commands, AI reasoning, tool calls, and results in a styled HTML view.

Presets

Manage saved commands, AI personas, AI rules, and site cards.

Extracts

View structured data extracted by the AI agent in Extract Mode.

Log

Raw debug/diagnostic logs for troubleshooting.

Command Input Bar (Bottom)

ToggleFunction
StealthHuman-like delays and jitter for anti-detection.
LoopRepeat the command N times with an interval.
ScreenshotCycle: OFF → AUTO → ALWAYS. Sends page screenshots to the AI.
ExtractWhen ON, the AI returns structured data instead of a text response.
TargetSelect which browsers receive the command.

6. Your First Task — Quick Start

Let's walk through your very first SwarmAI task, step by step.

Prerequisites Checklist

Example: Search on Google

  1. Make sure a browser is selected (check the Target toggle)
  2. Click the text input field at the bottom of the chat panel
  3. Type: Go to google.com and search for "best AI tools 2026"
  4. Press Enter (or click the Send button)
  5. Watch the Activity tab — the AI will navigate to Google, type the query, and press search
You can watch the AI's actions in real-time on the browser screencast and in the Activity tab.

More Example Commands

CommandWhat It Does
Sign up on instagram.com with email test@example.comOpens Instagram, fills sign-up form
Go to twitter.com, log in, and like 3 posts in my feedLogs in, scrolls feed, likes posts
Open amazon.com and search for "wireless earbuds"Navigates to Amazon, searches product
Go to reddit.com and extract the top 10 post titlesScrapes data using Extract mode

Stopping a Task

Click the red Stop button during execution. The AI agent will halt immediately.

7. Core Features Deep Dive

How the AI Agent Works

SwarmAI uses the browser-use library for agent logic. For each step, the agent follows this cycle:

  1. Observe — Extract the page DOM, identify interactive elements
  2. Think — Analyze the page state, plan the best action
  3. Act — Execute one action: click, type, scroll, navigate
  4. Repeat — Loop until task complete or max steps reached

Available Actions

ActionDescription
click(index)Click an element by DOM index
input_text(index, text)Type text into an input field
scroll(direction)Scroll up or down the page
go_to_url(url)Navigate to a specific URL
go_backGo back to previous page
wait(seconds)Pause execution for page loads
extract_dataExtract structured data from the page
doneMark task finished

Smart Agent Features

FeatureDescription
Judge ModeLLM evaluates whether the task was actually completed
Auto Re-planAutomatically re-plans when progress stalls (after 2 steps)
Loop DetectionDetects repetitive actions and breaks the loop
Failure RecoveryReturns partial results after max failures (3)

8. Multi-Browser Swarm Setup

SwarmAI is built for scale. Run dozens of browsers simultaneously, each with its own identity.

Creating Multiple Browsers

  1. Click + Browser repeatedly to create as many browsers as needed
  2. Each browser gets a unique fingerprint and CDP port automatically
  3. Browsers appear in the grid, each showing a live screencast

Sending Commands to Multiple Browsers

Option 1: All Browsers

Use the Target toggle and select all browsers. Each browser gets its own independent AI agent.

Option 2: Selected Browsers

Click the selection badge on each browser to target. Commands run only on selected browsers.

Performance Considerations

BrowsersRecommended PCNotes
1-5Any modern PC, 8 GB RAMRuns smoothly
5-1516 GB RAM, decent CPUReduce screencast quality if needed
15-3032 GB RAM, 8+ core CPUHide unused browser panels
30+64 GB RAM recommendedUse headless mode for browsers you don't need to watch
Each Chromium instance uses ~150-300 MB RAM. Close unused tabs and enable panel hiding to save resources.

9. Device Fingerprinting

SwarmAI spoofs 15 fingerprint vectors for each browser, making them appear as unique real mobile devices.

Fingerprint Vectors

VectorDescription
User-AgentMobile browser user-agent string (iPhone, Galaxy, Pixel, etc.)
Screen SizeDevice-specific screen resolution and viewport
WebGLWebGL renderer and vendor info spoofing
CanvasCanvas fingerprint randomization
GPUGPU vendor and renderer info
TimezoneTimezone matching proxy or target location
LanguageBrowser language headers
Platformnavigator.platform spoofing
FontsAvailable fonts list matching the device
TouchTouch event support (mobile emulation)

Device Profiles

SwarmAI includes 25+ mobile device profiles: iPhone 15 Pro, Galaxy S24, Pixel 8, and many more. Each browser is assigned a random profile on creation.

Fingerprints are set once per browser and persist. To change a fingerprint, delete and recreate the browser.

10. Mirror Mode

Mirror Mode lets you manually control browsers using your mouse and keyboard, with all selected browsers receiving the same input simultaneously.

Enabling Mirror Mode

  1. Click Mirror in the top toolbar (turns green)
  2. Select the browsers you want to mirror
  3. Click on any browser panel — your input is forwarded to all selected browsers

Controls in Mirror Mode

PC InputBrowser Action
Left clickTouch tap at position
Click and dragSwipe gesture
Mouse wheelScroll on page
Keyboard typingText input
Mirror Mode and AI agent commands can conflict. Turn off Mirror Mode before sending AI commands.

11. Loop & Repeat Mode

Loop Mode repeats the same command multiple times with optional delays between cycles. Essential for repetitive tasks like engagement, monitoring, or data collection.

Setting Up a Loop

  1. Click Loop button in the command bar
  2. Set Count: how many times to repeat (1-999)
  3. Set Interval: minutes between cycles (0-999)
  4. Type your command and send — it repeats automatically

Example: Engagement Loop

Loop: 10 cycles, 5 minute interval
Command: "Go to instagram.com, scroll feed, like 3 posts"

Result: Every 5 minutes, each browser opens Instagram,
likes 3 posts. Repeats 10 times over ~50 minutes.
Use longer intervals (5-15 min) with Stealth Mode for the most natural behavior patterns.

12. Stealth Mode

Stealth Mode makes the AI's actions appear more human-like by introducing natural variations.

What Stealth Mode Does

FeatureDescription
Click JitterRandom offset on click coordinates
Speed VariationRandom variation in action timing
Reading PausesRandom pauses between actions simulating reading
Action DelaysVariable delays between consecutive actions

When to Use Stealth Mode

For testing/debugging, turn Stealth Mode OFF for fast, precise actions.

13. Presets, Personas & Rules

Saved Commands

Save frequently-used commands for one-click execution:

  1. Go to Presets tab → Saved Commands
  2. Enter a name and command text
  3. Click Add
  4. Click Play next to any saved command to execute

AI Personas

Personas customize the AI agent's behavior. Only one persona can be active at a time.

Examples: Speed Runner (fast, skip verifications), Careful (verify before/after each action), Social Media Expert (navigate social apps expertly).

AI Rules

Rules are constraints always injected into the AI's system prompt. Multiple rules can be active simultaneously.

Examples: "Never click on ads", "Always close popups first", "Use search instead of scrolling", "Skip sponsored content".

Site Cards

Site Cards provide context about specific websites to help the AI navigate more accurately. When you send a command and the browser is on a matching site, the card's instructions are injected into the system prompt.

14. Extract Mode & Data Collection

Extract Mode forces the AI agent to return structured data instead of completing a task. This is browser-use's native extraction mechanism.

How to Use Extract Mode

  1. Toggle Extract ON in the command bar
  2. Type a command describing what data to extract
  3. The AI navigates to the data source and returns structured JSON
  4. Results appear in the Extracts tab

Example

Command: "Go to reddit.com/r/technology and extract the top 10 post titles, authors, and upvote counts"

Result (in Extracts tab):
{
  "items": [
    {"title": "...", "author": "u/...", "upvotes": 1234},
    ...
  ],
  "summary": "Extracted 10 posts from r/technology"
}
Extract Mode works best with specific descriptions of what data you want. Be precise about fields and quantities.

15. Proxy Management

Assign unique proxy IP addresses to each browser for maximum anonymity and anti-detection.

Setting Up Proxies

  1. Click Proxy in the top toolbar
  2. Paste your proxy list (one per line): host:port:user:pass
  3. Select assignment mode: Manual or Auto-distribute
  4. Proxies are assigned to browsers and persist across sessions

Supported Proxy Types

TypeDescription
HTTP/HTTPSStandard web proxies, most common
SOCKS5Full tunnel proxy, better anonymity
For best results, use residential rotating proxies. Datacenter proxies are more likely to be detected.

16. Telegram Bot Remote Control

Control SwarmAI remotely from your phone using a Telegram bot. Run tasks, take screenshots, check status, and manage your browser swarm — all from Telegram chat.

Setting Up the Telegram Bot

Step 1: Create a Bot with @BotFather

  1. Open Telegram on your phone or desktop
  2. Search for @BotFather (official Telegram bot creator)
  3. Send /newbot to start the creation process
  4. Enter a display name for your bot (e.g., "My SwarmAI Bot")
  5. Enter a username ending in "bot" (e.g., "my_swarmai_bot")
  6. BotFather will give you a bot token — copy it. It looks like: 7123456789:AAHxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Keep your bot token secret. Anyone with this token can control your bot. Never share it publicly.

Step 2: Find Your Chat ID

  1. Search for @userinfobot on Telegram
  2. Send any message to it (e.g., "hello")
  3. It replies with your Chat ID — a number like 123456789
  4. Copy this number
The Chat ID ensures only you can control the bot. Others who find your bot won't be able to use it.

Step 3: Configure in SwarmAI

  1. Open SwarmAI and click Settings (gear icon)
  2. Scroll down to the Telegram Bot section
  3. Paste your Bot Token in the token field
  4. Enter your Chat ID in the Chat ID field
  5. Enable Auto-start if you want the bot to start automatically with SwarmAI
  6. Click Save
  7. The bot status indicator will turn green when connected

Step 4: Test the Connection

  1. Open Telegram and go to your bot (search by the username you created)
  2. Send /help
  3. You should receive a list of available commands with a keyboard
If the bot doesn't respond: (1) Check that SwarmAI is running, (2) Verify the bot token and Chat ID are correct, (3) Check your internet connection.

All Telegram Commands

CommandDescription
/helpShow all commands and interactive keyboard buttons
/browsersList all browsers with their status (idle/busy)
/select NSet browser #N as the default target for commands
/statusShow status of all browsers and running agents
/run [text]Execute an AI task on the selected browser
/stopStop all running agents immediately
/ss [N|all]Take a screenshot of browser #N or all browsers
/loop N [M] cmdRepeat command N times with M minute interval
/presets [N]List saved presets, or run preset #N
/settingsShow current SwarmAI settings
/set key valChange a setting remotely (e.g., /set provider openai)
/list [what]List available providers, models, languages, or personas
/doctorRun diagnostics and show system info

Usage Examples

Running a Task

You: /run Go to instagram.com and like 3 posts
Bot: ✅ Task sent to Browser #1
Bot: [Step 1] Navigating to instagram.com...
Bot: [Step 5] Task complete. Liked 3 posts.

Taking Screenshots

You: /ss
Bot: [Screenshot of Browser #1]

You: /ss all
Bot: [Screenshot of Browser #1]
Bot: [Screenshot of Browser #2]
Bot: [Screenshot of Browser #3]

Looping a Task

You: /loop 5 10 Like 3 posts on instagram.com
Bot: 🔄 Loop started: 5 cycles, 10 min interval
Bot: [Cycle 1/5] Running...
Bot: [Cycle 1/5] Done
Bot: ⏲ Next cycle in 10 minutes...

Quick Text Commands

You can also just type plain text (without /run) and the bot will treat it as a task:

You: Search for "best laptops 2026" on Google
Bot: ✅ Task sent to Browser #1
The bot sends real-time progress updates as the AI agent works. You'll see each step, tool call, and result in the chat.

17. Settings Reference

AI Model

SettingDefaultDescription
ProviderAnthropicLLM provider selection
API KeyYour provider's API key
ModelSpecific model to use
Prompt CachingONCache system prompts (Anthropic only)

Agent

SettingDefaultRangeDescription
Max Steps3010-100Maximum actions per task
Action Delay0s0-5sPause between actions

Display

SettingDefaultDescription
LanguageEnglishUI language
Browser Panel Size100%Scale browser panels (40-200%)
Font Scale100%UI text size (50-200%)
Homepagegoogle.comDefault page for new browsers

18. Troubleshooting & FAQ

Browser Issues

SymptomSolution
Browser won't startRun playwright install chromium in terminal.
Black/blank screencastRestart the browser. Check if CDP port is available.
Browser crashes frequentlyCheck available RAM. Reduce number of active browsers.
Slow performanceHide browser panels you don't need. Close unused tabs.

AI Agent Issues

SymptomSolution
Agent does nothingCheck API key. Verify internet. Check Activity for errors.
Agent clicks wrong elementsEnable Screenshot mode (AUTO or ALWAYS).
Agent stuck in loopClick Stop. Try rephrasing your command.
"Max steps reached"Increase Max Steps in Settings (up to 100).

Frequently Asked Questions

Q: How much does the LLM API cost?

A typical task (10-15 steps) costs ~$0.01-0.03 with Claude Sonnet, ~$0.005-0.01 with GPT-4o-mini. Screenshots add ~$0.01-0.02 each. Token costs are shown in the status bar.

Q: Can I use SwarmAI offline?

Internet is required for LLM APIs and web browsing. You can use a local model via Ollama with the Custom provider to reduce external API dependency.

Q: How many browsers can I run?

No hard limit. Practical limits depend on RAM and CPU. Typical users run 5-30 browsers per PC.

Q: Is my API key stored securely?

API keys are stored locally in settings.json in your AppData folder. Never sent to SwarmAI servers.

Q: Do browser sessions persist?

Yes. All cookies, logins, local storage, and history are saved per browser profile and persist across app restarts.