SwarmAI

AI-Powered Browser Swarm Automation
Complete User Guide & Tutorial

Control dozens of mobile-sized browsers with AI agents. Each browser has a unique fingerprint, persistent sessions, and full DOM-based automation — all from one desktop app.

Version 1.0 • April 2026

1 Introduction & System Requirements
2 Installation & First-Time Setup
3 Creating Your First Browser
4 Configuring LLM Providers & API Keys
5 Interface Overview
6 Your First Task — Quick Start
7 Core Features Deep Dive
8 Multi-Browser Swarm Setup
9 Device Fingerprinting
10 Mirror Mode
11 Loop & Repeat Mode
12 Stealth Mode
13 Presets, Personas & Rules
14 Extract Mode & Data Collection
15 Proxy Management
16 Telegram Bot Remote Control
17 Settings Reference
18 Troubleshooting & FAQ

1. Introduction & System Requirements

What is SwarmAI?

SwarmAI is a desktop application that manages multiple mobile-sized browser instances controlled by AI agents. Instead of writing scripts or macros, you describe what you want in natural language — the AI reads the page DOM, identifies elements, and performs clicks, typing, and scrolling automatically.

Each browser runs as a persistent Chromium context with a unique device fingerprint, making it appear as a real mobile device. Run dozens of browsers simultaneously from one desktop.

How It Works

You type a command → AI reads the page DOM (elements + structure)
↓
AI decides an action (click, type, scroll, etc.) → Executes via CDP
↓
AI observes the result → Decides next action → Repeats until done

System Requirements

Component	Requirement
Operating System	Windows 10/11 (64-bit)
RAM	8 GB minimum, 16 GB recommended for 10+ browsers
CPU	4+ cores recommended
Disk	500 MB for app + ~100 MB per browser profile
Internet	Required for LLM API calls and browsing
LLM API Key	Anthropic, OpenAI, Gemini, DeepSeek, Grok, or custom

2. Installation & First-Time Setup

Step 1: Download & Install

Download the latest installer from the SwarmAI dashboard after signing in
Run the installer and follow on-screen instructions
Launch SwarmAI from the Start Menu or desktop shortcut
Sign in with your Google account when the login screen appears

Step 2: Chromium Auto-Install

SwarmAI uses Playwright's Chromium. On first launch, it automatically installs the correct Chromium version. If you need to reinstall manually:

playwright install chromium

Step 3: Configure Your API Key

Click the Settings button (gear icon) in the top toolbar
Select your LLM provider (e.g., Anthropic, OpenAI)
Paste your API key
Select a model from the dropdown
Click Save

You can change the AI provider and model at any time in Settings. No restart required.

3. Creating Your First Browser

Creating a Browser Instance

Click the + Browser button in the top toolbar
A new mobile-sized browser appears in the browser grid
Each browser is assigned a unique device fingerprint automatically
The browser opens to the configured homepage (default: Google)

Persistent Sessions

Browser sessions are fully persistent. Cookies, logins, local storage, and history are preserved across restarts. Each browser maintains its own isolated profile.

To log in once and have the session persist forever, simply log in via the browser and close SwarmAI normally. The session is automatically saved.

Browser Panel Controls

Element	Function
Title Bar	Shows browser name and selection badge
URL Bar	Navigate to any URL, shows current page address
Screencast	Live view of the browser via CDP screencast
Nav Bar	Back, Forward, Refresh, Home buttons

4. Configuring LLM Providers & API Keys

SwarmAI requires an LLM API key to power its AI agent. The AI reads the browser DOM and decides what actions to take.

Supported Providers

Provider	Recommended Models	Notes
Anthropic	Claude Sonnet 4, Claude Haiku	Best overall accuracy. Supports prompt caching.
OpenAI	GPT-4o, GPT-4o-mini	Fast response times, good accuracy.
Google Gemini	Gemini 2.0 Flash	Cost-effective. Uses OpenAI-compatible endpoint.
DeepSeek	DeepSeek Chat	Budget option for simple tasks.
Grok	Grok-2	xAI
Custom	Any OpenAI-compatible API	Use with any OpenAI-format provider.

Getting an API Key

Anthropic (Recommended)

Go to console.anthropic.com
Create an account or sign in
Navigate to API Keys in the dashboard
Click Create Key and copy it (starts with sk-ant-)
Add credits to your account (API is pay-per-use)

OpenAI

Go to platform.openai.com
Create an account or sign in
Navigate to API Keys
Click Create new secret key and copy it (starts with sk-)

Entering Your API Key

Click the Settings button (gear icon) in the top toolbar
In AI Model section, select your provider
Paste your API key
Select a model from the dropdown
Click Save

Enable Prompt Caching (Anthropic only) to significantly reduce API costs — saves up to 90% on repeated requests.

Using a Custom Provider

If you have an OpenAI-compatible API endpoint (e.g., local model, proxy, third-party):

Select Custom from the provider dropdown
Enter the base URL (e.g., http://localhost:11434/v1 for Ollama)
Enter your API key (if required)
Type your model name manually

5. Interface Overview

SwarmAI has a split-panel layout: Chat Panel on the left and Browser Grid on the right.

Top Toolbar

Button	Function
+ Browser	Create a new browser instance
Browser List	Select, rename, or manage existing browsers
Mirror	Toggle mirror mode (green = ON). Forwards input to all selected browsers.
Hidden	Toggle hidden browsers visibility
Proxy	Open proxy management panel
Settings	Open settings (API keys, agent config, display)
Scale	Adjust browser panel and font sizes

Chat Panel (Left Side)

The chat panel has four tabs:

Activity

Live execution log. Shows commands, AI reasoning, tool calls, and results in a styled HTML view.

Presets

Manage saved commands, AI personas, AI rules, and site cards.

Extracts

View structured data extracted by the AI agent in Extract Mode.

Log

Raw debug/diagnostic logs for troubleshooting.

Command Input Bar (Bottom)

Toggle	Function
Stealth	Human-like delays and jitter for anti-detection.
Loop	Repeat the command N times with an interval.
Screenshot	Cycle: OFF → AUTO → ALWAYS. Sends page screenshots to the AI.
Extract	When ON, the AI returns structured data instead of a text response.
Target	Select which browsers receive the command.

6. Your First Task — Quick Start

Let's walk through your very first SwarmAI task, step by step.

Prerequisites Checklist

☑ SwarmAI installed and running
☑ Signed in with Google account
☑ At least one browser created (click + Browser)
☑ LLM API key configured in Settings

Example: Search on Google

Make sure a browser is selected (check the Target toggle)
Click the text input field at the bottom of the chat panel
Type: Go to google.com and search for "best AI tools 2026"
Press Enter (or click the Send button)
Watch the Activity tab — the AI will navigate to Google, type the query, and press search

You can watch the AI's actions in real-time on the browser screencast and in the Activity tab.

More Example Commands

Command	What It Does
`Sign up on instagram.com with email test@example.com`	Opens Instagram, fills sign-up form
`Go to twitter.com, log in, and like 3 posts in my feed`	Logs in, scrolls feed, likes posts
`Open amazon.com and search for "wireless earbuds"`	Navigates to Amazon, searches product
`Go to reddit.com and extract the top 10 post titles`	Scrapes data using Extract mode

Stopping a Task

Click the red Stop button during execution. The AI agent will halt immediately.

7. Core Features Deep Dive

How the AI Agent Works

SwarmAI uses the browser-use library for agent logic. For each step, the agent follows this cycle:

Observe — Extract the page DOM, identify interactive elements
Think — Analyze the page state, plan the best action
Act — Execute one action: click, type, scroll, navigate
Repeat — Loop until task complete or max steps reached

Available Actions

Action	Description
`click(index)`	Click an element by DOM index
`input_text(index, text)`	Type text into an input field
`scroll(direction)`	Scroll up or down the page
`go_to_url(url)`	Navigate to a specific URL
`go_back`	Go back to previous page
`wait(seconds)`	Pause execution for page loads
`extract_data`	Extract structured data from the page
`done`	Mark task finished

Smart Agent Features

Feature	Description
Judge Mode	LLM evaluates whether the task was actually completed
Auto Re-plan	Automatically re-plans when progress stalls (after 2 steps)
Loop Detection	Detects repetitive actions and breaks the loop
Failure Recovery	Returns partial results after max failures (3)

8. Multi-Browser Swarm Setup

SwarmAI is built for scale. Run dozens of browsers simultaneously, each with its own identity.

Creating Multiple Browsers

Click + Browser repeatedly to create as many browsers as needed
Each browser gets a unique fingerprint and CDP port automatically
Browsers appear in the grid, each showing a live screencast

Sending Commands to Multiple Browsers

Option 1: All Browsers

Use the Target toggle and select all browsers. Each browser gets its own independent AI agent.

Option 2: Selected Browsers

Click the selection badge on each browser to target. Commands run only on selected browsers.

Performance Considerations

Browsers	Recommended PC	Notes
1-5	Any modern PC, 8 GB RAM	Runs smoothly
5-15	16 GB RAM, decent CPU	Reduce screencast quality if needed
15-30	32 GB RAM, 8+ core CPU	Hide unused browser panels
30+	64 GB RAM recommended	Use headless mode for browsers you don't need to watch

Each Chromium instance uses ~150-300 MB RAM. Close unused tabs and enable panel hiding to save resources.

9. Device Fingerprinting

SwarmAI spoofs 15 fingerprint vectors for each browser, making them appear as unique real mobile devices.

Fingerprint Vectors

Vector	Description
User-Agent	Mobile browser user-agent string (iPhone, Galaxy, Pixel, etc.)
Screen Size	Device-specific screen resolution and viewport
WebGL	WebGL renderer and vendor info spoofing
Canvas	Canvas fingerprint randomization
GPU	GPU vendor and renderer info
Timezone	Timezone matching proxy or target location
Language	Browser language headers
Platform	navigator.platform spoofing
Fonts	Available fonts list matching the device
Touch	Touch event support (mobile emulation)

Device Profiles

SwarmAI includes 25+ mobile device profiles: iPhone 15 Pro, Galaxy S24, Pixel 8, and many more. Each browser is assigned a random profile on creation.

Fingerprints are set once per browser and persist. To change a fingerprint, delete and recreate the browser.

10. Mirror Mode

Mirror Mode lets you manually control browsers using your mouse and keyboard, with all selected browsers receiving the same input simultaneously.

Enabling Mirror Mode

Click Mirror in the top toolbar (turns green)
Select the browsers you want to mirror
Click on any browser panel — your input is forwarded to all selected browsers

Controls in Mirror Mode

PC Input	Browser Action
Left click	Touch tap at position
Click and drag	Swipe gesture
Mouse wheel	Scroll on page
Keyboard typing	Text input

Mirror Mode and AI agent commands can conflict. Turn off Mirror Mode before sending AI commands.

11. Loop & Repeat Mode

Loop Mode repeats the same command multiple times with optional delays between cycles. Essential for repetitive tasks like engagement, monitoring, or data collection.

Setting Up a Loop

Click Loop button in the command bar
Set Count: how many times to repeat (1-999)
Set Interval: minutes between cycles (0-999)
Type your command and send — it repeats automatically

Example: Engagement Loop

Loop: 10 cycles, 5 minute interval
Command: "Go to instagram.com, scroll feed, like 3 posts"

Result: Every 5 minutes, each browser opens Instagram,
likes 3 posts. Repeats 10 times over ~50 minutes.

Use longer intervals (5-15 min) with Stealth Mode for the most natural behavior patterns.

12. Stealth Mode

Stealth Mode makes the AI's actions appear more human-like by introducing natural variations.

What Stealth Mode Does

Feature	Description
Click Jitter	Random offset on click coordinates
Speed Variation	Random variation in action timing
Reading Pauses	Random pauses between actions simulating reading
Action Delays	Variable delays between consecutive actions

When to Use Stealth Mode

Social media automation — Prevents detection by platform algorithms
Account management — Makes bot behavior less detectable
Extended sessions — More natural interaction patterns

For testing/debugging, turn Stealth Mode OFF for fast, precise actions.

13. Presets, Personas & Rules

Saved Commands

Save frequently-used commands for one-click execution:

Go to Presets tab → Saved Commands
Enter a name and command text
Click Add
Click Play next to any saved command to execute

AI Personas

Personas customize the AI agent's behavior. Only one persona can be active at a time.

Examples: Speed Runner (fast, skip verifications), Careful (verify before/after each action), Social Media Expert (navigate social apps expertly).

AI Rules

Rules are constraints always injected into the AI's system prompt. Multiple rules can be active simultaneously.

Examples: "Never click on ads", "Always close popups first", "Use search instead of scrolling", "Skip sponsored content".

Site Cards

Site Cards provide context about specific websites to help the AI navigate more accurately. When you send a command and the browser is on a matching site, the card's instructions are injected into the system prompt.

14. Extract Mode & Data Collection

Extract Mode forces the AI agent to return structured data instead of completing a task. This is browser-use's native extraction mechanism.

How to Use Extract Mode

Toggle Extract ON in the command bar
Type a command describing what data to extract
The AI navigates to the data source and returns structured JSON
Results appear in the Extracts tab

Example

Command: "Go to reddit.com/r/technology and extract the top 10 post titles, authors, and upvote counts"

Result (in Extracts tab):
{
  "items": [
    {"title": "...", "author": "u/...", "upvotes": 1234},
    ...
  ],
  "summary": "Extracted 10 posts from r/technology"
}

Extract Mode works best with specific descriptions of what data you want. Be precise about fields and quantities.

15. Proxy Management

Assign unique proxy IP addresses to each browser for maximum anonymity and anti-detection.

Setting Up Proxies

Click Proxy in the top toolbar
Paste your proxy list (one per line): host:port:user:pass
Select assignment mode: Manual or Auto-distribute
Proxies are assigned to browsers and persist across sessions

Supported Proxy Types

Type	Description
HTTP/HTTPS	Standard web proxies, most common
SOCKS5	Full tunnel proxy, better anonymity

For best results, use residential rotating proxies. Datacenter proxies are more likely to be detected.

16. Telegram Bot Remote Control

Control SwarmAI remotely from your phone using a Telegram bot. Run tasks, take screenshots, check status, and manage your browser swarm — all from Telegram chat.

Setting Up the Telegram Bot

Step 1: Create a Bot with @BotFather

Open Telegram on your phone or desktop
Search for @BotFather (official Telegram bot creator)
Send /newbot to start the creation process
Enter a display name for your bot (e.g., "My SwarmAI Bot")
Enter a username ending in "bot" (e.g., "my_swarmai_bot")
BotFather will give you a bot token — copy it. It looks like: 7123456789:AAHxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Keep your bot token secret. Anyone with this token can control your bot. Never share it publicly.

Step 2: Find Your Chat ID

Search for @userinfobot on Telegram
Send any message to it (e.g., "hello")
It replies with your Chat ID — a number like 123456789
Copy this number

The Chat ID ensures only you can control the bot. Others who find your bot won't be able to use it.

Step 3: Configure in SwarmAI

Open SwarmAI and click Settings (gear icon)
Scroll down to the Telegram Bot section
Paste your Bot Token in the token field
Enter your Chat ID in the Chat ID field
Enable Auto-start if you want the bot to start automatically with SwarmAI
Click Save
The bot status indicator will turn green when connected

Step 4: Test the Connection

Open Telegram and go to your bot (search by the username you created)
Send /help
You should receive a list of available commands with a keyboard

If the bot doesn't respond: (1) Check that SwarmAI is running, (2) Verify the bot token and Chat ID are correct, (3) Check your internet connection.

All Telegram Commands

Command	Description
`/help`	Show all commands and interactive keyboard buttons
`/browsers`	List all browsers with their status (idle/busy)
`/select N`	Set browser #N as the default target for commands
`/status`	Show status of all browsers and running agents
`/run [text]`	Execute an AI task on the selected browser
`/stop`	Stop all running agents immediately
`/ss [N\|all]`	Take a screenshot of browser #N or all browsers
`/loop N [M] cmd`	Repeat command N times with M minute interval
`/presets [N]`	List saved presets, or run preset #N
`/settings`	Show current SwarmAI settings
`/set key val`	Change a setting remotely (e.g., `/set provider openai`)
`/list [what]`	List available providers, models, languages, or personas
`/doctor`	Run diagnostics and show system info

Usage Examples

Running a Task

You: /run Go to instagram.com and like 3 posts
Bot: ✅ Task sent to Browser #1
Bot: [Step 1] Navigating to instagram.com...
Bot: [Step 5] Task complete. Liked 3 posts.

Taking Screenshots

You: /ss
Bot: [Screenshot of Browser #1]

You: /ss all
Bot: [Screenshot of Browser #1]
Bot: [Screenshot of Browser #2]
Bot: [Screenshot of Browser #3]

Looping a Task

You: /loop 5 10 Like 3 posts on instagram.com
Bot: 🔄 Loop started: 5 cycles, 10 min interval
Bot: [Cycle 1/5] Running...
Bot: [Cycle 1/5] Done
Bot: ⏲ Next cycle in 10 minutes...

Quick Text Commands

You can also just type plain text (without /run) and the bot will treat it as a task:

You: Search for "best laptops 2026" on Google
Bot: ✅ Task sent to Browser #1

The bot sends real-time progress updates as the AI agent works. You'll see each step, tool call, and result in the chat.

17. Settings Reference

AI Model

Setting	Default	Description
Provider	Anthropic	LLM provider selection
API Key	—	Your provider's API key
Model	—	Specific model to use
Prompt Caching	ON	Cache system prompts (Anthropic only)

Agent

Setting	Default	Range	Description
Max Steps	30	10-100	Maximum actions per task
Action Delay	0s	0-5s	Pause between actions

Display

Setting	Default	Description
Language	English	UI language
Browser Panel Size	100%	Scale browser panels (40-200%)
Font Scale	100%	UI text size (50-200%)
Homepage	google.com	Default page for new browsers

18. Troubleshooting & FAQ

Browser Issues

Symptom	Solution
Browser won't start	Run `playwright install chromium` in terminal.
Black/blank screencast	Restart the browser. Check if CDP port is available.
Browser crashes frequently	Check available RAM. Reduce number of active browsers.
Slow performance	Hide browser panels you don't need. Close unused tabs.

AI Agent Issues

Symptom	Solution
Agent does nothing	Check API key. Verify internet. Check Activity for errors.
Agent clicks wrong elements	Enable Screenshot mode (AUTO or ALWAYS).
Agent stuck in loop	Click Stop. Try rephrasing your command.
"Max steps reached"	Increase Max Steps in Settings (up to 100).

Frequently Asked Questions

Q: How much does the LLM API cost?

A typical task (10-15 steps) costs ~$0.01-0.03 with Claude Sonnet, ~$0.005-0.01 with GPT-4o-mini. Screenshots add ~$0.01-0.02 each. Token costs are shown in the status bar.

Q: Can I use SwarmAI offline?

Internet is required for LLM APIs and web browsing. You can use a local model via Ollama with the Custom provider to reduce external API dependency.

Q: How many browsers can I run?

No hard limit. Practical limits depend on RAM and CPU. Typical users run 5-30 browsers per PC.

Q: Is my API key stored securely?

API keys are stored locally in settings.json in your AppData folder. Never sent to SwarmAI servers.

Q: Do browser sessions persist?

Yes. All cookies, logins, local storage, and history are saved per browser profile and persist across app restarts.

Select Language

SwarmAI

Table of Contents

1. Introduction & System Requirements

What is SwarmAI?

How It Works

System Requirements

2. Installation & First-Time Setup

Step 1: Download & Install

Step 2: Chromium Auto-Install

Step 3: Configure Your API Key

3. Creating Your First Browser

Creating a Browser Instance

Persistent Sessions

Browser Panel Controls

4. Configuring LLM Providers & API Keys

Supported Providers

Getting an API Key

Anthropic (Recommended)

OpenAI

Entering Your API Key

Using a Custom Provider

5. Interface Overview

Top Toolbar

Chat Panel (Left Side)

Activity

Presets

Extracts

Log

Command Input Bar (Bottom)

6. Your First Task — Quick Start

Prerequisites Checklist

Example: Search on Google

More Example Commands

Stopping a Task

7. Core Features Deep Dive

How the AI Agent Works

Available Actions

Smart Agent Features

8. Multi-Browser Swarm Setup

Creating Multiple Browsers

Sending Commands to Multiple Browsers

Option 1: All Browsers

Option 2: Selected Browsers

Performance Considerations

9. Device Fingerprinting

Fingerprint Vectors

Device Profiles

10. Mirror Mode

Enabling Mirror Mode

Controls in Mirror Mode

11. Loop & Repeat Mode

Setting Up a Loop

Example: Engagement Loop

12. Stealth Mode

What Stealth Mode Does

When to Use Stealth Mode

13. Presets, Personas & Rules

Saved Commands

AI Personas

AI Rules

Site Cards

14. Extract Mode & Data Collection

How to Use Extract Mode

Example

15. Proxy Management

Setting Up Proxies

Supported Proxy Types

16. Telegram Bot Remote Control

Setting Up the Telegram Bot

Step 1: Create a Bot with @BotFather

Step 2: Find Your Chat ID

Step 3: Configure in SwarmAI

Step 4: Test the Connection

All Telegram Commands

Usage Examples

Running a Task

Taking Screenshots

Looping a Task

Quick Text Commands