Skip to main content

Agentic Browser Automation Without Giving AI Your Login

A beginner-friendly tutorial for building scheduled or queue-driven agentic browser automation with Selenium, noVNC manual login, and a narrow API wrapper so agents can draft website forms without seeing credentials or controlling your PC.

9 min read
Share:
AI-Powered

AI-powered · Limited to 20 requests per hour

A human keeping a login key separate while an AI sends structured requests to a protected browser container
The useful split: humans keep the login, AI sends structured data, and the browser runs in a contained service.

Quick Answer

If you only need a one-off browsing session, use the browser already built into ChatGPT, Codex, Claude Desktop, or your favorite AI tool.

This tutorial is for a different problem: agentic automation that needs to run on a schedule or from a queue. Think daily form drafts, recurring portal updates, internal admin chores, or a workflow that should keep running on a server while your laptop is closed.

scheduler or queue
  -> agentic workflow
  -> your small wrapper API
  -> Selenium WebDriver
  -> browser container
  -> logged-in website

You log in yourself through a visible noVNC browser. The AI only sends structured requests to your wrapper, such as "create this draft form" or "take a screenshot." That keeps login out of the AI conversation and avoids letting a computer-use agent operate your personal machine.

The pattern is useful when a site requires login, has no public API, and you want an agentic workflow that can run on a small server instead of depending on your PC being awake 24/7.

The Problem

Many useful workflows are trapped behind websites.

Maybe it is a partner portal, an internal admin page, a school form, a membership site, or some boring dashboard that only exists in a browser. You want AI to prepare the data and fill the form, but the website does not offer an API.

The tempting shortcut is dangerous:

Give the AI my login
Ask it to open the website
Let it drive my browser

That mixes too many responsibilities. The AI sees secrets. The AI can click anywhere. Your laptop becomes part of the production workflow. If the workflow should run overnight or while you are away, your personal computer now has to stay on.

A better split is:

Human owns login.
Browser container owns the session.
Wrapper API owns the allowed actions.
Agent owns the recurring workflow.

That is the design this tutorial builds.

Why Not Just Use ChatGPT's Browser?

Built-in AI browsers are good for interactive work. You are present. You ask a question. The tool opens pages, reads, clicks, and reports back.

Agentic automation has a different shape. It may run every morning at 8:00. It may pick up jobs from a queue. It may retry after a website timeout. It may need to create a draft while you are asleep and notify you only when the browser session needs login.

Use caseBuilt-in AI browserWrapper API plus Selenium
One-off research or browsingGood fitUsually unnecessary
Daily or event-driven automationWeak fitGood fit
Runs while your PC is offNoYes, on a server
Stable tool contract for an agentLimitedExplicit HTTP API
Login handlingEasy to mix with the chatHuman logs in through noVNC

The point is not that built-in browsers are bad. They solve the interactive case. This pattern solves the automation case.

A Daily Automation Example

Imagine a private vendor portal with no API. Every weekday morning, you want an agentic workflow to prepare draft requests based on yesterday's operational notes.

8:00 AM cron trigger
  -> fetch yesterday's notes from your database or inbox
  -> AI writes the request subject and details
  -> agent calls POST /portal/draft-request with dry_run=true
  -> wrapper fills the logged-in browser form
  -> wrapper returns screenshot and status
  -> agent notifies you if review, login, or submission approval is needed

That is different from asking a chat tool to browse once. The agent has a recurring job, a retry path, a tool contract, and a way to stop when human authentication or approval is required.

For a first version, keep the final action human-reviewed. Let the agent create drafts. Let the wrapper return screenshots. Let a human approve the final submit step when the action matters.

What You Are Building

For the tutorial, the target website is still a private vendor portal with a "New Request" form. The form has a subject, details, priority, and a "Save Draft" button. The real website could be different, but the architecture stays the same.

PieceJob
Scheduler or queueStarts the automation daily or when a job arrives
Agentic workflowPrepares data, calls tools, handles retry and review states
Wrapper APIConverts safe HTTP requests into Selenium actions
Selenium containerRuns Chrome and exposes WebDriver on 4444
noVNCLets you see and control the browser on 7900

The official Docker Selenium README documents the standalone browser images, WebDriver port, noVNC port, and the common --shm-size 2g setup. The tutorial below wraps that browser in a smaller API so your AI does not talk to WebDriver directly.

What You Need

  • Docker and Docker Compose
  • A website account you are allowed to automate
  • A basic terminal
  • An agentic workflow runner that can make HTTP requests, such as n8n, cron plus a worker, or an agent framework

Step 1: Start A Visible Browser Container

Create a folder:

bash
mkdir browser-automation-wrapper
cd browser-automation-wrapper

Create docker-compose.yml:

yaml
services:
  selenium:
    image: selenium/standalone-chrome:latest
    shm_size: "2g"
    ports:
      - "127.0.0.1:4444:4444"
      - "127.0.0.1:7900:7900"
    environment:
      SE_VNC_PASSWORD: "change-this-password"

Start it:

bash
docker compose up -d

Open the live browser:

http://127.0.0.1:7900/?autoconnect=1&resize=scale&password=change-this-password

Use that browser to log in to your target website yourself. Do not ask AI to log in. Do not paste your password into an AI chat. The browser session now lives inside the container.

Step 2: Keep WebDriver Private

WebDriver is powerful. If someone can reach it, they can control the browser. That is why the compose file binds 4444 and 7900 to 127.0.0.1.

A guarded API gateway allowing approved automation cards through while blocking arbitrary browser commands
The wrapper is the gate: approved actions pass through, arbitrary browser control does not.
DoDo Not
Bind Selenium to localhostExpose 4444 to the public internet
Use noVNC only for manual login and debuggingGive noVNC access to an AI agent
Expose a narrow wrapper APILet AI send arbitrary WebDriver commands

This is the main safety boundary. The AI should not receive a general browser-control endpoint. It should receive a small API with only the actions you approve.

Step 3: Define The Allowed Workflow

Before writing code, describe the exact task. For the example portal:

Allowed action:
  Create a draft request in the vendor portal.

Inputs:
  subject
  details
  priority
  dry_run

Rules:
  Never log in.
  Never change account settings.
  Never submit the final form unless dry_run is false.
  Return needs_login if the browser is on the login page.
  Return a screenshot path for debugging.

This is where the wrapper becomes safer than a computer-use agent. The agent cannot decide to click account settings because your API does not expose that action.

Step 4: Ask AI To Generate The Wrapper

Here is a prompt you can give to your coding AI. Replace the fake portal URL and selector names with your real site details after you inspect the page.

Build a small FastAPI application that wraps Selenium Remote WebDriver.

Goal:
Create a narrow API for filling a logged-in website form. The website does not
provide an API. The user will log in manually through noVNC. The app must never
handle usernames, passwords, 2FA codes, or cookies directly.

Stack:
- Python 3.12
- FastAPI
- Selenium Python package
- Remote WebDriver at SELENIUM_URL, default http://selenium:4444/wd/hub

Endpoints:
- GET /health
- POST /portal/draft-request
- GET /debug/screenshot

Security requirements:
- Require Authorization: Bearer $AUTOMATION_API_TOKEN on every endpoint except /health.
- Do not accept arbitrary CSS selectors, URLs, or JavaScript from the API caller.
- Keep selectors in a server-side SELECTORS dictionary.
- If the browser is on a login page, return {"status": "needs_login"}.
- Default dry_run to true. In dry_run mode, fill fields but do not click Save Draft.
- Log actions, but never log secrets or full cookies.

Draft request input:
- subject: string, max 120 chars
- details: string, max 4000 chars
- priority: one of low, normal, high
- dry_run: boolean, default true

Selenium behavior:
- Reuse one browser session while the app is alive.
- Use WebDriverWait and expected_conditions, not blind sleeps.
- Navigate to https://example-portal.invalid/requests/new.
- Fill subject, details, and priority.
- Click Save Draft only when dry_run is false.
- Return status, current_url, page_title, and screenshot_file.

Operational behavior:
- Provide Dockerfile and docker-compose.yml.
- In docker-compose.yml, bind wrapper API to 127.0.0.1:8088 for local testing.
- Selenium and noVNC must not be exposed publicly.
- Add clear comments only where intent is not obvious.

That prompt matters more than the framework. It tells the AI where the boundary is: no login handling, no arbitrary browser control, no public WebDriver.

Step 5: Minimal Wrapper Example

A simple generated wrapper will look like this. Create form-wrapper/main.py:

python
import os
from typing import Literal

from fastapi import FastAPI, Header, HTTPException
from pydantic import BaseModel, Field
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select, WebDriverWait

SELENIUM_URL = os.getenv("SELENIUM_URL", "http://selenium:4444/wd/hub")
API_TOKEN = os.environ["AUTOMATION_API_TOKEN"]
PORTAL_DRAFT_URL = "https://example-portal.invalid/requests/new"

SELECTORS = {
    "subject": "#request_subject",
    "details": "#request_details",
    "priority": "#request_priority",
    "save_draft": "button[data-action='save-draft']",
}

app = FastAPI()
driver = None


class DraftRequest(BaseModel):
    subject: str = Field(max_length=120)
    details: str = Field(max_length=4000)
    priority: Literal["low", "normal", "high"] = "normal"
    dry_run: bool = True


def require_token(authorization: str | None) -> None:
    if authorization != f"Bearer {API_TOKEN}":
        raise HTTPException(status_code=401, detail="Unauthorized")


def browser():
    global driver
    if driver is None:
        options = webdriver.ChromeOptions()
        driver = webdriver.Remote(command_executor=SELENIUM_URL, options=options)
    return driver


def on_login_page(current_url: str, title: str) -> bool:
    text = f"{current_url} {title}".lower()
    return "login" in text or "sign-in" in text or "signin" in text


@app.get("/health")
def health():
    return {"status": "ok"}


@app.post("/portal/draft-request")
def draft_request(payload: DraftRequest, authorization: str | None = Header(default=None)):
    require_token(authorization)

    page = browser()
    wait = WebDriverWait(page, 20)
    page.get(PORTAL_DRAFT_URL)

    if on_login_page(page.current_url, page.title):
        return {"status": "needs_login", "current_url": page.current_url}

    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, SELECTORS["subject"]))).clear()
    page.find_element(By.CSS_SELECTOR, SELECTORS["subject"]).send_keys(payload.subject)

    page.find_element(By.CSS_SELECTOR, SELECTORS["details"]).clear()
    page.find_element(By.CSS_SELECTOR, SELECTORS["details"]).send_keys(payload.details)

    Select(page.find_element(By.CSS_SELECTOR, SELECTORS["priority"])).select_by_value(payload.priority)

    if not payload.dry_run:
        wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, SELECTORS["save_draft"]))).click()

    screenshot_file = "/tmp/latest-screenshot.png"
    page.save_screenshot(screenshot_file)

    return {
        "status": "filled" if payload.dry_run else "draft_saved",
        "current_url": page.current_url,
        "page_title": page.title,
        "screenshot_file": screenshot_file,
    }


@app.get("/debug/screenshot")
def screenshot(authorization: str | None = Header(default=None)):
    require_token(authorization)
    page = browser()
    screenshot_file = "/tmp/latest-screenshot.png"
    page.save_screenshot(screenshot_file)
    return {"status": "ok", "screenshot_file": screenshot_file}

This is intentionally narrow. It cannot browse arbitrary URLs. It cannot run arbitrary JavaScript. It cannot fill arbitrary selectors chosen by the AI. You edit the server-side selectors after inspecting the target page.

Create form-wrapper/Dockerfile:

dockerfile
FROM python:3.12-slim

WORKDIR /app
RUN pip install --no-cache-dir fastapi uvicorn selenium pydantic

COPY main.py /app/main.py
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8088"]

Update docker-compose.yml:

yaml
services:
  selenium:
    image: selenium/standalone-chrome:latest
    shm_size: "2g"
    ports:
      - "127.0.0.1:4444:4444"
      - "127.0.0.1:7900:7900"
    environment:
      SE_VNC_PASSWORD: "change-this-password"

  form-wrapper:
    build: ./form-wrapper
    depends_on:
      - selenium
    ports:
      - "127.0.0.1:8088:8088"
    environment:
      SELENIUM_URL: "http://selenium:4444/wd/hub"
      AUTOMATION_API_TOKEN: "${AUTOMATION_API_TOKEN}"

Create .env:

bash
AUTOMATION_API_TOKEN=replace-this-with-a-long-random-token

Restart with the wrapper:

bash
docker compose up -d --build

Step 6: Test It Without AI

Load the token into your terminal:

bash
set -a
. ./.env
set +a

Call the wrapper:

bash
curl -X POST http://127.0.0.1:8088/portal/draft-request \
  -H "Authorization: Bearer $AUTOMATION_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "subject": "Quarterly vendor check-in",
    "details": "Please prepare a draft request for the next vendor review.",
    "priority": "normal",
    "dry_run": true
  }'

If the response says needs_login, open noVNC and log in manually. Then run the same request again.

Keep dry_run enabled until you have watched the browser fill the correct fields. Only then should you allow the wrapper to click "Save Draft." For sensitive workflows, keep the final submit button outside automation and let a human review.

Step 7: Let The Agent Call The Wrapper

After the wrapper works by hand, your agentic workflow can call it like any other HTTP API. This could be an n8n workflow, a cron job, a small worker process, a queue consumer, or an agent framework that supports HTTP tools.

You can create draft requests through this API:

POST http://127.0.0.1:8088/portal/draft-request
Authorization: Bearer $AUTOMATION_API_TOKEN
Content-Type: application/json

Rules:
- Never ask for my website password.
- Never ask for 2FA codes.
- Never call Selenium or noVNC directly.
- If the API returns needs_login, tell me to log in through noVNC.
- Use dry_run=true unless I explicitly approve saving the draft.

The AI now handles language and structure. Your wrapper handles browser mechanics. The logged-in browser handles the website. Those boundaries make the workflow easier to reason about.

The important agent behavior is the stop condition. If the wrapper returns needs_login, the agent should not try to solve login by asking for credentials. It should notify you, pause the job, and continue only after you have logged in through noVNC.

Step 8: Move It To A Server

A developer using a private tunnel to inspect a browser container running on a small remote server
For 24/7 workflows, the server runs the browser; your laptop is only for login and debugging.

Local is the safest first test. For 24/7 automation, move the same Docker Compose stack to a small VPS.

Local setupServer setup
Use 127.0.0.1:7900 directlyUse an SSH tunnel to reach noVNC
Wrapper API bound to localhostExpose wrapper through HTTPS, VPN, or Tailscale
Your laptop must stay onThe VPS keeps the browser session alive

On a remote server, do not expose noVNC publicly. Tunnel to it when you need to log in or debug:

bash
ssh -L 7900:127.0.0.1:7900 user@your-server

Then open this on your own machine:

http://127.0.0.1:7900/?autoconnect=1&resize=scale&password=change-this-password

If your AI platform needs to call the wrapper from outside the server, expose only the wrapper API, protect it with HTTPS and a strong bearer token, and consider IP allowlists or a private network. Keep 4444 and 7900 private.

Session Persistence

The beginner version keeps the browser session alive while the Selenium container stays running. That is often enough for a first workflow.

You can mount a Chrome profile volume later if you need login to survive container restarts, but test it carefully. Some websites expire cookies, require fresh 2FA, or invalidate sessions after IP changes. Chrome profile locks can also make restarts annoying if the browser exits uncleanly.

Start with the simpler rule:

Keep the container running.
Return needs_login when the session expires.
Let the human re-authenticate through noVNC.

Guardrails Worth Keeping

RiskGuardrail
AI sees credentialsHuman logs in through noVNC; wrapper never accepts passwords
AI clicks the wrong thingWrapper exposes allowlisted actions, not full browser control
Website layout changesUse explicit selectors, screenshots, and needs_review responses
Automation submits bad dataDefault to dry_run and save drafts before final submit
Public remote-control endpointKeep WebDriver and noVNC bound to localhost or private network

When Not To Use This

Do not use this pattern when the website has a proper API. Use the API. It will be more stable, easier to monitor, and less likely to break when the page changes.

Also avoid this pattern when the site forbids automation, uses CAPTCHA as a clear anti-automation boundary, or handles actions where one bad click is expensive. In those cases, AI can still prepare the data, but a human should perform the browser action.

The Useful Boundary

The important idea is not Selenium. Selenium is replaceable. The useful idea is the boundary.

AI should not own your login.
AI should not own your whole browser.
AI can own structured draft data.
Your API should decide what browser actions are allowed.

That is a calmer way to automate websites without APIs. You still get AI-generated drafts, 24/7 server-side execution, and visible browser debugging. But you do not have to hand an AI agent your password or let it operate your personal computer.

License

Article text © 2026 Mark Huang. Licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) unless otherwise noted. Article text is licensed for non-commercial sharing with attribution to the original article URL. Commercial use requires prior written permission and must clearly cite the original source.

Code snippets, screenshots, third-party assets, and site source code may have separate terms.

Suggested attribution: Based on "Agentic Browser Automation Without Giving AI Your Login" by Mark Huang, originally published at https://markhuang.ai/blog/ai-browser-automation-without-sharing-login.

Stay updated

Articles on Go, AI/LLMs, and distributed systems. No spam.

Comments

Loading comments...