Agentic Browser Automation Without Giving AI Your Login

A human keeping a login key separate while an AI sends structured requests to a protected browser container — The useful split: humans keep the login, AI sends structured data, and the browser runs in a contained service.

Quick Answer

If you only need a one-off browsing session, use the browser already built into ChatGPT, Codex, Claude Desktop, or your favorite AI tool.

This tutorial is for a different problem: agentic automation that needs to run on a schedule or from a queue. Think daily form drafts, recurring portal updates, internal admin chores, or a workflow that should keep running on a server while your laptop is closed.

scheduler or queue
  -> agentic workflow
  -> your small wrapper API
  -> Selenium WebDriver
  -> browser container
  -> logged-in website

You log in yourself through a visible noVNC browser. The AI only sends structured requests to your wrapper, such as "create this draft form" or "take a screenshot." That keeps login out of the AI conversation and avoids letting a computer-use agent operate your personal machine.

The pattern is useful when a site requires login, has no public API, and you want an agentic workflow that can run on a small server instead of depending on your PC being awake 24/7.

The Problem

Many useful workflows are trapped behind websites.

Maybe it is a partner portal, an internal admin page, a school form, a membership site, or some boring dashboard that only exists in a browser. You want AI to prepare the data and fill the form, but the website does not offer an API.

The tempting shortcut is dangerous:

Give the AI my login
Ask it to open the website
Let it drive my browser

That mixes too many responsibilities. The AI sees secrets. The AI can click anywhere. Your laptop becomes part of the production workflow. If the workflow should run overnight or while you are away, your personal computer now has to stay on.

A better split is:

Human owns login.
Browser container owns the session.
Wrapper API owns the allowed actions.
Agent owns the recurring workflow.

That is the design this tutorial builds.

Why Not Just Use ChatGPT's Browser?

Built-in AI browsers are good for interactive work. You are present. You ask a question. The tool opens pages, reads, clicks, and reports back.

Agentic automation has a different shape. It may run every morning at 8:00. It may pick up jobs from a queue. It may retry after a website timeout. It may need to create a draft while you are asleep and notify you only when the browser session needs login.

Use case	Built-in AI browser	Wrapper API plus Selenium
One-off research or browsing	Good fit	Usually unnecessary
Daily or event-driven automation	Weak fit	Good fit
Runs while your PC is off	No	Yes, on a server
Stable tool contract for an agent	Limited	Explicit HTTP API
Login handling	Easy to mix with the chat	Human logs in through noVNC

The point is not that built-in browsers are bad. They solve the interactive case. This pattern solves the automation case.

A Daily Automation Example

Imagine a private vendor portal with no API. Every weekday morning, you want an agentic workflow to prepare draft requests based on yesterday's operational notes.

8:00 AM cron trigger
  -> fetch yesterday's notes from your database or inbox
  -> AI writes the request subject and details
  -> agent calls POST /portal/draft-request with dry_run=true
  -> wrapper fills the logged-in browser form
  -> wrapper returns screenshot and status
  -> agent notifies you if review, login, or submission approval is needed

That is different from asking a chat tool to browse once. The agent has a recurring job, a retry path, a tool contract, and a way to stop when human authentication or approval is required.

For a first version, keep the final action human-reviewed. Let the agent create drafts. Let the wrapper return screenshots. Let a human approve the final submit step when the action matters.

What You Are Building

For the tutorial, the target website is still a private vendor portal with a "New Request" form. The form has a subject, details, priority, and a "Save Draft" button. The real website could be different, but the architecture stays the same.

Piece	Job
Scheduler or queue	Starts the automation daily or when a job arrives
Agentic workflow	Prepares data, calls tools, handles retry and review states
Wrapper API	Converts safe HTTP requests into Selenium actions
Selenium container	Runs Chrome and exposes WebDriver on `4444`
noVNC	Lets you see and control the browser on `7900`

The official Docker Selenium README documents the standalone browser images, WebDriver port, noVNC port, and the common --shm-size 2g setup. The tutorial below wraps that browser in a smaller API so your AI does not talk to WebDriver directly.

What You Need

Docker and Docker Compose
A website account you are allowed to automate
A basic terminal
An agentic workflow runner that can make HTTP requests, such as n8n, cron plus a worker, or an agent framework

Step 1: Start A Visible Browser Container

Create a folder:

bash

mkdir browser-automation-wrapper
cd browser-automation-wrapper

Create docker-compose.yml:

yaml

services:
  selenium:
    image: selenium/standalone-chrome:latest
    shm_size: "2g"
    ports:
      - "127.0.0.1:4444:4444"
      - "127.0.0.1:7900:7900"
    environment:
      SE_VNC_PASSWORD: "change-this-password"

Start it:

bash

docker compose up -d

Open the live browser:

http://127.0.0.1:7900/?autoconnect=1&resize=scale&password=change-this-password

Use that browser to log in to your target website yourself. Do not ask AI to log in. Do not paste your password into an AI chat. The browser session now lives inside the container.

Step 2: Keep WebDriver Private

WebDriver is powerful. If someone can reach it, they can control the browser. That is why the compose file binds 4444 and 7900 to 127.0.0.1.

A guarded API gateway allowing approved automation cards through while blocking arbitrary browser commands — The wrapper is the gate: approved actions pass through, arbitrary browser control does not.

Do	Do Not
Bind Selenium to localhost	Expose `4444` to the public internet
Use noVNC only for manual login and debugging	Give noVNC access to an AI agent
Expose a narrow wrapper API	Let AI send arbitrary WebDriver commands

This is the main safety boundary. The AI should not receive a general browser-control endpoint. It should receive a small API with only the actions you approve.

Step 3: Define The Allowed Workflow

Before writing code, describe the exact task. For the example portal:

Allowed action:
  Create a draft request in the vendor portal.

Inputs:
  subject
  details
  priority
  dry_run

Rules:
  Never log in.
  Never change account settings.
  Never submit the final form unless dry_run is false.
  Return needs_login if the browser is on the login page.
  Return a screenshot path for debugging.

This is where the wrapper becomes safer than a computer-use agent. The agent cannot decide to click account settings because your API does not expose that action.

Step 4: Ask AI To Generate The Wrapper

Here is a prompt you can give to your coding AI. Replace the fake portal URL and selector names with your real site details after you inspect the page.

Build a small FastAPI application that wraps Selenium Remote WebDriver.

Goal:
Create a narrow API for filling a logged-in website form. The website does not
provide an API. The user will log in manually through noVNC. The app must never
handle usernames, passwords, 2FA codes, or cookies directly.

Stack:
- Python 3.12
- FastAPI
- Selenium Python package
- Remote WebDriver at SELENIUM_URL, default http://selenium:4444/wd/hub

Endpoints:
- GET /health
- POST /portal/draft-request
- GET /debug/screenshot

Security requirements:
- Require Authorization: Bearer $AUTOMATION_API_TOKEN on every endpoint except /health.
- Do not accept arbitrary CSS selectors, URLs, or JavaScript from the API caller.
- Keep selectors in a server-side SELECTORS dictionary.
- If the browser is on a login page, return {"status": "needs_login"}.
- Default dry_run to true. In dry_run mode, fill fields but do not click Save Draft.
- Log actions, but never log secrets or full cookies.

Draft request input:
- subject: string, max 120 chars
- details: string, max 4000 chars
- priority: one of low, normal, high
- dry_run: boolean, default true

Selenium behavior:
- Reuse one browser session while the app is alive.
- Use WebDriverWait and expected_conditions, not blind sleeps.
- Navigate to https://example-portal.invalid/requests/new.
- Fill subject, details, and priority.
- Click Save Draft only when dry_run is false.
- Return status, current_url, page_title, and screenshot_file.

Operational behavior:
- Provide Dockerfile and docker-compose.yml.
- In docker-compose.yml, bind wrapper API to 127.0.0.1:8088 for local testing.
- Selenium and noVNC must not be exposed publicly.
- Add clear comments only where intent is not obvious.

That prompt matters more than the framework. It tells the AI where the boundary is: no login handling, no arbitrary browser control, no public WebDriver.

Step 5: Minimal Wrapper Example

A simple generated wrapper will look like this. Create form-wrapper/main.py:

python

import os
from typing import Literal

from fastapi import FastAPI, Header, HTTPException
from pydantic import BaseModel, Field
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select, WebDriverWait

SELENIUM_URL = os.getenv("SELENIUM_URL", "http://selenium:4444/wd/hub")
API_TOKEN = os.environ["AUTOMATION_API_TOKEN"]
PORTAL_DRAFT_URL = "https://example-portal.invalid/requests/new"

SELECTORS = {
    "subject": "#request_subject",
    "details": "#request_details",
    "priority": "#request_priority",
    "save_draft": "button[data-action='save-draft']",
}

app = FastAPI()
driver = None


class DraftRequest(BaseModel):
    subject: str = Field(max_length=120)
    details: str = Field(max_length=4000)
    priority: Literal["low", "normal", "high"] = "normal"
    dry_run: bool = True


def require_token(authorization: str | None) -> None:
    if authorization != f"Bearer {API_TOKEN}":
        raise HTTPException(status_code=401, detail="Unauthorized")


def browser():
    global driver
    if driver is None:
        options = webdriver.ChromeOptions()
        driver = webdriver.Remote(command_executor=SELENIUM_URL, options=options)
    return driver


def on_login_page(current_url: str, title: str) -> bool:
    text = f"{current_url} {title}".lower()
    return "login" in text or "sign-in" in text or "signin" in text


@app.get("/health")
def health():
    return {"status": "ok"}


@app.post("/portal/draft-request")
def draft_request(payload: DraftRequest, authorization: str | None = Header(default=None)):
    require_token(authorization)

    page = browser()
    wait = WebDriverWait(page, 20)
    page.get(PORTAL_DRAFT_URL)

    if on_login_page(page.current_url, page.title):
        return {"status": "needs_login", "current_url": page.current_url}

    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, SELECTORS["subject"]))).clear()
    page.find_element(By.CSS_SELECTOR, SELECTORS["subject"]).send_keys(payload.subject)

    page.find_element(By.CSS_SELECTOR, SELECTORS["details"]).clear()
    page.find_element(By.CSS_SELECTOR, SELECTORS["details"]).send_keys(payload.details)

    Select(page.find_element(By.CSS_SELECTOR, SELECTORS["priority"])).select_by_value(payload.priority)

    if not payload.dry_run:
        wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, SELECTORS["save_draft"]))).click()

    screenshot_file = "/tmp/latest-screenshot.png"
    page.save_screenshot(screenshot_file)

    return {
        "status": "filled" if payload.dry_run else "draft_saved",
        "current_url": page.current_url,
        "page_title": page.title,
        "screenshot_file": screenshot_file,
    }


@app.get("/debug/screenshot")
def screenshot(authorization: str | None = Header(default=None)):
    require_token(authorization)
    page = browser()
    screenshot_file = "/tmp/latest-screenshot.png"
    page.save_screenshot(screenshot_file)
    return {"status": "ok", "screenshot_file": screenshot_file}

This is intentionally narrow. It cannot browse arbitrary URLs. It cannot run arbitrary JavaScript. It cannot fill arbitrary selectors chosen by the AI. You edit the server-side selectors after inspecting the target page.

Create form-wrapper/Dockerfile:

dockerfile

FROM python:3.12-slim

WORKDIR /app
RUN pip install --no-cache-dir fastapi uvicorn selenium pydantic

COPY main.py /app/main.py
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8088"]

Update docker-compose.yml:

yaml

services:
  selenium:
    image: selenium/standalone-chrome:latest
    shm_size: "2g"
    ports:
      - "127.0.0.1:4444:4444"
      - "127.0.0.1:7900:7900"
    environment:
      SE_VNC_PASSWORD: "change-this-password"

  form-wrapper:
    build: ./form-wrapper
    depends_on:
      - selenium
    ports:
      - "127.0.0.1:8088:8088"
    environment:
      SELENIUM_URL: "http://selenium:4444/wd/hub"
      AUTOMATION_API_TOKEN: "${AUTOMATION_API_TOKEN}"

Create .env:

bash

AUTOMATION_API_TOKEN=replace-this-with-a-long-random-token

Restart with the wrapper:

bash

docker compose up -d --build

Step 6: Test It Without AI

Load the token into your terminal:

bash

set -a
. ./.env
set +a

Call the wrapper:

bash

curl -X POST http://127.0.0.1:8088/portal/draft-request \
  -H "Authorization: Bearer $AUTOMATION_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "subject": "Quarterly vendor check-in",
    "details": "Please prepare a draft request for the next vendor review.",
    "priority": "normal",
    "dry_run": true
  }'

If the response says needs_login, open noVNC and log in manually. Then run the same request again.

Keep dry_run enabled until you have watched the browser fill the correct fields. Only then should you allow the wrapper to click "Save Draft." For sensitive workflows, keep the final submit button outside automation and let a human review.

Step 7: Let The Agent Call The Wrapper

After the wrapper works by hand, your agentic workflow can call it like any other HTTP API. This could be an n8n workflow, a cron job, a small worker process, a queue consumer, or an agent framework that supports HTTP tools.

You can create draft requests through this API:

POST http://127.0.0.1:8088/portal/draft-request
Authorization: Bearer $AUTOMATION_API_TOKEN
Content-Type: application/json

Rules:
- Never ask for my website password.
- Never ask for 2FA codes.
- Never call Selenium or noVNC directly.
- If the API returns needs_login, tell me to log in through noVNC.
- Use dry_run=true unless I explicitly approve saving the draft.

The AI now handles language and structure. Your wrapper handles browser mechanics. The logged-in browser handles the website. Those boundaries make the workflow easier to reason about.

The important agent behavior is the stop condition. If the wrapper returns needs_login, the agent should not try to solve login by asking for credentials. It should notify you, pause the job, and continue only after you have logged in through noVNC.

Step 8: Move It To A Server

A developer using a private tunnel to inspect a browser container running on a small remote server — For 24/7 workflows, the server runs the browser; your laptop is only for login and debugging.

Local is the safest first test. For 24/7 automation, move the same Docker Compose stack to a small VPS.

Local setup	Server setup
Use `127.0.0.1:7900` directly	Use an SSH tunnel to reach noVNC
Wrapper API bound to localhost	Expose wrapper through HTTPS, VPN, or Tailscale
Your laptop must stay on	The VPS keeps the browser session alive

On a remote server, do not expose noVNC publicly. Tunnel to it when you need to log in or debug:

bash

ssh -L 7900:127.0.0.1:7900 user@your-server

Then open this on your own machine:

http://127.0.0.1:7900/?autoconnect=1&resize=scale&password=change-this-password

If your AI platform needs to call the wrapper from outside the server, expose only the wrapper API, protect it with HTTPS and a strong bearer token, and consider IP allowlists or a private network. Keep 4444 and 7900 private.

Session Persistence

The beginner version keeps the browser session alive while the Selenium container stays running. That is often enough for a first workflow.

You can mount a Chrome profile volume later if you need login to survive container restarts, but test it carefully. Some websites expire cookies, require fresh 2FA, or invalidate sessions after IP changes. Chrome profile locks can also make restarts annoying if the browser exits uncleanly.

Start with the simpler rule:

Keep the container running.
Return needs_login when the session expires.
Let the human re-authenticate through noVNC.

Guardrails Worth Keeping

Risk	Guardrail
AI sees credentials	Human logs in through noVNC; wrapper never accepts passwords
AI clicks the wrong thing	Wrapper exposes allowlisted actions, not full browser control
Website layout changes	Use explicit selectors, screenshots, and `needs_review` responses
Automation submits bad data	Default to `dry_run` and save drafts before final submit
Public remote-control endpoint	Keep WebDriver and noVNC bound to localhost or private network

When Not To Use This

Do not use this pattern when the website has a proper API. Use the API. It will be more stable, easier to monitor, and less likely to break when the page changes.

Also avoid this pattern when the site forbids automation, uses CAPTCHA as a clear anti-automation boundary, or handles actions where one bad click is expensive. In those cases, AI can still prepare the data, but a human should perform the browser action.

The Useful Boundary

The important idea is not Selenium. Selenium is replaceable. The useful idea is the boundary.

AI should not own your login.
AI should not own your whole browser.
AI can own structured draft data.
Your API should decide what browser actions are allowed.

That is a calmer way to automate websites without APIs. You still get AI-generated drafts, 24/7 server-side execution, and visible browser debugging. But you do not have to hand an AI agent your password or let it operate your personal computer.

Agentic Browser Automation Without Giving AI Your Login

Quick Answer

The Problem

Why Not Just Use ChatGPT's Browser?

A Daily Automation Example

What You Are Building

What You Need

Step 1: Start A Visible Browser Container

Step 2: Keep WebDriver Private

Step 3: Define The Allowed Workflow

Step 4: Ask AI To Generate The Wrapper

Step 5: Minimal Wrapper Example

Step 6: Test It Without AI

Step 7: Let The Agent Call The Wrapper

Step 8: Move It To A Server

Session Persistence

Guardrails Worth Keeping

When Not To Use This

The Useful Boundary

License

Try Dense-Mem in 5 Minutes With the Hosted Demo

Dense-Mem Quick Start: Give Claude Code and Codex the Same Memory

Secure Dense-Mem on Vultr with Traefik

Stay updated

Comments

License

Related Articles

Try Dense-Mem in 5 Minutes With the Hosted Demo

Dense-Mem Quick Start: Give Claude Code and Codex the Same Memory

Secure Dense-Mem on Vultr with Traefik

Stay updated

Comments