How to Build an AI Agent That Actually Handles Boring Tasks for You

Wait 5 sec.

Ah, AI agents… the hottest trend in tech right now. Everyone's hyped about them being the future of work. After all, they can do it all and will automate most tasks to give us more time, right? Well… sort of.The reality? Most agents get blocked by websites or get lost while trying to execute tasks. To actually make one that works, you need a best-in-class tech stack. Only the right combination of tools can turn an AI agent into a real task-automation machine.Follow this tutorial and learn how to craft an AI agent that can truly automate tasks for you!Why Most AI Agents Don’t DeliverThe dream of having AI automate tasks for us is exactly why AI agents were invented in the first place. It’s why “agentic AI” became a trend, and why the hype is still sky-high.Imagine a world where all the tedious, repetitive stuff gets handled by AI so we can save time. Sounds perfect, right?\ \That way, we could focus on what really matters: stacking V-Bucks in Fortnite or grinding runes in Elden Ring.Jokes aside, if you’ve ever played around with an AI agent like OpenAI Operator or tried building one yourself, you already know the sad truth: AI agents rarely live up to expectations!These are some of the main reasons AI agents flop:They can’t interact with websites or desktop apps like a real human would.LLMs powering them can be unpredictable, giving different results on the same input.Even when they do use a browser, anti-bot techniques like CAPTCHAs stop them cold.Unlike humans, AI agents often lack common sense reasoning and struggle to adapt when faced with situations beyond their programming.The problem isn’t the idea of AI agents. Instead, it’s the tech stack you use to build them.So let’s stop wasting time and figure out how to build an AI agent that can actually automate browser tasks for you.Make an AI Agent Automate the Stuff You Hate Doing: Step-by-Step TutorialIn this chapter, you’ll be walked through building an AI agent that can handle one of the most boring (yet critical) tasks out there: job hunting!The resulting AI agent will be smart enough to:Visit GoogleDiscover job platformsBrowse listings based on your desired positions and preferencesExtract interesting jobsExport them into a clean JSON fileAnd if you want to take it further, you’ll also find resources on how to feed it your CV so the agent can learn your profile and automatically apply to the best matches—all without you lifting a finger.\ \:::warning⚠️ Important: This is just an example! As mentioned before the end of this guide, the same agent can be adapted to almost any browser-based workflow by simply changing the task description.:::Let’s dive in!PrerequisitesTo follow along with this tutorial, make sure you have:An LLM API key (we’ll use Gemini, since it’s basically free to use via API, but OpenAI, Anthropic, Ollama, Groq, and others work as well).A Bright Data account with the Browser API enabled (don’t worry about setup yet, as you’ll be guided through it in this tutorial).Python ≥ 3.11 installed locally.To speed things up, we’ll also assume you already have a Python project set up with an uv virtual environment in place.Step #1: Install Browser UseAs mentioned earlier, most AI agents flop because they hit the wall of tech limitations 🧱. The models alone just aren’t enough. So what’s one of the best tools to build AI agents that can indeed do stuff inside a browser? 👉 Browser Use!Never heard of it? No worries! Catch up with this video or take a look at its official docs:https://www.youtube.com/watch?v=zGkVKix_CRU&embedable=true\First things first, activate your uv venv and install the browser-use package from PyPI:uv pip install browser-useUnder the hood, this library runs on Playwright, so you’ll also need to grab the Chromium binaries it depends on. To do so, run:uvx playwright install chromium --with-deps --no-shellBoom! 💥 You’re now set up with a browser automation agentic AI powerhouse.Step #2: Integrate the LLMAI agents won’t do much without AI (shocker, right? 😅), so your agent needs a language model to properly think. Browser Use supports a long list of LLM providers, but we’ll focus on Gemini, the one highlighted on the official browser-use GitHub page.Why Gemini? Because it’s one of the few LLMs with API access and generous rate limits that make it fundamentally free to play with. 🆓Grab your Gemini API key and store it in a .env file in your project folder like this:GEMINI_API_KEY=Next, create an agent.py file, which will contain the AI agent definition logic. Start by reading the envs from .env using python-dotenv (which comes with browser-use):from dotenv import load_dotenv# Read the environment variables from the .env fileload_dotenv() Then, define your LLM integration:from browser_use import ChatGoogle# The LLM powering the AI agentllm = ChatGoogle(model="gemini-2.5-flash")Amazing! You’ve got your AI engine ready. 🧠Time to define and build the rest of your agent’s logic…Step #3: Describe the Browser-Based Task to AutomateHow you describe the task to your agent is everything. The LLM you configured in Browser Use only works as well as your instructions, so spend time crafting a prompt that’s clear, detailed, but not overly complicated.This is the most important step in your implementation. Thus, check out guides on prompt design and follow the Browser-Use best practices to maximize results. You might need a few rounds of trial and error. 🧪Since this is just an example, let's keep it simple and describe the browser job-hunting task like this:task = """Search on Google for software engineer jobs in New York.1. Choose a job posting page.2. On the chosen site, filter for jobs published within the last 24 hours.3. For each job listing, extract the key details, including the job posting URL and the apply URL (if available).4. Return all results as a JSON list."""As you can see, you’re giving your agent a lot of freedom, which is totally fine considering how capable and flexible Browser Use is! 💪💡 Tip: In a real-world setup, you should read preferences from a configuration file and inject them into your prompt. This makes your agent customizable for different searches. Think varying job titles, locations, required skills, company preferences, remote vs on-site, and more. For a similar approach, read our guide on building a LinkedIn job hunting AI assistant.Step #4: Define and Run the AgentUse Browser Use to spin up an AI agent controlled by your configured LLM that can tackle the task you defined earlier:from browser_use import Agentagent = Agent( llm=llm, task=task,)Fire your agent like this:history = agent.run_sync()Perfect! Now all that’s left is to grab the output from your AI agent and export it to JSON (or any format you need). 💾Step #5: Export the Output to JSONGrab the output from your agent (which should be a clean JSON list of jobs) and dump it to a .json file:import jsonoutput_data = history.structured_outputwith open("jobs.json", "w", encoding="utf-8") as f: json.dump(output_data, f, ensure_ascii=False, indent=4)Here we go! Mission complete. Boring task handler agent at your service! 🫡Step #6: Address the Agent LimitationsBrowser Use is incredible—but not magical, unfortunately…\ \If you try to run your browser-based handler AI agent now, it’ll probably get blocked. That may occur because of a Google reCAPTCHA:\ \(See how to automate reCAPTCHA solving.)If it somehow bypasses that, there’s still the Indeed human verification page powered by Cloudflare:\ These failures are especially common if you run the script on a server or in headless mode—which, let’s be honest, is exactly what you want. No one wants a machine tied up for minutes while it handles a task! 😣So yeah, all this sets up building an AI agent that fails… just like all the others 😢. Was that a waste of time? Nope, as the tutorial isn’t over yet!\ \There’s still the most important step. The one that actually makes this whole thing work. 🤩Step #8: Integrate Agent BrowserYour agent fails because the sites it interacts with can detect it as an automated bot. How does that happen? Tons of reasons, including:Browser fingerprinting: The browser session created by default in Playwright is super generic and doesn’t look like a real user.Rate limiters: Your agent ends up making too many requests in a short time (classic for automation, not humans), which triggers suspicion instantly.IP reputation : The more automation scripts you run from your IP, the more solutions like Cloudflare flag you as a potential bot—increasing the chances of a CAPTCHA or other verification.So, what’s the solution? A browser that:Runs human-like sessions, mimicking real user behavior.Can solve CAPTCHAs automatically if they appear.Integrates with a proxy network with millions of rotating IPs to avoid rate limits.Runs in the cloud for infinite scalability.Integrates seamlessly with AI.Is this a dream? Nope! It exists, and it’s called Agent Browser (aka Browser API)!https://www.youtube.com/watch?v=T59GCkpk5zY&embedable=trueFollow the official Agent Browser integration guide, and you’ll end up on a page like this:\ \Copy your connection URL (highlighted in red) and add it to your .env file like so:BRIGHT_DATA_BROWSER_AGENT_URL=Then, read it in agent.py and define the Browser object to instruct Browser Use to connect to the remote browser:import osfrom browser_use import BrowserBRIGHT_DATA_BROWSER_AGENT_URL = os.getenv("BRIGHT_DATA_BROWSER_AGENT_URL")browser = Browser( cdp_url=BRIGHT_DATA_BROWSER_AGENT_URL)Next, pass the browser object to your agent:agent = Agent( llm=llm, task=task, browser=browser, #