New tools for building agents

Today, we’re releasing the first set of building blocks that will help developers and enterprises build useful and reliable agents. We view agents as systems that independently accomplish tasks on behalf of users. Over the past year, we’ve introduced new model capabilities—such as advanced reasoning, multimodal interactions, and new safety techniques—that have laid the foundation for our models to handle the complex, multi-step tasks required to build agents. However, customers have shared that turning these capabilities into production-ready agents can be challenging, often requiring extensive prompt iteration and custom orchestration logic without sufficient visibility or built-in support.

To address these challenges, we’re launching a new set of APIs and tools specifically designed to simplify the development of agentic applications:

These new tools streamline core agent logic, orchestration, and interactions, making it significantly easier for developers to get started with building agents. Over the coming weeks and months, we plan to release additional tools and capabilities to further simplify and accelerate building agentic applications on our platform.

## Introducing the Responses API

The Responses API is our new API primitive for leveraging OpenAI’s built-in tools to build agents. It combines the simplicity of Chat Completions with the tool-use capabilities of the Assistants API. As model capabilities continue to evolve, we believe the Responses API will provide a more flexible foundation for developers building agentic applications. With a single Responses API call, developers will be able to solve increasingly complex tasks using multiple tools and model turns.

To start, the Responses API will support new built-in tools like web search, file search, and computer use. These tools are designed to work together to connect models to the real world, making them more useful in completing tasks. It also brings with it several usability improvements including a unified item-based design, simpler polymorphism, intuitive streaming events, and SDK helpers like `response.output_text` to easily access the model’s text output.

The Responses API is designed for developers who want to easily combine OpenAI models and built-in tools into their apps, without the complexity of integrating multiple APIs or external vendors. The API also makes it easier to store data on OpenAI so developers can evaluate agent performance using features such as tracing and evaluations. As a reminder, we do not train our models on business data by default, even when the data is stored on OpenAI. The API is available to all developers starting today and is not charged separately—tokens and tools are billed at standard rates specified on our pricing page⁠(opens in a new window). Check out the Responses API quickstart guide⁠(opens in a new window) to learn more.

## What this means for existing APIs

## Introducing built-in tools in the Responses API

Developers can now get fast, up-to-date answers with clear and relevant citations from the web. In the Responses API, web search is available as a tool when using gpt-4o and gpt-4o-mini, and can be paired with other tools or function calls.

`1const response = await openai.responses.create({2 model: "gpt-4o",3 tools: [ { type: "web_search_preview" } ],4 input: "What was a positive news story that happened today?",5});67console.log(response.output_text);`

During early testing, we’ve seen developers build with web search for a variety of use cases including shopping assistants, research agents, and travel booking agents—any application that requires timely information from the web.

For example, Hebbia⁠(opens in a new window) leverages the web search tool to help asset managers, private equity and credit firms, and law practices quickly extract actionable insights from extensive public and private datasets. By integrating real-time search capabilities into their research workflows, Hebbia delivers richer, context-specific market intelligence and continuously improves the precision and relevance of their analyses, outperforming current benchmarks.

Web search in the API is powered by the same model used for ChatGPT search. On SimpleQA, a benchmark that evaluates the accuracy of LLMs in answering short, factual questions, GPT‑4o search preview and GPT‑4o mini search preview score 90% and 88% respectively.

##### SimpleQA Accuracy (higher is better)

Responses generated with web search in the API include links to sources, such as news articles and blog posts, giving users a way to learn more. With these clear, inline citations, users can engage with information in a new way, while content owners gain new opportunities to reach a broader audience.

Any website or publisher can choose to appear⁠(opens in a new window) in web search in the API.

The web search tool is available to all developers in preview in the Responses API. We are also giving developers direct access to our fine-tuned search models in the Chat Completions API via `gpt-4o-search-preview` and `gpt-4o-mini-search-preview`. Pricing⁠(opens in a new window) starts respectively at $30 and $25 per thousand queries for GPT‑4o search and 4o-mini search respectively. Check out web search in the Playground⁠(opens in a new window) and learn more in our docs⁠(opens in a new window).

Developers can now easily retrieve relevant information from large volumes of documents using the improved file search tool. With support for multiple file types, query optimization, metadata filtering, and custom reranking, it can deliver fast, accurate search results. And again, with the Responses API, it takes only a few lines of code to integrate.

`1const productDocs = await openai.vectorStores.create({2 name: "Product Documentation",3 file_ids: [file1.id, file2.id, file3.id],4});56const response = await openai.responses.create({7 model: "gpt-4o-mini",8 tools: [{9 type: "file_search",10 vector_store_ids: [productDocs.id],11 }],12 input: "What is deep research by OpenAI?",13});1415console.log(response.output_text);`

The file search tool can be used for a variety of real-world use cases, including enabling a customer support agent to easily access FAQs, helping a legal assistant to quickly reference past cases for a qualified professional, and assisting a coding agent to query technical documentation. For example, Navan⁠(opens in a new window) uses file search in its AI-powered travel agent to quickly provide their users with precise answers from knowledge-base articles (like their company’s travel policy). With built-in query optimization and reranking, they are able to set up a powerful RAG (retrieval-augmented generation) pipeline without extra tuning or configuration. With dedicated vector stores for each user group, Navan is able to tailor answers to individual account settings and user roles, saving time for customers and their staff while helping provide accurate, personalized support.

This tool is available in the Responses API to all developers. Usage is priced⁠(opens in a new window) at $2.50 per thousand queries and file storage at $0.10/GB/day, with the first GB free. The tool continues to be available in the Assistants API. Finally, we’ve also added a new search endpoint to Vector Store API objects to directly query your data for use in other applications and APIs. Learn more in our docs⁠(opens in a new window) and start testing in the Playground⁠(opens in a new window).

To build agents capable of completing tasks on a computer, developers can now use the computer use tool in the Responses API, powered by the same Computer-Using Agent (CUA) model that enables Operator. This research preview model set a new state-of-the-art record, achieving 38.1% success on OSWorld⁠(opens in a new window) for full computer use tasks, 58.1% on WebArena⁠(opens in a new window), and 87% on WebVoyager⁠(opens in a new window) for web-based interactions.

The built-in computer use tool captures mouse and keyboard actions generated by the model, making it possible for developers to automate computer use tasks by directly translating these actions into executable commands within their environments.

`1const response = await openai.responses.create({2 model: "computer-use-preview",3 tools: [{4 type: "computer_use_preview",5 display_width: 1024,6 display_height: 768,7 environment: "browser",8 }],9 truncation: "auto",10 input: "I'm looking for a new camera. Help me find the best one.",11});1213console.log(response.output);`

Developers can use the computer use tool to automate browser-based workflows like performing quality assurance on web apps or executing data-entry tasks across legacy systems. For example, Unify⁠(opens in a new window) is a system of action for growing revenue that uses agents to identify intent, research accounts, and engage with buyers. Using OpenAI’s computer use tool, Unify’s agents can access information that was previously unreachable via APIs—such as enabling a property management company to verify through online maps if a business has expanded its real estate footprint. This research acts as a custom signal to trigger personalized outreach—empowering go-to-market teams to engage buyers with precision and scale.

As another example, Luminai⁠(opens in a new window) integrated the computer use tool to automate complex operational workflows for large enterprises with legacy systems that lack API availability and standardized data. In a recent pilot with a major community service organization, Luminai automated the application processing and user enrollment process in just days—something traditional robotic process automation (RPA) struggled to achieve after months of effort.

Before launching CUA in Operator last year, we conducted extensive safety testing and red teaming, addressing three key areas of risk: misuse, model errors, and frontier risks. To address risks associated with expanding Operator’s capabilities to local operating systems through CUA in the API, we performed additional safety evaluations and red teaming. We also added mitigations for developers, including safety checks to guard against prompt injections, confirmation prompts for sensitive tasks, tools to help developers isolate their environments, and enhanced detection of potential policy violations. While these mitigations help reduce risk, the model is still susceptible to inadvertent mistakes, especially in non-browser environments. For example, CUA’s performance on OSWorld, a benchmark designed to measure the performance of AI agents on real-world tasks, is currently at 38.1%, indicating that the model is not yet highly reliable for automating tasks on operating systems. Human oversight is recommended in these scenarios. More details about our API-specific safety work can be found in our updated system card.

| Benchmark type | Benchmark | Computer use (universal interface) | Web browsing agents | Human | | --- | --- | --- | --- | --- | | | | OpenAI CUA | Previous SOTA | Previous SOTA | | | Computer use | OSWorld | 38.1% | 22.0% | - | 72.4% | | Browser use | WebArena | 58.1% | 36.2% | 57.1% | 78.2% | | WebVoyager | 87.0% | 56.0% | 87.0% | - |

Evaluation details are described here

Starting today, the computer use tool is available as a research preview in the Responses API for select developers in usage tiers 3-5⁠(opens in a new window). Usage is priced⁠(opens in a new window) at $3/1M input tokens and $12/1M output tokens. Learn more in our docs⁠(opens in a new window) and check out the sample application⁠(opens in a new window) illustrating how to build with this tool.

In addition to building the core logic of agents and giving them access to tools so they are useful, developers also need to orchestrate agentic workflows. Our new open-source Agents SDK simplifies orchestrating multi-agent workflows and offers significant improvements over Swarm⁠(opens in a new window), an experimental SDK we released last year that was widely adopted by the developer community and successfully deployed by multiple customers.

Improvements include:

`1from agents import Agent, Runner, WebSearchTool, function_tool, guardrail23@function_tool4def submit_refund_request(item_id: str, reason: str):5 # Your refund logic goes here6 return "success"78support_agent = Agent(9 name="Support & Returns",10 instructions="You are a support agent who can submit refunds [...]",11 tools=[submit_refund_request],12)1314shopping_agent = Agent(15 name="Shopping Assistant",16 instructions="You are a shopping assistant who can search the web [...]",17 tools=[WebSearchTool()],18)1920triage_agent = Agent(21 name="Triage Agent",22 instructions="Route the user to the correct agent.",23 handoffs=[shopping_agent, support_agent],24)2526output = Runner.run_sync(27 starting_agent=triage_agent,28 input="What shoes might work best with my outfit so far?",29)`

The Agents SDK is suitable for various real-world applications, including customer support automation, multi-step research, content generation, code review, and sales prospecting. For instance, Coinbase⁠(opens in a new window) used the Agents SDK to quickly prototype and deploy AgentKit, a toolkit enabling AI agents to interact seamlessly with crypto wallets and various on-chain activities. In just a few hours, Coinbase integrated custom actions from their Developer Platform SDK into a fully functional agent. AgentKit’s streamlined architecture simplified the process of adding new agent actions, letting developers focus more on meaningful integrations and less on navigating complex agent setups.

In a couple of days, Box⁠(opens in a new window) was able to quickly create agents that leverage web search and the Agents SDK to enable enterprises to search, query, and extract insights from unstructured data stored within Box and public internet sources. This approach allows customers to not only access the latest information, but also search their internal, proprietary data in a safe and secure way that obeys their internal permissions and security policies. For example, a financial services firm can build a custom agent that calls on the Box AI agent to integrate their internal market analysis stored in Box with real-time news and economic data from the web, providing their analysts with a comprehensive view for investment decisions.

The Agents SDK works with the Responses API and Chat Completions API. The SDK will also work with models from other providers, as long as they provide a Chat Completions style API endpoint. Developers can immediately integrate it into their Python codebases, with Node.js support coming soon. Learn more in our docs⁠(opens in a new window).

In designing the Agents SDK, our team was inspired by the excellent work of others in the community including Pydantic⁠(opens in a new window), Griffe⁠(opens in a new window) and MkDocs⁠(opens in a new window). We’re committed to continuing to build the Agents SDK as an open source framework so others in the community can expand on our approach.

## What’s next: building the platform for agents

We believe agents will soon become integral to the workforce, significantly enhancing productivity across industries. As companies increasingly seek to leverage AI for complex tasks, we’re committed to providing the building blocks that enable developers and enterprises to effectively create autonomous systems that deliver real-world impact.

With today’s releases, we’re introducing the first building blocks to empower developers and enterprises to more easily build, deploy, and scale reliable, high-performing AI agents. As model capabilities become more and more agentic, we’ll continue investing in deeper integrations across our APIs and new tools to help deploy, evaluate, and optimize agents in production. Our goal is to give developers a seamless platform experience for building agents that can help with a variety of tasks across any industry. We’re excited to see what developers build next. To get started, explore our docs⁠(opens in a new window) and stay tuned for more updates soon.

Our Research * Research Index * Research Overview * Research Residency * OpenAI for Science * Economic Research

Latest Advancements * GPT-5.3 Instant * GPT-5.3-Codex * GPT-5 * Codex

Safety * Safety Approach * Security & Privacy * Trust & Transparency

ChatGPT * Explore ChatGPT(opens in a new window) * Business * Enterprise * Education * Pricing(opens in a new window) * Download(opens in a new window)

Sora * Sora Overview * Features * Pricing * Sora log in(opens in a new window)

API Platform * Platform Overview * Pricing * API log in(opens in a new window) * Documentation(opens in a new window) * Developer Forum(opens in a new window)

For Business * Business Overview * Solutions * Contact Sales

Company * About Us * Our Charter * Foundation * Careers * Brand

Support * Help Center(opens in a new window)

More * News * Stories * Livestreams * Podcast * RSS

Terms & Policies * Terms of Use * Privacy Policy * Other Policies

(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)

English United States

New tools for building agents

The unpaid, unrecognised burden of the women-led care economy of India

Andrej Karpathy Transitions from Coding to Directing AI Agents

Musk and Hassabis Discuss AI's Impact on Scientific Discovery

Perfios Reports 46% Profit Increase to ₹104 Cr in FY25, Revenue Surpasses ₹700 Cr

Latest Briefs