> ## Documentation Index
> Fetch the complete documentation index at: https://onecli.sh/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Vertex AI: Route Agents to Claude & Gemini on GCP

> Route agent requests to Vertex AI models on Google Cloud, including Claude and Gemini. Supports service account and ADC authentication.

## Overview

OneCLI connects AI agents to Google Cloud's Vertex AI platform. Agents can call models hosted on Vertex AI, including Claude (via Anthropic's Model Garden) and Gemini. The gateway injects Google Cloud credentials into requests to the Vertex AI API automatically.

This is useful for agents that need to call AI models through your own Google Cloud project, using your billing and quotas.

## Setup

<Steps>
  <Step title="Prepare your credentials">
    You need one of the following:

    * **Service account key**: Create a service account in the Google Cloud console with the Vertex AI User role. Download the JSON key file.
    * **Authorized user credentials**: Run `gcloud auth application-default login` on your machine and use the generated credentials file.
  </Step>

  <Step title="Connect in OneCLI">
    Open the OneCLI dashboard, go to **Connections** > **Vertex AI**, and provide:

    * **Credentials**: paste the JSON key or upload the credentials file
    * **Project ID**: your Google Cloud project ID
    * **Region**: the Vertex AI region (e.g., `us-central1`)
  </Step>
</Steps>

## What agents can do

* Send prompts to Claude models hosted on Vertex AI (Claude Sonnet, Claude Haiku, Claude Opus)
* Send prompts to Gemini models
* Use streaming or non-streaming inference
* Pass structured messages with text and image inputs
* Call models with tool/function calling enabled
* Access any model available in your project's Model Garden
* Run batch prediction jobs

## Controlling access with rules

Use OneCLI's [rules engine](/guides/rules) to limit what agents can do with Vertex AI. For example, you can restrict agents to specific model endpoints, or rate limit inference calls to control costs. Rules are evaluated before credential injection, so a blocked request never reaches Vertex AI.

## Using with MCP servers

If your agent uses an MCP server that needs Vertex AI credentials locally, see the [Vertex AI credential stubs](/guides/credential-stubs/vertex-ai) guide to set up placeholder files that the gateway fills in at request time.
