Ollama Integration Example

📹 Watch the demo video to see this example in action!

Note: This video was recorded before the project was renamed from automcp to gen-mcp. The functionality remains the same—just replace automcp with genmcp in commands.

Overview

This example demonstrates how to expose Ollama, a popular local language model runtime, as MCP tools using gen-mcp. By wrapping Ollama’s functionality, you can enable AI assistants to interact with your local language models seamlessly—no custom server code required.

What You’ll Learn

Prerequisites

Two Integration Approaches

gen-mcp supports two different methods for integrating with Ollama, each with its own advantages:

The HTTP approach calls Ollama’s REST API directly:

Advantages:

Use when: You want production-grade integration with complete feature access

CLI-Based Integration

The CLI approach executes ollama commands directly:

Advantages:

Use when: You need quick prototyping or prefer command-line interaction

HTTP-Based Integration Tutorial

Let’s walk through creating an HTTP-based Ollama integration step-by-step.

Step 1: Start Ollama

Ensure Ollama is running locally:

ollama serve

Ollama will start on http://localhost:11434 by default. You can verify it’s running:

curl http://localhost:11434
# Should return: "Ollama is running"

Step 2: Understanding the Configuration

Here’s the complete ollama-http.yaml configuration file that defines the MCP tools:

mcpFileVersion: "0.1.0"
name: ollama
version: "1.0.0"
runtime:
  transportProtocol: streamablehttp
  streamableHttpConfig:
    port: 8009
tools:
- name: generate
  title: "Generate a response"
  description: "Generates a response for a given prompt."
  inputSchema:
    type: object
    properties:
      model:
        type: string
        description: "The name of the model to use."
      prompt:
        type: string
        description: "The prompt to generate a response for."
      system:
        type: string
        description: "A system message to override the model's default behavior."
      stream:
        type: boolean
        description: "Whether to stream the response. Must be false."
    required:
    - model
    - prompt
    - stream
  invocation:
    http:
      method: POST
      url: http://localhost:11434/api/generate

- name: chat
  title: "Generate a chat response"
  description: "Generates a response for a chat-based conversation."
  inputSchema:
    type: object
    properties:
      model:
        type: string
        description: "The name of the model to use."
      messages:
        type: array
        items:
          type: object
          properties:
            role:
              type: string
              description: "The role: 'user' or 'assistant'."
            content:
              type: string
              description: "The message content."
          required:
          - role
          - content
      stream:
        type: boolean
        description: "Whether to stream. Must be false."
    required:
    - model
    - messages
    - stream
  invocation:
    http:
      method: POST
      url: http://localhost:11434/api/chat

- name: tags
  title: "List downloaded models"
  description: "Lists all downloaded models."
  inputSchema:
    type: object
    properties: {}
  invocation:
    http:
      method: GET
      url: http://localhost:11434/api/tags

- name: pull_model
  title: "Pull model"
  description: "Download a model from the ollama library."
  inputSchema:
    type: object
    properties:
      model:
        type: string
        description: "The name of the model to pull."
      stream:
        type: boolean
        description: "Must be false."
    required:
    - model
    - stream
  invocation:
    http:
      method: POST
      url: http://localhost:11434/api/pull

- name: running_models
  title: "Get running models"
  description: "List models currently loaded into memory."
  inputSchema:
    type: object
  invocation:
    http:
      method: GET
      url: http://localhost:11434/api/ps

Step 3: Configuration Breakdown

Let’s understand each section:

Runtime Configuration

runtime:
  transportProtocol: streamablehttp
  streamableHttpConfig:
    port: 8009

Tool Definition Structure

Each tool follows this pattern:

- name: generate              # Unique tool identifier
  title: "Generate a response" # Human-readable name
  description: "..."           # What the tool does (for LLM understanding)
  inputSchema:                 # JSON Schema for input validation
    type: object
    properties:
      model:
        type: string
        description: "..."
    required:
    - model
  invocation:                  # How to execute the tool
    http:
      method: POST
      url: http://localhost:11434/api/generate

Key concepts:

Step 4: Run the MCP Server

Start the gen-mcp server with your configuration:

genmcp run -f ollama-http.yaml

You should see:

INFO    Starting MCP server on port 8009
INFO    Loaded 5 tools from ollama-http.yaml

Step 5: Test Your Integration

You can now connect an MCP client (like Claude Desktop or any MCP-compatible tool) to http://localhost:8009 and use the Ollama tools.

Example tool calls:

List available models:

{
  "tool": "tags"
}

Generate a completion:

{
  "tool": "generate",
  "arguments": {
    "model": "llama2",
    "prompt": "Explain quantum computing in simple terms",
    "stream": false
  }
}

CLI-Based Integration Tutorial

The CLI approach is simpler but more limited. Here’s the complete configuration:

CLI Configuration File

mcpFileVersion: 0.1.0
name: Ollama
version: 0.0.1
runtime:
  streamableHttpConfig:
    port: 7008
  transportProtocol: streamablehttp
tools:
- name: start_ollama
  title: Start Ollama
  description: Start ollama. Only run if not already started.
  inputSchema:
    type: object
  invocation:
    cli:
      command: nohup ollama start > /dev/null 2>&1 &

- name: check_ollama_running
  title: Check if Ollama is Running
  description: Check if Ollama is running.
  inputSchema:
    type: object
  invocation:
    cli:
      command: curl http://localhost:11434 || echo "ollama is not running"

- name: pull_model
  title: Pull model
  description: Pull a model so that Ollama can use it
  inputSchema:
    type: object
    properties:
      model:
        type: string
        description: The name of the model to pull
  invocation:
    cli:
      command: ollama pull {model}

- name: list_models
  title: List models
  description: List all models ollama has pulled currently.
  inputSchema:
    type: object
  invocation:
    cli:
      command: ollama list

- name: generate_completion
  title: Generate completion
  description: Generate a completion from Ollama
  inputSchema:
    type: object
    properties:
      model:
        type: string
        description: The model to use
      prompt:
        type: string
        description: The prompt to give the model
  invocation:
    cli:
      command: 'ollama run {model} {prompt}'
      templateVariables:
        prompt:
          format: '"{prompt}"'

CLI Configuration Explained

The key difference is the invocation type:

invocation:
  cli:
    command: ollama pull {model}

Running the CLI Integration

genmcp run -f ollama-cli.yaml

Understanding Input Schema Validation

Input schemas ensure tools receive valid data before execution. Here’s how they work:

Basic Schema

inputSchema:
  type: object
  properties:
    model:
      type: string
      description: "Model name"
  required:
  - model

Validation behavior:

Array Schema

inputSchema:
  type: object
  properties:
    messages:
      type: array
      items:
        type: object
        properties:
          role:
            type: string
          content:
            type: string
        required:
        - role
        - content

This validates complex nested structures like chat messages.

Common Patterns and Best Practices

Pattern 1: Tool Chaining

Design tools to work together:

- name: check_ollama_running
  description: "Check if Ollama is running. Run before other tools."

- name: pull_model
  description: "Download a model. Run check_ollama_running first."

Pattern 2: Safe Defaults

Require explicit flags for dangerous operations:

inputSchema:
  properties:
    stream:
      type: boolean
      description: "Must be false for MCP compatibility"
  required:
  - stream

Pattern 3: Clear Descriptions

Help LLMs understand tool usage:

description: "Download a model from the ollama library. This may take several minutes for large models. Always check if the model exists first using tags."

Troubleshooting

Issue: “Connection refused” error

Solution: Ensure Ollama is running:

ollama serve

Issue: “Model not found” error

Solution: Pull the model first:

ollama pull llama2

Issue: Tools not appearing in MCP client

Solution: Check the server is running and the port matches your client configuration.

Next Steps

Resources