OpenAgent
Core Concepts

Knowledge Base

Upload documents and enable grounded, accurate agent responses.

Knowledge Base

The Knowledge Base is OpenAgent's RAG (Retrieval-Augmented Generation) system. It lets your agents answer questions based on your own documents rather than relying solely on the LLM's training data.

How it works

Document Upload


┌─────────────────┐
│   Document      │  ← parse text from PDF, DOCX, XLSX, etc.
│   Parsing       │
└────────┬────────┘


┌─────────────────┐
│   Text          │  ← split into overlapping chunks
│   Splitting     │
└────────┬────────┘


┌─────────────────┐
│   Embedding     │  ← convert chunks to vectors
└────────┬────────┘


┌─────────────────┐
│   Vector Store  │  ← store vectors + metadata
└────────┬────────┘

      At query time:


┌─────────────────┐
│   Semantic      │  ← embed query, find similar chunks
│   Search        │
└────────┬────────┘


    Top-K chunks injected into agent context

Supported Document Formats

FormatExtensionNotes
PDF.pdfText extraction + OCR for scanned pages
Word.docx, .docPreserves headings and structure
Excel.xlsx, .xlsTabular data per sheet
CSV / TSV.csv, .tsvRow-by-row ingestion
PowerPoint.pptxSlide text and notes
Plain Text.txtDirect ingestion
Markdown.md, .mdxCode-aware splitting

Scanned PDFs are processed with OCR automatically. Accuracy depends on scan quality.

Creating a Knowledge Base

Name your knowledge base

Go to Knowledge Bases → New Knowledge Base. Choose a descriptive name (e.g., "Product Documentation", "HR Handbook").

Choose an embedding model

Select the embedding model used to convert text chunks to vectors. All documents in a knowledge base must use the same embedding model — choose carefully, as changing it later requires re-indexing.

Recommended: text-embedding-3-small (OpenAI) — fast, affordable, and high quality.

Upload documents

Click Upload and select one or more files. You can upload multiple files at once. The indexing pipeline starts automatically.

Monitor indexing progress

Each file shows a status indicator:

  • Queued — waiting to be processed
  • Indexing — currently being chunked and embedded
  • Ready — available for agent queries
  • Failed — an error occurred (check the error details)

Attach to an agent

Open your agent settings and select this knowledge base under Knowledge Base. The agent will now search it automatically for every user message.

Chunking Strategy

OpenAgent splits documents into overlapping chunks before embedding. The default settings work well for most documents, but you can tune them:

ParameterDefaultDescription
chunk_size512 tokensMaximum tokens per chunk
chunk_overlap50 tokensOverlap between adjacent chunks to preserve context
split_strategyrecursiveHow to split: recursive, sentence, paragraph

When to adjust:

  • Technical documentation — larger chunks (1024) to keep code examples intact
  • FAQ-style content — smaller chunks (256) so each Q&A is a discrete unit
  • Dense tables — use paragraph strategy to keep rows together

Retrieval Configuration

When an agent queries the knowledge base, you can configure:

ParameterDefaultDescription
top_k5Number of chunks to retrieve
similarity_threshold0.7Minimum similarity score (0–1)
rerankingdisabledRerank results with a cross-encoder for better precision

Raising top_k gives the agent more context but uses more tokens per request. Start with the default and increase if agents miss relevant information.

Multi-Knowledge Base Agents

A single agent can query multiple knowledge bases. For example, a customer support agent might search both:

  • Product Documentation — for technical questions
  • Company Policy — for billing and return questions

When multiple knowledge bases are attached, OpenAgent performs parallel searches and merges the results, ranking by relevance score.

Keeping Knowledge Fresh

  • Manual re-upload — delete the old file and upload the updated version
  • API-based updates — use the REST API to programmatically sync documents from your CMS, database, or file system
  • Scheduled sync — configure a sync job to pull from a URL or S3 bucket on a schedule

Best Practices

Organize by topic — separate knowledge bases for different domains keep retrieval precise and let you attach only the relevant base to each agent.

Prefer clean text — if possible, export documents as plain text or markdown rather than converting complex PDFs. Better input → better retrieval.

Include metadata — add document title and creation date to help the agent cite sources correctly.

Test retrieval — use the Search tab in the knowledge base view to test what chunks are returned for representative queries before deploying to production.

On this page