Knowledge sources
Core Concepts

Knowledge sources

Where the agent looks for answers. Add the right sources, in priority order, and the bot stops hallucinating.

The hierarchy

The agent consults sources in this order. Earlier sources outrank later ones.

  1. Pinned answers — short, hand-written responses you've written for specific questions.
  2. Your docs site — the URLs the crawler indexes.
  3. Manually uploaded markdown — internal notes, FAQs, runbooks.
  4. Linked GitHub repos — README, docs folder, code comments.
  5. Past resolved tickets — successful resolutions get folded back into the corpus.
  6. The model's general knowledge — used only when nothing above applies, and the agent flags low confidence.

Adding a docs site

  1. From dashboard → Knowledge → Sources, paste your docs URL.
  2. Pick crawl depth (default 5). Higher = more pages, longer to index.
  3. Optional: add include / exclude path patterns (e.g. exclude /blog/).
  4. Click Crawl. Initial crawl typically completes in 5–30 minutes.

Re-indexing

DuggAI re-crawls indexed sites every 24 hours by default. You can trigger an immediate re-crawl from the source row, useful right after a docs update. Clusters attached to old content auto-update when their source page changes.

Pinned answers

When the bot keeps getting one specific question wrong, write a pinned answer. Format:

  • Question pattern — natural language phrasing of the question.
  • Answer — the markdown reply.

Pinned answers outrank everything else. Use them for:

  • Pricing questions (where the truth is in your billing system, not docs).
  • Status / availability (e.g. "is feature X live yet").
  • Anything sensitive where you want the exact wording every time.

GitHub repos

Connect a GitHub repo and DuggAI indexes README, anything in docs/, and top-level inline comments. Useful for OSS projects where the docs are the repo. Re-syncs on every push.

Manual markdown

Paste markdown directly when you have internal knowledge that isn't (and shouldn't be) public. Common uses: troubleshooting runbooks, escalation playbooks, "known broken" lists.

Don't paste secrets
Manual markdown is searchable by the agent and can be cited in replies. Don't put API keys, customer data, or anything you wouldn't want quoted back to a user.

How retrieval actually works

Each source is chunked, embedded, and stored in a per-project vector index. On every message, the agent retrieves the top chunks across all sources, ranks them by relevance and source priority, and includes the highest-ranked ones in the prompt. The model is explicitly instructed to cite sources and to say "I don't know" when no chunk is sufficient.

What if the agent ignores a source

Usually means the chunk wasn't retrieved — either the source isn't indexed yet, the question doesn't lexically match, or the page is behind auth. Check the conversation detail in the dashboard: it shows which chunks the agent considered.