Moderation

Discord moderation

The agent reads every message in your watched channels and acts only on the rules you write. It quotes the exact phrase that broke the exact rule — no vibes, no thresholds, no confidence score to tune.

Moderation is one of the three things DuggAI does in a Discord server, alongside customer support and code fixes. Turn it on per server and the same bot that answers support questions also watches for rule-breaking messages. You decide whether it bans on its own or just queues a recommendation for you.

How it decides what to moderate

There is no keyword list and no toxicity score. The agent reads each message against the rules you wrote in plain English. When it thinks a message breaks a rule, it has to do two things before any action is recorded:

Quote the exact rule it believes was broken, copied verbatim from your rule set.
Quote the exact span of the message that broke it.

A verifier then checks that both quotes are real — that the rule quote actually exists in the rule set the agent claimed, and that the message span actually exists in the message. If either is invented, the decision is rejected before it reaches you or the user. That is what keeps the bot from banning on a hunch: it can only act on something it can point at. In the dashboard, the offending span is highlighted inside the full message so you see precisely what it caught.

Same reasoning model as support

Moderation uses the same explicit-reasoning approach as the support agent. See How the agent decides for the support side. The difference here is the literal-quote requirement and the two separate rule sets below.

Two rule sets

You write rules in two boxes, and which box a rule lives in decides what happens when it matches.

Rule set	What happens on a match	Use it for
Flag for review (propose ban)	The message lands in your review queue with Ban, Mute, and Dismiss buttons. Nothing happens to the user until you decide.	Judgment calls — harassment, off-topic, NSFW outside the right channel. Anything you want a human to confirm.
Auto-ban (optional, off by default)	The bot bans immediately on a verified match, no queue. The action still gets logged so you can review or revert it after the fact.	Only unambiguous spam — crypto pumps, token/airdrop shilling, unapproved invite-link spam, drop-shipping bots.

Auto-ban only reads the auto-ban box

The verifier forces an auto-ban to literal-quote a rule from the auto-ban set specifically. It cannot reach into your “flag for review” rules to justify an immediate ban. So a rule you only want reviewed can never trigger an automatic ban, even by mistake. Keep anything that needs judgment out of the auto-ban box.

Turning it on

Enable the moderation use case
During onboarding, pick Discord Moderation on the use-cases step. Already onboarded? Turn it on under Settings → Use cases. This is what reveals the moderation setup.
Connect the Discord bot
Moderation runs through the same bot as support, and the install grants the ban, kick, and timeout permissions it needs in the same consent screen. See Install the Discord bot. Two things still matter: the bot's role has to sit above the members it polices, and if you installed before moderation launched, re-run the install once to pick up the new permissions.
Write your rules
In the moderation setup, paste your rules into Flag for review. Plain English works — write it the way you'd write a server-rules post. One rule per line is easiest for the agent to quote cleanly.
Optionally enable auto-ban
Flip on Auto-ban only if you have spam categories that are never a judgment call, then list them in the auto-ban box. You can leave this off entirely and review everything by hand.

Start in review-only

For the first week, leave auto-ban off and let everything flow through the queue. You'll see exactly what the agent would have done and can tune your wording before you trust it to act on its own.

Reviewing in the inbox

Moderation isn't a separate page — flagged messages land in your Inbox next to support tickets, under the moderation filter. Open one and you get the full message with the violating span highlighted, the exact rule it matched, a link to jump straight to the message in Discord, and three actions:

Ban — approve the ban. (Shows as Unban once a user is already banned, so you can reverse it.)
Mute — a softer call than a ban when the message is borderline.
Dismiss — the agent was wrong or it doesn't warrant action; the message is left alone.

You can filter the queue by status:

Status	Meaning
To review	A flag-for-review match waiting on your decision.
Banned	A ban that went through — auto-ban or one you approved.
Failed	The bot tried to ban but Discord refused (see below).
Dismissed	Resolved without action, including bans you later reverted.

Why a ban shows as Failed

Discord rejects a ban when the target outranks the bot (a server owner, or anyone whose top role sits above the bot's), or if the bot's ban permission was removed after install. Fix the role order or permission in your server, then re-run the action from the inbox.

Every decision is auditable

Each moderation decision is logged with the message, the matched rule, the action taken, and the model, token count, cost, and latency that produced it. Auto-bans are logged the same as proposals, so even the actions you didn't touch are reviewable and reversible.

Writing rules that work

Be concrete. “No promoting tokens, contract addresses, or referral links” quotes cleanly. “No spam” gives the agent nothing specific to point at.
One idea per line. Keeps the rule quote tight and the highlight readable.
Reserve auto-ban for the obvious. If you'd ever want to glance at it first, it belongs in flag-for-review, not auto-ban.
Iterate from the queue. When the agent misses or over-flags, the fix is almost always a sharper rule, not a setting.