Discord moderation
The agent reads every message in your watched channels and acts only on the rules you write. It quotes the exact phrase that broke the exact rule — no vibes, no thresholds, no confidence score to tune.
Moderation is one of the three things DuggAI does in a Discord server, alongside customer support and code fixes. Turn it on per server and the same bot that answers support questions also watches for rule-breaking messages. You decide whether it bans on its own or just queues a recommendation for you.
How it decides what to moderate
There is no keyword list and no toxicity score. The agent reads each message against the rules you wrote in plain English. When it thinks a message breaks a rule, it has to do two things before any action is recorded:
- Quote the exact rule it believes was broken, copied verbatim from your rule set.
- Quote the exact span of the message that broke it.
A verifier then checks that both quotes are real — that the rule quote actually exists in the rule set the agent claimed, and that the message span actually exists in the message. If either is invented, the decision is rejected before it reaches you or the user. That is what keeps the bot from banning on a hunch: it can only act on something it can point at. In the dashboard, the offending span is highlighted inside the full message so you see precisely what it caught.
Two rule sets
You write rules in two boxes, and which box a rule lives in decides what happens when it matches.
| Rule set | What happens on a match | Use it for |
|---|---|---|
| Flag for review (propose ban) | The message lands in your review queue with Ban, Mute, and Dismiss buttons. Nothing happens to the user until you decide. | Judgment calls — harassment, off-topic, NSFW outside the right channel. Anything you want a human to confirm. |
| Auto-ban (optional, off by default) | The bot bans immediately on a verified match, no queue. The action still gets logged so you can review or revert it after the fact. | Only unambiguous spam — crypto pumps, token/airdrop shilling, unapproved invite-link spam, drop-shipping bots. |
Turning it on
- Enable the moderation use caseDuring onboarding, pick Discord Moderation on the use-cases step. Already onboarded? Turn it on under Settings → Use cases. This is what reveals the moderation setup.
- Connect the Discord botModeration runs through the same bot as support, and the install grants the ban, kick, and timeout permissions it needs in the same consent screen. See Install the Discord bot. Two things still matter: the bot's role has to sit above the members it polices, and if you installed before moderation launched, re-run the install once to pick up the new permissions.
- Write your rulesIn the moderation setup, paste your rules into Flag for review. Plain English works — write it the way you'd write a server-rules post. One rule per line is easiest for the agent to quote cleanly.
- Optionally enable auto-banFlip on Auto-ban only if you have spam categories that are never a judgment call, then list them in the auto-ban box. You can leave this off entirely and review everything by hand.
Reviewing in the inbox
Moderation isn't a separate page — flagged messages land in your Inbox next to support tickets, under the moderation filter. Open one and you get the full message with the violating span highlighted, the exact rule it matched, a link to jump straight to the message in Discord, and three actions:
- Ban — approve the ban. (Shows as Unban once a user is already banned, so you can reverse it.)
- Mute — a softer call than a ban when the message is borderline.
- Dismiss — the agent was wrong or it doesn't warrant action; the message is left alone.
You can filter the queue by status:
| Status | Meaning |
|---|---|
| To review | A flag-for-review match waiting on your decision. |
| Banned | A ban that went through — auto-ban or one you approved. |
| Failed | The bot tried to ban but Discord refused (see below). |
| Dismissed | Resolved without action, including bans you later reverted. |
Every decision is auditable
Each moderation decision is logged with the message, the matched rule, the action taken, and the model, token count, cost, and latency that produced it. Auto-bans are logged the same as proposals, so even the actions you didn't touch are reviewable and reversible.
Writing rules that work
- Be concrete. “No promoting tokens, contract addresses, or referral links” quotes cleanly. “No spam” gives the agent nothing specific to point at.
- One idea per line. Keeps the rule quote tight and the highlight readable.
- Reserve auto-ban for the obvious. If you'd ever want to glance at it first, it belongs in flag-for-review, not auto-ban.
- Iterate from the queue. When the agent misses or over-flags, the fix is almost always a sharper rule, not a setting.