How we built iid-mcp: an MCP server that puts a private SailPoint IdentityIQ codebase in front of Claude Code, Copilot, and Claude Desktop, so they stop guessing.
At Instrumental Identity, we write a lot of SailPoint IdentityIQ code: rules, workflows, tasks, plugins, and XML configs that an importer validates against a DTD with more than a thousand elements. So when the capable LLM coding assistants arrived, we did the obvious thing and asked one to write a rule.
What came back looked completely plausible and was subtly wrong. It called a method that doesn’t exist on SailPointContext. It invented a helper in com.identityworksllc.iiq.common that nobody ever wrote. It set a type= attribute on a <Rule> that the importer rejects on sight. The code compiled in the model’s head and fell over in ours.
The reason is simple. The model has read a lot of public Java, but it’s never seen the libraries we actually build against: Instrumental Identity’s iiqcommon, the iiq-common-library, our in-house plugin set, and SailPoint’s own shipped API surface. So it does what models do when they don’t know. It produces a confident average of everything it’s seen and presents it as fact. For IIQ work that average is worse than useless, because it’s wrong in ways that take someone who knows the platform to catch.
The fix isn’t a better prompt. The fix is to stop letting the model lean on memory and hand it the real source instead. That’s what iid-mcp does.
What it is
iid-mcp is a Model Context Protocol server. MCP is the standard that lets an AI client (Claude Code, Claude Desktop, GitHub Copilot in agent mode) call tools you define, over a well-specified protocol, with no bespoke glue per client. You stand up one server, expose a set of tools, and every MCP-aware assistant can use them.
iid-mcp exposes seven tools. Two are live: they reach into Instrumental Identity’s GitLab and search or fetch real library source on demand. The other five are reference tools, serving curated, queryable knowledge built straight from SailPoint’s own shipped artifacts: the database DDL, the Javadoc, the example configs, the XML DTD.
| Tool | What it does |
|---|---|
search_iiq(query, max_results) | GitLab blob search across every in-scope project, in parallel |
fetch_iiq_file(project, path, ref) | Fetch one file’s full contents from GitLab |
get_iiq_patterns() | Return the hand-curated patterns reference |
get_iiq_schema(table?) | IIQ database schema: table inventory, or full column detail for one table |
get_iiq_api(name?) | IIQ + Instrumental Identity Javadoc: package index, a package’s classes, or one class |
get_iiq_examples(key?) | SailPoint’s bundled config examples by rule type, workflow, form |
get_iiq_dtd(element?) | The IIQ XML DTD: element index, or one element’s children and attributes |
The whole thing is about 1,200 lines of Python plus the generated reference data. It’s small on purpose. The hard part was never the code. It was deciding what to feed the model and how to keep that data honest.
The shape of the problem
There are two different jobs hiding inside “help me write IIQ code,” and they want different tools.
The first is find me real usage. When we’re writing against a helper, we want to see how it’s actually called in code that ships, not how the model imagines it’s called. That’s a search problem against private repos, and it has to be live, because the libraries change on the order of days.
The second is tell me the legal shape of this thing. What columns does spt_identity have on 8.5? What does SailPointContext.getObjects return? Which children is a <Workflow> allowed to have? None of that changes between Tuesday and Thursday; it changes when SailPoint ships a new version. That’s a reference problem, and the right move is to precompute it from the source of truth and serve it fast.
We built both halves into one server because, from the agent’s point of view, they’re the same task: ground this artifact in reality before you generate it. The server’s own instructions spell out the ordering. Load the patterns, pull the matching example, check the API signature, verify the XML shape against the DTD, then search the live code to confirm. Generate last.
Live search over GitLab
You configure what the server can see with one environment variable, a comma-separated list of project and group paths. A group path expands to every non-archived project inside it, recursively. The default scope resolves to roughly 38 projects, cached in memory for the life of the process.
search_iiq runs a GitLab blob search against every project in scope at once. Concurrency is bounded in two places: how many per-project requests are in flight within a single call, and how many search calls run at once across the whole process. Callers over the limit queue rather than getting rejected.
The most important design decision in the whole project lives here, and we got it wrong the first time. The original search_iiq gathered all per-project searches and returned the combined hits. Then production logs showed three of four search calls failing outright. The cause was one repo, iiq-common-library, whose blob search legitimately runs around 11 seconds while every other project returns in under 3.4. Under load it would tip past the timeout, and that single ReadTimeout, propagating up through asyncio.gather, took the entire search down with it. The 37 healthy projects had already returned useful hits, and the caller saw nothing but an exception.
So the rule now is: a per-project failure is data, not an exception. Each project’s search is wrapped to catch its own errors and return them instead of raising. The aggregator splits outcomes into hits and a structured errors list. You get the 37 projects that worked plus a note saying which one timed out and why. Partial results stay useful. The same instinct shows up again in the cache and in the transport layer: infrastructure trouble degrades the answer, it never destroys it.
A returned hit carries everything the agent needs to act: project, path, ref, line number, the matched snippet, and a web_urlthat deep-links straight to the line in GitLab. The usual loop is search, read the snippets, fetch the one or two that matter, write code grounded in what came back.
The reference tools
The five reference tools share one idea. SailPoint already ships the ground truth for most of what an agent needs to know; it just ships it in formats built for a human with a browser, not a model with a tool call. So we wrote build scripts that parse those artifacts into markdown shaped for retrieval, and tools that serve slices of it on demand.
Schema (get_iiq_schema). Parses the bundled database DDL (create plus upgrade patch scripts for Oracle, SQL Server, MySQL, and PostgreSQL), applies the patches in order, and generates a reference covering IIQ 8.4 and 8.5 with per-database type differences and version diffs. It also flags a real footgun: Oracle and PostgreSQL put function-based UPPER()indexes on string columns, so a query that filters without wrapping the bind parameter in UPPER() silently table-scans. Those columns are marked, so the agent writes the indexed form the first time instead of the slow form you discover in production.
API (get_iiq_api). Javadoc converted to per-package markdown and merged across three sources: SailPoint IIQ 8.5, iiqcommon, and iiq-common-library, about 940 classes across 93 packages. A class lookup spans all three, so the agent doesn’t need to know which library a helper lives in, and per-class sections get extracted on demand so a 569-class package never returns in one response.
Examples (get_iiq_examples). SailPoint ships example configs with IIQ. When the agent is about to write a Correlation rule, it gets SailPoint’s own example for that exact type, so the contract and the idiom come from the vendor rather than from the model’s imagination.
DTD (get_iiq_dtd). Every IIQ artifact is XML, validated on import against sailpoint.dtd (1,108 elements on 8.5, regenerated straight from the matching jar). A lookup returns the legal children and attributes of an element before the agent generates one. This prevents the single most common authoring failure: XML that looks right and gets rejected at import time.
Patterns (get_iiq_patterns). The one tool that isn’t generated. It’s a hand-curated reference of the things that come up constantly: logging idioms, task base classes, plugin anatomy, the namespace gotcha where iiqcommon and iiq-common-libraryshare a package prefix but hold different classes. The curation rules are strict and we enforce them on ourselves: every entry is sourced, verified, concise, and current. Stale entries get removed rather than left to rot, because the whole point is that the agent trusts the file and skips re-verifying. A stale pattern is worse than no pattern; it turns “the model doesn’t know” into “the model is confidently wrong, and we told it to be.”
State, caching, and ops
The design rule is one sentence: state is externalized, the process is killable. There’s no local database, no session store, no on-disk cache. Everything lives in GitLab and Redis, which makes the container disposable and means moving from a Proxmox VM today to ECS or Fargate tomorrow is a deploy change, not a code change.
Caching matters because, with search fanning out in parallel, the slowest project IS the floor of total search time. GitLab responses get cached in Redis when configured, and the cache degrades gracefully in every direction: no Redis means a no-op pass-through, and an unreachable Redis mid-call gets swallowed and falls back to a live fetch. The server never fails a tool call because the cache is unhappy. An optimization that can take down the thing it’s optimizing is a liability.
Observability is structured logging, one JSON object per meaningful event, emitted to the host’s systemd journal rather than the default Docker log driver so a week of search-latency and error history survives the redeploy-on-every-push cycle. Secrets never reach the logs: tokens are recorded as a present/absent boolean and Redis credentials are redacted.
The production server sits behind a Cloudflare Tunnel and Cloudflare Access, with browsers gated by M365 Entra SSO and programmatic clients using per-user service tokens so revocation is per-user instead of all-or-nothing. GitLab CI runs lint, test, build, and deploy on every push to main, backed by 118 tests that mock GitLab and Redis so the suite is deterministic and never touches the network.
One migration is worth recording. The server originally spoke MCP over SSE, a long-lived stream that went stale through Cloudflare on container restart or after an idle stretch. The connection looked alive and was dead. Streamable HTTP fixes this by construction: short-lived per-call requests against a single endpoint, nothing held open to go stale. We cut production over entirely, leaving SSE in the codebase behind a one-line switch.
The principles that drove the design
A few principles ended up driving most of the decisions, and they generalize past this one server.
- Ground the model in real source. Never trust its memory of your private code. Every tool exists to replace a confident guess with a fact the model can cite.
- Infrastructure degrades the answer, it never destroys it. A slow repo becomes one line in an error list. A dead cache becomes a live fetch. The tool call comes back.
- Externalize all state. A killable process is a deployable, redeployable, portable process.
- Curated knowledge has to stay honest or it’s worse than nothing. The model trusts what you give it and skips re-checking. That trust is the whole value, and it’s the liability the moment the data goes stale.
What’s next
IIQ is the first product, not the only one. The package layout already reserves space for sibling toolsets (SailPoint ISC, and later Evolveum Midpoint and Fischer Identity) that plug into the same scope, cache, transport, and observability machinery. Each new product brings its own GitLab paths and reference data and reuses everything else.
For now it’s doing the job it was built for. Claude Code sessions write IIQ code against real library source instead of a plausible hallucination of it, the cost of a wrong line shows up at authoring time instead of at import time, and nobody has to be the human who catches the method that doesn’t exist. That last part is the whole point.