Blog

AI vendor assessment template for shadow AI at work

Most AI vendor assessment guides assume you already know which vendors to assess. The harder, more common problem is the one before that — you do not, because half the tools in use were never procured through IT. This piece is the practitioner sequence for HR and procurement teams: how to surface the shadow-AI list first, then triage it, then run a 30-question assessment on the tools that matter.

Shadow AI is the catch-all phrase for AI tools your employees use without your IT or compliance team's awareness — most commonly free consumer chatbots, browser extensions, and SaaS plug-ins that quietly call a large language model. The risk is dual: data leakage on the way in, and unreviewed output on the way out. Responsible AI Studio (RAIS) publishes an AI Vendor Assessment tool that converts a shadow-AI discovery list into a triaged register with the EU AI Act, GDPR, and ISO 42001 obligations each vendor triggers.

Qualified review still required. Outputs are AI-generated starting-point documents — not a substitute for qualified legal or compliance advice.

Discovery: three sources for the shadow-AI list

The list of shadow AI tools in use across your organisation is almost certainly longer than your sanctioned list. Three discovery sources, run in parallel, find the tools each one would miss alone.

An anonymous employee survey with explicit amnesty surfaces the tools people have signed up for personally. The amnesty matters — without it, the survey returns a sanitised picture. Frame the question as "which AI tools have you found genuinely useful in the last quarter?" rather than "have you used unauthorised AI?". Volunteered information is the goal.

Browser and proxy logs, filtered for known AI domains and SaaS plug-ins, surface what runs from corporate devices. This is the IT-team-led source, and tends to find tools the survey missed because staff did not consciously think of them as AI — Grammarly, Notion AI features, Slack AI summaries, automated meeting transcription.

Expense-report scans for AI subscriptions, free-trial-to-paid conversions, and add-on charges on existing SaaS bills catch the procurement tail. Many shadow tools are paid for on team or individual budgets and never hit IT procurement. The expense data is where they appear.

Each source finds tools the others miss. Running all three is the only way to a complete inventory — and the inventory is the input to every later step. The upstream context for this work is the employee AI guidelines rollout guide; the upstream policy work that should govern any tool you sanction is in set the acceptable use rules before approving tools.

Triage: which shadow tools matter, which are noise

Not every tool on the list needs the full assessment. Triage on three lenses: data sensitivity, frequency of use, and business criticality. Score each tool 1–5 on each dimension. A grammar-checking browser extension used twice a week by one team on non-confidential text scores low across all three; an AI meeting-transcription tool used by leadership for board prep scores high.

The triage produces three bands. Sanction or replace covers the tools where the score justifies converting to a paid enterprise tier with a DPA, or replacing with a sanctioned alternative. Light-touch review covers the tools where a brief assessment and an acceptable-use note suffice. Allow with logging covers the genuinely low-stakes tools that do not need a full process but should appear on the inventory for the next review cycle.

The triage step is what makes the 30-question assessment tractable. Running it on every tool in the inventory is unrealistic; running it on the triaged "sanction or replace" band is the workable scope.

The 30-question vendor assessment

The 30-question framework covers six domains, five questions each. The domains are calibrated to the risks an AI vendor introduces specifically, beyond the standard SaaS due-diligence the procurement team already runs.

Security — encryption in transit and at rest, access controls, audit logging, incident-response timing, penetration-test cadence. Sample question: "What is the vendor's notification timeline for a security incident that affects customer data?"

Model lineage — which underlying model and version is in use, how often the model updates, whether the customer is notified before a model change, what happens to historical outputs when the model changes. Sample question: "Does the vendor notify customers in advance of underlying model changes that may affect output behaviour?"

Training data — what data the vendor's models were trained on, whether customer input data trains the model by default, opt-out mechanism, retention of customer prompts and outputs. Sample question: "Are customer prompts and outputs used to train the vendor's models by default?"

IP — ownership of model outputs, indemnification for copyright claims on output, terms-of-use limitations on commercial use. Sample question: "Does the vendor indemnify the customer against IP claims arising from model output?"

Sub-processors — which third parties process customer data, including the underlying model providers (OpenAI, Anthropic, Google, others), with EU AI Act Article 25 provider-versus-deployer classification for each. Sample question: "Which sub-processor model providers are listed, and what are their roles under EU AI Act Article 25?"

Incident response — definition of an AI-specific incident (not just a data breach — model misbehaviour, prompt injection, hallucination causing customer harm), notification timeline, remediation expectations. Sample question: "How does the vendor define an AI-specific incident, distinct from a data security incident?"

Each question scores 0–3 against vendor-supplied evidence. The framework totals to a vendor risk score from 0 to 90.

Scoring rationale and the go/no-go gate

The weighted-score approach exists because not every domain matters equally for every use case. A vendor scoring 18/15 on security but 4/15 on training data is workable for a customer-support assistant where staff paste no personal data; the same vendor is not workable for an HR-tools deployment where they will. The HR and procurement reader does not need to construct the weights from scratch — the framework comes with default weights per use-case category, which can be adjusted by the compliance reviewer.

The go/no-go gate is a single threshold per use case. A vendor below the threshold goes back for evidence supplementation; a vendor at or above the threshold proceeds to contracting. A vendor below the threshold on a hard floor — for example, no DPA available at all — fails outright regardless of the other scores. The go/no-go gate is what turns the assessment from an academic exercise into a procurement decision.

Evidence request workflow

Asking the vendor for evidence rather than answers is the discipline that makes the assessment hold up under review. For each scored question, request supporting evidence — DPA text, model card, sub-processor list, security audit summary, incident-response runbook excerpt. File the evidence alongside the score in the register. Re-check on an annual cadence, or sooner if the vendor announces a material change. For the regulator-side expectations on AI vendor handling, see the third-party AI provider expectations in NYDFS guidance.

The evidence pack is the artefact your internal auditor will ask for first, before they look at the scores. Score without evidence is opinion; score with evidence is assessment.

FAQ

Q1. What is shadow AI? Shadow AI is any AI tool an employee uses for work without your IT, security, or compliance team's knowledge — typically free consumer chatbots, browser extensions, or SaaS features powered by a large language model. Recent surveys suggest the majority of knowledge workers use one or more such tools.

Q2. Why is shadow AI a vendor-assessment problem? Because every shadow AI tool is, by definition, an unassessed vendor processing your data. The fix is not to ban it — most bans fail — but to convert the shadow list into an assessed vendor register and replace the highest-risk tools with sanctioned alternatives.

Q3. How do I discover shadow AI in my organisation? Three sources: an employee survey (with amnesty), browser/proxy logs filtered for known AI domains, and expense-report scans for AI subscriptions. Each method finds tools the others miss.

Q4. What does an AI vendor assessment cover? Security, model lineage, training data, IP, sub-processors, and incident response — weighted against ISO 42001 or NIST AI RMF. Responsible AI Studio (RAIS) generates a 30-question scored workbook plus an executive summary tailored to your jurisdiction.


Shadow AI is not solved by a ban memo. It is solved by surfacing the list, triaging what matters, and running an evidence-based assessment on the tools that warrant it. RAIS tools amplify your in-house expertise — the discovery workflow, the 30-question framework, the EU AI Act Article 25 classification — so HR, procurement, and compliance teams spend their time on judgement rather than scaffolding.

Run a scored AI vendor assessment → /tools/vendor-assessment

Qualified review still required. Outputs are AI-generated starting-point documents — not a substitute for qualified legal or compliance advice.