Chapter 10: Testing Strategies for Plugin-Based Systems

The previous chapter closed with a point about testing as a security concern: you need to verify that permission boundaries hold, that a compromised or buggy plugin stays contained, and that changes to the host do not silently break the plugins depending on it. But security verification is only one slice of what testing a plugin system demands.

The deeper challenge is that a plugin architecture deliberately distributes responsibility. Plugin authors write code you do not control, against interfaces that evolve, in runtime environments that may differ from their development machines. The host application must remain correct as plugins are added, removed, and updated. Two plugins that each work in isolation may conflict when loaded together. A plugin that performs well on a developer laptop may degrade the host under real user load. None of these problems surface cleanly in a monolithic codebase where you control everything; in a plugin system, each of them requires deliberate test coverage.

The production systems studied each developed testing infrastructure that reflects their architecture. VS Code provides an Extension Development Host that mirrors the real editor without affecting it. Backstage's dependency injection container makes mocking services mechanical. NocoBase ships createTestApp and createTestPlugin utilities so that database-driven plugin tests can run against an in-memory SQLite instance. Babel tests transformations with AST snapshot comparisons. The pattern across all of them is the same: the testing infrastructure is purpose-built for the plugin model, not bolted on from a monolithic testing tradition.

10.1 Testing Architecture Overview

A comprehensive testing strategy for plugins operates at four levels, each catching failures the others cannot.

Unit tests cover a plugin's internal logic in isolation — the functions that calculate totals, format data, or implement business rules. They run without the host, without the SDK, without a browser. They are fast, deterministic, and written by the plugin author:

test('calculateTotal', () => {
  expect(calculateTotal([{ price: 10 }, { price: 20 }])).toBe(30);
});

Integration tests verify that the plugin interacts correctly with the SDK. They check that the plugin registers the routes it claims to register, that it subscribes to the events it should subscribe to, and that its handlers produce the right outputs. This is where a mock host proves its value — the plugin's setup and start functions run against a realistic SDK implementation, but entirely in memory, without a browser or a server.

End-to-end tests run the plugin inside a real host application and simulate real user interactions. Navigation, form submission, modal dialogs, cross-plugin event chains — these are the scenarios that reveal integration failures invisible to unit and integration tests. They are slower than the other layers and harder to maintain, so they are best reserved for critical paths rather than exhaustive coverage.

Performance tests measure load time, memory footprint, and CPU usage. A plugin that takes 800ms to initialise or leaks memory across repeated activation cycles degrades the host for every user. These measurements need baselines and thresholds — specific numbers to enforce, not vague adjectives like "acceptable". The Vite plugin testing pattern shows how build-time performance can be made measurable:

it('does not significantly slow build', async () => {
  const withoutPlugin = await measureBuildTime([]);
  const withPlugin = await measureBuildTime([myPlugin()]);
 
  const overhead = withPlugin - withoutPlugin;
  expect(overhead).toBeLessThan(500); // 500ms max overhead
});

The four layers are not a choice of one — they are a stack. Unit tests catch the most failures for the least cost; E2E tests catch the failures that only appear in a full system. A plugin that passes all four levels is much more likely to be reliable in production than one that only passes one.

10.2 Plugin-Specific Testing Challenges

Beyond the standard layered approach, plugin systems introduce testing problems that do not exist in monolithic applications.

Framework Abstraction Testing

A framework-agnostic plugin may run under React, Vue, Angular, or a web component adapter. The core logic should be identical across all of them, but the lifecycle integration differs. TinaCMS illustrates the problem: its usePlugin hook ties a plugin's lifetime to a React component's mount/unmount cycle. Testing that the plugin registers correctly when the component mounts and unregisters when it unmounts requires a React testing environment, not a plain Node.js test runner:

function usePlugin(plugin: Plugin) {
  React.useEffect(() => {
    cms.plugins.add(plugin);
    return () => cms.plugins.remove(plugin);
  }, [plugin]);
}

For plugins that support multiple frameworks, the integration tests should be parameterised across adapters. The same plugin behaviour — "registers a route on setup, removes it on teardown" — should be verified in each framework's testing environment. This is more test infrastructure to maintain, but it is the only reliable way to catch framework-specific lifecycle bugs before they reach production.

Sandbox Environment Testing

A plugin that runs correctly in the main thread may fail inside an iframe sandbox, a Web Worker, or VS Code's Extension Development Host. The APIs available in each environment differ, the communication model differs, and error propagation differs. Tests must simulate these environments accurately.

VS Code's Extension Development Host is the most complete example: it launches a real VS Code instance with the extension under development loaded, but separate from the developer's actual editor configuration. Tests can trigger extension commands, inspect editor state, and verify that activation events fire in the right order, all without risking the developer's working environment:

import * as vscode from 'vscode';
import { runTests } from '@vscode/test-electron';
 
const extensionDevelopmentPath = path.resolve(__dirname, '../../');
const extensionTestsPath = path.resolve(__dirname, './suite/index');
 
await runTests({
  extensionDevelopmentPath,
  extensionTestsPath,
  launchArgs: ['--disable-extensions'], // Isolate from other extensions
});

For iframe-based plugins, JSDOM does not replicate the postMessage boundary. Integration tests for iframe plugins need either a headless browser (Playwright in headless mode) or a purpose-built message channel stub that simulates the host's validation and routing logic.

Inter-Plugin Communication Testing

Event-driven communication between plugins is easy to break silently. A plugin that emits 'user:created' and a second plugin that listens to it are coupled through a string and a payload shape — both invisible to the type checker unless typed event schemas are in place, and neither verifiable without loading both plugins simultaneously.

Tests for event-driven interactions need to load the relevant plugins together and verify the chain end-to-end. NocoBase's approach is direct: spy on the receiving plugin's method and verify it is called when the emitting plugin takes the triggering action:

it('should react to user creation', async () => {
  const aclPlugin = app.getPlugin('acl');
  const userPlugin = app.getPlugin('users');
 
  const spy = jest.spyOn(aclPlugin, 'assignDefaultRole');
 
  await userPlugin.createUser({ name: 'test' });
 
  expect(spy).toHaveBeenCalledWith(expect.objectContaining({ name: 'test' }));
});

The event bus itself should also be tested for ordering guarantees: if three plugins all subscribe to 'form:submitted', do they receive the event in the declared order? Does a subscriber that throws prevent subsequent subscribers from running? These guarantees need to be documented and tested, because plugin authors will rely on them.

Permission System Testing

The security model described in chapter 9 only holds if the permission checks are actually tested. A permission check that is correct under normal conditions but silently bypassed in an edge case is not a security control — it is a false sense of security.

Integration tests for the permission system should verify both sides: that permitted operations succeed and that unpermitted operations are rejected with the correct error. Backstage's testing approach mocks the auth and permissions services, making it easy to test both the happy path and the denial path:

it('should deny access without permission', async () => {
  const mockPermissions = {
    authorize: jest.fn().mockResolvedValue([{ result: 'DENY' }]),
  };
 
  const response = await request(app).delete('/api/entities/123').expect(403);
 
  expect(response.body.error).toBe('Permission denied');
});

Testing the permission system also requires testing that permission declarations in the manifest are actually enforced — that a plugin which did not declare api:write cannot call sdk.services.apiClient.post. This is a host-level integration test, not a plugin-level test, but it is equally important.

Lifecycle Testing

Lifecycle ordering failures are among the hardest bugs to reproduce. A plugin that depends on another plugin's services may work correctly when both happen to load in the right order and fail when load timing varies. The registry's topological sort (chapter 5) should prevent this, but the sort itself needs testing.

Database state adds another dimension. NocoBase plugins that run migrations on installation need tests that verify the migration runs correctly on a clean database, runs idempotently when the database already has the schema, and rolls back cleanly on failure:

describe('plugin installation', () => {
  it('runs migrations on first install', async () => {
    const app = await createTestApp({ database: ':memory:' });
    await app.pm.add(MyPlugin, { enabled: true });
    await app.load();
 
    const collection = app.db.getCollection('myData');
    expect(collection).toBeDefined();
  });
 
  it('does not error on reinstall', async () => {
    // Run the migration twice — should be idempotent
    await app.pm.add(MyPlugin, { enabled: true });
    await app.load();
    await app.reload(); // Simulate restart with plugin already installed
 
    await expect(app.pm.get('my-plugin').isInstalled()).resolves.toBe(true);
  });
});

10.3 Test Environment Setup

The test infrastructure for a plugin system is more involved than for a typical application, because you need environments that simulate the host with varying degrees of fidelity.

Mock Host Application

The mock host is the workhorse of integration testing. It implements the full PluginSDK interface with in-memory fakes — no real network requests, no real database, no real browser. Plugin code calls sdk.routes.add and the mock records the call. Plugin code emits sdk.events.emit and the mock delivers the event to registered subscribers. Assertions check what the mock recorded:

const mockSDK: PluginSDK = {
  routes: { add: vi.fn(), remove: vi.fn() },
  events: { on: vi.fn(), off: vi.fn(), emit: vi.fn() },
  ui: { showModal: vi.fn(), showToast: vi.fn() },
  services: {
    apiClient: { get: vi.fn(), post: vi.fn() },
    auth: { getUser: vi.fn(), hasRole: vi.fn() },
    storage: { get: vi.fn(), set: vi.fn(), remove: vi.fn() },
  },
  plugin: { id: 'test-plugin', manifest: mockManifest },
};

NocoBase's createTestApp is a good model for what this looks like with more fidelity — a real application instance with in-memory infrastructure:

import { createTestApp, createTestPlugin } from '@nocobase/test';
 
beforeEach(async () => {
  app = await createTestApp({
    database: ':memory:',
    silent: true,
    cleanAfterEach: true,
  });
  plugin = await createTestPlugin(MyPlugin, { enabled: true });
  await app.pm.add(plugin);
  await app.load();
});

The database: ':memory:' option is worth highlighting. SQLite in-memory databases start clean for every test, are fast to create and destroy, and avoid the state pollution that is the most common source of flaky integration tests.

Plugin Test Harness

For plugins that render UI, the test harness needs a minimal DOM environment. JSDOM (used by Vitest and Jest by default) provides enough for testing component registration, event handling, and basic DOM operations. Framework-specific adapters bring their own testing utilities on top:

// Backstage service mocking through DI
const mockServices = {
  logger: { info: jest.fn(), error: jest.fn() },
  database: createMockDatabase(),
  config: createMockConfig({
    'catalog.rules': [{ allow: ['Component'] }],
  }),
};
 
const plugin = createPlugin({
  id: 'test-plugin',
  register: (env) => {
    env.registerInit({
      deps: mockServices,
      async init(deps) {
        // Plugin code runs against mocked services
      },
    });
  },
});

Backstage's DI model is particularly well-suited to testing: swapping a real database service for a mock database is the same operation as declaring any other dependency, and the TypeScript types ensure the mock implements the full service interface.

Automated Testing Pipeline

Plugins should be tested in CI against a matrix of host versions, Node.js versions, and database configurations. A failure in any cell of the matrix indicates either a compatibility problem or a test environment assumption that needs to be made explicit:

strategy:
  matrix:
    node-version: [18, 20, 22]
    plugin-host: [development, production]
    database: [sqlite, postgresql]
 
steps:
  - name: Test Plugin
    run: |
      npm run test:unit
      npm run test:integration
      npm run test:e2e

Compatibility testing against multiple host versions catches the class of breakage where a new host release changes behaviour that a plugin was inadvertently depending on:

describe('Plugin Compatibility', () => {
  const hostVersions = ['1.0.0', '1.1.0', '2.0.0'];
 
  hostVersions.forEach((version) => {
    it(`works with host v${version}`, async () => {
      const host = await createHost({ version });
      const plugin = await loadPlugin('./my-plugin');
 
      await expect(host.loadPlugin(plugin)).resolves.not.toThrow();
    });
  });
});

The host itself should have integration tests for all approved plugins that run on every host change, not just on plugin changes. This catches the case where a host change breaks an existing plugin even though the plugin's own test suite passes.

10.4 Testing Tools and Frameworks

Vitest for Unit and Integration Tests

Vitest is the natural choice for TypeScript plugin projects. Its compatibility with Vite's module resolution makes it straightforward to test both the plugin's runtime behaviour and its build-time transformations:

import { build } from 'vite';
import myPlugin from './my-plugin';
 
it('should transform code correctly', async () => {
  const result = await build({
    plugins: [myPlugin()],
    build: { write: false },
    rollupOptions: { input: 'test-input.js' },
  });
 
  expect(result.output[0].code).toContain('expected-output');
});

For Babel-style transformation plugins, AST snapshot testing is the appropriate pattern. The test transforms a representative input and compares the output code against a stored snapshot. When a transformation changes intentionally, the snapshot is updated deliberately — when it changes accidentally, the snapshot diff catches the regression:

import { transformSync } from '@babel/core';
import myPlugin from './babel-plugin-my-transform';
 
it('transforms arrow functions', () => {
  const result = transformSync('const fn = () => {};', {
    plugins: [myPlugin],
    filename: 'test.js',
  });
 
  expect(result.code).toBe('const fn = function() {};');
});

Testing Library for UI Plugins

For plugins that render UI components, Testing Library provides the right abstraction: test what the user sees and interacts with, not implementation details. The same approach works across React, Vue, and Angular via the respective Testing Library adapters:

import { render, screen } from '@testing-library/react';
import { TinaCMS, usePlugin } from 'tinacms';
 
function TestComponent() {
  const cms = new TinaCMS();
  usePlugin({
    __type: 'screen',
    name: 'Test Screen',
    Component: () => <div>Plugin UI</div>
  });
 
  return <div>Test</div>;
}
 
it('should register screen plugin', () => {
  render(<TestComponent />);
  expect(screen.getByText('Plugin UI')).toBeInTheDocument();
});

Testing plugin UI in isolation — with a mock SDK rather than a real host — keeps these tests fast and focused. Tests that require a real host to verify a rendering detail are better placed in the E2E suite.

Playwright for End-to-End Testing

Playwright drives a real browser against a running host application. For browser-based plugins, this is the only reliable way to test across the full stack — real network requests, real CSS rendering, real browser APIs:

import { test, expect } from '@playwright/test';
 
test('extension activates on command', async ({ page }) => {
  await page.goto('vscode://file/test-workspace');
 
  await page.keyboard.press('Control+Shift+P');
  await page.fill('[placeholder="Type a command"]', 'My Extension: Test Command');
  await page.press('[placeholder="Type a command"]', 'Enter');
 
  await expect(page.locator('.notification')).toContainText('Extension activated');
});

Playwright's multi-browser support — Chromium, Firefox, WebKit — is valuable for plugins that use browser APIs with varying support levels. Shadow DOM, Web Workers, and postMessage all have subtle cross-browser differences that only surface in a real browser.

Shared Testing Utilities for Plugin Authors

Publishing a testing utilities package alongside the SDK is an act of good citizenship. Plugin authors should not have to reconstruct mock SDK implementations from scratch. Backstage does this through @backstage/backend-test-utils, which provides mockServices pre-built for every core service:

import { createServiceRef, mockServices } from '@backstage/backend-test-utils';
 
const { startTestBackend } = createServiceRef({
  id: 'test.backend',
  deps: {
    config: mockServices.rootConfig.factory({
      data: {
        backend: { database: { client: 'better-sqlite3' } },
      },
    }),
  },
});

The testing utilities package should include a createMockSDK factory, a typed event bus spy that records emissions and subscriptions for assertion, a permission mocker that can simulate granted and denied permissions, and lifecycle helpers that call setup, start, and stop in the correct order.

10.5 Quality Assurance Processes

Testing by plugin authors covers the plugin's own behaviour. The platform's QA processes cover the broader questions: does this plugin meet the platform's standards, does it interact safely with other plugins, and is it honest about its resource consumption?

Code Review

Plugin submissions should receive review on three dimensions. Security review checks for dangerous patterns, permission declarations that exceed what the plugin actually uses, and CSP compliance. Automated scanning handles the pattern matching; human review handles the intent. Performance review checks activation time, memory usage at steady state, and CPU consumption under the expected workload. API usage review verifies that the plugin calls SDK methods correctly, disposes its resources during stop, and handles error cases rather than letting exceptions propagate to the host.

Automated static analysis can catch the most obvious security problems before human review:

const DANGEROUS_PATTERNS = [
  /eval\(/,
  /Function\(/,
  /\.innerHTML\s*=/,
  /document\.write\(/,
  /window\[.*\]/,
];
 
function validateWebviewCSP(csp: string): boolean {
  const requiredPolicies = [
    "default-src 'none'",
    "script-src 'nonce-{nonce}'",
    "style-src 'unsafe-inline'",
  ];
 
  return requiredPolicies.every((policy) => csp.includes(policy));
}

Plugin Certification

Before a plugin is distributed — whether through a marketplace, a private registry, or an enterprise deployment — it should pass through a certification gate. The gate has an automated portion and a human portion. The automated portion runs the full test matrix: unit tests, integration tests, E2E tests, security scan, performance benchmarks. Only when all automated checks pass does the submission reach human review. NocoBase makes this explicit with a built-in validation command:

nocobase validate plugin ./my-plugin
# Checks: package.json structure, required lifecycle methods,
# database collection schemas, permission declarations, API endpoints

Human review focuses on what automation cannot catch: whether the plugin's stated purpose matches its actual behaviour, whether the permission declarations are appropriate rather than excessive, and whether the user experience is consistent with the platform's standards.

Performance Benchmarking

Every plugin should have measurable performance thresholds that are enforced in CI. The thresholds need to be specific because vague requirements are not enforceable:

describe('Plugin Performance', () => {
  it('loads within acceptable time', async () => {
    const start = performance.now();
    await loadPlugin('./my-plugin');
    const loadTime = performance.now() - start;
 
    expect(loadTime).toBeLessThan(100); // 100ms max load time
  });
 
  it('uses acceptable memory', async () => {
    const beforeMemory = process.memoryUsage().heapUsed;
 
    const plugin = await loadPlugin('./my-plugin');
    await plugin.initialize();
 
    const afterMemory = process.memoryUsage().heapUsed;
    expect(afterMemory - beforeMemory).toBeLessThan(10 * 1024 * 1024); // 10MB max
  });
});

These thresholds should reflect the realistic runtime environment. A 100ms load time is reasonable for a feature plugin but generous for a simple analytics tracker. Thresholds calibrated to the specific plugin category catch genuine outliers without generating noise.

Compatibility Matrix and Plugin Interaction Testing

The compatibility matrix test (host version × Node.js version × database) catches version-specific failures early. But there is a second dimension to compatibility that the matrix does not cover: does this plugin work correctly alongside other plugins?

describe('Plugin Interactions', () => {
  it('does not conflict with popular plugins', async () => {
    const popularPlugins = ['auth-plugin', 'ui-theme', 'analytics'];
    const app = await createTestApp();
 
    for (const plugin of popularPlugins) {
      await app.loadPlugin(plugin);
    }
 
    await expect(app.loadPlugin('./my-plugin')).resolves.not.toThrow();
 
    for (const plugin of popularPlugins) {
      expect(app.getPlugin(plugin).isActive()).toBe(true);
    }
  });
});

A plugin that works in isolation but breaks the authentication plugin when loaded alongside it is a serious problem that only appears in this kind of combination test. The set of plugins to include in combination tests should reflect what is commonly co-installed in real deployments.

Conclusion

Testing a plugin system requires the same layered thinking as building one. Unit tests are fast and cheap; they cover internal logic but see nothing of the integration boundary. Integration tests with a mock host verify that the plugin interacts with the SDK correctly, without requiring a full running application. End-to-end tests verify the full stack, catching problems that only appear when all the pieces are assembled. Performance and compatibility tests ensure that a plugin that works correctly today still works correctly as the ecosystem evolves around it.

The investment in purpose-built testing infrastructure — mock hosts, event bus spies, NocoBase-style createTestApp utilities, VS Code-style Extension Development Hosts — pays back every time a plugin author finds a bug in their integration test suite rather than a support ticket from a user. The harder you make it to ship a broken plugin, the more trustworthy the ecosystem becomes for everyone who installs from it.

In the next chapter, we apply everything covered so far to a concrete domain: designing the plugin architecture for an e-commerce platform, where payment gateways, shipping calculators, and tax strategies each represent a real customisation point with real business consequences.