src/core/htmlImport converts an HTML string into a flat fragment of first-class PageNodes that callers splice directly into the live page tree.
The module has two consumers: the paste-HTML UI and the AI agent's insertHtml / replaceNodeHtml tools. Both call the same importHtml(source) entry point — no duplicated mapping logic.
- Single entry point:
importHtml(source)→{ nodes, rootIds, body?, stripped, styleCss }. - Pipeline:
parseHtml→harvestInlineStyles+collectStyleCss→stripUnsafe→walkAndMap. - Mapping is rule-driven (
HTML_TO_MODULE_RULES). The catch-all*rule guarantees every element produces a node — nothing falls through. - Every produced node is a real
PageNode: selectable, draggable, deletable, and re-styleable in the canvas. - HTML class names ride onto
node.classIdsduring the pure walk;insertImportedNodesthen links each name to a real registry class id (reusing a same-named class, binding to a parsed<style>rule of that name, or auto-creating a bare one) so the class renders and is editable. - CSS is preserved. Inline
style="…"lands onnode.inlineStyles(the editor's per-node style layer);<style>blocks are returned as rawstyleCss, which consumers parse viacssToStyleRulesinto registry rules shown in the Selectors panel. Only the security-denied property names are dropped. - UX entry points: Spotlight Import HTML command and right-click Paste HTML here… on any container node.
src/core/htmlImport/
├── index.ts — public barrel; all exports below go through here
├── parseHtml.ts — DOMParser.parseFromString wrapper (browser-only; tests polyfill via happy-dom)
├── stripUnsafe.ts — removes <script>, on* attrs (counted), <style>+style= (harvested first); collectStyleCss returns the <style> CSS
├── inlineStyle.ts — harvests the full inline style="" bag (security-gated) before it is stripped
├── rules.ts — HTML_TO_MODULE_RULES declarative mapping table
└── walkAndMap.ts — DOM walker + importHtml() entry point
src/admin/modals/ImportHtml/
├── ImportHtmlModal.tsx — modal: CodeMirror HTML editor, DOM-tree preview, error alert, footer buttons
├── ImportHtmlModal.module.css
└── index.ts — barrel re-export
src/admin/spotlight/commands/importHtml.ts — Spotlight command editor.importHtml
src/__tests__/htmlImport/mapping.test.ts — per-rule unit tests
importHtml(source) runs these steps in sequence:
importHtml(source: string)
1. parseHtml(source) — new DOMParser().parseFromString(source, 'text/html')
Returns a DOM Document. Uses the global DOMParser;
no server-side DOM library is imported.
2. harvestInlineStyles(doc) — captures each element's full inline style="" bag
(camelCase, security-gated) BEFORE step 4 removes
the style attribute. Keyed by Element.
collectStyleCss(doc) — concatenates every <style> block's CSS BEFORE
step 4 removes the <style> elements.
3. stripUnsafe(doc) — mutates doc in place; returns StripReport
4. walkAndMap(doc, inlineStyles)— maps doc.body element children to PageNodes,
attaching each harvested inline bag to its
node's `inlineStyles`. Returns { nodes, rootIds }
→ { nodes, rootIds, body?, stripped, styleCss } (ImportResult)
interface ImportResult {
/** All produced nodes, keyed by id. A node may carry `inlineStyles` (the
* per-node `style=""` layer) populated from its inline style attribute. */
nodes: Record<string, PageNode>
/** IDs of the top-level nodes (direct children of doc.body), in document order. */
rootIds: string[]
/** Source <body> classes, safe HTML attribute props, and inline styles, kept off rootIds. */
body?: {
classIds?: string[]
props?: Record<string, unknown>
inlineStyles?: Record<string, string>
}
/** Counts of constructs removed by stripUnsafe (scripts, inline handlers). */
stripped: StripReport
/** Raw concatenated CSS from <style> blocks. Empty when the source had none. */
styleCss: string
}Callers splice the fragment into the page tree via insertImportedNodes(parentId, fragment, opts?) in the editor store — one mutateActiveTreeAndSite call, one undo step. Any node.inlineStyles set during the walk rides along on the node verbatim (it is a first-class node field), so the publisher emits it as a style="…" attribute and the editor's inline-style layer (and BackgroundImageControl) shows it. Whole-site import applies fragment.body to the page's base.body root so body-level classes, safe HTML attributes, and inline styles preserve template-wide styling hooks without creating a fake child node.
<style> blocks → Selectors panel. importHtml does NOT parse CSS itself (that would couple @core/htmlImport to @core/siteImport and lose the site's breakpoint context). Instead it returns the raw styleCss; each consumer parses it with cssToStyleRules(styleCss, { breakpoints }) and passes the resulting { styleRules, conditions } to insertImportedNodes(parentId, fragment, { styleRules, conditions }). Class rules whose name matches a node's class= token bind to that node (the merge runs before class-name linking); ambient rules (body, a:hover, …) register globally. All appear in the Selectors panel. The whole-site Super Import path folds each page's <style> CSS in as a synthetic per-page source (<htmlPath>::inline) so it scopes, resolves url(…) assets, and detects conflicts exactly like a linked stylesheet.
HTML_TO_MODULE_RULES in src/core/htmlImport/rules.ts is a declarative array of ImportRule objects. The walker tests each element against the rules in order; the first match wins. The last rule is always *, so every element is guaranteed to match.
| Selector | Module | Props set | Recurse |
|---|---|---|---|
instatic-outlet |
base.outlet |
none (the CMS content outlet) | No |
instatic-loop |
base.loop |
sourceId, filters.tableId, orderBy, direction, limit, offset, pagination, pageSize, optional tag / customTag from data-* attrs |
Yes |
h1–h6, p, span, small, strong, em |
base.text |
text = el.textContent, tag = tag name |
No |
a with class btn |
base.button |
label = el.textContent, href, target |
No |
a (no btn class) |
base.link |
text = el.textContent, href, target |
No |
img |
base.image |
src = src attribute only |
No |
form |
base.form |
mode, formId, CMS data attrs, custom action / method |
Yes |
label |
base.label unless wrapping elements, then base.container |
text, targetMode, targetId |
No for plain labels; yes for wrapper labels |
input |
base.input, base.checkbox, base.radio, base.submit, or base.button |
Native form attrs modeled by the target module | No |
textarea |
base.textarea |
fieldId, name, id, placeholder, value, validation attrs |
No |
select |
base.select |
fieldId, name, id, required, disabled, multiple |
Yes |
optgroup |
base.option-group |
label, disabled |
Yes |
option |
base.option |
value, label, selected, disabled |
No |
button |
base.button, or base.submit when submit-type / inside a form without a type |
label, disabled |
No |
ul, ol |
base.container |
tag = tag name |
Yes |
div, section, article, main, header, footer, nav, aside |
base.container |
tag = tag name |
Yes |
area, base, br, col, embed, hr, link, meta, param, source, track, wbr (void elements) |
base.container |
tag: 'custom', customTag = tag name |
No |
* (catch-all) |
base.container |
tag: 'custom', customTag = tag name |
Yes |
Key details:
<instatic-outlet>→base.outlet. The custom element marks where matched content flows in a CMS template. It maps to a childlessbase.outletnode (any inner markup is ignored — the composer fills it). This rule lets the AI agent and hand-authored template HTML place the single content outlet inline via the normal import path. See templates.md and agent.md.<instatic-loop>→base.loop. The custom element lets the AI agent and hand-authored snippets create a real Loop through the same HTML import path. Children recurse normally and become loop variants. Loop configuration is read from attributes:data-source-id,data-table-id(stored asfilters.tableId),data-order-by,data-direction,data-limit,data-offset,data-pagination,data-page-size, and optionaldata-tag/data-custom-tag. See loops.md and agent.md.base.textusestag(not a separatelevelor heading prop) — the tag name is passed through directly. Imported bare DOM text usestag: 'none', which publishes text without an element wrapper.- Direct text inside a recursing container is preserved. The walker iterates
childNodes(not justchildren): element children route through the rules, and each significant text node becomes a synthesizedbase.textchild withtag: 'none'in document order. That no-wrapper text mode publishes back to bare text, so<div class="num">98%</div>and<li>Buy milk</li>import as containers holding their original text without adding selector-visible wrapper elements. Whitespace-only text (indentation between tags) is skipped; internal whitespace runs collapse to single spaces, and boundary spaces are kept when the text run sits between element siblings. <body>metadata is preserved separately. Classes, safe HTML attributes (id, ARIA,data-*, etc.), and harvested inline styles on<body>are returned asfragment.bodyrather than inserted intorootIds. Full-site import applies them tobase.body; paste-style HTML import can ignore them without changing the fragment structure.base.linkuses the proptext(notlabel).base.buttonuseslabel(nottext). These match the module source.base.imagecapturessrconly.altis not a per-instance prop — it comes from the media library asset.- Form elements import as form primitives. Third-party
<form>elements default tobase.formincustommode, so they do not become CMS submission endpoints until an author binds them to a data table. Published CMS-native forms can round-trip theirdata-instatic-*form metadata. Plain labels becomebase.label; labels that wrap controls become abase.containerwithcustomTag:'label'so nested inputs are not dropped. - Void elements (
<br>,<hr>, etc.) have their own rule that sits before the catch-all. They map tobase.containerwithtag:'custom'+ the real tag name, but withrecurse:falseso the produced node has no children.<input>is not part of this fallback anymore; it imports through the form-control rule. The canvas renderer (ContainerEditor) also guards against passing children (including the empty-container placeholder) to void element tags, because React throws if you do so. - The catch-all (
*) handlesli,figure,blockquote,table,dialog, and anything else not listed. It usestag: 'custom'+customTagsoresolveHtmlTaginbase.containeremits the real element name. Usingtag: 'div'+customTagwould render<div>instead. - The pure
walkAndMapstep copies element class names ontonode.classIds(Array.from(el.classList)) — it is registry-agnostic and infers no styles. The store action that splices the fragment in (insertImportedNodes) then converts those names to real class ids (see Class linking).
stripUnsafe (src/core/htmlImport/stripUnsafe.ts) mutates the parsed document before the walker runs. CSS is harvested first (see the pipeline above), so <style> and style="…" are removed from the DOM but not dropped from the import:
| Construct | Treatment |
|---|---|
<script> elements |
Stripped — counted as stripped.scripts |
Inline on* attributes (onclick, onload, …) |
Stripped — counted as stripped.inlineHandlers |
<style> elements |
CSS harvested into result.styleCss (then parsed into registry rules); the element is removed |
style="…" attributes |
Declarations harvested onto node.inlineStyles; the attribute is removed |
| HTML comments and processing instructions | Stripped silently — no count |
The AI agent should not use stripped constructs for behavior. If an edit needs JavaScript, it writes a real runtime script with write_code_asset({ type: "script", ... }) and verifies targeting with inspect_code_runtime instead of embedding <script> or onclick in an HTML import.
After insert, ImportHtmlModal builds a toast body from the added-selector count plus the non-zero stripped counts, e.g. "3 CSS selectors, stripped 2 <script>". If nothing notable happened, the toast shows only the node count.
Inline style="…" → node.inlineStyles. Before stripUnsafe removes a style attribute, harvestInlineStyles (inlineStyle.ts) reads the element's parsed CSSOM declaration and copies every declaration into a camelCase bag, dropping only property names rejected by isEmittableProperty (the publisher's security denylist — the same gate cssToStyleRules uses). A url(…) background is canonicalised to url('payload') form so the Super Import asset rewriter and the editor's BackgroundImageControl recognise it. The bag is attached to the produced node as node.inlineStyles — the editor's first-class per-node style="" layer — which the publisher emits verbatim and the user edits via the Properties panel's inline-style mode. In Super Import any url(…) is uploaded to the media library and rewritten to its media URL.
The importer is "approximate by construction". Several inputs do not survive the round-trip:
| Input | What happens | Why |
|---|---|---|
alt="" on <img> |
Dropped | base.image has no alt prop — alt text is stored on the media library asset |
Safe HTML attributes not modeled by the matched module (id, ARIA attrs, role, custom attrs, data-*, etc.) |
Preserved in props.htmlAttributes on base container/text/link/button/image nodes and editable in the Properties panel Attributes view. class names become registry classes, inline style declarations become node.inlineStyles, event handlers are stripped, reserved editor/runtime data-* names are not imported, and attributes already owned by the module (for example href on links and src on images) stay in their first-class module props. |
The module schema owns modeled props; htmlAttributes is the safe escape hatch for extra authored attributes |
Exact inline whitespace around mixed content (<div>Hello <em>world</em></div>) |
Approximated | Each text run becomes a base.text child with tag: 'none' and whitespace collapsed to single spaces. True parent-edge indentation is trimmed, but a single boundary space is preserved around element siblings so Hello <em>world</em> does not become Helloworld. The text itself is preserved and publishes without an extra wrapper. |
| Whitespace-only text (newlines/indentation between tags) | Dropped | It carries no content — collapsing it would add empty text nodes to every pretty-printed snippet |
Void elements (<br>, <hr>, etc.) |
Imported as a childless base.container node with tag:'custom' and the real tag name as customTag. No children, no empty-container placeholder. <input> imports as a form primitive instead. |
React throws if children are rendered inside void element tags; the dedicated void-element rule (before the catch-all) sets recurse:false and the canvas renderer skips children entirely for void tags. |
These losses are deliberate. The importer is a structural bootstrap, not a fidelity snapshot.
Three entry points all open the same ImportHtmlModal:
- Spotlight palette — type "Import HTML" (
editor.importHtmlcommand,codeicon). Opens with no parent pre-set (defaults to the page root) and an empty editor. - DOM panel context menu — right-click any container node → Paste HTML here…. The clipboard is read and pre-fills the editor;
parentIdis pre-set to the right-clicked node. - Canvas context menu — same as DOM panel, via
CanvasRoot.handlePasteHtml.
The modal is a two-column dialog — HTML editor on the left (wider), tree preview on the right (narrower):
- CodeMirror HTML editor (left column) — paste or type HTML. Error alerts appear inline in the column header when insertion fails.
- Tree preview (right column) — 200 ms debounced DOM-style tree view using the same row components as the Layers panel. Updates as the user types.
- Insertion target — taken from the opener's
parentId; Spotlight defaults to the page root. There is no parent picker in the modal. - Insert button (footer) — runs
importHtml, callsinsertImportedNodes, shows a success toast (with optional stripped-count detail), closes the modal.
After insert, every produced node is a normal canvas node. It can be selected, moved, re-styled, and deleted like any other node.
ImportHtmlModal is lazy-mounted in AdminCanvasEditorBody.tsx gated by
importHtmlModalOpen. While the modal chunk downloads, the body renders the
same dialog chrome with a two-column loading skeleton so opening the command
has immediate feedback:
{importHtmlModalOpen && (
<Suspense fallback={<ImportHtmlModalLoading />}>
<ImportHtmlModal />
</Suspense>
)}The engine's style rule registry (site.styleRules) is keyed by a generated id, and every renderer resolves a node's classes by id (classNamesForClassIds → styleRules[classId].name). HTML, however, carries class names. The two layers reconcile in insertImportedNodes (src/admin/pages/site/store/slices/site/nodeActions.ts):
- The pure
walkAndMapstep writes raw names ontonode.classIds(it has noSiteDocument, so it cannot mint ids). - As the fragment is spliced into the live tree,
insertImportedNodeswalks every fragment node'sclassIdsand, for each name:- links to an existing class of that name if one exists (so
class="hero"reuses yourheroclass), or - auto-creates a bare (style-less) class for that name.
- links to an existing class of that name if one exists (so
- The node's
classIdsare rewritten to the resolved ids in the samemutateActiveTreeAndSitetransaction (one undo step).
The result: imported markup renders its class attribute, the classes show up in the Selectors panel, and they are immediately styleable — by the user in the editor, or by the AI agent emitting a <style> block in the insertHtml payload (whose .foo {} rules pre-create the named classes with styles, so the link in step 2 finds them) or applyCss after the fact.
Skipping this linking step is the bug that made HTML-authored styles silently never apply: names on
classIdsnever matched the id-keyed registry, so the renderer dropped them. Regression-gated bysrc/__tests__/agent/executor.test.ts.
The importer preserves CSS across two layers, both gated by isEmittableProperty:
- Inline
style="…"→ the node'sinlineStylesbag (per-nodestyle=""layer). Full declarations, not just backgrounds. <style>blocks → parsed by the consumer (cssToStyleRules) into registryStyleRules shown in the Selectors panel. A.foo {}rule binds to nodes carryingclass="foo"; ambient selectors (body,a:hover,.a .b) register globally. First-wins on name/selector collisions with existing rules.- Class names without a matching
<style>rule still survive —insertImportedNodesauto-creates a bare (style-less) class for the name (see Class linking), styleable afterwards in the editor or by the agent.
importHtml itself stays CSS-agnostic: it returns raw styleCss and the consumer (which has the site's breakpoints and may import @core/siteImport) does the parsing. This avoids an htmlImport → siteImport import cycle and lets @media fold into the site's real breakpoints.
| Pattern | Use instead |
|---|---|
Calling walkAndMap before stripUnsafe |
Call importHtml(source) — it runs both in the correct order |
Importing parseHtml or walkAndMap from inside src/core/ via a deep path |
Import through the barrel: import { importHtml } from '@core/htmlImport' |
Adding a server-side DOM import to parseHtml.ts |
If server-side parsing is needed, add a guarded dynamic import at the call site — parseHtml.ts must stay importable in the browser bundle without bundling a DOM library |
Storing alt text on base.image nodes produced by the importer |
base.image has no alt prop; alt lives on the media library asset |
- docs/features/modules.md — module definitions,
base.text,base.button,base.image,base.container - docs/features/agent.md — AI agent feature;
insertHtmlandreplaceNodeHtmlrun through this pipeline - docs/reference/page-tree.md —
NodeTree<PageNode>,createNode,ImportFragmentshape - Source-of-truth files:
src/core/htmlImport/index.ts— public barrel + API documentationsrc/core/htmlImport/rules.ts—HTML_TO_MODULE_RULESmapping tablesrc/core/htmlImport/walkAndMap.ts—importHtml(),walkAndMap(),ImportResult,ImportFragmentsrc/core/htmlImport/stripUnsafe.ts—stripUnsafe(),StripReportsrc/admin/modals/ImportHtml/ImportHtmlModal.tsx— paste-HTML modalsrc/admin/spotlight/commands/importHtml.ts— Spotlight command
- Gate tests:
src/__tests__/htmlImport/mapping.test.ts— 95 per-rule mapping tests