The woes of sanitizing SVGs

(muffin.ink)

96 points | by varun_ch 2 hours ago

17 comments

  • simonw 1 hour ago
    I'm glad this article includes the only credible fix for the HTTP leak problems: CSP.

    A useful thing I learned recently is that, while CSP headers are usually set using HTTP headers, you can also reliably set them directly in HTML - for example for HTML generated directly on a page where HTTP headers don't come into play:

      <iframe sandbox="allow-scripts" srcdoc="
        <meta http-equiv='Content-Security-Policy'
            content='default-src none; script-src unsafe-inline; style-src unsafe-inline;'>
        <!-- untrusted content here -->
      "></iframe>
    
    It feels like this shouldn't work, because JavaScript in the untrusted content could use the DOM to delete or alter that meta tag... but it turns out all modern browsers specifically lock that down, treating those CSP rules as permanent as soon as that meta tag has loaded before any malicious code has the chance to subvert them.

    I had Claude Code run some experiments to help demonstrate this a few weeks ago: https://github.com/simonw/research/tree/main/test-csp-iframe...

    • rafram 1 hour ago
      And any additional CSP directives can only narrow what's allowed. Also works with headers plus <meta> - <meta>s can restrict the CSP even more than what the headers specified, but they can't widen it.
    • recursive 1 hour ago
      I did not know about `srcdoc`, but it looks like that's still vulnerable to injection by using a double quote and </iframe> to escape the sandbox. If this is constructed in a hygienic way using DOM manipulation, it seems like it could work, but it definitely seems possible to screw up.
      • rafram 1 hour ago
        If you're constructing your unsandboxed parent document HTML using string concatenation, you might as well not use the sandboxed iframe at all. But presumably someone who bothers to sandbox untrusted content also knows about setAttribute(), or the srcdoc JS property.
      • simonw 46 minutes ago
        You can entity-encode the content in the srcdoc= attribute to robustly solve that problem, or populate it via the DOM.
      • tracker1 1 hour ago
        s/"/&quot;/g
    • tracker1 1 hour ago
      Nice, favorited... thinking this could be useful for an email reader to support css, but not scripts.
  • andybak 2 hours ago
    My first thought is "support a tiny subset of svg that probably still covers 90% of real-world use cases".

    I do feel that's there's two distinct types of svg - "bunch of paths with fills" and "clever dangerous stuff" where most real SVGs are of the former type.

    Fully expect this to be shot down by someone that's thought about this problem for longer than the 120 seconds I just spent. :)

    • afavour 2 hours ago
      I think you're right but the lack of industry standard for this kind of thing kills it. People want to be able to take the output of whatever tool they use that exports SVG and put it in a browser. Which isn't an unfair request. But you wouldn't have a guarantee it wouldn't filter out the tool using some obscure SVG functionality.

      I'd love to see an agreed standard like OpenGL vs OpenGL ES for SVG. SVG-ES. Everyone agrees on the static, non-scripted elements that should work.

      • varun_ch 1 hour ago
        The way linked SVGs render from within img tags is basically perfect for SVG images (which as I understand is not standardized but is largely the same across browsers). External resources and scripting are blocked while still rendering nearly all SVGs correctly. And of course, any CSS is scoped to the SVG.

        If someone formalizes this as a new format, please give it a new name! tvg tiny vector graphics? savg safe vector graphics?

        And keep the scope as simple as possible so it actually ships! Don’t try implementing a binary format or something.

        • sbrother 17 minutes ago
          Maybe I'm missing something as I am not a frontend developer, but when you embed SVGs in an img tag as part of a Phoenix LiveView or even just a static component, you no longer get the ability to dynamically change paths/fills/colors with events coming from the server. Even if it's as simple as having a shape that you want to fill with a brand/highlight color, which at least for me is a common use case.
        • hackeman300 1 hour ago
          Someone did this already and did call it tinyVG! https://tinyvg.tech/
        • ambicapter 1 hour ago
          .rvg, Restricted Vector Graphics?
    • RobotToaster 28 minutes ago
      You'd lose a lot of useful features, like SMIL animation.
      • andybak 23 minutes ago
        But you'd gain adoption. A fair trade.
    • hackeman300 1 hour ago
      Seems like someone already implemented your idea. https://tinyvg.tech/
    • varun_ch 2 hours ago
      I wonder if it would be best if this was at the browser level as some sort of new format. Otherwise surely it would be really slow/cumbersome to deal with these in ‘user space’
    • whycome 2 hours ago
      It always seems like any animated svg loses all of the animation after sanitizing
    • harperlee 2 hours ago
      Fwiw I just thought the same, parse (don’t validate) the bits you like and recreate / reject the input.
    • LorenPechtel 1 hour ago
      Yeah, I think that's the real answer.

      Look at what Microsoft did with Excel--the dangerous stuff is behind a switch.

      Thus, solution:

      Add two bits to the tag.

      SVG1 does not execute any sort of script.

      SVG2 does not follow links.

      SVG3 is actually SVG1 + SVG2 as these are bit flags, not numbers.

      Additional bits are reserved for future use if any other issues are found.

      The only real safety is in the engine, not by any sanitizer.

    • duped 2 hours ago
      So if you are building something where you control every SVG ever produced and rendered then this is totally reasonable.

      If you ever need to interface with other tools that generate SVG you now need to have a way of essentially transpiling SVG from the wild into your tamed SVGs. Oftentimes this is done by hand, by a software developer and designer (sometimes the same person).

      And this is for basic functionality that your designers expect and have trivial controls for in their vector editors, like "add a drop shadow."

      The article goes into some issues with sanitization itself, and except for <script> these are a bunch of reasonable things that someone might expect to work or not have issues with. Sandboxing rendering isn't an unreasonable approach if you're not writing the parser and renderer yourself.

  • philo23 1 hour ago
    It'd be nice if there was a sandbox attribute you could add to inline <svg> tags, like the <iframe sandbox> attribute that'd let you opt out of all the potentially "dynamic" stuff inside of an SVG like scripts and event handlers, or even just literally sandbox the entire thing from accessing the "parent" HTML page's context/cookies/etc just like an iframe.

    I'm sure it'd just open up a whole other can of worms though... not to mention having to wait for browsers to actually support it.

    The real solution here is definitely CSP + basic sanitisation though.

    • simonw 43 minutes ago
      Thankfully if you have CSP you don't need even basic sanitization, which is useful because most of the problems in this article are demonstrations of how simple sanitization isn't simple at all.
    • chocmake 1 hour ago
      Most of the aspects the author was critiquing are actually just regular CSS features, they simply don't want any external requests. Effectively they want inlined SVGs to be treated like how the browsers treat IMG-embedded SVGs (no scripting or external requests loaded).

      Sanitization-wise it's already possible to strip scripting from SVGs and anything else you want, it's just that a library like DOMPurify to avoid ballooning in size doesn't include say a preset to handle the extra parsing necessary to make them behave like browsers treat IMG embeds, so it's up to devs to add their own.

      But yeah, a world where a simple attribute to achieve the same effect as an IMG embed but for inlined SVGs would be nice.

  • evilpie 1 hour ago
    The HTML Sanitizer API has a subset of SVG that is allowed by the default configuration. It won't help you with sanitizing CSS at all however, style is simply not allowed by default.

    https://developer.mozilla.org/en-US/docs/Web/API/HTML_Saniti...

    https://developer.mozilla.org/en-US/docs/Web/API/HTML_Saniti...

  • spankalee 2 hours ago
    This is, by the way, why Google Slides doesn't have SVG support even though there's a nearly 15 year old ticket requesting the feature.
  • Theodores 2 minutes ago
    Maybe we need a dumbed down version 3 of SVG where the browser knows it is not to do anything that requires fetching a URL, to make the image as harmless as a JPG.

    This version 3 could have the version number changed to 2 in order to do cool SVG things, so full-fat SVG as version 2 is now. But you could just flip to 2 to a 3 on upload, so any embedded URLs are harmless.

    This could be useful for the creator too, as it is helpful to have layers of source images in bitmap format to work with, and you can easily export such things accidentally.

  • ikkun 1 hour ago
    I do wish tinyVG or similar would take off, but I don't see that ever actually happening. the only thing I think it's missing is animation support, which is pretty niche but not as niche as <script> tags.

    https://tinyvg.tech/

  • jancsika 1 hour ago
    For the "<script>" stuff: regardless of how the thing is spelled or otherwise obscured, the HTML5 parser eventually knows when it's gotten hold of a script tag. Oops, we got one in a NOSCRIPTTAG context. Let's poop out.

    Tag names, attributes, attribute values, event callback default-cancelers... so many ways to declare that this node and its children shouldn't parse/evaluate scripts.

    As Jay-Z said: "I've got 99 solutions, fixing a problem ain't one"

  • Springtime 2 hours ago
    It seems the reason they're inlined in the page at all is to measure things briefly like bounding boxes (not sure the full extent as it didn't cover that), before subsequent removal. I'm not familiar with Scratch and its use of user-submitted SVGs but I'd be curious to read more about what they're doing that required it be inlined specifically.

    (This isn't a comment on the challenges in proper sanitization fwiw, as I've needed to do various of the same things myself)

  • Devasta 12 minutes ago
    > In 2019, a few months after the initial release of Scratch 3, Scratch discovered that SVGs can contain <script> tags that Scratch would cause to be executed when the SVG loads. This is known as an XSS.

    > Example from Scratch's test suite:

      <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
        "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
      <svg version="1.1" xmlns="http://www.w3.org/2000/svg">
        <circle cx="250" cy="250" r="50" fill="red" />
        <script type="text/javascript"><![CDATA[
            alert('from the svg!')
        ]]></script>
      </svg>
    
    
    
    Is this really an issue? This is the method that the chrome teams polyfill to replace XSLT suggests you do. https://github.com/mfreed7/xslt_polyfill/tree/main#usage
  • kevinmgranger 1 hour ago
    > This was fixed by using a regular expression to remove script tags.

    The infamous you can't parse (X)HTML with regex¹ meme is from 2009, yet this fix was done in 2019. I guess the SO answer never mentioned SVG.

    1: https://stackoverflow.com/revisions/1732454/1

  • shaguoer 1 hour ago
    This is exactly the kind of content I come to HN for.
  • NooneAtAll3 33 minutes ago
    wait... scratch is just a browser?
  • etchalon 1 hour ago
    I don't understand why it wasn't immediately understood that SVG is as dangerous as HTML.

    It is not, and never was, an image format. It's a markup language.

  • esafak 2 hours ago
    Is there a browser-friendly vector alternative?
    • simonw 42 minutes ago
      SVG in an <img> tag can't execute scripts.
  • nengil 2 hours ago
    [dead]
  • SpyCoder77 1 hour ago
    I did not expect to see GarboMuffin.