Skip to content

Conversation

@papandreou
Copy link
Contributor

There are some "precious" space characters matched by /\s/ that should never be collapsed or trimmed away.

See assetgraph/assetgraph#778

There are some "precious" space characters matched by /\s/ that should
never be collapsed or trimmed away.

See assetgraph/assetgraph#778
@papandreou
Copy link
Contributor Author

According to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#special-white-space \s is equivalent to [ \f\n\r\t\v\u00a0\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].

Maybe the [ \n\r\t] character class I suggest here should be tweaked further to include more characters.

The White Space Processing Rules says:

Control characters (Unicode category Cc) other than tab (U+0009), line feed (U+000A), form feed (U+000C), and carriage return (U+000D) must be rendered as a visible glyph and otherwise treated as any other character of the Other Symbols (So) general category and Common script. The UA may use a glyph provided by a font specifically for the control character, substitute the glyphs provided for the corresponding symbol in the Control Pictures block, generate a visual representation of its codepoint value, or use some other method to provide an appropriate visible glyph. As required by [UNICODE], unsupported Default_ignorable characters must be ignored for rendering.

... so at least form feed (\f) should also be added.

@alexlamsl
Copy link
Collaborator

alexlamsl commented Sep 11, 2017

I prefer only changing the /\s/ that is necessary to pass your new unit tests rather than a blanket replacement.

For instance, https://github.com/kangax/html-minifier/pull/849/files#diff-bd35077e6c438d3f6866b57bb2481260L33 isn't necessary.

(I have tested on a few web browsers and agree that the issue illustrated by your test cases need to be fixed.)

@papandreou
Copy link
Contributor Author

I prefer only changing the /\s/ that is necessary to pass your new unit tests rather than a blanket replacement.
For instance, https://github.com/kangax/html-minifier/pull/849/files#diff-bd35077e6c438d3f6866b57bb2481260L33 isn't necessary.

Just to be sure that I understand your POV correctly -- isn't that actually an argument for adding test cases that validate that and the other replacement?

@alexlamsl
Copy link
Collaborator

@papandreou I see - if you can have the new test cases cover all the replacements of \s to [ \r\n\t\f], I'll be very happy and convinced 😉

@papandreou papandreou force-pushed the feature/preserveOddballWhitespace branch 2 times, most recently from dab9ad2 to 1d600cc Compare September 11, 2017 20:55
@papandreou
Copy link
Contributor Author

Okay, I added some more tests for the cases I could find, then rolled back to \s in the remaining cases.

However, I'm fairly sure that some of those tests can be written, so I probably wouldn't include 1d600cc -- those other kinds of spaces should basically be treated as regular glyphs, so I think /[ \n\r\t\f]/ is a safer default.

@alexlamsl
Copy link
Collaborator

I prefer changing them only if/when we hit an issue, and would therefore has test coverage for it.

The new tests & rollbacks LGTM - if there isn't anything else I shall go ahead & merge.


// Preserve hair space in attributes:
input = '<p class="foo\u200abar"></p>';
assert.equal(minify(input, { collapseWhitespace: true, preserveLineBreaks: true }), input);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preserveLineBreaks doesn't seem to be required here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, fixed + rebased.

…ttributes

Unfortunately this means we'll have to abandon String#trim as it
seems to be implemented as str.replace(/^\s+|\s+$/g, '')

Also, test preservation of oddball whitespace in class names when deduplicating and reordering
@papandreou
Copy link
Contributor Author

if there isn't anything else I shall go ahead & merge

Nothing further for now. I might pick it up again when I get some time :)

@alexlamsl
Copy link
Collaborator

@papandreou I shall merge after your rebase is pushed out then (GitHub having a slow day?)

Thanks a lot for the patch!

@papandreou papandreou force-pushed the feature/preserveOddballWhitespace branch from 1d600cc to c3d239d Compare September 11, 2017 21:30
@papandreou
Copy link
Contributor Author

@alexlamsl Hah, I don't know where I tried to push to before, but here it is :)

@alexlamsl alexlamsl merged commit 7683fbc into kangax:gh-pages Sep 11, 2017
@papandreou
Copy link
Contributor Author

@alexlamsl Thanks! Can I persuade you to do a new release with this fix?

@alexlamsl
Copy link
Collaborator

That's the plan - I went straight to bed after merging your PR 😅

Give me a few hours while I sort out my $day_job, then I'll make 3.5.5 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants