Skip to content

proposal: spec: improvements to raw strings #32590

Open
@deanveloper

Description

@deanveloper

Background

This proposal was branched off of #32190, which was a proposed HEREDOC syntax for Go. It was concluded that HEREDOC was not the correct syntax for Go to use, however the proposal did point out a large problem that Go currently has:

Problem

There is only one option to use raw strings, which is the backtick. The nature of how raw strings works means that raw strings themselves cannot contain backticks, meaning that the current workaround for including a backtick in a raw string is:

var str := `My backtick is `+"`"+` hard to use`

Raw strings are often used for storing large strings, such as strings containing other languages, or Go code itself. In many languages, the backtick has significant meaning. For instance:

  1. SQL uses backticks to signify a string that represents an identifier, such as a database name or table name. While they are not required for all identifiers, they are required if the identifier contains invalid characters (spaces, commas, etc) or if the identifier matches names with a keyword. It also seems to be good practice in general to surround database and table names in backticks.
  • Example: SELECT * FROM `database`.`table`
  1. Kotlin uses backticks in a similar fashion.
  • Example: fun `a method with spaces`() { ... }
  1. JavaScript uses backticks to indicate format strings, which allow people to embed expressions inside of their strings.
  • Example: let str = `HELLO ${name.toLocaleUpperCase()}`

Of course there are far more examples of languages where the backtick is a significant character in the language. This makes embedding these languages in Go very hard.

Proposed Solution

If there were a fixed number of ways to declare raw strings, the problem would, no matter what, arise that you would be unable to put Go code inside of Go code without some kind of need to transform the code. This means that there needs to be a variable way to create raw strings.

This proposal highlights one brought up here. It essentially improves on the current way to declare raw strings, allowing the following syntax:

var stmt = SQL`
SELECT `foo` FROM `bar` WHERE `baz` = "qux"
`SQL

var old = `
this, of course, still works
`

var new = 고`this
also works
    you can also use 고 AND `backticks` (separately) in the string!
`고

Essentially, raw strings can be prefixed with a delimiter, and the string is then terminated with a backtick followed by the same delimeter.

Strings which are densely populated with words and backticks may make it hard to pick a word to use as the delimiter for the raw string, as the word may appear inside the string, which would end the string early and cause a syntax error. Allowing any identifier to be used as a delimiter would allow non-ascii characters to be used as well, meaning that in special cases, when it's really needed, one can use a non-ascii character as their delimiter.

Concerns

@jimmyfrasche #32190 (comment)

Implementation-wise, the problem with user-specified delimiters is that they have to be handled during lexing, which adds complexity to a simple, though still somewhat involved, stage and would need a lot of explanation in the language spec.

I don't like the idea of complicating the language. I do not work with the internals of the language, so I am unsure of the magnitude of complication to the lexer that this change would bring. If it is too much, I don't think that it would at all be worth it, and maybe one of the alternatives below would be a better fit.

@ianlancetaylor #32190 (comment)

My only concern with [syntax] is that it doesn't lead with the fact that it is a string. C++ (R"delim( string )delim")) and Rust (r#" string "#) and Swift (#" string "#) are more clear as to when a string is starting.

I share this sentiment. My response to this here was that establishing a convention to use short, noticable identifiers (ie RAW, JS, SQL, etc) help with noticing where the string starts and ends. This could (possibly) be enforced by golint, but I'm not sure if that is a good idea or not.

Other Alternatives brought up

In #32190, there were several other alternatives that tried to achieve the same goal:

Variable numbers of backticks

Essentially, you could start the raw string with a certain number of backticks, and it would have to end with the same number of backticks.

    `````
    A raw string which can contain up to 4 ```` backticks in a row inside of it
    `````

This solution still had problems though. Strings cannot start with an even number of backticks, because any even number of backticks could also be interpreted as an empty string, introducing ambiguities. It also causes developers a bit of fuss when trying to get it to work inside of markdown, as markdown uses multiple backticks in order to signify a block of code.

Also, the strings could not start or end with backticks, which would be an unfortunate consequence.

Variable number of backticks + no empty raw strings

This one is a breaking change, however I think it is my favorite solution out of all of the alternatives. It's the exact same as the previous one, but Go also introduces a breaking change to disallow empty raw strings. There is no need for raw strings to be used to represent an empty string, since the normal "" can do that, and is much more preferable. The only code this would break is people who have used a raw string to define an empty string by doing something like x := `` or funcCall(``, ...). It may be good to do some research on if empty raw strings are ever used in real code.

This solution still has the issue of being annoying to use with markdown's code fences. The argument was used that we shouldn't make language decisions based on other languages, however I personally do not like this argument. Sharing code is part of what a programmer does, and Markdown is a very widely used markup language that uses multiple backticks in a row to define a code fence. This feature may make it a bit difficult to share Go code over anything that uses Markdown (slack, github, discord, and other services).

Despite making it difficult to share code via markdown-enabled chats, it is still easy to share code via something like gist.github.com or play.golang.org. If my original proposal proves to not work very well (doesn't feel Go-like, too difficult to implement, etc) I would love for this solution to be accepted in place.

Variable number of backticks + a quote

This proposal is actually pretty nice. It's similar to the previous proposal. Essentially, the starting is N backticks (N >= 2) followed by a quotation mark, and the ending delimiter is a quotation mark followed by the same number of backticks. Example:

s := ``"this is a `raw` "string" literal"``
fmt.Print(s)

// prints:
// this is a `raw` "string" literal

This syntax is actually very nice in my opinion. It fixes the "odd-number-only" ambiguity from the previous example, as well as fixing the Markdown issue (as code fences must occur on their own line). It also fixes the "strings starting/ending with backticks" issue.

The only issue with this syntax is that it doesn't seem to work well with existing raw strings. I don't personally have data about how often this occurs, but I'd imagine that there are several times where raw strings are used to describe strings with quotes in them, making code like x := `"this is a string"` common. Newcomers to Go may see this and think that the `" is the delimiter to the raw string, when in reality the ` is the delimiter and the " is part of the string.

However that critique may be a bit nitpicky. I do like this syntax a lot.

Choosing a symbol pair that nobody uses

This alternative stated that Go should add another symbol to use to declare raw strings in Go. For instance, to start the string and to end the string. Go code is defined to be UTF8 so file formatting issues should not happen. Another proposed idea was (U+2261 IDENTICAL TO).

This solution also has problems. What if our string has both backticks AND strange symbols (for instance if you were defining a list of mathematical symbols)? Or, what if you were trying to embed Go syntax inside of your strings? Also, the symbol is hard to type and not easy to find, so it may not be a good fit as a string delimiter.

Variable number of a special character

In #32590 (comment), another solution that I quite like was brought up, using a variable number of special characters. They propose using ^, and then the delimiters for the string become ^` and `^, where the number of ^ symbols is variable. They also created an implementation of it here.

For example:

s := ^^`
func main() {
	sql := ^`SELECT `foo` FROM `bar` WHERE `baz` = "qux"`^
	fmt.Println(sql)
}
`^^
fmt.Print(s)

// prints:
// 
// func main() {
//	sql := ^`SELECT `foo` FROM `bar` WHERE `baz` = "qux"`^
//	fmt.Println(sql)
// }
//

Other languages

  1. C++ R"delim(string)delim"
    • In my opinion, I personally hate the asymmetry of prefix-strings, they look sloppy to me and seem too much like they were trying to hack in features, so I don't really like this solution.
  2. Rust r#"string"#
    • Same issue that I had with C++: the asymmetry and "hackiness" of prefix-strings ruins it for me. Also, a fixed number of ways to define a string means that if one wants to put a Go raw string inside of a string (ie pattern matching for code generation), they will run into issues.
  3. Swift #"string"#
    • Again, a fixed number of ways to define a string means it's hard to pattern-match Go raw strings for code generation.

It's important that we have some kind of variable delimiter, as that way if the string we are embedding somehow contains it, it is easy to change the string's delimiter in order to avoid the issue.

The delimiter doesn't have to be an identifier like it is in this main proposal, it could also be varying the number of backticks like the one a few paragraphs up.

Conclusion

Raw strings in Go are often used to be able to copy-paste text to be used as strings, or to embed code from other languages (such as JS, SQL, or even Go) into Go. However, if that text contains backticks, we need some way to make sure that those backticks do not terminate the string early.

I believe that the way to do this is allowing an identifier to precede the string, and to make sure that the terminating backtick must be followed by the same identifier in order to terminate the string.

var markdown = MD`
### Thank you for reading :)
`MD

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions