7
\$\begingroup\$

BBCode is a markup language commonly used in webforum software in the 2000s and 2010s. Your task is to write a program or function that translates BBCode to HTML according to the following spec. (BBCode has wildly varying implementations in the real world, but for this challenge we're defining it like this):

Tags

  • [b]foo[/b]<strong>foo</strong>
  • [i]foo[/i]<em>foo</em>
  • [u]foo[/u]<u>foo</u>
  • [s]foo[/s]<s>foo</s>
  • [code]foo[/code]<code>foo</code>. Tags inside [code] are not parsed.
  • [url]https://www.example.com[/url]<a href="https://www.example.com">https://www.example.com</a>
  • [url=https://www.example.com]foo[/url]<a href="https://www.example.com">foo</a>
  • [img]https://www.example.com[/img]<img src="https://www.example.com"> (no HTML closing tag)
  • [color=value]foo[/color]<span style="color:value">foo</span>
  • [size=value]foo[/size]<span style="font-size:value">foo</span>
  • [quote]foo[/quote]<blockquote>foo</blockquote>
  • [quote=author]foo[/quote]<blockquote><cite>author</cite>foo</blockquote>

Details

  • BBCode tags are case insensitive [b][/b] and [B][/b] and [B][/B] are all valid), but HTML output tags must be lowercase.
  • Nested tags are valid (see Nesting section for further details): [b][i]text[/i][/b]<strong><em>text</em></strong>
  • Unmatched, malformed, or unknown tags are left as literal text: [b]foo[b]foo, [b ]foo[/b][b ]foo[/b], [/b][/b], [bar]foo[/bar][bar]foo[/bar]
  • All attributes are valid; don't worry about URI validation, color/size name validation, anti-XSS, etc.
  • Empty tags are valid: [b][/b]<strong></strong>

Input attributes and input text inside (valid) tags (except for [code]) will not contain[, ], =, or ".

Nesting

Tags close in LIFO order (like a stack).

Inputs will always have properly nesting tags.

Test cases

Input: [b]bold[/b]
Output: <strong>bold</strong>

Input: [B]BOLD[/B]
Output: <strong>BOLD</strong>

Input: [b][i]bold italic[/i][/b]
Output: <strong><em>bold italic</em></strong>

Input: [url]http://example.com[/url]
Output: <a href="http://example.com">http://example.com</a>

Input: [url=http://example.com]click here[/url]
Output: <a href="http://example.com">click here</a>

Input: [img]http://example.com/image.png[/img]
Output: <img src="http://example.com/image.png">

Input: [color=red]red text[/color]
Output: <span style="color:red">red text</span>

Input: [size=20px]big text[/size]
Output: <span style="font-size:20px">big text</span>

Input: [quote]someone said this[/quote]
Output: <blockquote>someone said this</blockquote>

Input: [quote=John]someone said this[/quote]
Output: <blockquote><cite>John</cite>someone said this</blockquote>

Input: [b]unclosed tag
Output: [b]unclosed tag

Input: unopened tag[/b]
Output: unopened tag[/b]

Input: [unknown]text[/unknown]
Output: [unknown]text[/unknown]

Input: [b]nested [i]tags[/i] work[/b]
Output: <strong>nested <em>tags</em> work</strong>

Input: [url=http://test.com][b]bold link[/b][/url]
Output: <a href="http://test.com"><strong>bold link</strong></a>

Input: plain text with no tags
Output: plain text with no tags

Input: [code]<script>alert('hi')</script>[/code]
Output: <code><script>alert('hi')</script></code>

Input: [CoLoR=blue]case test[/color]
Output: <span style="color:blue">case test</span>

Input: [code][b]not bold[/b][/code]
Output: <code>[b]not bold[/b]</code>

Input: [code][url=http://test.com]link[/url][/code]
Output: <code>[url=http://test.com]link[/url]</code>

Input: [b][code]tags[/code] outside[/b]
Output: <strong><code>tags</code> outside</strong>

Input: [b][i][u]triple nested[/u][/i][/b]
Output: <strong><em><u>triple nested</u></em></strong>

Input: [color=red][b]colored bold[/b][/color]
Output: <span style="color:red"><strong>colored bold</strong></span>

Input: [quote=Alice][b]bold quote[/b][/quote]
Output: <blockquote><cite>Alice</cite><strong>bold quote</strong></blockquote>

Input: [url=http://test.com][color=blue]styled link[/color][/url]
Output: <a href="http://test.com"><span style="color:blue">styled link</span></a>

Input: [code][code]nested code[/code][/code]
Output: <code>[code]nested code[/code]</code>

Input: [u]foo[/u]
Output: <u>foo</u>

Input: plaintext
Output: plaintext

Input: [code]left[/code][code]right[/code]
Output: <code>left</code><code>right</code>

This is . Standard loopholes are forbidden.

\$\endgroup\$
19
  • 3
    \$\begingroup\$ HTML has <b> and <i> tags and they are semantically different from <strong> and <em>. Not requesting to change rules, but why do you define the mapping that way? \$\endgroup\$ Commented Jan 29 at 4:05
  • 2
    \$\begingroup\$ @Explorer09 www.bbcode.org does translate [b] into <strong> and [i] into <em>. And so does Markdown on this very site with **this** and *this*. \$\endgroup\$ Commented Jan 29 at 4:51
  • 2
    \$\begingroup\$ text will never contain [, ] seems to contradict malformed, or unknown tags are left as literal text. \$\endgroup\$ Commented Jan 29 at 4:56
  • 2
    \$\begingroup\$ @qarz <strong> and <b> have different semantics in HTML5 and are not interchangeable. <b> is for general marker for drawing attention or keywords. <strong> is for making importance in text/speech. The **this** in Markdown translates better to <b> and not <strong>. Likewise for <i> (general marker for loanwords, scientific terms and titles) and <em> (stress or emphasis in speech). See also: FAQ from WHATWG \$\endgroup\$ Commented 2 days ago
  • 5
    \$\begingroup\$ @Explorer09 i don't think it's very relevant to be pedantic about semantic html when we're talking about software that is nowadays largely considered obsolete, used for informal conversation by non-technical users. \$\endgroup\$ Commented 2 days ago

5 Answers 5

4
\$\begingroup\$

Perl 5, 485 bytes

undef$/;$_=<>;s/\[code]((?:(?R)|.)*?)\[\/code]/push@a,$1;"\0"/geis;1while s{\[(b|i|u|img|url|color|size|quote)(?:=([^]]+))?](.*?)\[/\1]}{($n,$x,$y)=(lc$1,$2,$3);$n=~/^u/?$n=~/l/?"<a href=\"".($x||$y)."\">$y</a>":"<u>$y</u>":$n=~/^i/?$n=~/m/?"<img src=\"$y\">":"<em>$y</em>":$n=~/^b/?"<strong>$y</strong>":$n=~/^q/?"<blockquote>".($x?"<cite>$x</cite>":"")."$y</blockquote>":"<span style=\"".($n=~/c/?"color":"font-size").":$x\">$y</span>"}gie;s/\0/"<code>".shift(@a)."<\/code>"/ge;print

Try it online!

Detailed explaination

undef$/;$_=<>;

Undefines the line separator so $_=<>; reads the entire input (newlines and all) into the default variable $_ at once.

s/\[code]((?:(?R)|.)*?)\[\/code]/push@a,$1;"\0"/geis;

Matches a [code] block and substitutes its contents with a null byte \0, saving them in the array @a for later.

1 while s{\[(b|i|u|img|url|color|size|quote)(?:=([^]]+))?](.*?)\[/\1]}{
  ($n,$x,$y)=(lc$1,$2,$3);
  $n=~/^u/?$n=~/l/?"<a href=\"".($x||$y)."\">$y</a>":"<u>$y</u>":
  $n=~/^i/?$n=~/m/?"<img src=\"$y\">":"<em>$y</em>":
  $n=~/^b/?"<strong>$y</strong>":
  $n=~/^q/?"<blockquote>".($x?"<cite>$x</cite>":"")."$y</blockquote>":
  "<span style=\"".($n=~/c/?"color":"font-size").":$x\">$y</span>"
}gie;

s{...}g replaces all matching tags in the string. The while loop handles nested tags.

  • $n is the tag name
  • $x are the tag attributes if present
  • $y is what's within the tag

Inside the loop, a set of if/else replace the appropriate tags.

  • Starts with u? -> u or url
  • Starts with i? -> img or i
  • Starts with b? -> b.
  • Starts with q? quote (adds <cite>...</cite> only if an author was defined)
  • Else: It must be color or size. Merge these as they both output <span style="..."> by checking if the name contains c to decide between color: or font-size:.
s/\0/"<code>".shift(@a)."<\/code>"/ge;
print

s/\0/.../ge finds every null byte placeholder created in the beginning and replaces it (FIFO) shift(@a) with the array contents.

\$\endgroup\$
4
\$\begingroup\$

Retina, 675 bytes

i`\[code](.*)\[/code]
==$1==
i`(?<!==.*)\[b](.*)\[/b](?!.*==)
<strong>$1</strong>
i`(?<!==.*)\[i](.*)\[/i](?!.*==)
<em>$1</em>
i`(?<!==.*)\[(.)](.*)\[/\1](?!.*==)
<$1>$2</$1>
i`(?<!==.*)\[color=(.*)](.*)\[/color](?!.*==)
<span style="color:$1">$2</span>
i`(?<!==.*)\[size=(.*)](.*)\[/size](?!.*==)
<span style="font-size:$1">$2</span>
i`(?<!==.*)\[url](.*)\[/url](?!.*==)
[url=$1]$1[/url]
i`(?<!==.*)\[url=(.*)](.*)\[/url](?!.*==)
<a href="$1">$2</a>
i`(?<!==.*)\[img](.*)\[/img](?!.*==)
<img src="$1">
i`(?<!==.*)\[quote=(.*)](.*)\[/quote](?!.*==)
[quote]<cite>$1</cite>$2[/quote]
i`(?<!==.*)\[quote](.*)\[/quote](?!.*==)
<blockquote>$1</blockquote>
==(.*)==
<code>$1</code>

Try it online!

-20 bytes: I discovered by accident that ] matches that character and it doesn't need to be escaped when it doesn't close a character class

+0 byte: fixing a typo, saved another ]

a big mess but by the first 5 minutes i already wanted to be done writing regex.

Explanation

  • all exressions are case-insensitive
  • First, replace all code tags with == which is guaranteed not to appear elsewhere
  • I use negative look(ahead|behind) to assert the things i'm matching aren't enclosed in == in all subsequent expressions, properly escaping whatever is in code tags (this is shorter than checking for the full [code] tag everytime)
  • i replace b and i tags with their respective html
  • i replace the remaining 1-letter tags with the same tag as html
  • i replace color and size tags (an optimization here could allow first replacing size with font-size then matching and replacing both but i couldn't find a way where that's shorter)
  • I replace urls without an = with one that has
  • I replace all urls properly
  • i replace all images
  • i replace quotes with an = to one without, that just contains the literal <cite> tag
  • I replace quote tags properly
  • finally, i replace my custom == tags with their contents in a <code> tag.
\$\endgroup\$
12
  • \$\begingroup\$ I gotta learn this lang sometime... \$\endgroup\$ Commented 2 days ago
  • \$\begingroup\$ @Seggan to be quite honest it wasn't fun, it wasn't rewarding and the documentation sucked. I do not recommend it. \$\endgroup\$ Commented 2 days ago
  • \$\begingroup\$ there might be a typo for the closing <blockquote> as it doesn't have the / character \$\endgroup\$ Commented yesterday
  • \$\begingroup\$ @mastaH correct, thanks \$\endgroup\$ Commented yesterday
  • \$\begingroup\$ For a start, here's an obvious 19 byte saving: Try it online! (input removed due to comment length limitations). Using Retina 0.8.2 here just to prove you're not using Retina 1's power. \$\endgroup\$ Commented yesterday
2
\$\begingroup\$

JavaScript (ES12), 478 bytes

s=>(a=s.split(/(\[.+?\])/)).map(S=(s,i)=>i&1?([,C,t,,p]=/.(\/)?(\w+)(=(.+))?./.exec(s),n=`url|color|size|b|i|quote|img|u|s|code`.split`|`.indexOf(t=t.toLowerCase()),C)?!([j,p]=S[t]?.pop()||[],T=`|a href="0"||21:0"||2font-1:0"|strong||em||block1|block1><cite>0</cite|1 src="0"||u||s||1`.split`|`[n*2|!!p||1]?.replace(/\d/g,n=>[p||a[j+1],t,'span style="'][n]),c-=c&&n>8)*j&&T?a[a[j]=`<${T}>`,i]=n-6?`</${/\w+/.exec(T)}>`:a[j+1]='':0:(c+=n>8,S[t]||=[]).push([i,p]):0,c=0)&&a.join``

Attempt This Online!

Method

We split the input string on pseudo-legal BBCode tags, placing the tags at odd positions and the remaining parts at even positions. For instance:

[b][i]foo[/i][/b]["","[b]","","[i]","foo","[/i]","","[/b]",""]

We then iterate over the resulting array, modifying it in-place whenever a valid pair of opening and closing tags is found.

There is one stack per tag type in S, storing the position of each opening tag and its BBCode parameter, if any.

The counter c is used to keep track of the nesting depth of code blocks, allowing us to disable HTML conversion inside them.

We use two lookup tables:

  • one with 10 entries to identify the BBCode tags
  • one with 20 entries for the corresponding HTML tags, with and without a BBCode parameter

Commented

s =>
// split the input string on pseudo-legal BBCode tags '[…]'
(a = s.split(/(\[.+?\])/))
// for each part s at index i
.map(S = (s, i) =>
  i & 1 ?
    // if this is a tag
    (
      // C = 'closing tag' flag, t = tag name, p = optional parameter
      [, C, t,, p] = /.(\/)?(\w+)(=(.+))?./.exec(s),
      // force t to lowercase and get n = internal BBCode tag ID
      n =
      // 0   1     2    3 4 5     6   7 8 9
        `url|color|size|b|i|quote|img|u|s|code`
        .split`|`
        .indexOf(t = t.toLowerCase()),
      C
    ) ?
      // if this is a closing tag
      !(
        // attempt to retrieve j = position of the opening tag
        // and p = parameter of the opening tag
        [j, p] = S[t]?.pop() || [],
        // T = HTML tag determined by n and the presence of a parameter
        // (NB: an 'url' tag is always forced to entry #1)
        T =
        //  1           3      5          6       8
          `|a href="0"||21:0"||2font-1:0"|strong||em||` +
        // 10     11                   12         14 16 18
          `block1|block1><cite>0</cite|1 src="0"||u||s||1`
          .split`|`
          [n * 2 | !!p || 1]
          // unpack: 0 → p or a[j + 1], 1 → t, 2 → 'span style="'
          ?.replace(/\d/g, n => [ p || a[j + 1], t, 'span style="'][n]),
        // decrement c if it's greater than 0 and the tag is 'code'
        c -= c && n > 8
      ) * j && T ?
        // if c is not 0 and both j and T are defined,
        // replace the opening tag with T
        a[a[j] = `<${T}>`, i] =
          n - 6 ?
            // if this is not an 'img' tag, update the closing tag
            // using the name extracted from T
            `</${/\w+/.exec(T)}>`
          :
            // otherwise, clear both a[j + 1] and the closing tag
            a[j + 1] = ''
      :
        // invalid tag: do nothing
        0
    :
      // opening tag: push [i, p] onto this tag's stack
      // and increment c if the tag is 'code'
      (c += n > 8, S[t] ||= []).push([i, p])
  :
    // this is not a tag: do nothing
    0,
  // c = code block counter, initialized to 0
  c = 0
)
// end of map(), return a[] joined
&& a.join``
\$\endgroup\$
1
\$\begingroup\$

Python 3.12, ̶1̶4̶9̶1̶ ̶ 1484 bytes

J=len
import re
def A(s):
 U='color';T='img';S='url';R='span';Q='strong';L='code';Z='blockquote';G=[];C='';I=F=0;K={};D=list(re.finditer('\\[(/?)(\\w+)(?:=([^\\]]+))?\\]',s,re.I))
 for(B,E)in enumerate(D):
  if E[1]:continue
  A=E[2].lower();H=B+1;O=A==L
  while H<J(D):
   if D[H][2].lower()==A:
    if A==L:O=O+1-2*bool(D[H][1])
    D[H][1]and(A==L)*(O==0)*(A!=L):K[B]=H;break
   H+=1
 while F<J(D):
  E=D[F];C+=s[I:E.start()];I=E.end();A=E[2].lower();M=E[3];P=E[1]
  if G and G[-1][0]==L:
   if P and A==L and F==K[G[-1][1]]:C+='</code>';G.pop()
   else:C+=E[0]
  elif P:
   if G and G[-1][0]==A:C+=f"</{dict(b=Q,i='em',quote=Z,url='a',color=R,size=R).get(A,A)}>";G.pop()
   else:C+=E[0]
  elif A in'biuscode':
   if F in K:C+=f"<{dict(b=Q,i='em').get(A,A)}>";G+=[(A,F)]
   else:C+=E[0]
  elif'quote'==A:
   if F in K:C+=f'<{Z}><cite>{M}</cite>'if M else f'<{Z}>';G+=[(A,F)]
   else:C+=E[0]
  elif A==S:
   if M:
    if F in K:C+=f'<a href="{M}">';G+=[(A,F)]
    else:C+=E[0]
   else:
    B=F+1
    while B<J(D)and not(D[B][1]and D[B][2].lower()==S):B+=1
    if B<J(D):N=s[I:D[B].start()];C+=f'<a href="{N}">{N}</a>';I=D[B].end();F=B
    else:C+=E[0]
  elif A==T:
   B=F+1
   while B<J(D)and not(D[B][1]and D[B][2].lower()==T):B+=1
   if B<J(D):N=s[I:D[B].start()];C+=f'<img src="{N}">';I=D[B].end();F=B
   else:C+=E[0]
  elif A in'colorsize':
   H=U if A==U else'font-size'
   if F in K:C+=f'<span style="{H}:{M}">';G+=[(A,F)]
   else:C+=E[0]
  else:C+=E[0]
  F+=1
 return C+s[I:]
\$\endgroup\$
4
  • 2
    \$\begingroup\$ Few trivial bytesaves possible, for example elif A=='quote' can be changed to elif'quote'==A, if A==L and O==0 or A!=L and D[H][1] can become D[H][1]and(A==L)*(O==0)*(A!=L), etc. \$\endgroup\$ Commented 2 days ago
  • \$\begingroup\$ @CrSb0001 Thanks! \$\endgroup\$ Commented 2 days ago
  • 2
    \$\begingroup\$ You might want to check this question for more general tips, maybe see if you can incorporate some of those in the answer \$\endgroup\$ Commented 2 days ago
  • \$\begingroup\$ @CrSb0001 Appreciate it! I'm pretty new to serious golfing \$\endgroup\$ Commented 2 days ago
0
\$\begingroup\$

C, 1493 bytes

#include<stdio.h>
#include<string.h>
#define P printf
char I[99999],*T[]={"b","i","u","s","code","url","img","color","size","quote"},A[999][256],t[256],a[256],*O[]={"strong","em","u","s","code"};int S[999],R[999],s,o[99999],c[99999],L,z,e,n,Y,x,d,i,k;int q(char*u,char*v){for(;*u&&*v;u++,v++)if((*u|32)!=(*v|32))return 0;return!*u&&!*v;}int y(char*m){for(k=0;k<10;k++)if(q(m,T[k]))return k;return-1;}int p(int r){int k=r+1;z=0;*a=0;if(I[k]==47)z=1,k++;n=0;while(I[k]&&I[k]-93&&I[k]-61&&I[k]-91)t[n++]=I[k++];t[n]=0;if(I[k]==61&&!z){k++;n=0;while(I[k]&&I[k]-93&&I[k]-91)a[n++]=I[k++];a[n]=0;}return I[k]==93?e=k+1:0;}int main(){L=fread(I,1,99999,stdin);for(i=0;i<L;i++)o[i]=c[i]=-1;for(i=0;i<L;i++)if(I[i]==91&&p(i)){Y=y(t);if(x){if(Y==4&&z&&!--d){x=0;if(s&&S[s-1]==4)s--,o[R[s]]=4,c[i]=4;}else if(Y==4&&!z)d++;continue;}if(~Y){if(z){if(s&&S[s-1]==Y)s--,o[R[s]]=Y,strcpy(A[R[s]],A[s]),c[i]=Y;}else{if((Y==7|Y==8)&&!*a)continue;S[s]=Y;strcpy(A[s],a);R[s++]=i;if(Y==4)x=1,d=1;}}}for(i=0;i<L;){if(~o[i]){Y=o[i];char*v=A[i];p(i);if(Y<5)P("<%s>",O[Y]);else if(Y<7){if(Y<6&&*v)P("<a href=\"%s\">",v);else{n=0;for(;e<L&&c[e]-Y;)t[n++]=I[e++];t[n]=0;Y<6?P("<a href=\"%s\">%s</a>",t,t):P("<img src=\"%s\">",t);p(e);i=e;goto N;}}else if(Y<9)P("<span style=\"%s:%s\">",Y<8?"color":"font-size",v);else{P("<blockquote>");if(*v)P("<cite>%s</cite>",v);}i=e;}else if(~c[i]){Y=c[i];p(i);Y<4?P("</%s>",O[Y]):Y<5?P("</code>"):Y<6?P("</a>"):Y<7?0:Y<9?P("</span>"):P("</blockquote>");i=e;}else putchar(I[i++]);N:;}}
\$\endgroup\$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.