A problem occured while loading the plugin: krl_flickr_photoset -> : Assigning the return value of new by reference is deprecated on line 18
kelvinluck.com: Hacking TXP into submission
Get Firefox! creative commons

Hacking TXP into submission

So, as you will see from my previous two blog entries, under the misunderstanding that there was no plug-in to allow TXP to display nicely formatted code in-line I started work on a plug-in to do just that.

I’ve since found out that there was already a plug-in (glx_code) but as part of my learning experience with TXP I decided to push ahead and complete my plug-in (which uses the excellent GeSHi Generic Syntax Highlighter).

My problem last time was that I wanted my plug-in to allow you to place your code in-line within the tag like so:

<txp:krl_geshiSyntaxHighlight language=”php”>
function codeThatIsHighlighted() {}
</txp:krl_geshiSyntaxHighlight>

But – even if I wrapped the code in code or notextile tags then the over zealous Textile engine still replaced certain characters with their textile equivalent (so ”__” on either side of something would make it italic). Some of the Textile engine was respecting my notextile tags though – the quotes were no longer being swapped out for their pretty html equivalent.

Thus started my trawl through the TXP source to figure out what was going on and how to stop this. My first mistake was that I presumed that the substitution was occurring whenever the page was rendered and was something I could control from within my plug-in. It took a fair amount of digging to track it down and find out that the Textile engine actually does a pass over an article as you save it and inserts the parsed text into the Body_html row of the textpattern table.

Once I had this figured out it became apparent I would need to make Textile (more) aware of notextile tags. It did seem to pay some attention to them and when I looked through the source (of textpattern/lib/classTextpattern.php) I found that the glyphs function was aware of notextile, pre, kbd and code tags and purposefully didn’t apply it’s transformations to them. However, the span function just applied it’s transformations regardless. So I borrowed some code from glyph and modified it slightly to work in the span function.

Here is my modified version of the span function:

PHP:
  1. function span($text)
  2. {
  3.     $qtags = array('\*','\*\*','\?\?','-','__','_','%','\+','~');
  4. // KL 2005-01-26 - Borrowed some code from the glyphs function to make span aware of notextile, pre, kbd and code tags
  5.     $codepre = false;
  6.     /*  if no html, do a simple search and replace... */
  7.     if (!preg_match("/<.*>/", $text)) {
  8.         foreach($qtags as $f) {
  9.             $text = preg_replace_callback("/
  10.                 (?<=^|\s|[[:punct:]]|[{([])
  11.                 ($f)
  12.                 ($this->c)
  13.                 (?::(\S+))?
  14.                 ([\w<&].*[\w])
  15.                 ([[:punct:];]*)
  16.                 $f
  17.                 (?=[])}]|[[:punct:]]+|\s|$)
  18.             /xmU", array(&$this, "fSpan"), $text);
  19.         }
  20.         return $text;
  21.     }
  22.     else {
  23.         // codepre = we are in a code / pre / kbd tag - don't replace the things from $glyph_search
  24.         // with their html alternatives but do replace other htmlspecialchars
  25.         // codepre2 = we are in notextile tags. That means NO textile. So leave everything - including
  26.         // the things from $glyph_search well alone...
  27.         $codepre = $codepre2 = false;
  28.         $text = preg_split("/(<.*>)/U", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
  29.         foreach($text as $line) {
  30.             $offtags = ('code|pre|kbd');
  31.             $offtags2 = ('notextile');
  32.  
  33.             /*  matches are off if we're between <code>, <pre> etc. */
  34.             if (preg_match('/<(' . $offtags . ')>/i', $line)) $codepre = true;
  35.             if (preg_match('/<(' . $offtags2 . ')>/i', $line)) $codepre2 = true;
  36.             if (preg_match('/<\/(' . $offtags . ')>/i', $line)) $codepre = false;
  37.             if (preg_match('/<\/(' . $offtags2 . ')>/i', $line)) $codepre2 = false;
  38.  
  39.             if (!$codepre && !$codepre2) {
  40.                 foreach($qtags as $f) {
  41.                     $line = preg_replace_callback("/
  42.                         (?<=^|\s|[[:punct:]]|[{([])
  43.                         ($f)
  44.                         ($this->c)
  45.                         (?::(\S+))?
  46.                         ([\w<&].*[\w])
  47.                         ([[:punct:];]*)
  48.                         $f
  49.                         (?=[])}]|[[:punct:]]+|\s|$)
  50.                     /xmU", array(&$this, "fSpan"), $line);
  51.                 }
  52.             }
  53.             /* do htmlspecial if between <code> */
  54.             if ($codepre && !$codepre2) {
  55.                 $line = htmlspecialchars($line, ENT_NOQUOTES, "UTF-8");
  56.                 $line = preg_replace('/&lt;(\/?' . $offtags . ')&gt;/', "<$1>", $line);
  57.             }
  58.  
  59.             $span_out[] = $line;
  60.         }
  61.         return join('', $span_out);
  62.     }
  63. }

I then noticed that there was still an issue with quotes being encoded when they were appearing within notextile tags when they shouldn’t have been… So I added this hack to the glyph function (as already illustrated in the span function above):

PHP:
  1. function glyphs($text)
  2. {
  3.     // fix: hackish
  4.     $text = preg_replace('/"\z/', "\" ", $text);
  5. $pnc = '[[:punct:]]';
  6.  
  7.     $glyph_search = array(
  8.         '/([^\s[{(>_*])?\'(?(1)|(?=\s|s\b|'.$pnc.'))/',      //  single closing
  9.         '/\'/',                                              //  single opening
  10.         '/([^\s[{(>_*])?"(?(1)|(?=\s|'.$pnc.'))/',           //  double closing
  11.         '/"/',                                               //  double opening
  12.         '/\b( )?\.{3}/',                                     //  ellipsis
  13.         '/\b([A-Z][A-Z0-9]{2,})\b(?:[(]([^)]*)[)])/',        //  3+ uppercase acronym
  14.         '/\s?--\s?/',                                        //  em dash
  15.         '/\s-\s/',                                           //  en dash
  16.         '/(\d+) ?x ?(\d+)/',                                 //  dimension sign
  17.         '/\b ?[([]TM[])]/i',                                 //  trademark
  18.         '/\b ?[([]R[])]/i',                                  //  registered
  19.         '/\b ?[([]C[])]/i');                                 //  copyright
  20.  
  21.     $glyph_replace = array('$1&#8217;$2',   //  single closing
  22.         '&#8216;',                          //  single opening
  23.         '$1&#8221;',                        //  double closing
  24.         '&#8220;',                          //  double opening
  25.         '$1&#8230;',                        //  ellipsis
  26.         '<acronym title="$2">$1</acronym>', //  3+ uppercase acronym
  27.         '&#8212;',                          //  em dash
  28.         ' &#8211; ',                        //  en dash
  29.         '$1&#215;$2',                       //  dimension sign
  30.         '&#8482;',                          //  trademark
  31.         '&#174;',                           //  registered
  32.         '&#169;');                          //  copyright
  33.  
  34.     /*  if no html, do a simple search and replace... */
  35.     if (!preg_match("/<.*>/", $text)) {
  36.         $text = preg_replace($glyph_search, $glyph_replace, $text);
  37.         return $text;
  38.     }
  39.     else {
  40.         // codepre = we are in a code / pre / kbd tag - don't replace the things from $glyph_search
  41.         // with their html alternatives but do replace other htmlspecialchars
  42.         // codepre2 = we are in notextile tags. That means NO textile. So leave everything - including
  43.         // the things from $glyph_search well alone...
  44.         $codepre = $codepre2 = false;
  45.         $text = preg_split("/(<.*>)/U", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
  46.         foreach($text as $line) {
  47.             $offtags = ('code|pre|kbd');
  48.             $offtags2 = ('notextile');
  49.  
  50.             /*  matches are off if we're between <code>, <pre> etc. */
  51.             if (preg_match('/<(' . $offtags . ')>/i', $line)) $codepre = true;
  52.             if (preg_match('/<(' . $offtags2 . ')>/i', $line)) $codepre2 = true;
  53.             if (preg_match('/<\/(' . $offtags . ')>/i', $line)) $codepre = false;
  54.             if (preg_match('/<\/(' . $offtags2 . ')>/i', $line)) $codepre2 = false;
  55.             if (!preg_match("/<.*>/", $line) && !$codepre && !$codepre2) {
  56.                 $line = preg_replace($glyph_search, $glyph_replace, $line);
  57.             }
  58.  
  59.             /* do htmlspecial if between <code> */
  60.             if ($codepre && !$codepre2) {
  61.                 $line = htmlspecialchars($line, ENT_NOQUOTES, "UTF-8");
  62.                 $line = preg_replace('/&lt;(\/?' . $offtags . ')&gt;/', "<$1>", $line);
  63.             }
  64.  
  65.             $glyph_out[] = $line;
  66.         }
  67.         return join('', $glyph_out);
  68.     }
  69. }

[note: line numbers are after the above hack has been applied – you should be able to find the right function to replace anyway]

With these hacks in place I am finding that my plug-in is able to work more or less how I want it to… I am interested to find out if people think that these hacks will break other functionality or incur a performance hit or if there was a reason that textile ignored notextile tags when replacing “span” style tags in the first place.

Anyone who can answer those questions or can suggest a better way I can solve my problem please leave a comment and let me know… I’d rather not be hacking the TXP core – especially with version 1 hopefully out soon. But it seems nice to be able to drop little highlighted code snippets into your blog without requiring any uploading of files or anything…

  1. Hi

    I’m trying to install your plugin and everything is ok except for the text above that say to make some changes to textpattern/lib/classTextpattern.php because I’ve not this file in my installation.

    Do I’ve a bad install or the explanation is old ?

    Thanks in advance

    J
    — Javier    Apr 13, 17:18    #
  2. Hi

    I just finally got around to updating the copy of textpattern I am running on this blog and found that unfortunately the changes are still required. A bit of a surprising shame since I raised the bug about a year ago!

    But… I’ve just realised that I made a mistake in the original post – the changes should be made in textpattern/lib/classTextile.php NOT textpattern/lib/classTextpattern.php!

    Also, note that you only need to make these changes if you want to use the plugin to highlight sourcecode that you embed inside notextile tags inside the krl_geshiSyntaxHighlight tag. If you are using the alternative syntax using a file and the file=”filename” attribute then these changes are unnecessary.


    Kelvin Luck    Apr 17, 18:49    #