Custom Markup

administrators (intermediate)

PmWiki's markup translation engine is handled by a set of rules; each rule searches for a specific pattern in the markup text and replaces it with some replacement text. Internally, this is accomplished by using PHP's "preg_replace" function.

Introduction

Rules are added to the translation engine via PmWiki's Markup() function, which looks like

Markup($name, $when, $pattern, $replace); # if no evaluation is needed, or if PHP < 5.5
Markup($name, $when, $pattern, $replace_function); # if evaluation is needed'

# DEPRECATED, will not work as of PHP 7.2

Markup_e($name, $when, $pattern, $replace); # if evaluation is needed and 5.5<=PHP<=7.1
  • $name is a unique name (a string) given to the rule
  • $when says when the rule should be applied relative to other rules or phases, can be preceded by "<" or ">"
  • $pattern is the pattern to be searched for in the markup text
  • $replace is the text or HTML that should replace the matched pattern.
  • $replace_function is the name of the function which should be called with the match, and should return the replacement.

Simple markup

See Cookbook:Markup Directive Functions a new helper function (as of PmWiki 2.3.11) that makes it easy to add custom markup directives.

Example

For example, here's the code that creates the rule for ''emphasized text'' (in scripts/stdmarkup.php):

Markup("em", "inline", "/''(.*?)''/", "<em>$1</em>");

Basically this statement says to create a rule called "em" to be performed with the other "inline" markups, and the rule replaces any text inside two pairs of single quotes with the same text ($1) surrounded by <em> and </em>.

Sequence in which rules are applied

The first two parameters to Markup() are used to specify the sequence in which rules should be applied. The first parameter provides a name for a rule -- "em" in the example above. We could've chosen other names such as "''", or even "twosinglequotes". In general PmWiki uses the markup itself to name the rule (i.e., PmWiki uses "''" instead of "em"), but to keep this example easier to read later on we'll use a mnemonic name for now.

The second parameter says that this rule is to be done along with the other "inline" markups. PmWiki divides the translation process into a number of phases (or placeholder rules):

  _begin      start of translation
  {$var}      Page Text Variables happen here.
  fulltext    translations to be performed on the full text            
  split       conversion of the full markup text into lines to be processed
  directives  directive processing
  inline      inline markups
  links       conversion of links, url-links, and WikiWords     
  block       block markups
  style       style handling       
  _end        end of translation

This argument is normally specified as a left-angle bracket "<" ("before") or a right-angle bracket ">" ("after") followed by the name of another rule or the name of a phase.

Thus, specifying "inline" for the second parameter says that this rule should be applied when the other "inline" rules are being performed. If we want a rule to be performed with the directives -- i.e., before inline rules are processed, we would specify "directives" or "<inline" for the second parameter.

{$var} and (:if ...:) conditionals

A significant rule in terms of ordering is "{$var}" which substitutes variables -- if you say "<{$var}" then your markup will be processed before variables are substituted whereas if you say ">{$var}" then your markup will be processed after variables are substituted. This happens before conditional (:if...:) expressions, which is why page text variables are processed even if they are defined inside (:if false:).

Markup regular expression definition

The third parameter is a Perl-compatible regular expression. Basically, it is a slash, a regular expression, another slash, and a set of optional modifiers.

The example uses the pattern string "/''(.*?)''/", which uses ''(.*?)'' as the regular expression and no options. (The regular expression says "find two single quotes in succession, then as few arbitrary characters as are needed to make the match find something, then two additional single quotes in succession"; the parentheses "capture" a part of the wikitext for later use.)

Replacement text

The fourth parameter is the replacement text that should be inserted instead of the marked-up wikitext. You can use $1, $2, etc. to insert the text from the first, second etc. parenthesised part of the regular expression.

In the example, we have "<em>$1</em>", which is an <em>, the text matched by the first parentheses (i.e. by the .*? section of the pattern), and </em>.

Here's a rule for @@monospaced@@ text:

Markup("@@", "inline", "/@@(.*?)@@/", "<code>$1</code>");

and for a [:comment ...:] directive that is simply removed from the output:

Markup("comment", "directives", "/\\[:comment .*?:\\]/", '');

Okay, now how about the rule for '''strong emphasis'''? We have to be a bit careful here, because although this translation should be performed along with other inline markup, we also have to make sure that the rule for ''' is handled before the rule for '', because ''' also contains ''. The second parameter to Markup() can be used to specify the new rule's relationship to any other rule:

Markup("strong", "<em", "/'''(.*?)'''/", "<strong>$1</strong>");

This creates a rule called "strong", and the second parameter "<em" says to be sure that this rule is processed before the "em" rule we defined above. If we wanted to do something after the "em" rule, we would use ">em" instead. Thus, it's possible to add rules at any point in PmWiki's markup translation process in an extensible manner. (In fact, the "inline", "block", "directives", etc., phases above are just placeholder rules used to provide an overall sequence for other rules. Thus one can use "<inline" to specify rules that should be handled before any other inline rules.)

If you want to disable available markup just call e.g.:

DisableMarkup("strong");

PmWiki's default markup rules are defined in the scripts/stdmarkup.php file. To see the entire translation table as the program is running, the scripts/diag.php module adds "?action=ruleset", which displays the set of defined markup rules in the sequence in which they will be processed. You can see it at CustomMarkup?action=ruleset. You must first enable the action by setting $EnableDiag = 1 in your configuration file.

<:vspace> and <:block>

<:vspace> is inserted into the page text very early in page text processing to preserve an empty line (i.e. two newlines in a row). Very late in processing HTML is inserted into the page output to preserve the empty lines. Unless markup processing detects this it can be ignored.

<:block>
At the start of a line, <:block> means "start a block-level element", i.e. break out of the paragraphs.

Say you have these markups:

  • (:abc:) returns 'ABC'
  • (:def:) returns '<:block>DEF'

This wiki text:

 Some text
 (:abc:)
 some other text

will produce this HTML (simplified):

 
<p>Some text
 ABC
 some other text</p>

While this wiki text:

 Some text
 (:def:)
 some other text

will produce this HTML (simplified):

 
<p>Some text</p>
 DEF
 <p>some other text</p>

This is intended for a markup rule to return a block level element like <div>...</div> that is not allowed inside an HTML paragraph.

Other common examples

Define a custom markup to produce a specific HTML or Javascript sequence

Suppose an admin wants to have a simple "(:example:)" markup that will always produce a fixed HTML string in the output, such as for a webring, Google AdSense display, or Javascript. The Markup() call to do this would be:

Markup('example', 'directives',
  '/\\(:example:\\)/',
  Keep("<div class='example'><p>Here is a 
    <a target='_blank' href='https://www.example.com'>link</a> to
    <em>example.com</em></p></div>") );
  • The first argument is a unique name for the markup ("example").
  • The second argument says to perform this markup along with other directives.
  • The third argument is the pattern to look for "(:example:)".
  • The fourth argument is the HTML that "(:example:)" is to be replaced with. We use the Keep() function here to prevent the output from being further processed by PmWiki's markup rule -- in the above example, we don't want the https://www.example.com url to be again converted to a link.

Define a markup to call a custom function that returns content

The /e modifier has been deprecated and should not be used in ongoing development. See below for more details.

For older PHP versions (< 7.2) an 'e' option on the $pattern parameter causes the $replace parameter to be treated as a PHP expression to be evaluated instead of replacement text. To avoid using the deprecated e/ parameter, a markup to produce a random number between 1 and 100 might now look like:

Markup('random', 'directives',
  '/\\(:random:\\)/',
  "MyRandomFunction");
function MyRandomFunction() {
  return rand(1, 100);
}

This calls the PHP built-in rand() function and substitutes the directive with the result. Any function can be called, including functions defined in a local customization file or in Cookbook recipes.

Arguments can also be passed by using regular expression capturing parentheses, thus

Markup('randomargs', 'directives',
  '/\\(:random (\\d+) (\\d+):\\)/',
  "MyRandomFunction");
function MyRandomFunction($m) {
  return rand($m[1], $m[2]);
}

will cause the markup (:random 50 100:) to generate a random number between 50 and 100.

Note: the /e modifier in regular expressions is deprecated since PHP version 5.5, and removed since PHP version 7. The reason for this is, that malicious authors could pass strings that could cause arbitrary and undesirable PHP functions to be executed.

For a PmWiki function to help with parsing arbitrary sequences of arguments and key=value pairs, see Cookbook:ParseArgs.

Migration to PHP 5.5 and Markup_e()

Since PHP version 5.5, the /e evaluation modifier is deprecated and some hosting providers don't allow its use.

Recent versions of the PmWiki core (2.2.58 and newer) allow new ways to define markup rules without being dependent on the /e eval modifier. The historical ways to define markup rules are not removed and work, but may be incompatible with PHP 5.5 installations.

Note: whether your replacement pattern needs evaluation or not, you must use Markup() and not Markup_e(). The latter is deprecated and should no longer be used for new recipes and customizations, and old recipes using Markup_e should be upgraded to the new format.

The examples below all require PmWiki 2.2.58 (2013-12-25) or newer but the latest version is recommended.

THE SHORT ANSWER: If your markup regular expression (the 3rd argument) contains an "e" after the closing slash (i.e., /regex/e or /regex/se or etc) AND your 4th argument is entirely surrounded with double-quotes then you may be able to get away with simply following these simple steps:

  1. Delete the "e" from after the closing slash in the 3rd argument
  2. Create a new replacement function with $m as argument.
  3. In your function, the previous occurrences of '$1', '$2', etc. are now found as $m[1], $m[2], etc. You should no longer call PSS().
  4. In your function, call extract($GLOBALS['MarkupToHTML']); in order to get the current $pagename and $markupid.
  5. Your function needs to return the result from the markup processing, either html or another markup.
  6. Set the name of the replacement function as 4th argument of the Markup() call.

In some cases this will not suffice - it depends on how quoting was done - but in many cases following these simple steps will result in PHP 5.5+ compatibility.

If you try those steps and are still having problems then continue to read below for a deeper understanding.

The following is acceptable for PHP 5.5+ (compatible with PmWiki 2.2.58+, will also work in PHP 5.4 and older)

  • Markup($name, $when, $pattern, $replace);
    • $pattern can no longer have an "/e" modifier
    • $replace can be a function name (callback) which will be called with the array of matches as argument
    • instead of a string, the fourth parameter can be a definition of an anonymous function (note you can use anon functions this way since PHP 5.3.0+).
  • Markup_e($name, $when, $pattern, $replace); DEPRECATED, should no longer be used

Examples:

  • For PHP 5.4 and older, this was acceptable:
    Markup('randomargs', 'directives',
      '/\\(:random (\\d+) (\\d+):\\)/e',
      "rand('$1', '$2')"
      );
  • For PHP 5.5 and newer, $replace is callback, we call Markup():
    Markup('randomargs', 'directives',
      '/\\(:random (\\d+) (\\d+):\\)/',
      "MyRandom"
      );
    function MyRandom($m) { # $m = matches
      return rand($m[1], $m[2]); # note "return" is used, unlike before
    }
    
    This will also work in PHP 5.4 and older

Other example:

  • PHP 5.4 or older:
    Markup('Maxi:','<links',
      "/\\b([Mm]axi:)([^\\s\"\\|\\[\\]]+)(\"([^\"]*)\")?/e",
      "Keep(LinkMaxi(\$pagename,'$1','$2','$4','$1$2'),'L')"
      );
    
  • PHP 5.5 or newer, PmWiki 2.2.58+, $replace is a function name:
    Markup('Maxi:','<links',
      "/\\b([Mm]axi:)([^\\s\"\\|\\[\\]]+)(\"([^\"]*)\")?/",
      "LinkMaxi"
      );
    function LinkMaxi($m) {
      extract($GLOBALS['MarkupToHTML']); # to get $pagename
      # do stuff with $m[1], $m[2], etc.
      return Keep($out, 'L');
    }
    
    This will also work in PHP 5.4 and older
  • $replace can also be a callback function, we call Markup():
    Markup('Maxi:','<links',
      "/\\b([Mm]axi:)([^\\s\"\\|\\[\\]]+)(\"([^\"]*)\")?/",
      "CallbackMaxi"
    );
    function CallbackMaxi($m) {
      extract($GLOBALS["MarkupToHTML"]); # to get $pagename
      return Keep(LinkMaxi($pagename,$m[1],$m[2],$m[4],$m[1].$m[2]),'L');
    }
    
    This will also work in PHP 5.4 and older

The above may seem complicated, but it is actually simpler to use your own callback function:

Markup('mykey', 'directives', 
  '/\\(:mydirective (.*?) (.*?):\\)/i',
  'MyFunction'
);
function MyFunction($m) {
  extract($GLOBALS["MarkupToHTML"]);

  # ... do stuff with $m (the matches), drop PSS() ...

  return $out; # or return Keep($html);
}

If you have any questions about the new way to define custom markup, you can ask us at the talk page or on the mailing lists.

FAQ

How can I embed JavaScript into a page's output?

There are several ways to do this. The Cookbook:JavaScript recipe describes a simple means for embedding static JavaScript into web pages using custom markup. For editing JavaScript directly in wiki pages (which can pose various security risks), see the JavaScript-Editable recipe. For JavaScript that is to appear in headers or footers of pages, the skin template can be modified directly, or <script> statements can be inserted using the $HTMLHeaderFmt array.

How would I create a markup ((:nodiscussion:)) that will set a page variable ({$HideDiscussion}) which can be used by (:if enabled HideDiscussion:) in .PageActions?

Add the following section of code to your config.php

SDV($HideDiscussion, 0); 	#define var name
Markup('hideDiscussion', '<{$var}',
 '/\\(:nodiscussion:\\)/', 'setHideDiscussion'); 
function setHideDiscussion() { 
  global $HideDiscussion; 
  $HideDiscussion = true;
} 

This will enable the (:if enabled HideDiscussion:) markup to be used. If you want to print the current value of {$HideDiscussion} (for testing purposes) on the page, you'll also need to add the line:
$FmtPV['$HideDiscussion'] = '$GLOBALS["HideDiscussion"]';

It appears that (.*?) does not match newlines in these functions, making the above example inoperable if the text to be wrapped in <em> contains new lines.

If you include the "s" modifier on the regular expression then the dot (.) will match newlines. Thus your regular expression will be "/STUFF(.*?)/s". That s at the very end is what you are looking for. If you start getting into multi-line regexes you may be forced to look at the m option as well - let's anchors (^ and $) match not begin/end of strings but also begin/end of lines (i.e., right before/after a newline). Also make sure your markup is executed during the fulltext phase.

How can the text returned by my markup function be re-processed by the markup engine?

If the result of your markup contains more markup that should be processed, you have two options. First is to select a "when" argument that is processed earlier than the markup in your result. For example, if your markup may return [[links]], your "when" argument could be "<links" and your markup will be processed before the links markup. The second option is to call the PRR() function in your markup definition or inside your markup function. In this case, after your markup is processed, PmWiki will restart all markups from the beginning.

How do I get started writing recipes and creating my own custom markup?

(alternate) Introduction to custom markup for Beginners

How do I make a rule that runs once at the end of all other rule processing?

Use this statement instead of the usual Markup() call:

$MarkupFrameBase['posteval']['myfooter'] = "\$out = onetimerule(\$out);";

Category: Markup


This page may have a more recent version on pmwiki.org: PmWiki:CustomMarkup, and a talk page: PmWiki:CustomMarkup-Talk.