ExpandCollapseNext Index

+ 1.1 Basic RE2

Felix provides Google's RE2 engine for regular expressions. The basic syntax and capabilities are a subset of Perl's PCRE, only RE2 actually works correctly and performs well. RE2 does not support backreferences.

+ 1.1.1 Reference

+ 1.1.2 Compiling a regexp.

A regexp can be compiled with the RE2 function.

test/regress/rt/regexp_01.flx

  var r = RE2(" *([A-Za-z_][A-Za-z0-9]*).*");

+ 1.1.3 Simple Matching

Matching is done with the Match function:

test/regress/rt/regexp_01.flx

  var line = "Hello World";
  var maybe_subgroups = Match (r, line);

Please note, Match only supports a complete match. There's no searching or partial matching. Instead, just use repeated wildcards as shown.

+ 1.1.4 Checking Match results.

The best way to check the result of a Match is with a pattern match as follows:

test/regress/rt/regexp_01.flx

  match maybe_subgroups with
  | #None => println$ "No match";
  | Some a =>
    println$ "Matched " + a.1;
  endmatch;

test/regress/rt/regexp_01.expect

Matched Hello

+ 1.1.5 Streamable matching

You may want to match more than one instance of a pattern in a string. For example, you may want to capture each word in a line of text. This can be done by iterating over a regex like the following

test/regress/rt/regexp_01.flx

  var r2 = RE2("\w+"); // try to match a word
  var sentence = "Hello World";
  for x in (r2, sentence) do
      println$ x.0;
  done

test/regress/rt/regexp_01.expect

Hello
World

If you use the simple method, you'll only match a single word, but with the for loop you get every match.

+ 1.1.6 Supported Syntax.

+ 1.2 Regular definitions.

Regular expressions are quoting hell. Luckily Felix provides a solution: regular definitions:

test/regress/rt/regexp_01.flx

  begin
    regdef lower = charset "abcdefghijklmnopqrstuvwxyz";
    regdef upper = charset "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    regdef digit = charset "0123456789";
    regdef alpha = upper | lower;
    regdef cid0 = alpha | "_";
    regdef cid1 = cid0 | digit;
    regdef cid = cid0 cid1 *;
    regdef space = " ";
    regdef white = space +;
    regdef integer = digit+;

These are some basic definitions. Note that regdef introduces a new syntax corresponding with the notation usually used for regular expressions.

This is called a DSSL or Domain Specific Sub-Language. Its not a DSL, because that's a complete new language, rather the sub suggests its an extension of normal Felix. The extension is written entirely in user space.

Now to use these definitions:

test/regress/rt/regexp_01.flx

  // match an assignment statement
    regdef sassign =
      white? "var" white?
      group (cid) white? "=" white?
      (group (cid) | group (integer))
      white? ";" white?
    ;
  
    var rstr : string = sassign.Regdef::render;
    var ra = RE2 rstr;
    var result = Match (ra, " var a = b; ");
    match result with
      | #None =>
        println$ "No match?";
  
      | Some groups =>
        if groups.2 != "" do
          println$ "Assigned " + groups.1 + " from variable " + groups.2;
        else
          println$ "Assigned " + groups.1 + " from integer" + groups.3;
        done;
    endmatch;
  end

test/regress/rt/regexp_01.expect

Assigned a from variable b

Note that the regdef kind of variable must be converted to a Perl regexp in a string form using the render function.