Reference

Core API

The basic structure of recognition in JLPeg is a Pattern.

JLpeg.PatternType

Container for various patterns and grammars. Always has val, which may be primitive or a Vector of patterns, and code, a Vector of Instructions. Some patterns have fields unique to that type of pattern. A pattern which encloses other patterns will have an aux field containing a Dict for metadata.

source

These are created in a wide variety of ways, all based on three functions, P, S, and R, with the rest of the work done with a great quantity of operator overloading.

JLpeg.PegMatchType
PegMatch <: AbstractMatch

A type representing a successful match of a Pattern on a string. Typically returned from the match function. PegFail is the atypical return value.

A PegMatch is equal (==) to a Vector if the captures are equal to the Vector.

For details of how to make use of a PegMatch, see the section "Working With Matched Data" in the documentation.

Properties:

  • subject::AbstractString: Stores the string matched.
  • full::Bool: Whether the match is of the entire string.
  • captures::PegCapture: Contains any captures from matching patt to subject. This can in principle contain anything, as captures may call functions, in which case the return value of that function becomes the capture. For more information, consult the JLPeg documentation, and the docstrings for C, Cg, Cc, and A.
  • offsets::Vector{Int}: Provided for compatibility with AbstractMatch. SubStrings contain their own offsets, so this is unnecessary for normal work, but we generate them when match.offsets is used. It then consists of the indices at which the outer layer of captures may be found within the subject.
  • patt::Pattern: The pattern matched against the subject.
source
JLpeg.PegCaptureType
PegCapture <: AbstractVector

The captures (including group captures) of a PegMatch. Indexes and iterates in the same fashion.

source
JLpeg.PegFailType
PegFail

Returned on a failure to match(patt:Pattern, subject::AbstractString).

Properties

  • subject::AbstractString: The string that we failed to match on.
  • errpos: The position at which the pattern ultimately failed to match.
  • label::Symbol: Info about the failure provided by T(::symbol), defaulting to :default if the pattern fails but not at a throw point, or if no throws are provided.
source

Pattern Matching

Base.matchFunction
match(patt::Pattern, subject::AbstractString)::Union{PegMatch, PegFail}

Match patt to subject, returning a PegMatch implementing the expected interface for its supertype AbstractMatch, or a PegFail with useful information about the failure.

source
JLpeg.compile!Function
compile!(patt::Pattern)::Pattern

Compile a Pattern. It isn't necessary to call this, match will compile a Pattern if necessary, but this is a useful thing to do during precompilation.

This translates the Pattern to Instruction codes, appending them to the code field and returning same. Performs various optimizations in the process.

source

Constructing Patterns

Patterns are built by combining the products of these constructors into more complex Patterns.

JLpeg.PFunction
P(p::Union{AbstractString,AbstractChar,Integer,Bool,Symbol})::Pattern

Create a Pattern.

  • If p is a String or Char, this matches that string or character.
  • If p is a positive Integer, it matches that many characters.
  • If p is true, the rules succeeds, if false, the rule fails.
  • If p is a Symbol, this represents a call to the rule with that name.
  • If p is a negative Integer, matches if that many characters remain, consumes no input.
  • If p is already a Pattern, it is simply returned.

Examples

julia> match(P("func"), "funci")
PegMatch(["func"])

julia> match(P(3), "three")
PegMatch(["thr"])

julia> match(P(true), "pass")
PegMatch([""])

julia> match(P(false), "fail")
PegFail("fail", 1)

julia> match(P('👍'), "👍")
PegMatch(["👍"])
source
JLpeg.@P_strMacro
P"str"

Call P(str) on the String, in close imitation of Lua's calling convention.

source
JLpeg.SMethod
S(s::AbstractString)

Create a Pattern matching any character in the string.

source
JLpeg.RMethod
R(s::AbstractString)

Create a Pattern matching every character in the range from the first to the second character. s must be two codepoints long, and the first must be lower-valued than the second. Both must be valid UTF-8.

source
JLpeg.RMethod
R(a::AbstractChar, b::AbstractChar)

Match any character in the range a to b, inclusive.

source
JLpeg.BFunction
B(p::Union{Pattern,AbstractString,AbstractChar,Integer})

Match patt behind the current subject index. patt must be of fixed length. The most useful B patterns are !B(1), which succeeds at the beginning of the string, and B('\n')|!B(1) to match the start of a line.

source
JLpeg.U8Function
U8(val::Integer)

Matches a single byte, whether or not that byte is a valid part of a UTF-8 string.

source

JLpeg.Combinators

JLpeg.CombinatorsModule
JLpeg.Combinators

Contains all the combinator operators which shadow definitions in Base. The module redefines these symbols, providing a fallback to the Base, and are designed not to break existing code. The Julia compiler handles this kind of redirection well during inference, this technique is used by several packages which specialize operators for numerically-demanding tasks, without issue; we even transfer the Base docstrings.

Whether or not these overloads rise to the level of piracy is debatable. That said, we've walled them off, so that debate is not necessary.

Use

The @grammar and @rule macros are able to use these operators, so you may not need them. To import them, add this to your module:

import JLpeg.Combinators: *, -, %, |, ^, ~, !, >>, inv
source

Captures and Actions

PEG patterns are recognizers, matching the longest prefix of the string which they're able. Captures allow for substrings to be captured within the match, Actions perform actions at the index of the match, or on a capture.

JLpeg.CFunction
C(patt::Pattern)

Create a capture. Matching patt will return the matched substring. The sugared form is (patt,).

source
C(patt::Pattern, sym::Union{Symbol,AbstractString})

Create a named capture with key :sym or "sym". Sugared form (patt, :sym).

source
JLpeg.CgFunction
Cg(patt::Pattern [, sym::Union{Symbol,AbstractString}])

Create a group capture, which groups all captures from P into a vector inside the PegMatch object. If sym is provided, the group will be found at that key. Sugared as [patt] or [patt, :sym].

source
JLpeg.CrFunction
Cr(patt::Pattern, sym::Union{CapSym, Nothing})

Captures a UnitRange of matches in patt, optionally keyed by sym. Convenient for substitutions and annotations.

source
JLpeg.CcFunction
Cc(args...)

Constant capture. Matches the empty string and puts the values of args as a tuple in that place within the PegMatch captures. If args is a Pair{Symbol,Any}, that symbol will appear as a key in the match.

source
JLpeg.CpFunction
Cp()

Captures the empty string in .captures, consuming no input. Useful for the side effect, of storing the corresponding offset.

source
JLpeg.MFunction
M(patt::Pattern, sym::Symbol)

Mark the match of a pattern for later examination with K.

See also CM.

source
JLpeg.CMFunction
CM(patt::Pattern, sym::Symbol)

Mark the match of patt with sym, while also capturing it with sym.

See M and C for details.

source
JLpeg.KFunction
K(patt::Pattern, sym::Symbol, [check::Union{Function,Symbol}])

ChecK the pattern against the previous Mark with the same tag. If check is not provided, it will return true if the SubStrings of the mark and check are identical, otherwise it must be either a symbol representing one of the builtins or a function with the signature (marked::SubString, checked::SubString)::Bool. The success or failure of the check is the success or failure of the pattern. If patt doesn't match, the check will not be performed. The check will always fail if the corresponding mark is not present, except for the builtin :always, which always succeeds if patt succeeds.

K may also be written in do syntax, K(patt, :tag) do s1, s2; ... end.

See also CK

source
JLpeg.CKFunction
CK(patt::Pattern, sym::Symbol, check::Union{Function,Symbol})

ChecK the pattern against the prior [Mark], capturing if the check suceeds. See K and C for details.

source
JLpeg.AFunction
A(patt::Pattern, λ::Function)

Acts as a grouping capture for patt, applying λ to a successful match with the captures as arguments (not as a single Vector). If patt contains no captures, the capture is the SubString. The return value of λ becomes the capture; if nothing is returned, the capture (and its offset) are deleted. May be invoked as patt <| λ.

source
JLpeg.QFunction
Q(patt::Pattern, λ::Function)

A Query action. λ will receive the SubString matched by patt, and is expected to return a Bool representing the success or failure of the pattern. This happens during the parse, and may be used, for an example, to maintain a typedef symbol table when parsing C, with appropriate closed-over state variables.

source
JLpeg.Avm!Function
Avm!(patt::Pattern, λ::Function)

A vm Action. If patt succeeds, λ will be called on the entire VMState, and is expected to return true or false to describe the success or failure of the entire pattern. This is an ultimate escape hatch, intended for purposes such as debugging, or imposing user limits on stack depth. The most obvious use is early parse termination, by setting vm.running to false.

While it may be abused to do all sorts of surprising hacks, if you think you need it in order to parse something, you're probably wrong about that.

source
JLpeg.TFunction
T(label::Symbol)

Throw a failure labeled with :label. If a rule :label exists, this will be called to attempt recovery, the success or failure of the recovery rule is then used. Otherwise, :label and the position of T(:label) will be attached to PegFail in the event that the whole match fails at that point in the string.

source

Rules and Grammars

The great advantage (other than composability) PEGs have over regular expressions is the ability to match recursive patterns. These are constructed out of Rules and Grammars.

JLpeg.RuleType
Rule(name::Symbol, patt::Pattern)

Sugar-free form of @rule, creates a rule from patt, assigning it the name name.

source
JLpeg.GrammarMacros.@ruleMacro
@rule :name ← pattern...

Sugared form for rule definition. Assigns the rule in-scope with the given name:

# Wrong
name = @rule :name  ←  "foo" | "bar"
# Right
@rule :name ← "foo" | "bar"

In terms of scope and variable escaping, @rule functions identically to @grammar.

source
JLpeg.GrammarMacros.@grammarMacro
@grammar(name, rules)

Syntax sugar for defining a set of rules as a single grammar. Expects a block rules, each of which is a rule-pair as can be created with , or <--. "string" will be interpolated as P("string"), :symbol as P(:symbol), and an integer n as P(n).

Any variable name exported by JLpeg will refer to the same value as the export, while any other variable is escaped, and will have the meaning it has in the scope where @grammar is called.

Example use

This simple grammar captures the first string of numbers it finds:

julia> @grammar capnums begin
            :nums  ←  (:num,) | 1 * :nums
            :num   ←  R"09"^1
        end;

julia> match(capnums, "abc123abc123")
PegMatch(["123"])

This one also captures the lowercase letters, converting them to uppercase.

julia> upper = uppercase;  # A thoroughly unhygienic macro

julia> @grammar uppernums begin
           :nums  ←  (:num,) | :abc * :nums
           :num   ←  R"09"^1
           :abc   ←  R"az"^1 <| upper
       end;

julia> match(uppernums, "abc123abc123")
PegMatch(["ABC", "123"])

More extensive examples may be found in the documentation.

source

Dialects

A work in progress.

JLpeg.reConstant
re : An Interpretation of LPeg's `re` module

The first dialect, intended, among other things, as a useful bootstrap of other dialects.

source

Generators

A PEG is a specification of a class of algorithms which are valid on a universe of strings. While the common thing to do is use this specification to construct a recognizer for such strings, it may also be used to create generators for valid strings in that universe.

JLpeg aspires to provide a complete set of generators for our patterns. So far, we have:

JLpeg.generateFunction
generate(set::PSet)::String

Generate a String of all characters matched by a Set.

source

Benchmarking

JLpeg.matchreportFunction
matchreport(patt::Pattern, subject::AbstractString)

Matches patt against subject using an instrumented VM. This keeps a running count of instructions, amount of backtracking, and the number of times any given byte in the string is visited.

To state the obvious, this is not intended to be fast at all, it's provided as a tool for diagnosing performance bottlenecks in a pattern.

source
JLpeg.PegReportType
PegReport

Returned from a call to matchreport. Contains statistics gathered from the match run. Included:

  • matched::Bool Did the pattern match?
  • heatmap::Vector Number of instructions executed at each byte of the subject.
  • backtracks::Int Count of times the pattern backtracked.
  • max::Int Index of the farthest match.
  • advances::Int The number of bytes advanced. For a linear pattern, sizeof(subject).
  • count:: Int Total number of instructions executed.
  • capcount::Int Number of frames on the capture stack.
  • subject The string the pattern was matched against.

Printing this in the REPL will show a digest of these values, with a logarithmic heatmap of the subject string, as a visual indicator of where the pattern spends its time. Reports may also be saved and compared (manually), or added to your test suite, to detect performance regressions when a grammar is changed. A future release will use StyledStrings, which will allow the colors to easily be customized.

source