Reference
Core API
The basic structure of recognition in JLPeg is a Pattern.
JLpeg.Pattern — TypeContainer for various patterns and grammars. Always has val, which may be primitive or a Vector of patterns, and code, a Vector of Instructions. Some patterns have fields unique to that type of pattern. A pattern which encloses other patterns will have an aux field containing a Dict for metadata.
These are created in a wide variety of ways, all based on three functions, P, S, and R, with the rest of the work done with a great quantity of operator overloading.
JLpeg.PegMatch — TypePegMatch <: AbstractMatchA type representing a successful match of a Pattern on a string. Typically returned from the match function. PegFail is the atypical return value.
A PegMatch is equal (==) to a Vector if the captures are equal to the Vector.
For details of how to make use of a PegMatch, see the section "Working With Matched Data" in the documentation.
Properties:
subject::AbstractString: Stores the string matched.full::Bool: Whether the match is of the entire string.captures::PegCapture: Contains any captures from matchingpatttosubject. This can in principle contain anything, as captures may call functions, in which case the return value of that function becomes the capture. For more information, consult theJLPegdocumentation, and the docstrings forC,Cg,Cc, andA.offsets::Vector{Int}: Provided for compatibility withAbstractMatch.SubStrings contain their own offsets, so this is unnecessary for normal work, but we generate them whenmatch.offsetsis used. It then consists of the indices at which the outer layer of captures may be found within the subject.patt::Pattern: The pattern matched against the subject.
JLpeg.PegCapture — TypePegCapture <: AbstractVectorThe captures (including group captures) of a PegMatch. Indexes and iterates in the same fashion.
JLpeg.PegFail — TypePegFailReturned on a failure to match(patt:Pattern, subject::AbstractString).
Properties
subject::AbstractString: The string that we failed to match on.errpos: The position at which the pattern ultimately failed to match.label::Symbol: Info about the failure provided byT(::symbol), defaulting to:defaultif the pattern fails but not at a throw point, or if no throws are provided.
JLpeg.PegError — TypePegError(msg) <: ExceptionAn error while constructing a Pattern.
Pattern Matching
Base.match — Functionmatch(patt::Pattern, subject::AbstractString)::Union{PegMatch, PegFail}Match patt to subject, returning a PegMatch implementing the expected interface for its supertype AbstractMatch, or a PegFail with useful information about the failure.
JLpeg.compile! — Functioncompile!(patt::Pattern)::PatternCompile a Pattern. It isn't necessary to call this, match will compile a Pattern if necessary, but this is a useful thing to do during precompilation.
This translates the Pattern to Instruction codes, appending them to the code field and returning same. Performs various optimizations in the process.
Constructing Patterns
Patterns are built by combining the products of these constructors into more complex Patterns.
JLpeg.P — FunctionP(p::Union{AbstractString,AbstractChar,Integer,Bool,Symbol})::PatternCreate a Pattern.
- If
pis aStringorChar, this matches that string or character. - If
pis a positiveInteger, it matches that many characters. - If
pistrue, the rules succeeds, iffalse, the rule fails. - If
pis aSymbol, this represents a call to the rule with that name. - If
pis a negativeInteger, matches if that many characters remain, consumes no input. - If
pis already aPattern, it is simply returned.
Examples
julia> match(P("func"), "funci")
PegMatch(["func"])
julia> match(P(3), "three")
PegMatch(["thr"])
julia> match(P(true), "pass")
PegMatch([""])
julia> match(P(false), "fail")
PegFail("fail", 1)
julia> match(P('👍'), "👍")
PegMatch(["👍"])JLpeg.@P_str — MacroP"str"Call P(str) on the String, in close imitation of Lua's calling convention.
JLpeg.S — MethodS(s::AbstractString)Create a Pattern matching any character in the string.
JLpeg.@S_str — MacroS"str"Create a Set pattern of the characters in "str". See S.
JLpeg.R — MethodR(s::AbstractString)Create a Pattern matching every character in the range from the first to the second character. s must be two codepoints long, and the first must be lower-valued than the second. Both must be valid UTF-8.
JLpeg.R — MethodR(a::AbstractChar, b::AbstractChar)Match any character in the range a to b, inclusive.
JLpeg.@R_str — MacroR"str"Create a range pattern from "str". See R.
JLpeg.B — FunctionB(p::Union{Pattern,AbstractString,AbstractChar,Integer})Match patt behind the current subject index. patt must be of fixed length. The most useful B patterns are !B(1), which succeeds at the beginning of the string, and B('\n')|!B(1) to match the start of a line.
JLpeg.U8 — FunctionU8(val::Integer)Matches a single byte, whether or not that byte is a valid part of a UTF-8 string.
JLpeg.Combinators
JLpeg.Combinators — ModuleJLpeg.CombinatorsContains all the combinator operators which shadow definitions in Base. The module redefines these symbols, providing a fallback to the Base, and are designed not to break existing code. The Julia compiler handles this kind of redirection well during inference, this technique is used by several packages which specialize operators for numerically-demanding tasks, without issue; we even transfer the Base docstrings.
Whether or not these overloads rise to the level of piracy is debatable. That said, we've walled them off, so that debate is not necessary.
Use
The @grammar and @rule macros are able to use these operators, so you may not need them. To import them, add this to your module:
import JLpeg.Combinators: *, -, %, |, ^, ~, !, >>, invCaptures and Actions
PEG patterns are recognizers, matching the longest prefix of the string which they're able. Captures allow for substrings to be captured within the match, Actions perform actions at the index of the match, or on a capture.
JLpeg.C — FunctionC(patt::Pattern)Create a capture. Matching patt will return the matched substring. The sugared form is (patt,).
C(patt::Pattern, sym::Union{Symbol,AbstractString})Create a named capture with key :sym or "sym". Sugared form (patt, :sym).
JLpeg.Cg — FunctionCg(patt::Pattern [, sym::Union{Symbol,AbstractString}])Create a group capture, which groups all captures from P into a vector inside the PegMatch object. If sym is provided, the group will be found at that key. Sugared as [patt] or [patt, :sym].
JLpeg.Cr — FunctionCr(patt::Pattern, sym::Union{CapSym, Nothing})Captures a UnitRange of matches in patt, optionally keyed by sym. Convenient for substitutions and annotations.
JLpeg.Cc — FunctionCc(args...)Constant capture. Matches the empty string and puts the values of args as a tuple in that place within the PegMatch captures. If args is a Pair{Symbol,Any}, that symbol will appear as a key in the match.
JLpeg.Cp — FunctionCp()Captures the empty string in .captures, consuming no input. Useful for the side effect, of storing the corresponding offset.
JLpeg.M — FunctionJLpeg.CM — FunctionCM(patt::Pattern, sym::Symbol)Mark the match of patt with sym, while also capturing it with sym.
JLpeg.K — FunctionK(patt::Pattern, sym::Symbol, [check::Union{Function,Symbol}])ChecK the pattern against the previous Mark with the same tag. If check is not provided, it will return true if the SubStrings of the mark and check are identical, otherwise it must be either a symbol representing one of the builtins or a function with the signature (marked::SubString, checked::SubString)::Bool. The success or failure of the check is the success or failure of the pattern. If patt doesn't match, the check will not be performed. The check will always fail if the corresponding mark is not present, except for the builtin :always, which always succeeds if patt succeeds.
K may also be written in do syntax, K(patt, :tag) do s1, s2; ... end.
See also CK
JLpeg.CK — FunctionCK(patt::Pattern, sym::Symbol, check::Union{Function,Symbol})ChecK the pattern against the prior [Mark], capturing if the check suceeds. See K and C for details.
JLpeg.A — FunctionA(patt::Pattern, λ::Function)Acts as a grouping capture for patt, applying λ to a successful match with the captures as arguments (not as a single Vector). If patt contains no captures, the capture is the SubString. The return value of λ becomes the capture; if nothing is returned, the capture (and its offset) are deleted. May be invoked as patt <| λ.
JLpeg.Q — FunctionQ(patt::Pattern, λ::Function)A Query action. λ will receive the SubString matched by patt, and is expected to return a Bool representing the success or failure of the pattern. This happens during the parse, and may be used, for an example, to maintain a typedef symbol table when parsing C, with appropriate closed-over state variables.
JLpeg.Avm! — FunctionAvm!(patt::Pattern, λ::Function)A vm Action. If patt succeeds, λ will be called on the entire VMState, and is expected to return true or false to describe the success or failure of the entire pattern. This is an ultimate escape hatch, intended for purposes such as debugging, or imposing user limits on stack depth. The most obvious use is early parse termination, by setting vm.running to false.
While it may be abused to do all sorts of surprising hacks, if you think you need it in order to parse something, you're probably wrong about that.
JLpeg.T — FunctionT(label::Symbol)Throw a failure labeled with :label. If a rule :label exists, this will be called to attempt recovery, the success or failure of the recovery rule is then used. Otherwise, :label and the position of T(:label) will be attached to PegFail in the event that the whole match fails at that point in the string.
Rules and Grammars
The great advantage (other than composability) PEGs have over regular expressions is the ability to match recursive patterns. These are constructed out of Rules and Grammars.
JLpeg.Grammar — TypeGrammar(rule...)Create a grammar from the provided Rules.
JLpeg.Rule — TypeRule(name::Symbol, patt::Pattern)Sugar-free form of @rule, creates a rule from patt, assigning it the name name.
JLpeg.GrammarMacros.@rule — Macro@rule :name ← pattern...Sugared form for rule definition. Assigns the rule in-scope with the given name:
# Wrong
name = @rule :name ← "foo" | "bar"
# Right
@rule :name ← "foo" | "bar"In terms of scope and variable escaping, @rule functions identically to @grammar.
JLpeg.GrammarMacros.@grammar — Macro@grammar(name, rules)Syntax sugar for defining a set of rules as a single grammar. Expects a block rules, each of which is a rule-pair as can be created with ←, or <--. "string" will be interpolated as P("string"), :symbol as P(:symbol), and an integer n as P(n).
Any variable name exported by JLpeg will refer to the same value as the export, while any other variable is escaped, and will have the meaning it has in the scope where @grammar is called.
Example use
This simple grammar captures the first string of numbers it finds:
julia> @grammar capnums begin
:nums ← (:num,) | 1 * :nums
:num ← R"09"^1
end;
julia> match(capnums, "abc123abc123")
PegMatch(["123"])This one also captures the lowercase letters, converting them to uppercase.
julia> upper = uppercase; # A thoroughly unhygienic macro
julia> @grammar uppernums begin
:nums ← (:num,) | :abc * :nums
:num ← R"09"^1
:abc ← R"az"^1 <| upper
end;
julia> match(uppernums, "abc123abc123")
PegMatch(["ABC", "123"])More extensive examples may be found in the documentation.
JLpeg.GrammarMacros.@construle — Macro@construle :name ← pattern...Identical to @rule, but assigns the rule to a constant variable. Only legal in the top scope.
JLpeg.GrammarMacros.@constgrammar — Macro@constgrammar name, rulesIdentical to @grammar, but assigns the result to a constant. This is only valid in global scope.
Dialects
A work in progress.
JLpeg.re — Constantre : An Interpretation of LPeg's `re` moduleThe first dialect, intended, among other things, as a useful bootstrap of other dialects.
Generators
A PEG is a specification of a class of algorithms which are valid on a universe of strings. While the common thing to do is use this specification to construct a recognizer for such strings, it may also be used to create generators for valid strings in that universe.
JLpeg aspires to provide a complete set of generators for our patterns. So far, we have:
JLpeg.generate — Functiongenerate(set::PSet)::StringGenerate a String of all characters matched by a Set.
Benchmarking
JLpeg.matchreport — Functionmatchreport(patt::Pattern, subject::AbstractString)Matches patt against subject using an instrumented VM. This keeps a running count of instructions, amount of backtracking, and the number of times any given byte in the string is visited.
To state the obvious, this is not intended to be fast at all, it's provided as a tool for diagnosing performance bottlenecks in a pattern.
JLpeg.PegReport — TypePegReportReturned from a call to matchreport. Contains statistics gathered from the match run. Included:
matched::BoolDid the pattern match?heatmap::VectorNumber of instructions executed at each byte of the subject.backtracks::IntCount of times the pattern backtracked.max::IntIndex of the farthest match.advances::IntThe number of bytes advanced. For a linear pattern,sizeof(subject).count:: IntTotal number of instructions executed.capcount::IntNumber of frames on the capture stack.subjectThe string the pattern was matched against.
Printing this in the REPL will show a digest of these values, with a logarithmic heatmap of the subject string, as a visual indicator of where the pattern spends its time. Reports may also be saved and compared (manually), or added to your test suite, to detect performance regressions when a grammar is changed. A future release will use StyledStrings, which will allow the colors to easily be customized.