Reference
Core API
The basic structure of recognition in JLPeg is a Pattern
.
JLpeg.Pattern
— TypeContainer for various patterns and grammars. Always has val
, which may be primitive or a Vector of patterns, and code
, a Vector of Instructions. Some patterns have fields unique to that type of pattern. A pattern which encloses other patterns will have an aux
field containing a Dict for metadata.
These are created in a wide variety of ways, all based on three functions, P
, S
, and R
, with the rest of the work done with a great quantity of operator overloading.
JLpeg.PegMatch
— TypePegMatch <: AbstractMatch
A type representing a successful match of a Pattern
on a string. Typically returned from the match
function. PegFail
is the atypical return value.
A PegMatch
is equal (==
) to a Vector
if the captures are equal to the Vector
.
For details of how to make use of a PegMatch
, see the section "Working With Matched Data" in the documentation.
Properties:
subject::AbstractString
: Stores the string matched.full::Bool
: Whether the match is of the entire string.captures::PegCapture
: Contains any captures from matchingpatt
tosubject
. This can in principle contain anything, as captures may call functions, in which case the return value of that function becomes the capture. For more information, consult theJLPeg
documentation, and the docstrings forC
,Cg
,Cc
, andA
.offsets::Vector{Int}
: Provided for compatibility withAbstractMatch
.SubString
s contain their own offsets, so this is unnecessary for normal work, but we generate them whenmatch.offsets
is used. It then consists of the indices at which the outer layer of captures may be found within the subject.patt::Pattern
: The pattern matched against the subject.
JLpeg.PegCapture
— TypePegCapture <: AbstractVector
The captures (including group captures) of a PegMatch
. Indexes and iterates in the same fashion.
JLpeg.PegFail
— TypePegFail
Returned on a failure to match(patt:Pattern, subject::AbstractString)
.
Properties
subject::AbstractString
: The string that we failed to match on.errpos
: The position at which the pattern ultimately failed to match.label::Symbol
: Info about the failure provided byT(::symbol)
, defaulting to:default
if the pattern fails but not at a throw point, or if no throws are provided.
JLpeg.PegError
— TypePegError(msg) <: Exception
An error while constructing a Pattern
.
Pattern Matching
Base.match
— Functionmatch(patt::Pattern, subject::AbstractString)::Union{PegMatch, PegFail}
Match patt
to subject
, returning a PegMatch
implementing the expected interface for its supertype AbstractMatch
, or a PegFail
with useful information about the failure.
JLpeg.compile!
— Functioncompile!(patt::Pattern)::Pattern
Compile a Pattern
. It isn't necessary to call this, match
will compile a Pattern
if necessary, but this is a useful thing to do during precompilation.
This translates the Pattern to Instruction codes, appending them to the code
field and returning same. Performs various optimizations in the process.
Constructing Patterns
Patterns are built by combining the products of these constructors into more complex Patterns.
JLpeg.P
— FunctionP(p::Union{AbstractString,AbstractChar,Integer,Bool,Symbol})::Pattern
Create a Pattern
.
- If
p
is aString
orChar
, this matches that string or character. - If
p
is a positiveInteger
, it matches that many characters. - If
p
istrue
, the rules succeeds, iffalse
, the rule fails. - If
p
is aSymbol
, this represents a call to the rule with that name. - If
p
is a negativeInteger
, matches if that many characters remain, consumes no input. - If
p
is already aPattern
, it is simply returned.
Examples
julia> match(P("func"), "funci")
PegMatch(["func"])
julia> match(P(3), "three")
PegMatch(["thr"])
julia> match(P(true), "pass")
PegMatch([""])
julia> match(P(false), "fail")
PegFail("fail", 1)
julia> match(P('👍'), "👍")
PegMatch(["👍"])
JLpeg.@P_str
— MacroP"str"
Call P(str) on the String, in close imitation of Lua's calling convention.
JLpeg.S
— MethodS(s::AbstractString)
Create a Pattern matching any character in the string.
JLpeg.@S_str
— MacroS"str"
Create a Set pattern of the characters in "str". See S
.
JLpeg.R
— MethodR(s::AbstractString)
Create a Pattern matching every character in the range from the first to the second character. s
must be two codepoints long, and the first must be lower-valued than the second. Both must be valid UTF-8.
JLpeg.R
— MethodR(a::AbstractChar, b::AbstractChar)
Match any character in the range a
to b
, inclusive.
JLpeg.@R_str
— MacroR"str"
Create a range pattern from "str". See R
.
JLpeg.B
— FunctionB(p::Union{Pattern,AbstractString,AbstractChar,Integer})
Match patt
behind the current subject index. patt
must be of fixed length. The most useful B
patterns are !B(1)
, which succeeds at the beginning of the string, and B('\n')|!B(1)
to match the start of a line.
JLpeg.U8
— FunctionU8(val::Integer)
Matches a single byte, whether or not that byte is a valid part of a UTF-8 string.
JLpeg.Combinators
JLpeg.Combinators
— ModuleJLpeg.Combinators
Contains all the combinator operators which shadow definitions in Base. The module redefines these symbols, providing a fallback to the Base, and are designed not to break existing code. The Julia compiler handles this kind of redirection well during inference, this technique is used by several packages which specialize operators for numerically-demanding tasks, without issue; we even transfer the Base docstrings.
Whether or not these overloads rise to the level of piracy is debatable. That said, we've walled them off, so that debate is not necessary.
Use
The @grammar
and @rule
macros are able to use these operators, so you may not need them. To import them, add this to your module:
import JLpeg.Combinators: *, -, %, |, ^, ~, !, >>, inv
Captures and Actions
PEG patterns are recognizers, matching the longest prefix of the string which they're able. Captures allow for substrings to be captured within the match, Actions perform actions at the index of the match, or on a capture.
JLpeg.C
— FunctionC(patt::Pattern)
Create a capture. Matching patt
will return the matched substring. The sugared form is (patt,)
.
C(patt::Pattern, sym::Union{Symbol,AbstractString})
Create a named capture with key :sym or "sym". Sugared form (patt, :sym)
.
JLpeg.Cg
— FunctionCg(patt::Pattern [, sym::Union{Symbol,AbstractString}])
Create a group capture, which groups all captures from P into a vector inside the PegMatch
object. If sym
is provided, the group will be found at that key. Sugared as [patt]
or [patt, :sym]
.
JLpeg.Cr
— FunctionCr(patt::Pattern, sym::Union{CapSym, Nothing})
Captures a UnitRange
of matches in patt
, optionally keyed by sym
. Convenient for substitutions and annotations.
JLpeg.Cc
— FunctionCc(args...)
Constant capture. Matches the empty string and puts the values of args
as a tuple in that place within the PegMatch
captures. If args
is a Pair{Symbol,Any}
, that symbol will appear as a key in the match.
JLpeg.Cp
— FunctionCp()
Captures the empty string in .captures
, consuming no input. Useful for the side effect, of storing the corresponding offset.
JLpeg.M
— FunctionJLpeg.CM
— FunctionCM(patt::Pattern, sym::Symbol)
Mark the match of patt
with sym
, while also capturing it with sym
.
JLpeg.K
— FunctionK(patt::Pattern, sym::Symbol, [check::Union{Function,Symbol}])
ChecK
the pattern against the previous Mark
with the same tag. If check
is not provided, it will return true
if the SubStrings of the mark and check are identical, otherwise it must be either a symbol representing one of the builtins or a function with the signature (marked::SubString, checked::SubString)::Bool
. The success or failure of the check is the success or failure of the pattern. If patt
doesn't match, the check will not be performed. The check will always fail if the corresponding mark is not present, except for the builtin :always
, which always succeeds if patt
succeeds.
K
may also be written in do syntax, K(patt, :tag) do s1, s2; ... end
.
See also CK
JLpeg.CK
— FunctionCK(patt::Pattern, sym::Symbol, check::Union{Function,Symbol})
ChecK
the pattern against the prior [Mark
], capturing if the check suceeds. See K
and C
for details.
JLpeg.A
— FunctionA(patt::Pattern, λ::Function)
Acts as a grouping capture for patt
, applying λ
to a successful match with the captures as arguments (not as a single Vector). If patt
contains no captures, the capture is the SubString. The return value of λ
becomes the capture; if nothing
is returned, the capture (and its offset) are deleted. May be invoked as patt <| λ
.
JLpeg.Q
— FunctionQ(patt::Pattern, λ::Function)
A Q
uery action. λ
will receive the SubString
matched by patt
, and is expected to return a Bool
representing the success or failure of the pattern. This happens during the parse, and may be used, for an example, to maintain a typedef symbol table when parsing C, with appropriate closed-over state variables.
JLpeg.Avm!
— FunctionAvm!(patt::Pattern, λ::Function)
A vm Action. If patt
succeeds, λ
will be called on the entire VMState
, and is expected to return true
or false
to describe the success or failure of the entire pattern. This is an ultimate escape hatch, intended for purposes such as debugging, or imposing user limits on stack depth. The most obvious use is early parse termination, by setting vm.running
to false
.
While it may be abused to do all sorts of surprising hacks, if you think you need it in order to parse something, you're probably wrong about that.
JLpeg.T
— FunctionT(label::Symbol)
Throw a failure labeled with :label
. If a rule :label
exists, this will be called to attempt recovery, the success or failure of the recovery rule is then used. Otherwise, :label
and the position of T(:label)
will be attached to PegFail
in the event that the whole match fails at that point in the string.
Rules and Grammars
The great advantage (other than composability) PEGs have over regular expressions is the ability to match recursive patterns. These are constructed out of Rules and Grammars.
JLpeg.Grammar
— TypeGrammar(rule...)
Create a grammar from the provided Rules
.
JLpeg.Rule
— TypeRule(name::Symbol, patt::Pattern)
Sugar-free form of @rule
, creates a rule from patt
, assigning it the name name
.
JLpeg.GrammarMacros.@rule
— Macro@rule :name ← pattern...
Sugared form for rule definition. Assigns the rule in-scope with the given name
:
# Wrong
name = @rule :name ← "foo" | "bar"
# Right
@rule :name ← "foo" | "bar"
In terms of scope and variable escaping, @rule
functions identically to @grammar
.
JLpeg.GrammarMacros.@grammar
— Macro@grammar(name, rules)
Syntax sugar for defining a set of rules as a single grammar. Expects a block rules
, each of which is a rule-pair as can be created with ←
, or <--
. "string"
will be interpolated as P("string")
, :symbol
as P(:symbol)
, and an integer n
as P(n)
.
Any variable name exported by JLpeg will refer to the same value as the export, while any other variable is escaped, and will have the meaning it has in the scope where @grammar
is called.
Example use
This simple grammar captures the first string of numbers it finds:
julia> @grammar capnums begin
:nums ← (:num,) | 1 * :nums
:num ← R"09"^1
end;
julia> match(capnums, "abc123abc123")
PegMatch(["123"])
This one also captures the lowercase letters, converting them to uppercase.
julia> upper = uppercase; # A thoroughly unhygienic macro
julia> @grammar uppernums begin
:nums ← (:num,) | :abc * :nums
:num ← R"09"^1
:abc ← R"az"^1 <| upper
end;
julia> match(uppernums, "abc123abc123")
PegMatch(["ABC", "123"])
More extensive examples may be found in the documentation.
JLpeg.GrammarMacros.@construle
— Macro@construle :name ← pattern...
Identical to @rule
, but assigns the rule to a constant variable. Only legal in the top scope.
JLpeg.GrammarMacros.@constgrammar
— Macro@constgrammar name, rules
Identical to @grammar
, but assigns the result to a constant. This is only valid in global scope.
Dialects
A work in progress.
JLpeg.re
— Constantre : An Interpretation of LPeg's `re` module
The first dialect, intended, among other things, as a useful bootstrap of other dialects.
Generators
A PEG is a specification of a class of algorithms which are valid on a universe of strings. While the common thing to do is use this specification to construct a recognizer for such strings, it may also be used to create generators for valid strings in that universe.
JLpeg aspires to provide a complete set of generators for our patterns. So far, we have:
JLpeg.generate
— Functiongenerate(set::PSet)::String
Generate a String
of all characters matched by a Set.
Benchmarking
JLpeg.matchreport
— Functionmatchreport(patt::Pattern, subject::AbstractString)
Matches patt
against subject
using an instrumented VM. This keeps a running count of instructions, amount of backtracking, and the number of times any given byte in the string is visited.
To state the obvious, this is not intended to be fast at all, it's provided as a tool for diagnosing performance bottlenecks in a pattern.
JLpeg.PegReport
— TypePegReport
Returned from a call to matchreport
. Contains statistics gathered from the match run. Included:
matched::Bool
Did the pattern match?heatmap::Vector
Number of instructions executed at each byte of the subject.backtracks::Int
Count of times the pattern backtracked.max::Int
Index of the farthest match.advances::Int
The number of bytes advanced. For a linear pattern,sizeof(subject)
.count:: Int
Total number of instructions executed.capcount::Int
Number of frames on the capture stack.subject
The string the pattern was matched against.
Printing this in the REPL will show a digest of these values, with a logarithmic heatmap of the subject string, as a visual indicator of where the pattern spends its time. Reports may also be saved and compared (manually), or added to your test suite, to detect performance regressions when a grammar is changed. A future release will use StyledStrings
, which will allow the colors to easily be customized.