Working With Matched Data
PegMatch
implements the interface of AbstractMatch, and as such, it is intentionally structured to be similar to RegexMatch
from the standard library. PEGs are a far richer and more sophisticated tool than regexen, however: a named capture might appear many times, captures can be grouped, those groups may have groups, with captures, having names, and so on.
Our intention is that simple matching will behave in a familiar way, with additional methods provided for more complex scenarios. Let's consider a simple rule with some captures.
julia> @rule :baddate ← (R"09"^[4], :year) * "-" * (R"09"^[2:2],) * "-" * (R"09"^[2], :day);
julia> date = match(baddate, "2024-01-10")
PegMatch([:year => "2024", "01", :day => "10"])
The rule name is because this is certainly not how you should parse a date! Note the two equivalent ways of specifying a definite number of repetitions, [2]
is preferred.
Let's illustrate how to work with this.
julia> date == [:year => "2024", "01", :day => "10"]
true
julia> [:year => "2024", "01", :day => "10"] == date
true
A PegMatch
is ==
to a Vector
with the same contents. However, note that a PegMatch
uses default hash equality:
julia> hash(date) == hash([:year => "2024", "01", :day => "10"])
false
This is somewhat at variance with doctrine, but we feel it's the correct choice here.
Next, let's look at iteration and indexing.
julia> date[:day]
"10"
julia> date[3]
"10"
julia> keys(date)
3-element Vector{Any}:
:year
2
:day
julia> collect(date)
3-element Vector{Any}:
"2024"
"01"
"10"
julia> collect(eachindex(date))
3-element Vector{Int64}:
1
2
3
julia> collect(pairs(date))
3-element Vector{Pair{A, SubString{String}} where A}:
:year => "2024"
2 => "01"
:day => "10"
julia> collect(enumerate(date))
3-element Vector{Tuple{Int64, Any}}:
(1, "2024")
(2, "01")
(3, "10")
Default iteration will get you the matches, pairs
uses the name of the capture when there is one, if a capture has a name, that can be used to index it, or the position in the Vector.
So far so good, what happens if a named capture matches more than once?
julia> @rule :abcs ← ((R"az"^1, :abc) | "123")^1;
julia> letters = match(abcs, "abc123def123ghi123")
PegMatch([:abc => "abc", :abc => "def", :abc => "ghi"])
julia> letters[:abc]
"abc"
julia> keys(letters)
3-element Vector{Any}:
:abc
2
3
julia> collect(pairs(letters))
3-element Vector{Pair{Symbol, SubString{String}}}:
:abc => "abc"
:abc => "def"
:abc => "ghi"
As you can see, the first match with that name is the indexable one, and therefore, is the only time :abc
appears in keys
, while all matches have their name in pairs
, or, if anonymous, their index.