name: title layout: true class: center, middle, inverse .footnote.right[[Chicago Perl Mongers](http://chicago.pm.org)] --- name: code layout: true class: middle --- template: title # Parsing: Regex and Grammars ## Practical Language Parsing --- name: default layout: true class: center, middle --- # You have a
structured document YAML, JSON, XML --- # You have a
*lot of*
structured documents Config files, database dumps --- # You have a
lot of
*wildly different*
structured documents Web services, RPC --- # Build a script every time? Leave dozens of single-purpose scripts around --- # Build a way to report on structured documents Overengineering! --- template: title # `App::YAML::Filter` # `yq` --- template: title # Basic Filters --- # `{ foo: 1, bar: 2 }` # `.foo` -> `1` --- # `{ foo: 1, bar: 2 }` # `.bar` -> `2` --- template: title # Check for Equality ## Binary operator --- # `{ foo: 1, bar: 2 }` # `\.foo == 1` -> `true` --- # `{ foo: 1, bar: 2 }` # `\.foo == 2` -> `false` --- # `{ foo: 1, bar: 2 }` # `\.bar == 1` -> `false` --- # `{ foo: 1, bar: 2 }` # `\.bar == 2` -> `true` --- template: title # Functions --- # `{ foo: 1, bar: 2 }` # `grep( .foo == 1 )`
`{ foo: 1, bar: 2 }` # `grep( .foo == 2 )`
`empty` --- template: title # Regular Expressions --- # Filters: `.foo` ```perl $filter_re = qr{[.]\w+}; ``` --- # Integers: `1` ```perl $int_re = qr{\d+}; ``` --- # Equality: `FILTER == INT` ```perl $equal_re = qr{$filter_re\s+==\s+$int_re}; ``` --- # Function: `grep( EQUALITY )` ```perl $func_re = qr{(\w+)[(]\s+($equal_re)\s+[)]}; ``` --- template: code .center[ # Our Expressions ] ```perl $filter_re = qr{[.]\w+}; $int_re = qr{\d+}; $equal_re = qr{$filter_re\s+==\s+$int_re}; $func_re = qr{(\w+)[(]\s+($equal_re)\s+[)]}; ``` --- template: code .center[ # Run Our Program ] ```perl sub filter { my ( $class, $program, $document ) = @_; # ... } ``` --- template: code .center[ # Filters ] ```perl if ( $program =~ m{^$filter_re$} ) { my ( undef, $key ) = split /[.]/, $program; return $document->{ $key }; } ``` --- template: code .center[ # Integers ] ```perl elsif ( $program =~ m{^$int_re$} ) { return $program; } ``` --- template: code .center[ # Equality ] ```perl elsif ( $program =~ m{^$equal_re$} ) { my ( $lhs, $op, $rhs ) = split /\s+(==)\s+/, $program; my $lhs_value = $class->filter( $lhs, $document ); my $rhs_value = $class->filter( $rhs, $document ); if ( $lhs_value == $rhs_value ) { return 1; } else { return 0; } } ``` --- template: code .center[ # Function ] ```perl elsif ( $program =~ m{^$func_re$} ) { my ( $name, $argument ) = ( $1, $2 ); my $arg_val = $class->filter( $argument, $document ); if ( $name eq 'grep' ) { if ( $arg_val ) { return $document; } else { return; } } } ``` --- template: title # Limitations Of Our Language --- template: code .center[ # Filter == Int ] ```perl $filter_re = qr{[.]\w+}; $int_re = qr{\d+}; $equal_re = qr{$filter_re\s+==\s+$int_re}; ``` --- template: code .center[ # Filter|Int == Filter|Int ] ```perl $filter_re = qr{[.]\w+}; $int_re = qr{\d+}; $equal_re = qr{(?:$filter_re|$int_re)\s+==\s+(?:$filter_re|$int_re)}; ``` --- template: code .center[ # Filter|Int == Filter|Int ] ```perl $filter_re = qr{[.]\w+}; $int_re = qr{\d+}; $term_re = qr{$filter_re|$int_re}; $equal_re = qr{$term_re\s+==\s+$term_re}; ``` --- template: code .center[ # grep( EQUAL ) ] ```perl $func_re = qr{(\w+)[(]\s+($equal_re)\s+[)]}; ``` --- template: code .center[ # grep( FILTER|EQUAL ) ] ```perl $expr_re = qr{$equal_re|$filter_re}; $func_re = qr{(\w+)[(]\s+($expr_re)\s+[)]}; ``` --- template: code .center[ # Strings ] ```perl use Regexp::Common; my $string_re = $RE{delimited}{-delim=>q{'"}}; ``` --- template: code .center[ # Number Formats ## (Float, Hex, Oct, Bin) ] ```perl use Regexp::Common; my $evalnum_re = qr{(?: 0b $RE{num}{bin} | 0 $RE{num}{oct} | 0a $RE{num}{hex} )}x; ``` --- # More Operators ## !=, >, <, eq, ne --- # More Functions ## uniq, sort, length --- # Array Indexing # `.[0]` --- # Nested Data Structures # `.foo.bar` --- template: code .center[ # Recursive Parsing `grep( length(.foo) == 1 )` ] ```perl $expr_re = qr{$equal_re|$filter_re}; $func_re = qr{(\w+)[(]\s+($expr_re)\s+[)]}; ``` ```perl $expr_re = qr{$equal_re|$filter_re|(\w+)[(]\s+(?0)\s+[)]}; $func_re = qr{(\w+)[(]\s+($expr_re)\s+[)]}; ``` --- template: title # Developing a Grammar --- # Lexing --- # Changing
`Text`
into
`Tokens` --- # Parsing --- # Using
`Tokens`
to
`do something` --- template: title # Parse::RecDescent ## Damien Conway --- template: code .center[ # Starting from Regex ] ```perl $filter_re = qr{[.]\w+}; $int_re = qr{\d+}; $term_re = qr{$filter_re|$int_re}; $equal_re = qr{$term_re\s+==\s+$term_re}; $expr_re = qr{$equal_re|$filter_re}; $func_re = qr{(\w+)[(]\s+($expr_re)\s+[)]}; ``` --- template: code .center[ # P::RD Grammar ] ``` <autotree> program: func | expr | int filter: '.'
/\w+/ int: /\d+/ term: filter | int equal: lhs_term '==' rhs_term expr: equal | filter func: /\w+/ '(' expr ')' lhs_term: term rhs_term: term ``` --- # `
` --- # Using
`Tokens`
to
`do something` --- # Using
`Tokens`
to
`build a parse tree` --- # `program: func | expr | int` ## An entry point --- # `filter: '.' <skip:""> /\w+/` ## .foo ## No whitespace --- # `equal: lhs_term '==' rhs_term` ## `<autotree>` wants unique names --- template: code .center[ # The Parse Tree ] ``` $ perl parse-recdescent.pl '.foo' < data.yml ``` ```perl bless( { __RULE__ => "program", expr => bless( { __RULE__ => "expr", filter => bless( { __DIRECTIVE1__ => "\\s*", __PATTERN1__ => "foo", __RULE__ => "filter", __STRING1__ => "." }, 'filter' ) }, 'expr' ) }, 'program' ) ``` --- template: code .center[ # The Parse Tree ] ``` $ perl parse-recdescent.pl '.foo == 1' < data.yml ``` ```perl bless( { __RULE__ => "program", expr => bless( { __RULE__ => "expr", equal => bless( { __RULE__ => "equal", __STRING1__ => "==", lhs_term => bless( { __RULE__ => "lhs_term", term => bless( { __RULE__ => "term", filter => bless( { ... ``` --- template: code .center[ # Running the Program ] ```perl my $parser = Parse::RecDescent->new( $grammar ); sub filter { my ( $class, $program, $document ) = @_; my $tree = $parser->program( $program ); return run_tree( $tree, $document ); } ``` --- template: code .center[ # Running down the Tree ] ```perl sub run_tree { my ( $tree, $document ) = @_; ... } ``` --- template: code .center[ # Filters ] ```perl if ( $tree->{filter} ) { my ( $key ) = $tree->{filter}{__PATTERN1__}; return $document->{ $key }; } ``` --- template: code .center[ # Ints ] ```perl elsif ( $tree->{int} ) { return $tree->{int}{__VALUE__}; } ``` --- template: code .center[ # Equality ] ```perl elsif ( $tree->{equal} ) { my $lhs = $tree->{equal}{lhs_term}; my $rhs = $tree->{equal}{rhs_term}; my $lhs_value = run_tree( $lhs, $document ); my $rhs_value = run_tree( $rhs, $document ); if ( $lhs_value == $rhs_value ) { return 1; } else { return 0; } } ``` --- template: code .center[ # Functions ] ```perl elsif ( $tree->{func} ) { my $arg_val = run_tree( $tree->{func}{expr}, $document ); if ( $tree->{func}{__PATTERN1__} eq 'grep' ) { if ( $arg_val ) { return $document; } else { return; } } } ``` --- template: code .center[ # Intermediate Steps ] ```perl elsif ( $tree->{expr} ) { return run_tree( $tree->{expr}, $document ); } elsif ( $tree->{term} ) { return run_tree( $tree->{term}, $document ); } ``` --- template: title # Guidelines ## Learn from my pain --- template: code .center[ # Build In Layers ] * Primitives (int, string) * Variables (filter) * Operators (==) * Functions --- # Test in Layers!! --- # Small, Tight Rules --- # Parse Trees! --- template: code .center[ # Parser actions as normalization ] * Evaled numbers (oct, bin, hex) --- template: title # Marpa --- `this page unintentionally left blank` --- template: title # Thank You