image I am very interested in how natural languages work and evolve (see my review of the book The Language Instinct) and ever since I began playing with MGrammar I have wanted to see if it was possible to define English sentence structure using it.

Lets begin with a simple sentence:

The boy likes the girl.

This sentence is composed of the noun phrase (NP) "the boy" and a verb phrase(VP) "likes the girl". So lets begin with this MGrammar syntax:

syntax Main = S*;        
syntax S = NP VP ".";

If we look at the noun phrase "the boy", it is composed of determiner followed by noun, like wise if we look at the verb phrase "likes the girl" we see that it is composed of verb followed by a noun phrase. The MGrammar should then be:

syntax NP = Det? N;
syntax VP = V NP;

Then we just need to add some determiners, verbs and nouns:

syntax Det = "a" | "the" | "one";
syntax N = "boy" | "girl" | "dog" | "school" | "hair";
syntax V = "likes" | "bites" | "eats" | "discuss";

If you add an interleave rule to skip whitespace the sentence should be correctly parsed. That was a really simple sentence, lets add an adjective.

The nerdy boy likes the girl.

We need to modify the noun phrase rule. Before the noun an optional amount of adjectives (A*) can be placed. This is a simple change, just add A* to the noun phrase rule and add some adjectives.

syntax NP = Det? A* N;
syntax A = "happy" | "lucky" | "tall" | "red" | "nerdy";

That was simple, lets add something more to the sentence, for example:

The nerdy boy likes the girl from school with red hair .

I added a nested prepositional phrase (PP). A prepositional phrase is, according to wikipedia, composed of a preposition(P) and a noun phrase.

syntax NP = Det? A* N PP?;
syntax PP = P NP;
syntax P = "on" | "in" | "from" | "with";

The recursive nature of the PP phrase makes it possible to nest infinite number of prepositional phrases inside each other. Here is an illustration of the syntax tree for "girl from school with red hair":

image

I think I will stop here because this post is turning in to an English grammar lesson and I don't won't to loose all my subscribers :) Defining the English sentence structure in MGrammar is pretty pointless, unless you are building a grammar checker, in which case you are still out of luck as it will probably be impossible to define grammar for how words are built and you will run into trouble with ambiguity (which most natural languages have). But it was a fun try, and it is a good example for showing how recursive rules are parsed.

If you missed Martin Fowlers post on Oslo, it is a good read, I like how he defines it as a Language Workbench.

PS. I have started twittering, I know am late to the game, I just didn't get the point of twitter. I have been using it for a two days now and I am beginning to see the light. Oh, and please skip pointing out the irony with the inevitable grammatical errors in this post :)

4 comments:

leblancmeneses said...

I've found some interesting reads here. thanks!


looks like your final Sentence production rule isn't defined as your final example requires it to be.

using: (my own Language Workbench)
http://www.robusthaven.com/products
/Parsing+Expression+Grammar


here are my peg rules:
Space: [ ]+;

(?<Determiners>): 'a'\i / 'the'\i / 'one'\i ;
(?<Noun>): 'boy'\i / 'girl'\i / 'dog'\i / 'school'\i / 'hair'\i ;
(?<Verb>): 'likes'\i / 'bites'\i / 'eats'\i / 'discuss'\i ;
(?<Preposition>): 'on'\i / 'in'\i / 'from'\i / 'with'\i ;
(?<Adjectives>): 'happy'\i / 'lucky'\i / 'tall'\i / 'red'\i / 'nerdy'\i ;

(?<PrepositionalPhrase>): Preposition Space NounPhrase ;
(?<NounPhrase>): (Determiners Space)? (Adjectives Space)* Noun (Space PrepositionalPhrase)?;
(?<VerbPhrase>): Verb Space NounPhrase ;
(?<Sentence>): NounPhrase (Space VerbPhrase)? '.' ;

(?<Main>): Sentence* !.;
my unit tests are:
1) The boy likes the girl.
2) The nerdy boy likes the girl.
3) The nerdy boy likes the girl from school with red hair.
4) The girl from school with red hair.


currently 1-3 pass my tests.
4 fails.

by changing
VerbPhrase to be optional

4 provides an ast also.

is there an update missing to S?
maybe it should be...
syntax S = NP VP? ".";


by the way can i post this as a project .zip file with user contributed examples downloadable from my product page?

thanks,

lm

Torkel Ödegaard said...

Ok, thanks.

You can do what ever you like :)

Brian said...

If you would have continued, you would have also likely found that strict rule based systems are essentially useless in the modern day. Many (dare I say most) English speakers no longer stick to the 'rules' of the language and this throws a monkey wrench in the works. Of course you can continue to add rule after rule to encompass all the exceptions, but this is error prone and can cause the system to become unwieldy. A better approach would utilize example based or 'fuzzy' systems that are more tolerant of (or at least better represent) such variations between formal language specification and actual usage.

Torkel Ödegaard said...

@Brian

Interesting, I wounder who Word implements it grammar checker. I would also suspect that they use more fuzzy logic approach.