SlideShare a Scribd company logo
1 of 14
Download to read offline
Natural Language Generation from First-Order Expressions 1




               Natural Language Generation from First-Order Expressions
                                                   Thomas Mathew

                                              tam52@georgetown.edu
                                              Department of Linguistics
                                               Georgetown University,
                                             Washington DC 20057-1232

                                                     May 09 2008

         Abstract: In this paper I discuss an approach for generating natural language from a first-order
         logic representation. The approach shows that a grammar definition for the natural language and a
         lambda calculus based semantic rule collection can be applied for bi-directional translation using
         an overgenerate-and-prune mechanism.

         Keywords:     First-Order Logic, Natural Language Generation, Reversible Grammar/Semantic
         Rules


1. Introduction
First-order logic has been applied in natural language for question-and-answering systems which utilize the benefit
of a mathematical representation of a domain against which independent expressions can be validated or discredited
using mathematical operations. A second functional area, which is of greater interest from the perspective of the
problem I am trying to solve, is to utilize a first-order representation as an intermediate abstract representation in a
language ‘transducer’ which converts one natural language representation to another. Application areas include
Machine Translation, Text Summarization and Text Simplification. Such a transducer would depend on bi-
directional translations between natural language and first-order logic. The next section in this paper glosses over the
translation of language to first-order logic with the intent of introducing the reader to the grammar and semantic
rules that can be utilized for such a translation. The rest of the paper discusses how the same grammar and rules can
be applied to solve the reverse problem of translating first-order logic to natural language.

The concept of using bi-directional grammars is not a new one - Strzalkowski et al 1994 discuss in depth about
grammars that can perform bi-directional translation between natural language and a semantic abstraction. One
aspect discussed is the ease of maintainability when comparing the effort of maintaining one grammar and ruleset
versus that of maintaining two independent grammars and rulesets. Strzalkowski classifies three approaches on how
a bi-directional grammar can be utilized – (a) a parser and generator using a different evaluation strategy but sharing
a grammar (b) a parser and generator sharing an evaluation strategy but are developed as independent programs (c) a
program that can perform as both a parser and a generator. This paper suggests an approach that falls into category
(a).

With regards to similar work – (a) Whitney 1988 discusses a method using an independent grammar for translating
higher-order semantic representations into language using first-order predicate calculus (b) Dymetman et al 1988
discuss using a bi-directional grammar for machine translation via an intermediate higher-order semantic
representation.

2. First-Order Logic

First-order logic is a representation system utilizing an unambiguous formal language based on mathematical
structures. The language used allows constructions covered by prepositional logic extended with predicates and
quantification. Such a representation can be used to symbolize the semantic content in a natural language
construction. Consider:

         (1)      John loves Mary
Natural Language Generation from First-Order Expressions 2




Sentence (1) can be represented in a first-order form as LOVE(JOHN, MARY) where LOVE is a predicate which
defines a relation between its arguments JOHN and MARY. Methods for deriving a first-order form from English
have been explored in depth. Blackburn & Bos 2003 describe the use of lambda calculus as one such method which
involves annotating the grammatical rules that model the language with an appropriate lambda expression, This is
shown below for sentence (1):

          Grammar Rule                                λ-Expression                         Predicate           Argument
                                                                                           Generated           Generated
 S→ NP VP                               NP@VP                                          -                    -
 NP → Nproper                           Nproper                                        -                    -
 Nproper → John                         λp.p@JOHN                                      -                    JOHN
 Nproper → Mary                         λp.p@MARY                                      -                    MARY
 VP → Vtransitive NP                    Vtransitive@NP                                 -                    -
 Vtransitive → love(s)                  λpλq.(p@λr.LOVE(q, r))                         LOVE                 -

                         Table 1: Sample rules to generate first-order representation from English

By parsing the sentence using the grammar rules and appropriately evaluating and β-reducing the corresponding
lambda expressions (from Blackburn & Bos 2003), the equivalent first-order logic form can be derived. The β-
reduction steps are shown in Figure 1.




                                   Figure 1: Deriving a first-order logic representation

The discussion above is an over-simplification of the process to generate first-order logic – there are many specific
issues with natural language relating to ambiguity and actual language specification that would make this a hard
task. For the task at hand, however I am more concerned about the reverse translation ability and for this reason I
assume that grammatical rules defining a natural language which are annotated with lambda expressions would
suffice in generating a sufficiently descriptive first-order translation.

Predicate Types: A predicate in a first-order logic expression can be any of three types – relational, descriptive or
functional. A relational predicate describes the relationship between 2 or more entities. An example of a relational
predicate is LOVE(JOHN, MARY). A relational predicate would generally be derived from either a transitive/di-
transitive verb, a dependent clause or a prepositional phrase.
Natural Language Generation from First-Order Expressions 3



A descriptive predicate describes a property of its only argument and would generally be derived from either a noun,
adjective or an intransitive verb. This is shown in the examples below:

        (2)      BOXER(MARY) ≡ Mary is a boxer
        (3)      BOXER(MARY) ∧ WOMAN(MARY) ≡ Mary is a boxer and a woman

A functional predicate is used to describe an underspecified entity using a fully-described entity e.g.
MOTHER(JOHN) which can be derived from genitives such as John’s mother or NP constructions such as
Mother of John. A functional predicate can be used as an argument to a relational or descriptive predicate e.g.
LOVE(JOHN, MOTHER(JOHN)). Unlike the other predicate types, a functional predicate does not establish a
truth condition.




                                             Figure 2: Predicate Types

Argument Types: Arguments to a predicate can be constants, variables or a functional predicate. A constant
argument specifies a named entity such as JOHN or MARY. If a variable is used as an argument to a predicate, it
would be quantified in some form to represent either an existential or universal condition as shown in (4).

        (4)      ∃x (BOXER(x) ∧ WOMAN(x))

3. Natural Language Generation

Given that the first-order representation is unambiguous, if a natural language sentence S can be transformed into a
logical expression E without loss of meaning, then it should be possible to transform E back to a set of sentences of
which S is a member. For machine translation problems, it may suffice to select the first valid language sentence
that can be mapped to the expression. For other problems such as text simplification, the language generator may
need to generate multiple sentences and apply a ranking method to determine the simplest construction.

Each component of a first-order representation creates different complexities to address in the language generation
process. The following is a limited discussion of some issues that make the reverse translation a hard problem. The
discussion is based on the notion that a component of an expression can be considered to be associated to a token in
the language representation which has a specific part-of-speech.

Predicates: A specific predicate can be associated to tokens with different part-of-speech tags e.g. the descriptive
predicate WOMAN() can be realized from translation from either a noun (a woman) or an adjective (a lady ..). In
language generation, selection of one specific token associated with a particular part-of-speech would influence
selection of tokens associated with other descriptive predicates that control the same argument e.g. two nouns
describing the same entity in a transitive construction is unlikely.
Natural Language Generation from First-Order Expressions 4



An expression may have multiple instance of the same predicate. In a corresponding language representation, it is
possible that all such predicate instances with the same underlying predicate can be associated with a single
language token as shown below. This would generally involve language conjunction.

         (5)      LAUGH(JOHN) ∧ LAUGH(MARY)                          ≡ John and Mary laugh
                                                                     ≡ John laughs and Mary laughs
Functional predicates can be associated to either genitives or prepositions and may need to be treated independently
of non-functional predicates.

Arguments: An argument is likely to be referenced by more than one predicate in a sentence. It is also likely that all
such instances of the same argument would all be associated back to a single token. This is not always the case
though as conjunction and pronoun instances could result in multiple tokens referencing the same entity.

         (6)      LOVE(JOHN, MARY) ∧ LOVE(JOHN, DOG(JOHN)) ≡ John loves Mary and John loves
                                                             his dog

For the purpose of this paper, resolution of arguments to pronouns is considered out of scope and it is considered
sufficient to render the translation of the above expression as John loves Mary and John loves John’s dog. This
is because the anaphor resolution of the pronoun his cannot be handled by the approach described in the previous
section. As an alternative, anaphor resolution can be considered as a pre-processing step in first-order expression
generation and the generation of pronouns as a post-processing step in language generation.

Operators: The first-order language supports the set of operators {∧, ∨, →, =, ¬}. An operator can be directly
related back to a token in the language representation or it could also be implied by the ordering of tokens. Specific
operators are described below:

The conjunction operator ∧ can be directly associated to the conjunction token and. It could also represent the
relationship between a verb and a noun managed by a determiner such as a or some (7) or an adjective+noun (8).

         (7)      ∃x (BARK(x) ∧ DOG(x)) ≡ a dog barks
         (8)      ∃x (BARK(x) ∧ GREEN(x) ∧ DOG(x)) ≡ a green dog barks

The disjunction operator ∨ can be directly associated to the token or.

The implication operator → can be directly associated to tokens such as if, whenever. It could also represent the
relationship between a verb and a noun managed by a determiner such as every (9) or an adjective+noun (10) or a
copular construction.

         (9)      ∀x (DOG(x) → BARK(x)) ≡ every dog barks
         (10)     ∀x (EGG(x) ∧ GREEN(x) → LIKE(SAM, x)) ≡ Sam likes green eggs

The identity operator can also be represented as a predicate using the form EQUAL(x, y). Identity can result out of
an adjective.

The negation ¬ operator can be directly associated to the adverb token not. It could also represent the relationship
between a verb and a noun managed by an adjective such as two:

         (11)     ∃x (DOG(x) ∧ BARK(y) ∧ ∃x (DOG(y) ∧ BARK(y) ∧ ¬(x = y)) ≡ two dogs bark

The negation operator can influence the association between other operators to tokens. In the example below the
operator ∧ is associated to the token or because of the negation operation.

         (12)     (¬LIKE(SAM, EGG) ∧ ¬LIKE(SAM, HAM)) ≡ Sam does not like eggs or ham
Natural Language Generation from First-Order Expressions 5




Quantifiers: These include ∃ and ∀. Quantifiers can be associated to both determiners and adjectives. These are
shown in the above examples.

Parenthesis: This could impact the scoping of a predicate or operator.

The complexity in deciphering the generation issues listed earlier is quite considerable. Given the structure of the
first-order representation, the approach described in the previous section of using lambda calculus to translate
English to first-order logic is not feasible for language generation. It would be still be ideal to utilize the rules that
are used to derive first-order logic to guide the reverse translation.

One possible solution is to apply an overgenerate-and-prune strategy. This approach has been successfully applied to
natural language problems including language generation. This involves weakening the restrictions to allow for
overgeneration of language. Selectional restrictions can them be imposed to weed out bad translations. This
approach is described in the following sections.

4. Overgeneration approach
Consider the simple grammar below annotated with lambda expressions:

Non-terminal rules:
         Grammar Rule                                 λ-Expression                        Predicate            Argument
                                                                                          Generated            Generated
 S → NP VP                               NP@VP                                        -                    -
 VP → Vtransitive NP                     Vtransitive@NP                               -                    -
 VP → Vintransitive                      Vintransitive                                -                    -
 NP → Nproper                            Nproper                                      -                    -
 NP → D N                                D@N                                          -                    -
 NP → D N C                              (D@N)@C                                      -                    -
 NP → D AP                               D@AP                                         -                    -
 AP → A N                                λp.(A@p ∧ N@p)                               -                    -
 C → RP Vintransitive                    Vintransitive                                -                    -

Terminal rules:
          Grammar Rule                                λ-Expression                      Predicate              Argument
                                                                                        Generated              Generated
 Vtransitive → love(s)                   λpλq.(p@λr.LOVE(q, r))                       LOVE                 -
 Vintransitive → boxes                   λp.BOXER(p)                                  BOXER                -
 Nproper → John                          λp.p@JOHN                                    -                    JOHN
 A → lady                                λp.WOMAN(p)                                  WOMAN                -
 N → woman                               λp.WOMAN(p)                                  WOMAN                -
 A → boxing                              λp.BOXER(p)                                  BOXER                -
 N → boxer                               λp.BOXER(p)                                  BOXER                -
 RP → who                                −                                            -                    -
 D→a                                     λpλq.∃x (p@r ∧ q@r)                          -                    -
 D → some                                λpλq.∃x (p@r ∧ q@r)                          -                    -

Consider the sentences below, all of which can be derived from the rules above:

         (13)     John loves a lady boxer
         (14)     ?John loves a boxing woman
         (15)     John loves some woman who boxes
Natural Language Generation from First-Order Expressions 6



All the sentences can be associated with a first-order representation of the form:

         (16)     ∃x (LOVE(JOHN, x) ∧ BOXER(x) ∧ WOMAN(x))

Our goal is to determine a set of rules which would allow a direct derivation of (13), (14) and (15) from (16) using
the annotated grammar as a reference. It is straightforward to parse the expression and determine all predicates,
quantifiers, constant arguments and operators. Each such component must be directly derived from a terminal rule
(note that a single terminal rule can produce more than one first-order component). A set of candidate terminal rules
can be determined by looking up rules whose lambda expressions contain components found in the first order
expression. Starting from such a collection of candidate rules, all possible paths using the set of available non-
terminal grammar rules and terminal rules without an associated lambda expression can be navigated bottom-up till
a rule is reached whose LHS cannot be produced by any other rule. If any of the selected candidate rules have not
been used, the specific navigation path shall be marked as a failure. The mapping process between the first-order
components and the terminal rules is what determines the domain of sentences that get generated. This mapping is
not necessarily one-to-one – many first-order components may be associated with a single terminal rule - consider
example (5) above. This approach overgenerates since there is no selectional restriction enforced between realization
of predicates and their arguments.

The approach is explained in more detail below. A bold symbol X indicates a set of elements. #X indicates the
number of elements within the set X. R is the set of rules in the grammar. It is assumed that a lambda-expression for
a rule can contain at most one predicate or one argument.


          Extract the set of all predicate instances P and distinct predicate functions P from E.

            E.g. for the expression E = Z(x)∧Z(y), P = {Z, Z} and P = {Z}

          For every p∈P, if there are n instances of p in P, determine the set of all possible rulesets RSp
          where for each possible ruleset Rp∈RSp if r∈Rp then:
           - r ∈ {x∈R | λ-expression(x) contains p}
           - #Rp ≤ n

            E.g. if P = {Z, Z} and P = {Z} and if there are 2 rules r1 and r2 whose lambda-expression
            contains Z then RSp = {{r1}, {r2}, {r1, r1}, {r1, r2}, {r2, r2}}

          Determine the set of all possible rulesets RSpred where for each possible ruleset Rpred∈RSpred,
          Rpred = ∪p∈P Rp where Rp∈RSp.

            E.g. if P = {Y, Z} and RSp=Y = {{r1}, {r1, r1}} and RSp=Z = {{r2}, {r3}} then RSpred = {{r1,
            r2}, {r1, r3}, {r1, r1, r2}, {r1, r1, r3}}

          Extract the set of all constant argument instances A and distinct predicate functions A from E.

            E.g. for the expression Z(x)∧Y(x), A = {x, x} and A = {x}

          For every a∈A, if there are n instances of a in A, determine the set of all possible rulesets RSa
          where for each possible ruleset Ra∈RSa if r∈Ra then:
           - r ∈ {x∈R | λ-expression(x) contains a}
           - #Ra ≤ n

          Determine the set of all possible rulesets RSarg where for each possible ruleset Rarg∈RSarg, Rarg =
Natural Language Generation from First-Order Expressions 7




∪a∈A Ra where Ra∈RSa
Extract the set of all quantifier instances Q from E

  E.g. for the expression ∃x (Z(x)∧Y(x)), Q = {∃}

For every q∈Q, determine the set of all possible rulesets RSquant where for each possible ruleset
Rquant∈RSquant if r∈Rquant then:
 - r ∈ {x∈R | λ-expression(x) contains q ∧ ¬λ-expression(x) contains a predicate}
 - #Rquant ≤ #Q

  E.g. if Q = {∃, ∀, ∀} and if lambda-expression of r1 contains ∃ and that of r2 and r3
  contains ∀ then RSquant = {{r1, r2, r2}, {r1, r2, r3}, {r1, r3, r3}}

Extract the set of all quantifier instances O from E

  E.g. for the expression ∃x (Z(x)∧Y(x)), O = {∧}

For every o∈O, determine the set of all possible rulesets RSoper where for each possible ruleset
Roper∈RSoper if r∈Roper then:
 - r ∈ {x∈R | λ-expression(x) contains o ∧ ¬λ-expression(x) contains a predicate/
     quantifier}
 - #Roper ≤ #O

  E.g. if O = {∧, ∧} and if lambda-expression of r1 and r2 contain ∧ then RSquant = {{r1, r1,},
  {r1, r2}, {r2, r2}}

Determine the set of all possible rulesets RS where Rcandidate=Rpred∪Rarg∪Rquant∪Roper where
Rcandidate∈RS, Rpred∈RSpred, Rarg∈RSarg, Rquant∈RSquant, Roper∈∅∪RSoper

  E.g. if E = ∃x (LOVE(JOHN, x)∧WOMAN(x)) and
  r1 = [Vtransitive → love(s)],
  r2 = [Nproper → John],
  r3 = [D → a],
  r4 = [D → some],
  r5 = [N → woman],
  r6 = [CONJ → and],
  then RSpred = {{r1, r5}}, RSarg = {{r2}}, RSquant = {{r3}, {r4}}, RSoper = {{r6}}
  and RS = {{r1, r5, r2, r3}, {r1, r5, r2, r4}, {r1, r5, r2, r3, r6}, {r1, r5, r2, r4, r6}}

For each Rcandidate∈RS, find all paths to a non-terminal rule with an un-referenced LHS (i.e. S)
such that
 - The path has to use every element of Rcandidate one time
 - The path can use any non-terminal rule any number of times
 - The path can use any terminal rule without a λ−expression any number of times

  E.g. using the previous example one possible Rcandidate = {r1, r5, r2, r3} would generate
  John love(s) a woman and a woman love(s) John
Natural Language Generation from First-Order Expressions 8




Figure 3: Overgeneration process
Natural Language Generation from First-Order Expressions 9



Each path identified represents a candidate natural language sentence. This approach will produce the sentences
(13), (14) and (15), amongst others, when applied to the expression (16). It will satisfy the requirement that if S can
be translated to E using a set of rules, then E can be translated back to S.

The results for the first-order expression in (16) are:

         Input First-order                             Overgenerated                            Valid
            Expression                                  Translations
                                        John love(s) some lady boxer                               •
                                        some lady boxer love(s) John
                                        John love(s) a lady boxer                                  •
                                        a lady boxer love(s) John
                                        John love(s) some boxing woman                             •
 ∃x (LOVE(JOHN, x) ∧                    some boxing woman love(s) John
 BOXER(x) ∧ WOMAN(x))                   John love(s) a boxing woman                                •
                                        a boxing woman love(s) John
                                        some woman who boxes love(s) John
                                        John love(s) some woman who boxes                          •
                                        a woman who boxes love(s) John
                                        John love(s) a woman who boxes                             •

As mentioned earlier there is no context applied while processing the grammar. As a result the output will include
unwanted results such as:

         Incorrect NP assignment: both John loves a lady boxer and A lady boxer loves John are outputs

         Incorrect quantifier assignment: The expression (∃x (MAN(x) ∧ ∀y (WOMAN(y) → LOVE(x, y))) will
         generate both a man loves every woman and every man loves a woman).

The annotated grammar rules used shall need to include all orphan tokens. An orphan token is one that has no
association with an element in the first-order representation. This might require some duplication of terminal rules.
An example is a copular construction. Consider the example below:

         (17)      BOXER(JOHN) ≡ John is a boxer

In order for this approach to handle bi-directional translations that can handle such a case, there shall need to be 2
logical representations for the rule D → a. T his may result in ambiguity in generating first-order representations.

5. Pruning process
The overgeneration step will create well-formed language sentences – however some of the output will not be a
good translation for the first-order expression. The pruning process applies a filter to drop out such bad translations.

Option 1
One solution to weed out bad results is by regenerating the first-order representation back and comparing with the
original expression E to verify if the translation was valid. This does not require development of any new translation
rule as the methodology described in section 2 using lambda calculus would suffice.

The complexity in this approach arises from determining if two first-order representations are equivalent. This is due
to the nature of mathematical representation as shown below.

         (18)      ∃x (BOXER(x) ∧ WOMAN(x)) ≡ ∃x (WOMAN(x) ∧ BOXER(x))

This means that a literal equivalency test cannot be applied – instead mathematical equivalence would need to be
established. One method of doing this is using the system Otter.
Natural Language Generation from First-Order Expressions 10




This method shall become computationally intensive if there are a lot of candidate sentences to evaluate. On the flip-
side it is guaranteed to produce perfect pruning results provided a fool-proof system for equating logical expressions
can be developed.

Option 2
So far the discussions have stayed away from unraveling a first-order representation. A solution that does this is
described below. This approach focuses on multi-argument predicates to validate the generated language. If a
sentence does not have at least one multi-argument predicates (such as copular constructions), the algorithm does
not prune.

In English, a multi-argument predicate can generally be mapped to a verb. The first argument of the predicate is
generally realized to the left of the element that realizes the predicate. The exception to this is a passive construction.
The ordering of the tokens that realize the arguments and predicate can be determined by the order of the lambda
variables with respect to the predicate arguments. This is shown in the expressions below:



         Active form        ⇒ λpλq.(p@λr.LOVE(q, r))


         Passive form       ⇒ λpλq.(p@λr.LOVE(r, q))

In a di-transitive case, the active construction check is the same as that for the transitive case. In the passive form, it
needs to be verified that the first argument to the predicate is realized to the right of the element that realizes the
predicate. This is shown in the expressions below:



         Active form        ⇒ λpλqλr.(p@q@λsλt.GIVE(r, s, t))


         Passive form       ⇒ λpλqλr.(p@q@λsλt.GIVE(s, r, t))


A simple rule can be based on tracking the orientation of all argument realizations for a predicate with respect to the
realization of the predicate. Note that this rule is very specific to a SVO language – but can conceivable be adapted
to other structures. It is to be determined if the lambda expression can be processed so as to have a more generic
rule.

To enforce such an ordering rule, there has to be traceability between (a) the arguments and their realization (b) the
predicates and their realization. This is generally straightforward to do (the sentence generation rules can be
modified to keep a trace) for sentences where there is a 1:1 association between predicate+argument components
and rules. If there is no direct alignment, one possible scenario is multiple predicate instances all mapped to one rule
(see (5)) – in which case a trace can also be maintained with effort. For any greater combination, such as three
predicate instances mapped to two rule instances, creating a trace is more difficult as multiple combinations may
need to be evaluated – such combinations may be unlikely.

Two special cases for argument realization are: (a) Variable Arguments: The sentence generation rules must be
substantially be modified to allow for tracing of variables to its realization. If the variable is quantified, the
realization of the quantifier can be used to assess the order. The case of unquantified variables has not been
evaluated. (b) Functional Predicates: In this case, the argument realization can be mapped to the realization of the
functional predicate.

The generation rules will now include the following modified and new steps in order to maintain a trace:
Natural Language Generation from First-Order Expressions 11




Determine a map between each instance of p and an element r∈Rpred

  E.g. if P = {Z, Z, Y} and P = {Z, Y} and if there are 2 rules r1, r2 whose λ-expression
  contains Z and 1 rule r3 whose λ-expression contains Z then


                  RSpred = {{r1, r3}, {r2, r3}, {r1, r1, r3}, {r1, r2, r3}, {r2, r2, r3}}


       map = {{{1,2}, {3}}, {{1,2}, {3}}, {{1}, {2}, {3}}, {{1}, {2}, {3}}, {{1}, {2}, {3}}}



                                              P = {Z, Z, Y}

  1, 2, 3 refer to the index of the elements in P

For each predicate rule p∈P, identify the arguments to the predicate:
 - if argument is a constant argument, link p to a rule rarg∈Rarg∈RSarg where rarg contains an
     argument for p
 - if argument is a variable argument, link p to a rule rquant∈Rquant∈RSquant where rquant contains
     the quantifier scope for the argument for r

  E.g. if E = ∃x (LOVE(JOHN, x)∧WOMAN(x))
  r1 = [Vtransitive → love(s)],
  r2 = [Nproper → John],
  r3 = [D → a],
  r4 = [D → some],
  r5 = [N → woman],
  then RSpred = {{r1, r5}}, RSarg = {{r2}}, RSquant = {{r3}, {r4}}
  and RS = {{r1, r5, r2, r3}, {r1, r5, r2, r4}}



For each Rcandidate∈RS, find all paths to a non-terminal rule with an un-referenced LHS (i.e. S) such
that
 - The path has to use every element of Rcandidate one time
 - The path can use any non-terminal rule any number of times
 - The path can use any terminal rule without a λ−expression any number of times
 - For every p∈P determine the ordering of arguments with respect to the predicate in the path using
     the map to resolve how the predicate was realized in the generated sentence and using the links
     to determine how the arguments were realized

  E.g. using the previous example for one possible Rcandidate = {r1, r5, r2, r3} then
  P = {LOVE, WOMAN}, RSpred = {{r1, r5}}, map = {{{1}, {2}}}

  The trace for generated sentences is shown below:
  John love(s) a woman
    r2      r1     r3       r5
Natural Language Generation from First-Order Expressions 12



            a    woman      love(s)    John
            r3     r5         r1         r2


The rules to handle pruning are as follows:


          Extract the set of all predicate instances with more than one argument Pmultiargs from E

            E.g. for the expression E = ∃x (LOVE(JOHN, x)∧WOMAN(x)), Pmultiargs = {LOVE}

          For each p∈Pmultiargs determine whether the first argument should be realized to the left or right of the
          token realizing the predicate based on the order of the variables that describe the order of the arguments
          in the lambda expression for the rule associated with p

            E.g. for the example above, if r1 = [Vtransitive → love(s)] : λpλq.(p@λr.LOVE(q, r)) then
            q which is the first argument is defined by λq which precedes λr which is the second
            argument.

            For the predicate LOVE, the first argument is expected to be to the left of the token that
            realizes the predicate. The sentence order is expected to be rarg1 rpred rarg2.

          For each path discard the path if the argument ordering of any predicate is incorrect

            E.g. for the example E = ∃x (LOVE(JOHN, x)∧WOMAN(x)), for the specific predicate
            LOVE, for the candidate rule {r1, r5, r2, r3}, the sentence order should be r2 r1 r3.

            This marks the sentence a woman love(s) John as invalid


A second issue is that the rule does not cover restrictions based on agreement. The output will include John love a
lady boxer. This limitation cannot be addressed with the current state of the rules. To solve this issue the rules shall
need to be annotated with agreement restrictions which can be evaluated as the paths are navigated.

6. Implementation

I implemented this system using the language C# using the .NET framework. The object model used is below:
Natural Language Generation from First-Order Expressions 13




The classes used by this model are:

         Class                                                 Description
 Rule                    This represents a grammar rule annotated with a lambda expression. The class provides
                         methods for parsing a rule into RHS and LHS components. This class inherits from the
                         generic Expression class.

 FOExpression            This represents a first-order expression. This class inherits from the generic
                         Expression class.

                         The method ProcessExpression()accepts a grammar in the form of
                         RuleCollection and determines candidate terminal rules that can be associated with
                         the expression. For each set of terminal rules, the method calls RuleCollection
                         .GetRules() to determine if the set can generate a language sentence. This method
                         internally uses recursion to determine all possible sets of terminal rules.

 Expression              Provides a representation for a mathematical expression with quantifiers, constant
                         arguments, predicates and operators.

 RuleCollection          This represents a collection of rules i.e. represents the full grammar of the natural
                         language.

                         The method GetRules()provides a mechanism for determining all valid paths through
                         the grammar if a list of terminal rules are provided. This method internally uses recursion
                         to navigate the grammar rules.


7. Cost and Optimization

A valid concern against this approach is the question of cost. The cost implications are driven from two angles – one
is the complexity and scale of the grammar and the second is the complexity of the expression being evaluated.

The first issue is of lesser concern since it can be assumed that in an almost full-fledged grammar of English, a
larger percentage of the rules shall specify terminal rules as compared to non-terminal rules – this is because of the
large vocabulary. The algorithm discussed selects early on for terminal rules and hence avoids a performance hit of
having to use a huge grammar in generating a sentence.

The second issue has a larger impact. This limitation is because the algorithm searches for every valid candidate –
consider for example that there are n! permutations for the task to fit n noun phrases into a sentence with n noun-
phrase place-holders. That said, every application may not require every valid language translation for the processed
expression – in which case it may suffice to do an immediate pruning check for every successful sentence generated
and a successful pass through the pruning rules for one sentence could stop the entire algorithm from processing any
further.

8. Conclusion

I have so far defined an approach in which a grammar and ruleset for the compositional construction of a first-order
representation from English can be utilized successfully for the reverse translation problem as well. The approach is
robust in that there is very limited additional information that is required for the system to function. However this
robustness comes at a computation cost which is reasonable for simple first-order expressions but which can become
considerable for lengthy discourse. This overhead may not be that detrimental for application in the space of Text
Summarization and Text Simplification problems where it can be anticipated that an ideal system would be only
Natural Language Generation from First-Order Expressions 14



required to process limited expressions. Possible future enhancements that can be taken into account to make this
approach more efficient are listed below:

         Integrate pruning steps into the generator process wherever possible

         Apply a mix of a Top-down and Bottom-Up evaluation strategy in determining if the selected ruleset is a
         good candidate for a well-formed sentence. This can avoid the overhead of unnecessary evaluation of a
         frequently used rule such as S → NP VP for every pass.

         Another alternative to navigating the hierarchy is to cache navigation paths found using the LHS of all the
         candidate rules involved as a key to the cache. For every new candidate ruleset, the cache shall be searched
         first – if no entry is found then the system can navigate the rules. Each time the grammar changes though
         the cache shall need to be dropped

The current pruning system which translates the ordering in a lambda expression into a natural language ordering for
multi-argument predicates does have a dependency on non-terminal grammar rules which has been ignored. A more
robust scenario would be to process the non-terminal rules to infer the ordering. Language generation issues relating
to adverbs, adjuncts and tense has not been covered by this paper and is another area for future review.

As for future steps, the author would like to develop a text simplification system which performs the following:

    1.   Translate a complex sentence into first-order logic
    2.   Develop a processing layer that can mathematically simplify a first-order expression
    3.   Develop a processing layer that can break a first-order expression into independent expressions
    4.   Apply the approach described above to generate a natural language translation from the simplified
         expression

I would like to thank Prof. Graham Katz for providing the opportunity to write this paper and for his feedback.

9. References
Blackburn, P & Bos, J. (2003). Computational Semantics for Natural Language. NASSLLI.

Strzalkowski, T. et al. (1994). Reversible Grammar in Natural Language Processing. Kluwer Academic Publishers.

Whitney, R. (1988). Semantic Transformations for Natural Language Production. ISI Research Report.

Dymetman, M. & Isabelle, P. (1988). Reversible Logic Grammars for Machine Translation. Second International
   Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages.

Smullyan, R. (1994). First-Order Logic. Dover Publications.

More Related Content

What's hot

Syntax tree diagrams
Syntax tree diagramsSyntax tree diagrams
Syntax tree diagramsRUBEN ZAPATA
 
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]Dr. Shadia Banjar
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
morphology and syntax
morphology and syntaxmorphology and syntax
morphology and syntaxguest39e240
 
The Application of Distributed Morphology to the Lithuanian First Accent Noun...
The Application of Distributed Morphology to the Lithuanian First Accent Noun...The Application of Distributed Morphology to the Lithuanian First Accent Noun...
The Application of Distributed Morphology to the Lithuanian First Accent Noun...Adrian Lin
 
Syntax 334 lecture 3
Syntax 334 lecture 3Syntax 334 lecture 3
Syntax 334 lecture 3Akashgary
 
Deep structure and surface structure
Deep structure and surface structureDeep structure and surface structure
Deep structure and surface structureAsif Ali Raza
 
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...Guy De Pauw
 
Addlaall search-engine--hattab-haddad-yaseen-uop
Addlaall search-engine--hattab-haddad-yaseen-uopAddlaall search-engine--hattab-haddad-yaseen-uop
Addlaall search-engine--hattab-haddad-yaseen-uopworld20000
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingHemantha Kulathilake
 
Principles and parameters of grammar report
Principles and parameters of grammar reportPrinciples and parameters of grammar report
Principles and parameters of grammar reportAubrey Expressionista
 
Morphology background and history
Morphology background and historyMorphology background and history
Morphology background and historyMUHAMMADAFZAL378
 
Linguistics fact sheets
Linguistics fact sheetsLinguistics fact sheets
Linguistics fact sheetsAilenjane Enoc
 
Structures Of Axis Of Chain In Systemic Grammar
Structures Of Axis Of Chain In Systemic GrammarStructures Of Axis Of Chain In Systemic Grammar
Structures Of Axis Of Chain In Systemic GrammarDr. Cupid Lucid
 

What's hot (18)

Syntax tree diagrams
Syntax tree diagramsSyntax tree diagrams
Syntax tree diagrams
 
Generative grammer
Generative grammerGenerative grammer
Generative grammer
 
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]
Syntax & syntactic analysis,lec.1, dr. shadia.ppt [compatibility mode]
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
morphology and syntax
morphology and syntaxmorphology and syntax
morphology and syntax
 
The Application of Distributed Morphology to the Lithuanian First Accent Noun...
The Application of Distributed Morphology to the Lithuanian First Accent Noun...The Application of Distributed Morphology to the Lithuanian First Accent Noun...
The Application of Distributed Morphology to the Lithuanian First Accent Noun...
 
Syntax 334 lecture 3
Syntax 334 lecture 3Syntax 334 lecture 3
Syntax 334 lecture 3
 
Recursion
RecursionRecursion
Recursion
 
Deep structure and surface structure
Deep structure and surface structureDeep structure and surface structure
Deep structure and surface structure
 
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
 
Addlaall search-engine--hattab-haddad-yaseen-uop
Addlaall search-engine--hattab-haddad-yaseen-uopAddlaall search-engine--hattab-haddad-yaseen-uop
Addlaall search-engine--hattab-haddad-yaseen-uop
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
Principles and parameters of grammar report
Principles and parameters of grammar reportPrinciples and parameters of grammar report
Principles and parameters of grammar report
 
Morphology background and history
Morphology background and historyMorphology background and history
Morphology background and history
 
Minimalist program
Minimalist programMinimalist program
Minimalist program
 
Ngu phap
Ngu phapNgu phap
Ngu phap
 
Linguistics fact sheets
Linguistics fact sheetsLinguistics fact sheets
Linguistics fact sheets
 
Structures Of Axis Of Chain In Systemic Grammar
Structures Of Axis Of Chain In Systemic GrammarStructures Of Axis Of Chain In Systemic Grammar
Structures Of Axis Of Chain In Systemic Grammar
 

Viewers also liked

Sequences 01
Sequences 01Sequences 01
Sequences 01kmfob
 
Presentation1
Presentation1Presentation1
Presentation1tugasCALL
 
Netcloud - Business Card
Netcloud - Business CardNetcloud - Business Card
Netcloud - Business CardNetCloudIsrael
 
Axial strain (due to axial & moment stress)10.01.03.143
Axial strain (due to axial & moment stress)10.01.03.143Axial strain (due to axial & moment stress)10.01.03.143
Axial strain (due to axial & moment stress)10.01.03.143MAHFUZUR RAHMAN
 
Semantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by WikipediaSemantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by WikipediaMaxim Grinev
 
Indianapolis - Wikipedia and the Cultural Sector
Indianapolis - Wikipedia and the Cultural SectorIndianapolis - Wikipedia and the Cultural Sector
Indianapolis - Wikipedia and the Cultural Sectorwittylama
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityIDES Editor
 
Natural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization OpportunitiesNatural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization OpportunitiesAutomated Insights
 
Online Character Recognition
Online Character RecognitionOnline Character Recognition
Online Character RecognitionKamakhya Gupta
 
Language translation english to hindi
Language translation english to hindiLanguage translation english to hindi
Language translation english to hindiRAJENDRA VERMA
 
Automatic Document Summarization
Automatic Document SummarizationAutomatic Document Summarization
Automatic Document SummarizationFindwise
 
Machine Translation=Google Translator
Machine Translation=Google TranslatorMachine Translation=Google Translator
Machine Translation=Google TranslatorNerea
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translationStephen Peacock
 
Deductive and Inductive Arguments
Deductive and Inductive ArgumentsDeductive and Inductive Arguments
Deductive and Inductive ArgumentsJanet Stemwedel
 

Viewers also liked (20)

Query Based Summarization
Query Based SummarizationQuery Based Summarization
Query Based Summarization
 
Sequences 01
Sequences 01Sequences 01
Sequences 01
 
Presentation1
Presentation1Presentation1
Presentation1
 
Netcloud - Business Card
Netcloud - Business CardNetcloud - Business Card
Netcloud - Business Card
 
Stress strain
Stress strainStress strain
Stress strain
 
Axial strain (due to axial & moment stress)10.01.03.143
Axial strain (due to axial & moment stress)10.01.03.143Axial strain (due to axial & moment stress)10.01.03.143
Axial strain (due to axial & moment stress)10.01.03.143
 
Logic_160619_01
Logic_160619_01Logic_160619_01
Logic_160619_01
 
Semantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by WikipediaSemantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by Wikipedia
 
Indianapolis - Wikipedia and the Cultural Sector
Indianapolis - Wikipedia and the Cultural SectorIndianapolis - Wikipedia and the Cultural Sector
Indianapolis - Wikipedia and the Cultural Sector
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
 
Natural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization OpportunitiesNatural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization Opportunities
 
Assignment 3
Assignment 3Assignment 3
Assignment 3
 
Online Character Recognition
Online Character RecognitionOnline Character Recognition
Online Character Recognition
 
Language translation english to hindi
Language translation english to hindiLanguage translation english to hindi
Language translation english to hindi
 
Civil Engineering Materials
Civil Engineering MaterialsCivil Engineering Materials
Civil Engineering Materials
 
Automatic Document Summarization
Automatic Document SummarizationAutomatic Document Summarization
Automatic Document Summarization
 
CSS Syllabus 2016
CSS Syllabus 2016CSS Syllabus 2016
CSS Syllabus 2016
 
Machine Translation=Google Translator
Machine Translation=Google TranslatorMachine Translation=Google Translator
Machine Translation=Google Translator
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translation
 
Deductive and Inductive Arguments
Deductive and Inductive ArgumentsDeductive and Inductive Arguments
Deductive and Inductive Arguments
 

Similar to Natural Language Generation from First-Order Expressions

ways of teaching grammar
 ways of teaching grammar  ways of teaching grammar
ways of teaching grammar Tudosan Alesea
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsijaia
 
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...Svetlin Nakov
 
Anaphors and Pronominals in Tiv: Government-Binding Approach
Anaphors and Pronominals in Tiv: Government-Binding ApproachAnaphors and Pronominals in Tiv: Government-Binding Approach
Anaphors and Pronominals in Tiv: Government-Binding ApproachIsaac Kuna
 
Emspresentation11 140918070933-phpapp02
Emspresentation11 140918070933-phpapp02Emspresentation11 140918070933-phpapp02
Emspresentation11 140918070933-phpapp02Abdullah Al-Asmari
 
Principles of parameters
Principles of parametersPrinciples of parameters
Principles of parametersVelnar
 
grammaticality, deep & surface structure, and ambiguity
grammaticality, deep & surface structure, and ambiguitygrammaticality, deep & surface structure, and ambiguity
grammaticality, deep & surface structure, and ambiguityDedew Deviarini
 
The early theroy of chomsky
The early theroy of chomskyThe early theroy of chomsky
The early theroy of chomskyKarim Islam
 
Morphology, grammar
Morphology, grammarMorphology, grammar
Morphology, grammarSovanna Kakk
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianIDES Editor
 
Syntax presetation
Syntax presetationSyntax presetation
Syntax presetationqamaraftab6
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 

Similar to Natural Language Generation from First-Order Expressions (20)

Syntactic parsing for arabic
Syntactic parsing for arabicSyntactic parsing for arabic
Syntactic parsing for arabic
 
Performance Grammar
Performance GrammarPerformance Grammar
Performance Grammar
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
CONTRASTIVE
CONTRASTIVECONTRASTIVE
CONTRASTIVE
 
ways of teaching grammar
 ways of teaching grammar  ways of teaching grammar
ways of teaching grammar
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristics
 
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...
 
Anaphors and Pronominals in Tiv: Government-Binding Approach
Anaphors and Pronominals in Tiv: Government-Binding ApproachAnaphors and Pronominals in Tiv: Government-Binding Approach
Anaphors and Pronominals in Tiv: Government-Binding Approach
 
Emspresentation11 140918070933-phpapp02
Emspresentation11 140918070933-phpapp02Emspresentation11 140918070933-phpapp02
Emspresentation11 140918070933-phpapp02
 
Principles of parameters
Principles of parametersPrinciples of parameters
Principles of parameters
 
Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
grammaticality, deep & surface structure, and ambiguity
grammaticality, deep & surface structure, and ambiguitygrammaticality, deep & surface structure, and ambiguity
grammaticality, deep & surface structure, and ambiguity
 
Grammar Syntax(1).pptx
Grammar Syntax(1).pptxGrammar Syntax(1).pptx
Grammar Syntax(1).pptx
 
The early theroy of chomsky
The early theroy of chomskyThe early theroy of chomsky
The early theroy of chomsky
 
Morphology, grammar
Morphology, grammarMorphology, grammar
Morphology, grammar
 
P99 1067
P99 1067P99 1067
P99 1067
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of Persian
 
ENGLISH SYNTAX
ENGLISH SYNTAXENGLISH SYNTAX
ENGLISH SYNTAX
 
Syntax presetation
Syntax presetationSyntax presetation
Syntax presetation
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Natural Language Generation from First-Order Expressions

  • 1. Natural Language Generation from First-Order Expressions 1 Natural Language Generation from First-Order Expressions Thomas Mathew tam52@georgetown.edu Department of Linguistics Georgetown University, Washington DC 20057-1232 May 09 2008 Abstract: In this paper I discuss an approach for generating natural language from a first-order logic representation. The approach shows that a grammar definition for the natural language and a lambda calculus based semantic rule collection can be applied for bi-directional translation using an overgenerate-and-prune mechanism. Keywords: First-Order Logic, Natural Language Generation, Reversible Grammar/Semantic Rules 1. Introduction First-order logic has been applied in natural language for question-and-answering systems which utilize the benefit of a mathematical representation of a domain against which independent expressions can be validated or discredited using mathematical operations. A second functional area, which is of greater interest from the perspective of the problem I am trying to solve, is to utilize a first-order representation as an intermediate abstract representation in a language ‘transducer’ which converts one natural language representation to another. Application areas include Machine Translation, Text Summarization and Text Simplification. Such a transducer would depend on bi- directional translations between natural language and first-order logic. The next section in this paper glosses over the translation of language to first-order logic with the intent of introducing the reader to the grammar and semantic rules that can be utilized for such a translation. The rest of the paper discusses how the same grammar and rules can be applied to solve the reverse problem of translating first-order logic to natural language. The concept of using bi-directional grammars is not a new one - Strzalkowski et al 1994 discuss in depth about grammars that can perform bi-directional translation between natural language and a semantic abstraction. One aspect discussed is the ease of maintainability when comparing the effort of maintaining one grammar and ruleset versus that of maintaining two independent grammars and rulesets. Strzalkowski classifies three approaches on how a bi-directional grammar can be utilized – (a) a parser and generator using a different evaluation strategy but sharing a grammar (b) a parser and generator sharing an evaluation strategy but are developed as independent programs (c) a program that can perform as both a parser and a generator. This paper suggests an approach that falls into category (a). With regards to similar work – (a) Whitney 1988 discusses a method using an independent grammar for translating higher-order semantic representations into language using first-order predicate calculus (b) Dymetman et al 1988 discuss using a bi-directional grammar for machine translation via an intermediate higher-order semantic representation. 2. First-Order Logic First-order logic is a representation system utilizing an unambiguous formal language based on mathematical structures. The language used allows constructions covered by prepositional logic extended with predicates and quantification. Such a representation can be used to symbolize the semantic content in a natural language construction. Consider: (1) John loves Mary
  • 2. Natural Language Generation from First-Order Expressions 2 Sentence (1) can be represented in a first-order form as LOVE(JOHN, MARY) where LOVE is a predicate which defines a relation between its arguments JOHN and MARY. Methods for deriving a first-order form from English have been explored in depth. Blackburn & Bos 2003 describe the use of lambda calculus as one such method which involves annotating the grammatical rules that model the language with an appropriate lambda expression, This is shown below for sentence (1): Grammar Rule λ-Expression Predicate Argument Generated Generated S→ NP VP NP@VP - - NP → Nproper Nproper - - Nproper → John λp.p@JOHN - JOHN Nproper → Mary λp.p@MARY - MARY VP → Vtransitive NP Vtransitive@NP - - Vtransitive → love(s) λpλq.(p@λr.LOVE(q, r)) LOVE - Table 1: Sample rules to generate first-order representation from English By parsing the sentence using the grammar rules and appropriately evaluating and β-reducing the corresponding lambda expressions (from Blackburn & Bos 2003), the equivalent first-order logic form can be derived. The β- reduction steps are shown in Figure 1. Figure 1: Deriving a first-order logic representation The discussion above is an over-simplification of the process to generate first-order logic – there are many specific issues with natural language relating to ambiguity and actual language specification that would make this a hard task. For the task at hand, however I am more concerned about the reverse translation ability and for this reason I assume that grammatical rules defining a natural language which are annotated with lambda expressions would suffice in generating a sufficiently descriptive first-order translation. Predicate Types: A predicate in a first-order logic expression can be any of three types – relational, descriptive or functional. A relational predicate describes the relationship between 2 or more entities. An example of a relational predicate is LOVE(JOHN, MARY). A relational predicate would generally be derived from either a transitive/di- transitive verb, a dependent clause or a prepositional phrase.
  • 3. Natural Language Generation from First-Order Expressions 3 A descriptive predicate describes a property of its only argument and would generally be derived from either a noun, adjective or an intransitive verb. This is shown in the examples below: (2) BOXER(MARY) ≡ Mary is a boxer (3) BOXER(MARY) ∧ WOMAN(MARY) ≡ Mary is a boxer and a woman A functional predicate is used to describe an underspecified entity using a fully-described entity e.g. MOTHER(JOHN) which can be derived from genitives such as John’s mother or NP constructions such as Mother of John. A functional predicate can be used as an argument to a relational or descriptive predicate e.g. LOVE(JOHN, MOTHER(JOHN)). Unlike the other predicate types, a functional predicate does not establish a truth condition. Figure 2: Predicate Types Argument Types: Arguments to a predicate can be constants, variables or a functional predicate. A constant argument specifies a named entity such as JOHN or MARY. If a variable is used as an argument to a predicate, it would be quantified in some form to represent either an existential or universal condition as shown in (4). (4) ∃x (BOXER(x) ∧ WOMAN(x)) 3. Natural Language Generation Given that the first-order representation is unambiguous, if a natural language sentence S can be transformed into a logical expression E without loss of meaning, then it should be possible to transform E back to a set of sentences of which S is a member. For machine translation problems, it may suffice to select the first valid language sentence that can be mapped to the expression. For other problems such as text simplification, the language generator may need to generate multiple sentences and apply a ranking method to determine the simplest construction. Each component of a first-order representation creates different complexities to address in the language generation process. The following is a limited discussion of some issues that make the reverse translation a hard problem. The discussion is based on the notion that a component of an expression can be considered to be associated to a token in the language representation which has a specific part-of-speech. Predicates: A specific predicate can be associated to tokens with different part-of-speech tags e.g. the descriptive predicate WOMAN() can be realized from translation from either a noun (a woman) or an adjective (a lady ..). In language generation, selection of one specific token associated with a particular part-of-speech would influence selection of tokens associated with other descriptive predicates that control the same argument e.g. two nouns describing the same entity in a transitive construction is unlikely.
  • 4. Natural Language Generation from First-Order Expressions 4 An expression may have multiple instance of the same predicate. In a corresponding language representation, it is possible that all such predicate instances with the same underlying predicate can be associated with a single language token as shown below. This would generally involve language conjunction. (5) LAUGH(JOHN) ∧ LAUGH(MARY) ≡ John and Mary laugh ≡ John laughs and Mary laughs Functional predicates can be associated to either genitives or prepositions and may need to be treated independently of non-functional predicates. Arguments: An argument is likely to be referenced by more than one predicate in a sentence. It is also likely that all such instances of the same argument would all be associated back to a single token. This is not always the case though as conjunction and pronoun instances could result in multiple tokens referencing the same entity. (6) LOVE(JOHN, MARY) ∧ LOVE(JOHN, DOG(JOHN)) ≡ John loves Mary and John loves his dog For the purpose of this paper, resolution of arguments to pronouns is considered out of scope and it is considered sufficient to render the translation of the above expression as John loves Mary and John loves John’s dog. This is because the anaphor resolution of the pronoun his cannot be handled by the approach described in the previous section. As an alternative, anaphor resolution can be considered as a pre-processing step in first-order expression generation and the generation of pronouns as a post-processing step in language generation. Operators: The first-order language supports the set of operators {∧, ∨, →, =, ¬}. An operator can be directly related back to a token in the language representation or it could also be implied by the ordering of tokens. Specific operators are described below: The conjunction operator ∧ can be directly associated to the conjunction token and. It could also represent the relationship between a verb and a noun managed by a determiner such as a or some (7) or an adjective+noun (8). (7) ∃x (BARK(x) ∧ DOG(x)) ≡ a dog barks (8) ∃x (BARK(x) ∧ GREEN(x) ∧ DOG(x)) ≡ a green dog barks The disjunction operator ∨ can be directly associated to the token or. The implication operator → can be directly associated to tokens such as if, whenever. It could also represent the relationship between a verb and a noun managed by a determiner such as every (9) or an adjective+noun (10) or a copular construction. (9) ∀x (DOG(x) → BARK(x)) ≡ every dog barks (10) ∀x (EGG(x) ∧ GREEN(x) → LIKE(SAM, x)) ≡ Sam likes green eggs The identity operator can also be represented as a predicate using the form EQUAL(x, y). Identity can result out of an adjective. The negation ¬ operator can be directly associated to the adverb token not. It could also represent the relationship between a verb and a noun managed by an adjective such as two: (11) ∃x (DOG(x) ∧ BARK(y) ∧ ∃x (DOG(y) ∧ BARK(y) ∧ ¬(x = y)) ≡ two dogs bark The negation operator can influence the association between other operators to tokens. In the example below the operator ∧ is associated to the token or because of the negation operation. (12) (¬LIKE(SAM, EGG) ∧ ¬LIKE(SAM, HAM)) ≡ Sam does not like eggs or ham
  • 5. Natural Language Generation from First-Order Expressions 5 Quantifiers: These include ∃ and ∀. Quantifiers can be associated to both determiners and adjectives. These are shown in the above examples. Parenthesis: This could impact the scoping of a predicate or operator. The complexity in deciphering the generation issues listed earlier is quite considerable. Given the structure of the first-order representation, the approach described in the previous section of using lambda calculus to translate English to first-order logic is not feasible for language generation. It would be still be ideal to utilize the rules that are used to derive first-order logic to guide the reverse translation. One possible solution is to apply an overgenerate-and-prune strategy. This approach has been successfully applied to natural language problems including language generation. This involves weakening the restrictions to allow for overgeneration of language. Selectional restrictions can them be imposed to weed out bad translations. This approach is described in the following sections. 4. Overgeneration approach Consider the simple grammar below annotated with lambda expressions: Non-terminal rules: Grammar Rule λ-Expression Predicate Argument Generated Generated S → NP VP NP@VP - - VP → Vtransitive NP Vtransitive@NP - - VP → Vintransitive Vintransitive - - NP → Nproper Nproper - - NP → D N D@N - - NP → D N C (D@N)@C - - NP → D AP D@AP - - AP → A N λp.(A@p ∧ N@p) - - C → RP Vintransitive Vintransitive - - Terminal rules: Grammar Rule λ-Expression Predicate Argument Generated Generated Vtransitive → love(s) λpλq.(p@λr.LOVE(q, r)) LOVE - Vintransitive → boxes λp.BOXER(p) BOXER - Nproper → John λp.p@JOHN - JOHN A → lady λp.WOMAN(p) WOMAN - N → woman λp.WOMAN(p) WOMAN - A → boxing λp.BOXER(p) BOXER - N → boxer λp.BOXER(p) BOXER - RP → who − - - D→a λpλq.∃x (p@r ∧ q@r) - - D → some λpλq.∃x (p@r ∧ q@r) - - Consider the sentences below, all of which can be derived from the rules above: (13) John loves a lady boxer (14) ?John loves a boxing woman (15) John loves some woman who boxes
  • 6. Natural Language Generation from First-Order Expressions 6 All the sentences can be associated with a first-order representation of the form: (16) ∃x (LOVE(JOHN, x) ∧ BOXER(x) ∧ WOMAN(x)) Our goal is to determine a set of rules which would allow a direct derivation of (13), (14) and (15) from (16) using the annotated grammar as a reference. It is straightforward to parse the expression and determine all predicates, quantifiers, constant arguments and operators. Each such component must be directly derived from a terminal rule (note that a single terminal rule can produce more than one first-order component). A set of candidate terminal rules can be determined by looking up rules whose lambda expressions contain components found in the first order expression. Starting from such a collection of candidate rules, all possible paths using the set of available non- terminal grammar rules and terminal rules without an associated lambda expression can be navigated bottom-up till a rule is reached whose LHS cannot be produced by any other rule. If any of the selected candidate rules have not been used, the specific navigation path shall be marked as a failure. The mapping process between the first-order components and the terminal rules is what determines the domain of sentences that get generated. This mapping is not necessarily one-to-one – many first-order components may be associated with a single terminal rule - consider example (5) above. This approach overgenerates since there is no selectional restriction enforced between realization of predicates and their arguments. The approach is explained in more detail below. A bold symbol X indicates a set of elements. #X indicates the number of elements within the set X. R is the set of rules in the grammar. It is assumed that a lambda-expression for a rule can contain at most one predicate or one argument. Extract the set of all predicate instances P and distinct predicate functions P from E. E.g. for the expression E = Z(x)∧Z(y), P = {Z, Z} and P = {Z} For every p∈P, if there are n instances of p in P, determine the set of all possible rulesets RSp where for each possible ruleset Rp∈RSp if r∈Rp then: - r ∈ {x∈R | λ-expression(x) contains p} - #Rp ≤ n E.g. if P = {Z, Z} and P = {Z} and if there are 2 rules r1 and r2 whose lambda-expression contains Z then RSp = {{r1}, {r2}, {r1, r1}, {r1, r2}, {r2, r2}} Determine the set of all possible rulesets RSpred where for each possible ruleset Rpred∈RSpred, Rpred = ∪p∈P Rp where Rp∈RSp. E.g. if P = {Y, Z} and RSp=Y = {{r1}, {r1, r1}} and RSp=Z = {{r2}, {r3}} then RSpred = {{r1, r2}, {r1, r3}, {r1, r1, r2}, {r1, r1, r3}} Extract the set of all constant argument instances A and distinct predicate functions A from E. E.g. for the expression Z(x)∧Y(x), A = {x, x} and A = {x} For every a∈A, if there are n instances of a in A, determine the set of all possible rulesets RSa where for each possible ruleset Ra∈RSa if r∈Ra then: - r ∈ {x∈R | λ-expression(x) contains a} - #Ra ≤ n Determine the set of all possible rulesets RSarg where for each possible ruleset Rarg∈RSarg, Rarg =
  • 7. Natural Language Generation from First-Order Expressions 7 ∪a∈A Ra where Ra∈RSa Extract the set of all quantifier instances Q from E E.g. for the expression ∃x (Z(x)∧Y(x)), Q = {∃} For every q∈Q, determine the set of all possible rulesets RSquant where for each possible ruleset Rquant∈RSquant if r∈Rquant then: - r ∈ {x∈R | λ-expression(x) contains q ∧ ¬λ-expression(x) contains a predicate} - #Rquant ≤ #Q E.g. if Q = {∃, ∀, ∀} and if lambda-expression of r1 contains ∃ and that of r2 and r3 contains ∀ then RSquant = {{r1, r2, r2}, {r1, r2, r3}, {r1, r3, r3}} Extract the set of all quantifier instances O from E E.g. for the expression ∃x (Z(x)∧Y(x)), O = {∧} For every o∈O, determine the set of all possible rulesets RSoper where for each possible ruleset Roper∈RSoper if r∈Roper then: - r ∈ {x∈R | λ-expression(x) contains o ∧ ¬λ-expression(x) contains a predicate/ quantifier} - #Roper ≤ #O E.g. if O = {∧, ∧} and if lambda-expression of r1 and r2 contain ∧ then RSquant = {{r1, r1,}, {r1, r2}, {r2, r2}} Determine the set of all possible rulesets RS where Rcandidate=Rpred∪Rarg∪Rquant∪Roper where Rcandidate∈RS, Rpred∈RSpred, Rarg∈RSarg, Rquant∈RSquant, Roper∈∅∪RSoper E.g. if E = ∃x (LOVE(JOHN, x)∧WOMAN(x)) and r1 = [Vtransitive → love(s)], r2 = [Nproper → John], r3 = [D → a], r4 = [D → some], r5 = [N → woman], r6 = [CONJ → and], then RSpred = {{r1, r5}}, RSarg = {{r2}}, RSquant = {{r3}, {r4}}, RSoper = {{r6}} and RS = {{r1, r5, r2, r3}, {r1, r5, r2, r4}, {r1, r5, r2, r3, r6}, {r1, r5, r2, r4, r6}} For each Rcandidate∈RS, find all paths to a non-terminal rule with an un-referenced LHS (i.e. S) such that - The path has to use every element of Rcandidate one time - The path can use any non-terminal rule any number of times - The path can use any terminal rule without a λ−expression any number of times E.g. using the previous example one possible Rcandidate = {r1, r5, r2, r3} would generate John love(s) a woman and a woman love(s) John
  • 8. Natural Language Generation from First-Order Expressions 8 Figure 3: Overgeneration process
  • 9. Natural Language Generation from First-Order Expressions 9 Each path identified represents a candidate natural language sentence. This approach will produce the sentences (13), (14) and (15), amongst others, when applied to the expression (16). It will satisfy the requirement that if S can be translated to E using a set of rules, then E can be translated back to S. The results for the first-order expression in (16) are: Input First-order Overgenerated Valid Expression Translations John love(s) some lady boxer • some lady boxer love(s) John John love(s) a lady boxer • a lady boxer love(s) John John love(s) some boxing woman • ∃x (LOVE(JOHN, x) ∧ some boxing woman love(s) John BOXER(x) ∧ WOMAN(x)) John love(s) a boxing woman • a boxing woman love(s) John some woman who boxes love(s) John John love(s) some woman who boxes • a woman who boxes love(s) John John love(s) a woman who boxes • As mentioned earlier there is no context applied while processing the grammar. As a result the output will include unwanted results such as: Incorrect NP assignment: both John loves a lady boxer and A lady boxer loves John are outputs Incorrect quantifier assignment: The expression (∃x (MAN(x) ∧ ∀y (WOMAN(y) → LOVE(x, y))) will generate both a man loves every woman and every man loves a woman). The annotated grammar rules used shall need to include all orphan tokens. An orphan token is one that has no association with an element in the first-order representation. This might require some duplication of terminal rules. An example is a copular construction. Consider the example below: (17) BOXER(JOHN) ≡ John is a boxer In order for this approach to handle bi-directional translations that can handle such a case, there shall need to be 2 logical representations for the rule D → a. T his may result in ambiguity in generating first-order representations. 5. Pruning process The overgeneration step will create well-formed language sentences – however some of the output will not be a good translation for the first-order expression. The pruning process applies a filter to drop out such bad translations. Option 1 One solution to weed out bad results is by regenerating the first-order representation back and comparing with the original expression E to verify if the translation was valid. This does not require development of any new translation rule as the methodology described in section 2 using lambda calculus would suffice. The complexity in this approach arises from determining if two first-order representations are equivalent. This is due to the nature of mathematical representation as shown below. (18) ∃x (BOXER(x) ∧ WOMAN(x)) ≡ ∃x (WOMAN(x) ∧ BOXER(x)) This means that a literal equivalency test cannot be applied – instead mathematical equivalence would need to be established. One method of doing this is using the system Otter.
  • 10. Natural Language Generation from First-Order Expressions 10 This method shall become computationally intensive if there are a lot of candidate sentences to evaluate. On the flip- side it is guaranteed to produce perfect pruning results provided a fool-proof system for equating logical expressions can be developed. Option 2 So far the discussions have stayed away from unraveling a first-order representation. A solution that does this is described below. This approach focuses on multi-argument predicates to validate the generated language. If a sentence does not have at least one multi-argument predicates (such as copular constructions), the algorithm does not prune. In English, a multi-argument predicate can generally be mapped to a verb. The first argument of the predicate is generally realized to the left of the element that realizes the predicate. The exception to this is a passive construction. The ordering of the tokens that realize the arguments and predicate can be determined by the order of the lambda variables with respect to the predicate arguments. This is shown in the expressions below: Active form ⇒ λpλq.(p@λr.LOVE(q, r)) Passive form ⇒ λpλq.(p@λr.LOVE(r, q)) In a di-transitive case, the active construction check is the same as that for the transitive case. In the passive form, it needs to be verified that the first argument to the predicate is realized to the right of the element that realizes the predicate. This is shown in the expressions below: Active form ⇒ λpλqλr.(p@q@λsλt.GIVE(r, s, t)) Passive form ⇒ λpλqλr.(p@q@λsλt.GIVE(s, r, t)) A simple rule can be based on tracking the orientation of all argument realizations for a predicate with respect to the realization of the predicate. Note that this rule is very specific to a SVO language – but can conceivable be adapted to other structures. It is to be determined if the lambda expression can be processed so as to have a more generic rule. To enforce such an ordering rule, there has to be traceability between (a) the arguments and their realization (b) the predicates and their realization. This is generally straightforward to do (the sentence generation rules can be modified to keep a trace) for sentences where there is a 1:1 association between predicate+argument components and rules. If there is no direct alignment, one possible scenario is multiple predicate instances all mapped to one rule (see (5)) – in which case a trace can also be maintained with effort. For any greater combination, such as three predicate instances mapped to two rule instances, creating a trace is more difficult as multiple combinations may need to be evaluated – such combinations may be unlikely. Two special cases for argument realization are: (a) Variable Arguments: The sentence generation rules must be substantially be modified to allow for tracing of variables to its realization. If the variable is quantified, the realization of the quantifier can be used to assess the order. The case of unquantified variables has not been evaluated. (b) Functional Predicates: In this case, the argument realization can be mapped to the realization of the functional predicate. The generation rules will now include the following modified and new steps in order to maintain a trace:
  • 11. Natural Language Generation from First-Order Expressions 11 Determine a map between each instance of p and an element r∈Rpred E.g. if P = {Z, Z, Y} and P = {Z, Y} and if there are 2 rules r1, r2 whose λ-expression contains Z and 1 rule r3 whose λ-expression contains Z then RSpred = {{r1, r3}, {r2, r3}, {r1, r1, r3}, {r1, r2, r3}, {r2, r2, r3}} map = {{{1,2}, {3}}, {{1,2}, {3}}, {{1}, {2}, {3}}, {{1}, {2}, {3}}, {{1}, {2}, {3}}} P = {Z, Z, Y} 1, 2, 3 refer to the index of the elements in P For each predicate rule p∈P, identify the arguments to the predicate: - if argument is a constant argument, link p to a rule rarg∈Rarg∈RSarg where rarg contains an argument for p - if argument is a variable argument, link p to a rule rquant∈Rquant∈RSquant where rquant contains the quantifier scope for the argument for r E.g. if E = ∃x (LOVE(JOHN, x)∧WOMAN(x)) r1 = [Vtransitive → love(s)], r2 = [Nproper → John], r3 = [D → a], r4 = [D → some], r5 = [N → woman], then RSpred = {{r1, r5}}, RSarg = {{r2}}, RSquant = {{r3}, {r4}} and RS = {{r1, r5, r2, r3}, {r1, r5, r2, r4}} For each Rcandidate∈RS, find all paths to a non-terminal rule with an un-referenced LHS (i.e. S) such that - The path has to use every element of Rcandidate one time - The path can use any non-terminal rule any number of times - The path can use any terminal rule without a λ−expression any number of times - For every p∈P determine the ordering of arguments with respect to the predicate in the path using the map to resolve how the predicate was realized in the generated sentence and using the links to determine how the arguments were realized E.g. using the previous example for one possible Rcandidate = {r1, r5, r2, r3} then P = {LOVE, WOMAN}, RSpred = {{r1, r5}}, map = {{{1}, {2}}} The trace for generated sentences is shown below: John love(s) a woman r2 r1 r3 r5
  • 12. Natural Language Generation from First-Order Expressions 12 a woman love(s) John r3 r5 r1 r2 The rules to handle pruning are as follows: Extract the set of all predicate instances with more than one argument Pmultiargs from E E.g. for the expression E = ∃x (LOVE(JOHN, x)∧WOMAN(x)), Pmultiargs = {LOVE} For each p∈Pmultiargs determine whether the first argument should be realized to the left or right of the token realizing the predicate based on the order of the variables that describe the order of the arguments in the lambda expression for the rule associated with p E.g. for the example above, if r1 = [Vtransitive → love(s)] : λpλq.(p@λr.LOVE(q, r)) then q which is the first argument is defined by λq which precedes λr which is the second argument. For the predicate LOVE, the first argument is expected to be to the left of the token that realizes the predicate. The sentence order is expected to be rarg1 rpred rarg2. For each path discard the path if the argument ordering of any predicate is incorrect E.g. for the example E = ∃x (LOVE(JOHN, x)∧WOMAN(x)), for the specific predicate LOVE, for the candidate rule {r1, r5, r2, r3}, the sentence order should be r2 r1 r3. This marks the sentence a woman love(s) John as invalid A second issue is that the rule does not cover restrictions based on agreement. The output will include John love a lady boxer. This limitation cannot be addressed with the current state of the rules. To solve this issue the rules shall need to be annotated with agreement restrictions which can be evaluated as the paths are navigated. 6. Implementation I implemented this system using the language C# using the .NET framework. The object model used is below:
  • 13. Natural Language Generation from First-Order Expressions 13 The classes used by this model are: Class Description Rule This represents a grammar rule annotated with a lambda expression. The class provides methods for parsing a rule into RHS and LHS components. This class inherits from the generic Expression class. FOExpression This represents a first-order expression. This class inherits from the generic Expression class. The method ProcessExpression()accepts a grammar in the form of RuleCollection and determines candidate terminal rules that can be associated with the expression. For each set of terminal rules, the method calls RuleCollection .GetRules() to determine if the set can generate a language sentence. This method internally uses recursion to determine all possible sets of terminal rules. Expression Provides a representation for a mathematical expression with quantifiers, constant arguments, predicates and operators. RuleCollection This represents a collection of rules i.e. represents the full grammar of the natural language. The method GetRules()provides a mechanism for determining all valid paths through the grammar if a list of terminal rules are provided. This method internally uses recursion to navigate the grammar rules. 7. Cost and Optimization A valid concern against this approach is the question of cost. The cost implications are driven from two angles – one is the complexity and scale of the grammar and the second is the complexity of the expression being evaluated. The first issue is of lesser concern since it can be assumed that in an almost full-fledged grammar of English, a larger percentage of the rules shall specify terminal rules as compared to non-terminal rules – this is because of the large vocabulary. The algorithm discussed selects early on for terminal rules and hence avoids a performance hit of having to use a huge grammar in generating a sentence. The second issue has a larger impact. This limitation is because the algorithm searches for every valid candidate – consider for example that there are n! permutations for the task to fit n noun phrases into a sentence with n noun- phrase place-holders. That said, every application may not require every valid language translation for the processed expression – in which case it may suffice to do an immediate pruning check for every successful sentence generated and a successful pass through the pruning rules for one sentence could stop the entire algorithm from processing any further. 8. Conclusion I have so far defined an approach in which a grammar and ruleset for the compositional construction of a first-order representation from English can be utilized successfully for the reverse translation problem as well. The approach is robust in that there is very limited additional information that is required for the system to function. However this robustness comes at a computation cost which is reasonable for simple first-order expressions but which can become considerable for lengthy discourse. This overhead may not be that detrimental for application in the space of Text Summarization and Text Simplification problems where it can be anticipated that an ideal system would be only
  • 14. Natural Language Generation from First-Order Expressions 14 required to process limited expressions. Possible future enhancements that can be taken into account to make this approach more efficient are listed below: Integrate pruning steps into the generator process wherever possible Apply a mix of a Top-down and Bottom-Up evaluation strategy in determining if the selected ruleset is a good candidate for a well-formed sentence. This can avoid the overhead of unnecessary evaluation of a frequently used rule such as S → NP VP for every pass. Another alternative to navigating the hierarchy is to cache navigation paths found using the LHS of all the candidate rules involved as a key to the cache. For every new candidate ruleset, the cache shall be searched first – if no entry is found then the system can navigate the rules. Each time the grammar changes though the cache shall need to be dropped The current pruning system which translates the ordering in a lambda expression into a natural language ordering for multi-argument predicates does have a dependency on non-terminal grammar rules which has been ignored. A more robust scenario would be to process the non-terminal rules to infer the ordering. Language generation issues relating to adverbs, adjuncts and tense has not been covered by this paper and is another area for future review. As for future steps, the author would like to develop a text simplification system which performs the following: 1. Translate a complex sentence into first-order logic 2. Develop a processing layer that can mathematically simplify a first-order expression 3. Develop a processing layer that can break a first-order expression into independent expressions 4. Apply the approach described above to generate a natural language translation from the simplified expression I would like to thank Prof. Graham Katz for providing the opportunity to write this paper and for his feedback. 9. References Blackburn, P & Bos, J. (2003). Computational Semantics for Natural Language. NASSLLI. Strzalkowski, T. et al. (1994). Reversible Grammar in Natural Language Processing. Kluwer Academic Publishers. Whitney, R. (1988). Semantic Transformations for Natural Language Production. ISI Research Report. Dymetman, M. & Isabelle, P. (1988). Reversible Logic Grammars for Machine Translation. Second International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages. Smullyan, R. (1994). First-Order Logic. Dover Publications.