In a previous post I talked about a paper that encoded questions as programs which in turn defined a procedure that executed in the environment of a battleship game. In other words, each question defines a program that defines the semantics of the question.
Of course, that paper did not consider the translation of the natural text into the program. Today’s paper by Wang et. al considers building a semantic parser: to translate a natural language text into a precise internal program that can be executed against an environment (database) to produce an answer.
This paper encodes the semantics in a langugage called lambda DCS based on lambda calculus. It’s compositional form is quite interesting. Every logical form is either a set of entities (unary) or a set of entity pairs (binary). A unary and a binary
can be composed to produce a binary
which are pairs in
restricted to pairs whose second entities are present in
. The other operators are the set operations on the unary terms.
They have also defined a canonical translation of programs into natural language.
"start date of student alice whose university is brown university"
R(date).(student.Alice ∩ university.Brown)
Since there are only so many ways to compose these programs, the canonical phrases can be generated easily. What they then do is use Amazon’s Mechanical Turk to convert these canonical phrases to something more familiar. The above example is converted to “When did alice start attending brown university?”. These pairs are then used to train a paraphrasing model
where is a feature vector and
is a parameter vector;
is the logical form (i.e. a program);
is the canonical phrase;
is the human phrase for the canonical phrase; and
is the database (represented as a set of triples) that can be queried by the logical forms. I’ll come back to the details of the implementation as I find similar papers.
Pingback: Paper: PPDB: The Paraphrase Database (66/365) | Latent observations