In the last post, the authors made use of paraphrases. It turns out that there is in fact a paraphrase database and it’s quite interesting how it was created. It starts with the basic observation that given translated texts from english to some foreign language, if two english phrases and
translate to the same foreign phrase
then we may assume that
and
have similar meaning; i.e., that they paraphrase eachother.
The paper goes a little further than this. The goal is to extract a paraphrase rule as follows
where is a non-terminal,
and
are mix of terminal and non-terminal symbols where both share the same set of non-terminal symbols (given by the correspondence
, and a feature vector
.
Such a rule is construction from a syntactic machine translator, where two applied translation rules having the same syntactic construct and foreign phrase are matched
where once again and
share the same non-terminals so that the above rule for pairing
and
can be constructed. The paper also defines a way to combine the feature vectors but I will skip that here.