In the last post, the authors made use of paraphrases. It turns out that there is in fact a paraphrase database and it’s quite interesting how it was created. It starts with the basic observation that given translated texts from english to some foreign language, if two english phrases and translate to the same foreign phrase then we may assume that and have similar meaning; i.e., that they paraphrase eachother.
The paper goes a little further than this. The goal is to extract a paraphrase rule as follows
where is a non-terminal, and are mix of terminal and non-terminal symbols where both share the same set of non-terminal symbols (given by the correspondence , and a feature vector .
Such a rule is construction from a syntactic machine translator, where two applied translation rules having the same syntactic construct and foreign phrase are matched
where once again and share the same non-terminals so that the above rule for pairing and can be constructed. The paper also defines a way to combine the feature vectors but I will skip that here.