Paper: PPDB: The Paraphrase Database (66/365)

In the last post, the authors made use of paraphrases. It turns out that there is in fact a paraphrase database and it’s quite interesting how it was created. It starts with the basic observation that given translated texts from english to some foreign language, if two english phrases e_1 and e_2 translate to the same foreign phrase f then we may assume that e_1 and e_2 have similar meaning; i.e., that they paraphrase eachother.

The paper goes a little further than this. The goal is to extract a paraphrase rule as follows

\displaystyle  \mathbf{r}_p : = C \rightarrow \langle e_1, e_2, \sim_p, \phi_p \rangle

where C is a non-terminal, e_1 and e_2 are mix of terminal and non-terminal symbols where both share the same set of non-terminal symbols (given by the correspondence ~_p, and a feature vector \phi_p.

Such a rule is construction from a syntactic machine translator, where two applied translation rules having the same syntactic construct and foreign phrase are matched

\displaystyle  \mathbf{r}_ 1 : = C \rightarrow \langle f, e_1, \sim_1, \phi_ 1 \rangle \\ \mathbf{r}_ 2 : = C \rightarrow \langle f, e_2, \sim_2, \phi_ 2 \rangle

where once again f and e share the same non-terminals so that the above rule for pairing e_1 and e_2 can be constructed. The paper also defines a way to combine the feature vectors but I will skip that here.

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s