Learning Paraphrasing for Multi-word Expressions

This is the data set for our ACL2016 - MWE Workshop paper entitled "Learning Paraphrasing for Multi-word Expressions"

This data set is distributed under the CC-BY license.


README: this README file

train.tsv: the training data for the ranking task

test.tsv: the test data for the ranking task


A line is one data-point with 7 columns

column 1: target number in a sentence. Sentences with a target and all candidates will have the same number.

column 2: a sentence. The sentence might be repeated for different possible targets.

column 3: the target paraphrase in this sentence.

column 4: the candidate paraphrase for the target in this sentence.

column 5: the position (offset) of the target paraphrase.

column 6: the ppdb2.0 score for the target and candidate paraphrases.

column 7: the label ( the number of workers agreed on this candidate paraphrase).

To cite the data, use the following paper


  author    = {Yimam, Seid Muhie  and  Mart\'{i}nez Alonso, H\'{e}ctor  and  Riedl, Martin  and  Biemann, Chris},

  title     = {Learning Paraphrasing for Multiword Expressions},

  booktitle = {Proceedings of the 12th Workshop on Multiword Expressions},

  month     = {August},

  year      = {2016},

  address   = {Berlin, Germany},

  publisher = {Association for Computational Linguistics},

  pages     = {1--10},

  url       = {http://anthology.aclweb.org/W16-1801}

