References
[1] M. Artetxe, G. Labaka, and E. Agirre. A robust self-learning method for fully unsupervised
cross-lingual mappings of word embeddings. In ACL, 2018.
[2] B. Athiwaratkun, A. G. Wilson, and A. Anandkumar. Probabilistic fasttext for multi-sense
word embeddings. In ACL, 2018.
[3] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching word vectors with subword
information. TACL, 5:135–146, 2017.
[4] S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Józefowicz, and S. Bengio. Generating
sentences from a continuous space. In CoNLL, 2016.
[5] L. Chen, F. Yuan, J. M. Jose, and W. Zhang. Improving negative sampling for word represen-
tation using self-embedded features. In WSDM, 2018.
[6] A. Conneau and D. Kiela. Senteval: An evaluation toolkit for universal sentence representa-
tions. In LREC, 2018.
[7] A. Conneau, G. Kruszewski, G. Lample, L. Barrault, and M. Baroni. What you can cram into
a single vector: Probing sentence embeddings for linguistic properties. In ACL, 2018.
[8] A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. Jégou. Word translation without
parallel data. In ICLR, 2018.
[9] A. Frank and T. Mihaylov. Knowledgeable reader: Enhancing cloze-style reading comprehen-
sion with external commonsense knowledge. In ACL, 2018.
[10] M. Gardner and C. Clark. Simple and effective multi-paragraph reading comprehension. In
ACL, 2018.
[11] M. Ghazvininejad, C. Brockett, M.-W. Chang, W. B. Dolan, J. Gao, W. tau Yih, and M. Galley.
A knowledge-grounded neural conversation model. In AAAI, 2018.
[12] G. Glavaš and I. Vuli
´
c. Explicit retrofitting of distributional word vectors. In ACL, 2018.
[13] K. Guu, T. B. Hashimoto, Y. Oren, and P. Liang. Generating sentences by editing prototypes.
CoRR, abs/1709.08878, 2017.
[14] X. He, X. Xin, F. Yuan, and J. M. Jose. Batch is not heavy: Learning word representations
from all samples. In ACL, 2018.
[15] S. Jameel, Z. Bouraoui, and S. Schockaert. Unsupervised learning of distributional relation
vectors. In ACL, 2018.
[16] R. Kadlec, M. Schmid, O. Bajgar, and J. Kleindienst. Text understanding with the attention
sum reader network. arXiv preprint arXiv:1603.01547, 2016.
[17] D. P. Kingma and M. Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013.
[18] X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz. End-to-end task-completion neural
dialogue systems. arXiv preprint arXiv:1703.01008, 2017.
[19] Y. Li, F. Tao, A. Fuxman, and B. Zhao. Guess me if you can: Acronym disambiguation for
enterprises. In ACL, 2018.
[20] X. Liu, Y. Shen, K. Duh, and J. Gao. Stochastic answer networks for machine reading com-
prehension. In ACL, 2018.
[21] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in
vector space. arXiv preprint arXiv:1301.3781, 2013.
[22] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of
words and phrases and their compositionality. In NIPS, 2013.
[23] N. Mrkši
´
c, D. O. Séaghdha, T.-H. Wen, B. Thomson, and S. Young. Neural belief tracker:
Data-driven dialogue state tracking. arXiv preprint arXiv:1606.03777, 2016.
[24] B. Peng, X. Li, J. Gao, J. Liu, and K.-F. Wong. Deep dyna-q: Integrating planning for task-
completion dialogue policy learning. In Proceedings of the 56th Annual Meeting of the Asso-
ciation for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 2182–2192,
2018.
[25] B. Peng, X. Li, L. Li, J. Gao, A. Celikyilmaz, S. Lee, and K.-F. Wong. Composite task-
completion dialogue policy learning via hierarchical deep reinforcement learning. arXiv
preprint arXiv:1704.03084, 2017.
[26] J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation.
In EMNLP, 2014.
[27] A. Salle, M. Idiart, and A. Villavicencio. Enhancing the lexvec distributed word representation
model using positional contexts and external memory. CoRR, abs/1606.01283, 2016.
20