Accepted Papers: Main Conference
- "Judge me by my size(noun), do you?" YodaLib: A Demographic-Aware Humor Generation Framework
Aparna Garimella, Carmen Banea, Nabil Hossain and Rada Mihalcea - 100,000 Podcasts: A Large-Scale Spoken Document Corpus
Ann Clifton, Sravana Reddy, Yongze Yu, Aasish Pappu, Rezvaneh Rezapour, Hamed Bonab, Jussi Karlgren, Ben Carterette and Rosie Jones - A BERT-based Dual Embedding Model for Chinese Idiom Prediction
Minghuan Tan and Jing Jiang - A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English
Marius Mosbach, Stefania Degaetano-Ortlieb, Marie-Pauline Krielke, Badr Abdullah and Dietrich Klakow - A Co-Attentive Cross-Lingual Neural Model for Dialogue Breakdown Detection
Qian Lin, Souvik Kundu and Hwee Tou Ng - A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI
Angus Addlesee, Yanchao Yu and Arash Eshghi - A Contextual Alignment Enhanced Cross Graph Attention Network for Cross-lingual Entity Alignment
Zhiwen Xie, Runjie Zhu, Kunsong Zhao, Jin Liu, Guangyou Zhou and Jimmy Xiangji Huang - A Corpus for Argumentative Writing Support in German
Thiemo Wambsganss, Christina Niklaus, Matthias Söllner, Siegfried Handschuh and Jan Marco Leimeister - A Dataset and Evaluation Framework for Complex Geographical Description Parsing
Egoitz Laparra and Steven Bethard - A Deep Generative Approach to Native Language Identification
Ehsan Lotfi, Ilia Markov and Walter Daelemans - A Deep Generative Distance-Based Classifier for Out-of-Domain Detection with Mahalanobis Space
Hong Xu, Keqing He, Yuanmeng Yan, Sihong Liu, Zijun Liu and Weiran XU - A Deep Metric Learning Method for Biomedical Passage Retrieval
Andrés Rosso-Mateus, Fabio A. González and Manuel Montes-y-Gómez - A Document-Level Neural Machine Translation Model with Dynamic Caching Guided by Theme-Rheme Information
Yiqi Tong, Jiangbin Zheng, Hongkang Zhu, Yidong Chen and xiaodong shi - A Geometry-Inspired Attack for Generating Natural Language Adversarial Examples
Zhao Meng and Roger Wattenhofer - A Graph Representation of Semi-structured Data for Web Question Answering
xingyao zhang, Linjun Shou, Jian Pei, Ming Gong, Lijie Wen and Daxin Jiang - A hierarchical approach to automatic vision to language generation: from simple sentences to complex natural language
Simion-Vlad Bogolin, Ioana Croitoru and Marius Leordeanu - A High Precision Pipeline for Financial Knowledge Graph Construction
Sarah Elhammadi, Laks V.S. Lakshmanan, Raymond Ng, Michael Simpson, Baoxing Huai, Zhefeng Wang and Lanjun Wang - A Human Evaluation of AMR-to-English Generation Systems
Emma Manning, Shira Wein and Nathan Schneider - A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents
Tuan Lai, Trung Bui, Doo Soon Kim and Quan Hung Tran - A Large-Scale Corpus of E-mail Conversations with Standard and Two-Level Dialogue Act Annotations
Motoki Taniguchi, Yoshihiro Ueda, Tomoki Taniguchi and Tomoko Ohkuma - A Learning-Exploring Method to Generate Diverse Paraphrases with Multi-Objective Deep Reinforcement Learning
Mingtong Liu, Erguang Yang, Deyi Xiong, YUJIE ZHANG, Yao Meng, Changjian Hu, Jinan Xu and Yufeng Chen - A Linguistic Perspective on Reference: Choosing a Feature Set for Generating Referring Expressions in Context
Fahime Same and Kees van Deemter - A Locally Linear Procedure for Word Translation
Soham Dan, Hagai Taitelbaum and Jacob Goldberger - A Mixture-of-Experts Model for Learning Multi-Facet Entity Embeddings
Rana Alshaikh, Zied Bouraoui, Shelan Jeawak and Steven Schockaert - A Multitask Active Learning Framework for Natural LanguageUnderstanding
Hua Zhu, Sihan Luo, Wu Ye and Xidong Zhang - A Neural Local Coherence Analysis Model for Clarity Text Scoring
Panitan Muangkammuen, Sheng Xu, Fumiyo Fukumoto, Kanda Runapongsa Saikaew and Jiyi Li - A Neural Model for Aggregating Coreference Annotation in Crowdsourcing
Maolin Li, Hiroya Takamura and Sophia Ananiadou - A Quantitative Analysis on the Role of Training Data for Text Classification
Aleksandra Edwards, Jose Camacho-Collados, Hélène De Ribaupierre and Alun Preece - A Representation Learning Approach to Animal Biodiversity Conservation
Meet Mukadam, Mandhara Jayaram and Yongfeng Zhang - A Retrofitting Model for Incorporating Semantic Relations into Word Embeddings
Sapan Shah, Sreedhar Reddy and Pushpak Bhattacharyya - A Review of dataset and labeling methods for causality extraction
Jinghang Xu, Wanli Zuo, Shining Liang and Xianglin Zuo - A Semantically Consistent and Syntactically Variational Encoder-Decoder Framework for Paraphrase Generation
Wenqing Chen, Jidong Tian, Liqiang Xiao, Hao He and Yaohui Jin - A Sentence Cloze Dataset for Chinese Machine Reading Comprehension
Yiming Cui, Ting Liu, Ziqing Yang, Zhipeng Chen, Wentao Ma, Wanxiang Che, Shijin Wang and Guoping Hu - A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction
Yanyang Li, Yingfeng Luo, Ye Lin, Quan Du, Huizhen Wang, Tong Xiao and Jingbo Zhu - A Straightforward Approach to Narratologically Grounded Character Identification
Labiba Jahan, Rahul Mittal, W. Victor Yarlott and Mark Finlayson - A Study on Efficiency, Accuracy and Document Structure for Answer Sentence Selection
Daniele Bonadiman and Alessandro Moschitti - A Survey of Automatic Personality Detection from Texts
Sanja Stajner and Seren Yenikent - A Survey of Unsupervised Dependency Parsing
Wenjuan Han, Yong Jiang, Hwee Tou Ng and Kewei Tu - A Symmetric Local Search Network for Emotion-Cause Pair Extraction
Zifeng Cheng, Zhiwei Jiang, Yafeng Yin, Hua Yu and Qing Gu - A Systematic Study of Data Augmentation for Multiclass Utterance Classification Tasks
Binxia Xu, Siyuan Qiu, Jie Zhang, Yafang Wang, Xiaoyu Shen and Gerard de Melo - A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing
Sanxing Chen, Aidan San, Xiaodong Liu and Yangfeng Ji - A Taxonomy of Empathetic Response Intents in Human Social Conversations
Anuradha Welivita and Pearl Pu - A Thorough Analysis of Dataset Overlap on Winograd-Style Tasks
Ali Emami, Kaheer Suleman, Adam Trischler and Jackie Chi Kit Cheung - A Two-Level Interpretation of Modality in Human-Robot Dialogue
Lucia Donatelli, Kenneth Lai and James Pustejovsky - A Two-phase Prototypical Network Model for Incremental Few-shot Relation Classification
Haopeng Ren, Yi Cai, Xiaofeng Chen, Guohua Wang and Qing Li - A Unified Sequence Labeling Model for Emotion Cause Pair Extraction
Xinhong Chen, Qing Li and Jianping Wang - A Unifying Theory of Transition-based and Sequence Labeling Parsing
Carlos Gómez-Rodríguez, Michalina Strzyz and David Vilares - A Vietnamese Dataset for Evaluating Machine Reading Comprehension
Kiet Nguyen, Vu Nguyen, Anh Nguyen and Ngan Nguyen - A Word-Level Uncertainty Estimation Approach for Black-Box Text Classifiers using RNNs
Jakob Smedegaard Andersen, Tom Schöner and Walid Maalej - AbuseAnalyzer: Abuse Detection, Severity and Target Prediction for Gab Posts
Mohit Chandra, Ashwin Pathak, Eesha Dutta, Paryul Jain, Manish Gupta, Manish Shrivastava and Ponnurangam Kumaraguru - Ad Lingua: Text Classification Improves Symbolism Prediction in Image Advertisements
Andrey Savchenko, Anton Alekseev, Sejeong Kwon, Elena Tutubalina, Evgeny Myasnikov and Sergey Nikolenko - Adversarial Learning on the Latent Space for Diverse Dialog Generation
Kashif Khan, Gaurav Sahu, Vikash Balasubramanian, Lili Mou and Olga Vechtomova - Affective and Contextual Embedding for Sarcasm Detection
Nastaran Babanejad, Heidar Davoudi, Aijun An and Manos Papagelis - Affective Text Generation
Tushar Goswamy, Ishika Singh, Ahsan Barkati and Ashutosh Modi - Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution
Nikolay Arefyev, Boris Sheludko, Alexander Podolskiy and Alexander Panchenko - An analysis of language models for metaphor recognition
Arthur Neidlein, Philip Wiesenbach and Katja Markert - An Analysis of Simple Data Augmentation for Named Entity Recognition
Xiang Dai and Heike Adel - An Anchor-Based Automatic Evaluation Metric for Document Summarization
Kexiang Wang, Tianyu Liu, Baobao Chang and Zhifang Sui - An empirical analysis of existing systems and datasets toward general simple question answering
Namgi Han, Goran Topic, Hiroshi Noji, Hiroya Takamura and Yusuke Miyao - An Empirical Investigation of Low-Resource Cross-Lingual Emotion Lexicon Induction using Representation Alignment
Arun Ramachandran and Gerard de Melo - An Empirical Study of Contextual Data Augmentation forJapanese Zero Anaphora Resolution
Ryuto Konno, Yuichiroh Matsubayashi, Shun Kiyono, Hiroki Ouchi, Ryo Takahashi and Kentaro Inui - An Empirical Study of Rich Morphological Word Segmentation on Inuktitut in Low-resource NMT
Tan Ngoc Le and Fatiha Sadat - An Empirical Study of the Downstream Reliability of Pre-Trained Word Embeddings
Anthony Rios and Brandon Lwowski - An Enhanced Knowledge Injection Model for Commonsense Generation
Zhihao Fan, Yeyun Gong, Zhongyu Wei, Siyuan Wang, Yameng Huang, Jian Jiao, Xuanjing Huang, Nan Duan and Ruofei Zhang - An Iterative Emotion Interaction Network for Emotion Recognition in Conversations
Xin Lu, Yanyan Zhao, Yang Wu, Yijian Tian, Huipeng Chen and Bing Qin - An Unsupervised Method for Learning Representations of Multi-word Expressions
Robert Vacareanu, Rebecca Sharp, Marco A. Valenzuela-Escárcega and Mihai Surdeanu - Analogy Models for Neural Word Inflection
Ling Liu and Mans Hulden - Analysing cross-lingual transfer in lemmatisation for Indian languages
Kumar Saurav, Kumar Saunack and Pushpak Bhattacharyya - Answer-driven Deep Question Generation based on Reinforcement Learning
Liuyin Wang, Zihan Xu, Zibo Lin, Haitao Zheng and Ying Shen - Answering Legal Questions by Learning Neural Attentive Text Representation
Phi Manh Kien, Nguyen Ha Thanh, Ngo Xuan Bach, Vu Tran, Nguyen Le Minh and Tu Minh Phuong - Appraisal Theories for Emotion Classification in Text
Jan Hofmann, Enrica Troiano, Kai Sassenberg and Roman Klinger - AprilE: Attention with Pseudo Residual Connection for Knowledge Graph Embedding
Yuzhang Liu, Peng Wang, Yingtai Li, Yizhan Shao and Zhongkai Xu - AraBench: Benchmarking Dialectal Arabic-English Machine Translation
Hassan Sajjad, Ahmed Abdelali, Nadir Durrani and Fahim Dalvi - Arabizi Language Models for Sentiment Analysis
Gaétan Baert, Souhir Gahbiche, Guillaume Gadek and Alexandre Pauchet - Are We Ready for this Disaster? Towards Location Mention Recognition from Crisis Tweets
Reem Suwaileh, Muhammad Imran, Tamer Elsayed and Hassan Sajjad - Argumentation Mining on Essays at Multi Scales
Hao Wang, Zhen Huang, Yong Dou and Yu Hong - Ask to Learn: A Study on Curiosity-driven Question Generation
Thomas Scialom and Jacopo Staiano - Aspect-based Document Similarity for Research Papers
Malte Ostendorff, Terry Ruas, Till Blume, Bela Gipp and Georg Rehm - Aspect-Category based Sentiment Analysis with Hierarchical Graph Convolutional Network
Hongjie Cai, Xiangsheng Zhou, Yaofeng Tu, Jianfei Yu and Rui Xia - Aspectuality Across Genre: A Distributional Semantics Approach
Thomas Kober, Malihe Alikhani, Matthew Stone and Mark Steedman - At Which Level Should We Extract? An Empirical Analysis on Extractive Document Summarization
Qingyu Zhou, Furu Wei and Ming Zhou - Attention is All You Sign: Sign Language Translation with Transformers
Kayo Yin and Jesse Read - Attention Transfer Network for Aspect-level Sentiment Classification
Fei Zhao, Zhen Wu and Xinyu Dai - Attention Word Embedding
Shashank Sonkar, Andrew Waters and Richard Baraniuk - Attentively Embracing Noise for Robust Latent Representation in BERT
Gwenaelle Cunha Sergio, Dennis Singh Moirangthem and Minho Lee - Augmenting NLP models using Latent Feature Interpolations
Amit Jindal, Arijit Ghosh Chowdhury, Aniket Didolkar, Di Jin, Ramit Sawhney and Rajiv Ratn Shah - Author’s Sentiment Prediction
Mohaddeseh Bastan, Mahnaz Koupaee, Youngseo Son, Richard Sicoli and Niranjan Balasubramanian - Auto-Encoding Variational Bayes for Inferring Topics and Visualization
Dang Pham and Tuan Le - Autoencoding Improves Pre-trained Word Embeddings
Masahiro Kaneko and Danushka Bollegala - Automated Graph Generation at Sentence Level for Reading Comprehension Based on Conceptual Graphs
Wan-Hsuan Lin and Chun-Shien Lu - Automated Prediction of Examinee Proficiency from Short-Answer Questions
Le An Ha, Victoria Yaneva, Polina Harik, Ravi Pandian, Amy Morales and Brian Clauser - Automatic Assistance for Academic Word Usage
Dariush Saberi, John Lee and Jonathan James Webster - Automatic Crime Identification from Facts: A Few Sentence-Level Crime Annotations is All You Need
Shounak Paul, Pawan Goyal and Saptarshi Ghosh - Automatic Detection of Machine Generated Text: A Critical Survey
Ganesh Jawahar, Muhammad Abdul-Mageed and Laks Lakshmanan, V.S. - Automatic Discovery of Heterogeneous Machine Learning Pipelines: An Application to Natural Language Processing
Suilan Estevez-Velarde, Yoan Gutiérrez, Andres Montoyo and Yudivián Almeida Cruz - Automatic Distractor Generation for Multiple Choice Questions in Standard Tests
Zhaopeng Qiu, Xian Wu and Wei Fan - Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations
Xingyuan Zhao, Satoru Ozaki, Antonios Anastasopoulos, Graham Neubig and Lori Levin - Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification
Timo Schick, Helmut Schmid and Hinrich Schütze - AutoMeTS: The Autocomplete for Medical Text Simplification
Hoang Van, David Kauchak and Gondy Leroy - Autoregressive Affective Language Forecasting: A Self-Supervised Task
Matthew Matero and H. Andrew Schwartz - Autoregressive Reasoning over Chains of Facts with Transformers
Ruben Cartuyvels, Graham Spinks and Marie-Francine Moens - Balanced Joint Adversarial Training for Robust Intent Detection and Slot Filling
Xu Cao, Deyi Xiong, Chongyang Shi, Chao Wang, Yao Meng and Changjian Hu - Bayes-enhanced Lifelong Attention Networks for Sentiment Classification
Hao Wang, Shuai Wang, Sahisnu Mazumder, Bing Liu, Yan Yang and Tianrui Li - BERT-based Cohesion Analysis of Japanese Texts
Nobuhiro Ueda, Daisuke Kawahara and Sadao Kurohashi - Bi-directional CognitiveThinking Network for Machine Reading Comprehension
Wei Peng, Yue Hu, Luxi Xing, Yuqiang Xie, Jing Yu, Yajing Sun and Xiangpeng Wei - Biased TextRank: Unsupervised Graph-Based Content Extraction
Ashkan Kazemi, Verónica Pérez-Rosas and Rada Mihalcea - Bilingual Subword Segmentation for Neural Machine Translation
Hiroyuki Deguchi, Masao Utiyama, Akihiro Tamura, Takashi Ninomiya and Eiichiro Sumita - BioMedBERT: A Pre-trained Biomedical Language Model for QA and IR
SOURADIP CHAKRABORTY, Ekaba Bisong, Shweta Bhatt, Thomas Wagner, Riley Elliott and Francesco Mosconi - Biomedical Concept Relatedness – A large EHR-based benchmark
Claudia Schulz, Josh Levy-Kramer, Camille Van Assel, Miklos Kepes and Nils Hammerla - Bracketing Encodings for 2-Planar Dependency Parsing
Michalina Strzyz, David Vilares and Carlos Gómez-Rodríguez - Break the Gap: High-level Semantic Planning for Image Captioning
Chenxi Yuan, Yang Bai and Chun Yuan - Breeding Gender-aware Direct Speech Translation Systems
Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri and Marco Turchi - Bridging Anaphora Resolution: A Survey of the State of the Art
Hideo Kobayashi and Vincent Ng - Bridging Text and Knowledge with Multi-Prototype Embedding for Few-Shot Relational Triple Extraction
Haiyang Yu, Ningyu Zhang, Shumin Deng, Hongbin Ye, Wei Zhang and Huajun Chen - Bridging the Gap in Multilingual Semantic Role Labeling: a Language-Agnostic Approach
Simone Conia and Roberto Navigli - Building Hierarchically Disentangled Language Models for Text Generation with Named Entities
Yash Agarwal, Devansh Batra and Ganesh Bagler - Building Large-Scale English and Korean Datasets for Aspect-Level Sentiment Analysis in Automotive Domain
Dongmin Hyun, Junsu Cho and Hwanjo Yu - Building The First English-Brazilian Portuguese Corpus for Automatic Post-Editing
Felipe de Almeida Costa, Thiago Castro Ferreira, Adriana Pagano and Wagner Meira - Catching Attention with Automatic Pull Quote Selection
Tanner Bohn and Charles Ling - CEREC: A Corpus for Entity Resolution in Email Conversations
Parag Pravin Dakle and Dan Moldovan - CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters
Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Hiroshi Noji, Pierre Zweigenbaum and Jun'ichi Tsujii - CharBERT: Character-aware Pre-trained Language Model
Wentao Ma, Yiming Cui, Chenglei Si, Ting Liu, Shijin Wang and Guoping Hu - CHIME: Cross-passage Hierarchical Memory Network for Generative Review Question Answering
Junru Lu, Gabriele Pergola, Lin Gui, Binyang Li and Yulan He - Chinese Paragraph-level Discourse Parsing with Global Backward and Local Reverse Reading
Feng Jiang, Xiaomin Chu, Peifeng Li, Fang Kong and Qiaoming Zhu - CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson and Zhenzhong Lan - CoLAKE: Contextualized Language and Knowledge Embedding
Tianxiang Sun, Yunfan Shao, Xipeng Qiu, Qipeng Guo, Yaru Hu, Xuanjing Huang and Zheng Zhang - Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation
Fahimeh Saleh, Wray Buntine and Gholamreza Haffari - Combining Cognitive Modeling and Reinforcement Learning for Clarification in Dialogue
Baber Khalid, Malihe Alikhani and Matthew Stone - Combining Event Semantics and Degree Semantics for Natural Language Inference
Izumi Haruta, Koji Mineshima and Daisuke Bekki - Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction
Silvia Severini, Viktor Hangya, Alexander Fraser and Hinrich Schütze - Common Mistakes in Financial Sentiment Analysis Practices
Frank Xing, Lorenzo Malandri, Yue Zhang and Erik Cambria - Commonsense Question Answering Boosted by Graph-Based Iterative Retrieval over Multiple Knowledge Sources
Qianglong Chen, Feng Ji, Haiqing Chen and Yin Zhang - comp-syn: Perceptually Grounded Word Embeddings with Color
Bhargav Srinivasa Desikan, Tasker Hull, Ethan Nadler, Douglas Guilbeault, Aabir Abubakar Kar, Mark Chu and Donald Ruggiero Lo Sardo - Comparative Probing of Lexical Semantics Theories for Cognitive Plausibility and Technological Usefulness
António Branco, João António Rodrigues, Malgorzata Salawa, Ruben Branco and Chakaveh Saedi - Complaint Identification in Social Media with Transformer Networks
Mali Jin and Nikolaos Aletras - Computational Modeling of Affixoid Behavior in Chinese Morphology
Yu-Hsiang Tseng, Shu-Kai HSIEH, Pei-Yi Chen and Sara Court - CoNAN: A Complementary Neighboring-based Attention Network for Referring Expression Generation
Jungjun Kim, Hanbin Ko and Jialin Wu - Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations
Simone Conia and Roberto Navigli - Connecting the Dots Between Fact Verification and Fake News Detection
Qifei LI and Wangchunshu Zhou - Constituency Lattice Encoding for Aspect Term Extraction
Yunyi Yang, Kun Li, Xiaojun Quan, Weizhou Shen and Qinliang Su - Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps
Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara and Akiko Aizawa - Context Dependent Semantic Parsing: A Survey
Zhuang Li, Lizhen Qu and Gholamreza Haffari - Context in Informational Bias Detection
Esther van den Berg and Katja Markert - Context-aware Lexical Coherence Modeling
Sungho Jeon and Michael Strube - Context-Aware Text Normalisation for Historical Dialects
Maria Sukhareva - Contextual Argument Component Classification for Class Discussions
Luca Lugini and Diane Litman - Contextualized Embeddings for Enriching Linguistic Analyses on Politeness
Ahmad Aljanaideh, Eric Fosler-Lussier and Marie-Catherine de Marneffe - Continual Lifelong Learning in Natural Learning Processing: A Survey
Magdalena Biesialska, Katarzyna Biesialska and Marta R. Costa-jussà - ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation
Dario Stojanovski, Benno Krojer, Denis Peskov and Alexander Fraser - Contrastive Zero-Shot Learning for Cross-Domain Slot Filling with Adversarial Attack
Keqing He, Jinchao Zhang, Yuanmeng Yan, Weiran XU, Cheng Niu and Jie Zhou - Controllable Abstractive Sentence Summarization with Guiding Entities
Changmeng Zheng, Yi Cai, Guanjie Zhang and Qing Li - Conversational Machine Comprehension: a Literature Review
Somil Gupta, Bhanu Pratap Singh Rawat and hong yu - Coordination Boundary Identification without Labeled Data for Compound Terms Disambiguation
Yuya Sawada, Takashi Wada, Takayoshi Shibahara, Hiroki Teranishi, Shuhei Kondo, Hiroyuki Shindo, Taro Watanabe and Yuji Matsumoto - Coreference information guides human expectations during natural reading
Evan Jaffe, Cory Shain and William Schuler - Corpus-Based Identification of Verbs participating in Verb Alternations using Classification and Manual Annotation
Esther Seyffarth and Laura Kallmeyer - CosMo: Conditional Seq2Seq-based Mixture Model for Zero-Shot Commonsense Question Answering
Farhad Moghimifar, Lizhen Qu, Yue Zhuo, Mahsa Baktashmotlagh and Gholamreza Haffari - Creation of Corpus and analysis in Code-Mixed Kannada-English Twitter data for Emotion Prediction
Abhinav Reddy Appidi, Vamshi Krishna Srirangam, Darsi Suhas and Manish Shrivastava - Cross-lingual Annotation Projection in Legal Texts
Andrea Galassi, Kasper Drazewski, Marco Lippi and Paolo Torroni - Cross-Lingual Document Retrieval with Smooth Learning
Jiapeng Liu, Xiao Zhang, Dan Goldwasser and Xiao Wang - Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation
Junhao Liu, Linjun Shou, Jian Pei, Ming Gong, Min Yang and Daxin Jiang - Cross-lingual Transfer Learning for Grammatical Error Correction
Ikumi Yamashita, Satoru Katsumata, Masahiro Kaneko, Aizhan Imankulova and Mamoru Komachi - Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale
Ozan Caglayan, Pranava Madhyastha and Lucia Specia - CxGBERT: BERT meets Construction Grammar
Harish Tayyar Madabushi, Laurence Romain, Dagmar Divjak and Petar Milin - Cycle-Consistent Adversarial Autoencoders for Unsupervised Text Style Transfer
Yufang Huang, Wentao Zhu, Deyi Xiong, Yiye Zhang, Changjian Hu and Feiyu Xu - DaN+: Danish Nested Named Entities and Lexical Normalization
Barbara Plank, Kristian Nørgaard Jensen and Rob van der Goot - Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages
Mathieu Dehouck and Carlos Gómez-Rodríguez - Data Selection for Bilingual Lexicon Induction from Specialized Comparable Corpora
Martin Laville, Amir Hazem, Emmanuel Morin and Phillippe Langlais - Debunking Rumors on Twitter with Tree Transformer
Jing Ma and Wei Gao - Decolonising Speech and Language Technology
Steven Bird - Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems
Vitou Phy, Yang Zhao and Akiko Aizawa - Deep Inside-outside Recursive Autoencoder with All-span Objective
Ruyue Hong, Jiong Cai and Kewei Tu - Deep Learning Framework for Measuring the Digital Strategy of Companies from Earnings Calls
Ahmed Ghanim Al-Ali, Robert Phaal and Donald Sull - Definition Frames: Using Definitions for Hybrid Concept Representations
Evangelia Spiliopoulou, Artidoro Pagnoni and Eduard Hovy - Detect All Abuse! Toward Universal Abusive Language Detection Models
Kunze Wang, Dong Lu, Caren Han, SIQU LONG and Josiah Poon - Detecting de minimis Code-Switching in Historical German Books
Shijia Liu and David Smith - Detecting Non-literal Translations by Fine-tuning Cross-lingual Pre-trained Language Models
Yuming Zhai, Gabriel ILLOUZ and Anne Vilnat - Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for Low Resource Languages
Efsun Sarioglu Kayi, Linyong Nan, Bohan Qu, Mona Diab and Kathleen McKeown - DisenE: Disentangling Knowledge Graph Embeddings
Xiaoyu Kou, Yankai Lin, Yuntao Li, Jiahao Xu, Peng Li, Jie Zhou and Yan Zhang - Distill and Replay for Continual Language Learning
Jingyuan Sun, Shaonan Wang, Jiajun Zhang and Chengqing Zong - Distinguishing Between Foreground and Background Events in News
Mohammed Aldawsari, Adrian Perez, Deya Banisakher and Mark Finlayson - Diverse and Non-redundant Answer Set Extraction on Community QA based on DPPs
Shogo Fujita, Tomohide Shibata and Manabu Okumura - Diverse dialogue generation with context dependent dynamic loss function
Ayaka Ueyama and Yoshinobu Kano - Diverse Keyphrase Generation with Neural Unlikelihood Training
Hareesh Bahuleyan and Layla El Asri - Do Neural Language Models Overcome Reporting Bias?
Vered Shwartz and Yejin Choi - Do Word Embeddings Capture Spelling Variation?
Dong Nguyen and Jack Grieve - DocBank: A Benchmark Dataset for Document Layout Analysis
Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li and Ming Zhou - Document-level Relation Extraction with Dual-tier Heterogeneous Graph
Zhenyu Zhang, Bowen Yu, Xiaobo Shu, Tingwen Liu, Hengzhu Tang, Wang Yubin and Li Guo - Does Chinese BERT Encode Word Structure?
Yile Wang, Leyang Cui and Yue Zhang - Does Gender Matter? Towards Fairness in Dialogue Systems
Haochen Liu, Jamell Dacon, Wenqi Fan, Hui Liu, Zitao Liu and Jiliang Tang - Domain Transfer based Data Augmentation for Neural Query Translation
Liang Yao, Baosong Yang, zhang haibo, Boxing Chen and Weihua Luo - Don't Invite BERT to Drink a Bottle: Modeling the Interpretation of Metonymies Using BERT and Distributional Representations
Paolo Pedinotti and Alessandro Lenci - Don’t Patronize Me! An Annotated Dataset with Patronizing and Condescending Language towards Vulnerable Communities
Carla Perez Almendros, Luis Espinosa Anke and Steven Schockaert - Don’t take “nswvtnvakgxpm” for an answer –The surprising vulnerability of automatic content scoring systems to adversarial input
Yuning Ding, Brian Riordan, Andrea Horbach, Aoife Cahill and Torsten Zesch - DT-QDC: A Dataset for Question Comprehension in Online Test
Sijin Wu, Yujiu Yang, Nicholas Yung, Zhengchen Shen and Zeyang Lei - Dual Attention Model for Citation Recommendation
Yang Zhang and Qiang Ma - Dual Attention Network for Cross-lingual Entity Alignment
Jian Sun, Yu Zhou and Chengqing Zong - Dual Dynamic Memory Network for End-to-End Multi-turn Task-oriented Dialog Systems
Jian Wang, Junhao Liu, Wei Bi, Xiaojiang Liu, Kejing He, Ruifeng Xu and Min Yang - Dual Supervision Framework for Relation Extraction with Distant Supervision and Human Annotation
Woohwan Jung and Kyuseok Shim - Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation
Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab and Laurent Besacier - Dynamic Curriculum Learning for Low-Resource Neural Machine Translation
Chen Xu, Bojie Hu, Yufan Jiang, Kai Feng, Zeyang Wang, shen huang, Qi Ju, Tong Xiao and Jingbo Zhu - Dynamic Topic Tracker for KB-to-Text Generation
Zihao Fu, Lidong Bing, Wai Lam and Shoaib Jameel - Early Detection of Fake News by Utilizing the Credibility of News, Publishers, and Users based on Weakly Supervised Learning
Chunyuan Yuan, Qianwen Ma, Wei Zhou, Jizhong Han and Songlin Hu - Effective Few-Shot Classification with Transfer Learning
Aakriti Gupta, Kapil Thadani and Neil O'Hare - Effective Use of Target-side Context for Neural Machine Translation
Hideya Mino, Hitoshi Ito, Isao Goto, Ichiro Yamada and Takenobu Tokunaga - Embedding Dynamic Attributed Networks by Modeling the Evolution Processes
Zenan Xu, Zijing Ou, Qinliang Su, Jianxing Yu, Xiaojun Quan and ZhenKun Lin - Embedding Meta-Textual Information for Improved Learning to Rank
Toshitaka Kuwa, Shigehiko Schamoni and Stefan Riezler - Embedding Semantic Taxonomies
Alyssa Lees, Chris Welty, Shubin Zhao and Jacek Korycki - Emergent Communication Pretraining for Few-Shot Machine Translation
Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić and Anna Korhonen - Emotion Classification by Jointly Learning to Lexiconize and Classify
Deyu Zhou, Shuangzhi Wu, Qing Wang, Jun Xie, Zhaopeng Tu and Mu Li - EmpDG: Multi-resolution Interactive Empathetic Dialogue Generation
Qintong Li, Hongshen Chen, Zhaochun Ren, Pengjie Ren, Zhaopeng Tu and Zhumin CHEN - Enabling Interactive Transcription in an Indigenous Community
Eric Le Ferrand, Steven Bird and Laurent Besacier - Encoding Lexico-Semantic Knowledge using Ensembles of Feature Maps from Deep Convolutional Neural Networks
Steven Derby, Paul Miller and Barry Devereux - End to End Chinese Lexical Fusion Recognition with Sememe Knowledge
Yijiang Liu, Meishan Zhang and Donghong Ji - End-to-End Emotion-Cause Pair Extraction with Graph Convolutional Network
Ying Chen, Wenjun Hou and Xiaoqiang Zhang - Enhancing Clinical BERT Embedding using a Biomedical Knowledge Base
Boran Hao, Henghui Zhu and Ioannis Paschalidis - Enhancing Extractive Text Summarization with Topic-Aware Graph Neural Networks
Peng Cui, Le Hu and Yuanchao Liu - Enhancing Neural Models with Vulnerability via Adversarial Attack
Rong Zhang, Qifei Zhou, Bo Wu, Bo An, Weiping Li and Tong Mo - Evaluating Pretrained Transformer-based Models on the Task of Fine-Grained Named Entity Recognition
Cedric Lothritz, Kevin Allix, Lisa Veiber, Tegawendé F. Bissyandé and Jacques Klein - Evaluating Unsupervised Representation Learning for Detecting Stances of Fake News
Maike Guderlei and Matthias Aßenmacher - Event coreference resolution based on event-specific paraphrases and argument-aware semantic embeddings
Yutao Zeng, Xiaolong Jin, Saiping Guan, Jiafeng Guo and Xueqi Cheng - Event-Guided Denoising for Multilingual Relation Learning
Amith Ananthram, Emily Allaway and Kathleen McKeown - Expert Concept-Modeling Ground Truth Construction for Word Embeddings Evaluation in Concept-Focused Domains
Arianna Betti, Martin Reynaert, Thijs Ossenkoppele, Yvette Oortwijn, Andrew Salway and Jelke Bloem - Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering
Quan Hung Tran, Nhan Dam, Tuan Lai, Franck Dernoncourt, Trung Le, Nham Le and Dinh Phung - Explainable and Sparse Representations of Academic Articles for Knowledge Exploration
Keng-Te Liao, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, PoChun Chen, Kuansan Wang and Shou-de Lin - Explainable Automated Fact-Checking: A Survey
Neema Kotonya and Francesca Toni - Exploiting a lexical resource for discourse connective disambiguation in German
Peter Bourgonje and Manfred Stede - Exploiting Microblog Conversation Structures to Detect Rumors
Jiawen Li, Yudianto Sujana and Hung-Yu Kao - Exploiting Narrative Context and a Priori Knowledge of Categories in Textual Emotion Classification
Hikari Tanabe, Tetsuji Ogawa, Tetsunori Kobayashi and Yoshihiko Hayashi - Exploiting Node Content for Multiview Graph Convolutional Network and Adversarial Regularization
Qiuhao Lu, Nisansa de Silva, Dejing Dou, Thien Huu Nguyen, Prithviraj Sen, Berthold Reinwald and Yunyao Li - Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models
Seid Muhie Yimam, Hizkiel Mitiku Alemayehu, Abinew Ayele and Chris Biemann - Exploring Controllable Text Generation Techniques
Shrimai Prabhumoye, Alan W Black and Ruslan Salakhutdinov - Exploring Cross-sentence Contexts for Named Entity Recognition with BERT
Jouni Luoma and Sampo Pyysalo - Exploring End-to-End Differentiable Natural Logic Modeling
Yufei Feng, Zi'ou Zheng, Quan Liu, Michael Greenspan and Xiaodan Zhu - Exploring Question-Specific Rewards for Generating Deep Questions
Yuxi Xie, Liangming Pan, Dongzhe Wang, Min-Yen Kan and Yansong Feng - Exploring the Language of Data
Gábor Bella, Linda Gremes and Fausto Giunchiglia - Exploring the Value of Personalized Word Embeddings
Charles Welch, Jonathan K. Kummerfeld, Verónica Pérez-Rosas and Rada Mihalcea - Exploring the zero-shot limit of FewRel
alberto cetoli - Extracting Adherence Information from Electronic Health Records
Jordan Sanders, Meghana Gudala, Kathleen Hamilton, Nishtha Prasad, Jordan Stovall, Eduardo Blanco, Jane E Hamilton and Kirk Roberts - Fact vs. Opinion: the Role of Argumentative Features in News Classification
Tariq Alhindi, Smaranda Muresan and Daniel Preotiuc-Pietro - Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT
Ruifeng Yuan, zili Wang and Wenjie Li - Facts2Story: Controlling Text Generation by Key Facts
Eyal Orbach and Yoav Goldberg - Fair Evaluation in Concept Normalization: a Large-scale Comparative Analysis for BERT-based Models
Elena Tutubalina, Zulfat Miftahutdinov and Artur Kadurin - FASTMATCH: Accelerating the Inference of BERT-based Text Matching
Shuai Pang, Jianqiang Ma, ZEYU YAN, Yang Zhang and Jianping Shen - Federated Learning for Spoken Language Understanding
Zhiqi Huang, Fenglin Liu and Yuexian Zou - Few-shot Pseudo-Labeling for Intent Detection
Thomas Dopierre, Christophe Gravier, Julien Subercaze and Wilfried Logerais - Few-Shot Text Classification with Edge-Labeling Graph Neural Network-Based Prototypical Network
Chen Lyu, Weijie Liu, Meng Ma and Ping Wang - Filtering Back-Translated Data in Unsupervised Neural Machine Translation
Jyotsana Khatri and Pushpak Bhattacharyya - Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering
Wei Han, Hantao Huang and Tao Han - Fine-grained Information Status Classification Using Discourse Context-Aware BERT
Yufang Hou - Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning
Daniel Grießhaber, Johannes Maucher and Ngoc Thang Vu - Flight of the PEGASUS? Comparing Transformers on Few-shot and Zero-shot Multi-document Abstractive Summarization
Travis Goodwin, Max Savery and Dina Demner-Fushman - ForceReader: a BERT-based Interactive Machine Reading Comprehension Model with Attention Separation
zheng chen and kangjian wu - Formality Style Transfer with Shared Latent Space
Yunli Wang, Yu Wu, Lili Mou, Zhoujun Li and WenHan Chao - Free the Plural: Unrestricted Split-Antecedent Anaphora Resolution
Juntao Yu, Nafise Sadat Moosavi, Silviu Paun and Massimo Poesio - French Biomedical Text Simplification: When Small and Precise Helps
Rémi Cardon and Natalia Grabar - From Sentiment Annotations to Sentiment Prediction through Discourse Augmentation
Patrick Huber and Giuseppe Carenini - Generalized Shortest-Paths Encoders for AMR-to-Text Generation
Lisa Jin and Daniel Gildea - Generating Diverse Corrections with Local Beam Search for Grammatical Error Correction
Kengo Hotate, Masahiro Kaneko and Mamoru Komachi - Generating Equation by Utilizing Operators : GEO model
Kyung Seo Ki, Donggeon Lee, Bugeun Kim and Gahgene Gweon - Generating Instructions at Different Levels of Abstraction
Arne Köhn, Julia Wichlacz, Álvaro Torralba, Daniel Höller, Jörg Hoffmann and Alexander Koller - Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification
Linyi Yang, Eoin Kenny, Tin Lok James Ng, Yi Yang, Barry Smyth and Ruihai Dong - GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation
Zhijing Jin, Qipeng Guo, Xipeng Qiu and Zheng Zhang - Geo-Aware Image Caption Generation
Sofia Nikiforova, Tejaswini Deoskar, Denis Paperno and Yoad Winter - German's Next Language Model
Branden Chan, Stefan Schweter and Timo Möller - Global Context-enhanced Graph Convolutional Networks for Document-level Relation Extraction
Huiwei Zhou, Yibin Xu, Zhe Liu, Weihong Yao and Chengkun Lang - GPolS: A Contextual Graph-Based Language Model for Analyzing Parliamentary Debates and Political Cohesion
Ramit Sawhney, Arnav Wadhwa, Shivam Agarwal and Rajiv Ratn Shah - GPT-based Few-shot Table-to-Text Generation with Table Structure Reconstruction and Content Matching
Heng Gong, Yawei Sun, Xiaocheng Feng, Bing Qin, Wei Bi, Xiaojiang Liu and Ting Liu - Grammatical error detection in transcriptions of spoken English
Andrew Caines, Christian Bentz, Kate Knill, Marek Rei and Paula Buttery - Graph Convolution over Multiple Dependency Sub-graphs for Relation Extraction
Angrosh Mandya, Danushka Bollegala and Frans Coenen - Graph Enhanced Dual Attention Network for Document-Level Relation Extraction
Bo Li, Wei Ye, Zhonghao Sheng, Rui Xie, Xiangyu Xi and Shikun Zhang - Graph-Based Co-reference and Relation Knowledge Integration for Question Answering over Dialogue
Jian Liu, Dianbo Sui, Kang Liu and Jun Zhao - Handling Anomalies of Synthetic Questions in Unsupervised Question Answering
Giwon Hong, Junmo Kang, Doyeon Lim and Sung-Hyon Myaeng - Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages
Diptesh Kanojia, Raj Dabre, Shubham Dewangan, Pushpak Bhattacharyya, Gholamreza Haffari and Malhar Kulkarni - HateGAN: Adversarial Generative-Based Data Augmentation for Hate Speech Detection
RUI CAO and Roy Ka-Wei Lee - Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity
Hamza Harkous, Isabel Groves and Amir Saffari - Heterogeneous Graph Neural Networks to Predict What Happen Next
Jianming Zheng, Fei Cai, Yanxiang Ling and Honghui Chen - Heterogeneous Recycle Generation for Chinese Grammatical Error Correction
Charles Hinson, Hen-Hsen Huang and Hsin-Hsi Chen - Hidden Message Extraction: A Task Challenge and a Corpus
Gerardo Ocampo Diaz and Vincent Ng - Hierarchical Bi-Directional Self-Attention Networks for Paper Review Rating Recommendation
Zhongfen Deng, Hao Peng, Congying Xia, Jianxin Li, Lifang He and Philip Yu - Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism
Shirong Shen, Guilin Qi, Zhen Li, Sheng Bi and Lusheng Wang - Hierarchical Text Segmentation for Medieval Manuscripts
Amir Hazem, Beatrice Daille, Dominique Stutzmann and Christopher Kermorvan - Hierarchical Trivia Fact Extraction from Wikipedia Articles
Jingun Kwon, Hidetaka Kamigaito, Young-In Song and Manabu Okumura - HiTrans: A Transformer-Based Context- and Speaker-Sensitive Model for Emotion Detection in Conversations
Jingye Li, Donghong Ji, Fei Li, Meishan Zhang and Yijiang Liu - HOLMS: Alternative Summary Evaluation with Large Language Models
Yassine Mrabet and Dina Demner-Fushman - Homonym normalisation by word sense clustering: a case in Japanese
Kevin Heffernan and Yo Sato - How coherent are neural models of coherence?
Leila Pishdad, Federico Fancellu, Ran Zhang and Afsaneh Fazly - How Domain Terminology Affects Meeting Summarization Performance
Jia Jin Koay, Alexander Roustai, Xiaojin Dai, Dillon Burns, Alec Kerrigan and Fei Liu - How Far Does BERT Look At: Distance-based Clustering and Analysis of BERT's Attention
Yue Guan, Jingwen Leng, Chao Li, Quan Chen and Minyi Guo - How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text
Chihiro Shibata, Kei Uchiumi and Daichi Mochihashi - How Positive Are You: Text Style Transfer using Adaptive Style Embedding
Heejin Kim and Kyung-Ah Sohn - How Relevant Are Selectional Preferences for Transformer-based Language Models?
Eleni Metheniti, Tim Van de Cruys and Nabil Hathout - Human or Neural Translation?
Shivendra Bhardwaj, David Alfonso Hermelo, Phillippe Langlais, Gabriel Bernier-Colborne, Cyril Goutte and Michel Simard - Humans Meet Models on Object Naming: A New Dataset and Analysis
Carina Silberer, Sina Zarrieß, Matthijs Westera and Gemma Boleda - Hy-NLI: a Hybrid system for Natural Language Inference
Aikaterini-Lida Kalouli, Richard Crouch and Valeria de Paiva - I Know What You Asked: Graph Path Learning using AMR for Commonsense Reasoning
Jungwoo Lim, Dongsuk Oh, Yoonna Jang, Kisu Yang and Heuiseok Lim - Identifying Annotator Bias: A new IRT-based method for bias identification
Jacopo Amidei, Paul Piwek and Alistair Willis - Identifying Depressive Symptoms from Tweets: Figurative Language Enabled Multitask Learning Framework
Shweta Yadav, Jainish Chauhan, Joy Prakash Sain, Krishnaprasad Thirunarayan, Amit Sheth and Jeremiah Schumm - Identifying Motion Entities in Natural Language and A Case Study for Named Entity Recognition
Ngoc Phuoc An Vo, Irene Manotas, Vadim Sheinin and Octavian Popescu - Image Caption Generation for News Articles
Zhishen Yang and Naoaki Okazaki - Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games
Alessandro Suglia, Antonio Vergari, Ioannis Konstas, Yonatan Bisk, Emanuele Bastianelli, Andrea Vanzo and Oliver Lemon - Improving Abstractive Dialogue Summarization with Graph Structures and Topic Words
Lulu Zhao, Weiran Xu and Jun Guo - Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning
Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Kyunghyun Cho, Eneko Agirre and Gorka Azkune - Improving Document-Level Sentiment Analysis with User and Product Context
Chenyang Lyu, Jennifer Foster and Yvette Graham - Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation
Zhaohong Wan and Xiaojun Wan - Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution
David Q. Sun, Hadas Kotek, Christopher Klein, Mayank Gupta, William Li and Jason D. Williams - Improving Long-Tail Relation Extraction with Collaborating Relation-Augmented Attention
Yang Li, Tao Shen, Guodong Long, Jing Jiang, Tianyi Zhou and Chengqi Zhang - Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation
Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama and Eiichiro Sumita - Improving Relation Extraction with Relational Paraphrase Sentences
Junjie Yu, Tong Zhu, Wenliang Chen, Wei Zhang and Min Zhang - Improving Sentiment Analysis over non-English Tweets using Multilingual Transformers and Automatic Translation for Data-Augmentation
Valentin Barriere and Alexandra Balahur - Improving Spoken Language Understanding by Wisdom of Crowds
Koichiro Yoshino, Kana Ikeuchi, Katsuhito Sudoh and Satoshi Nakamura - Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation
Ruizhe Li, Xiao Li, Guanyi Chen and Chenghua Lin - Improving Word Embeddings through Iterative Refinement of Word- and Character-level Models
Phong Ha, Shanshan Zhang, Nemanja Djuric and Slobodan Vucetic - Inconsistencies in Crowdsourced Slot-Filling Annotations: A Typology and Identification Methods
Stefan Larson, Adrian Cheung, Anish Mahendran, Kevin Leach and Jonathan K. Kummerfeld - Incorporating Inner-word and Out-word Features for Mongolian Morphological Segmenta-tion
Na Liu, Xiangdong Su, Haoran Zhang, Guanglai Gao and Feilong Bao - Incorporating Noisy Length Constraints into Transformer with Length-aware Positional Encodings
Yui Oka, Katsuki Chousa, Katsuhito Sudoh and Satoshi Nakamura - Incorporating Syntax and Frame Semantics in Neural Network for Machine Reading Comprehension
Shaoru Guo, Yong Guan, Ru Li, Xiaoli Li and Hongye Tan - Increasing Learning Efficiency of Self-Attention Networks through Direct Position Interactions, Learnable Temperature, and Convoluted Attention
Philipp Dufter, Martin Schmitt and Hinrich Schütze - IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP
Fajri Koto, Afshin Rahimi, Jey Han Lau and Timothy Baldwin - Inducing Domain-Specific Sentiment Lexicons From Labeled Documents
SM Mazharul Islam, Xin Dong and Gerard de Melo - Inflating Topic Relevance with Ideology: A Case Study of Political Ideology Bias in Social Topic Detection Models
Meiqi Guo, Rebecca Hwa, Yu-Ru Lin and Wen-Ting Chung - Informative Manual Evaluation of Machine Translation Output
Maja Popović - Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism
Pan Xie, Zhi Cui, Xiuying Chen, XiaoHui Hu, Jianwei Cui and Bin Wang - Integrating Domain Terminology into Neural Machine Translation
Elise Michon, Josep Crego and Jean Senellart - Integrating External Event Knowledge for Script Learning
Shangwen Lv, Fuqing Zhu and Songlin Hu - Integrating User History into Heterogeneous Graph for Dialogue Act Recognition
Dong Wang, Ziran Li, Ying Shen and Haitao Zheng - Intent Mining from past conversations for Conversational Agent
Ajay Chatterjee and Shubhashis Sengupta - Interactive Key-Value Memory-augmented Attention for Image Paragraph Captioning
Chunpu Xu, Yu Li, Chengming Li, Xiang Ao, Min Yang and Jinwen Tian - Interactively-Propagative Attention Learning for Implicit Discourse Relation Recognition
Huibin Ruan, Yu Hong, Yang Xu, Zhen Huang, Guodong Zhou and Min Zhang - Intermediate Self-supervised Learning for Machine Translation Quality Estimation
Raphael Rubino and Eiichiro Sumita - Interpretable Multi-headed Attention for Abstractive Summarization ar Controllable Lengths
Ritesh Sarkhel, Moniba Keymanesh, Arnab Nandi and Srinivasan Parthasarathy - IntKB: A Verifiable Interactive Framework for Knowledge Base Completion
Bernhard Kratzwald, Guo Kunpeng, Stefan Feuerriegel and Dennis Diefenbach - Intra-/Inter-Interaction Network with Latent Interaction Modeling for Multi-turn Response Selection
Yang Deng, Wenxuan Zhang and Wai Lam - Intra-Correlation Encoding for Chinese Sentence Intention Matching
Xu Zhang, Yifeng Li, Wenpeng Lu, Ping Jian and Guoqiang Zhang - Intrinsic Quality Assessment of Arguments
Henning Wachsmuth and Till Werner - Invertible Tree Embeddings using a Cryptographic Role Embedding Scheme
Coleman Haley and Paul Smolensky - Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation
Shuhao Gu and Yang Feng - Is Killed More Significant than Fled? A Contextual Model for Salient Event Detection
Disha Jindal, Daniel Deutsch and Dan Roth - Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation
Bryan Eikema and Wilker Aziz - Joint Aspect Extraction and Sentiment Analysis with Directional Graph Convolutional Networks
Guimin Chen, Yuanhe Tian and Yan Song - Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams
Yuanhe Tian, Yan Song and Fei Xia - Joint Entity and Relation Extraction for Legal Documents with Legal Feature Enhancement
Yanguang Chen, Yuanyuan Sun, Zhihao Yang and Hongfei LIN - Joint Event Extraction with Hierarchical Policy Network
Peixin Huang, Xiang Zhao, Ryuichi Takanobu, Zhen Tan and Weidong Xiao - Joint Persian Word Segmentation Correction and Zero-Width Non-Joiner Recognition Using BERT
Ehsan Doostmohammadi, Minoo Nassajian and Adel Rahimi - Joint Transformer/RNN Architecture for Gesture Typing in Indic Languages
Emil Biju, Anirudh Sriram, Mitesh M. Khapra and Pratyush Kumar - Jointly Learning Aspect-Focused and Inter-Aspect Relations with Graph Convolutional Networks for Aspect Sentiment Analysis
Bin Liang, Rongdi Yin, Lin Gui, Jiachen Du and Ruifeng Xu - Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication
Ruize Wang, Zhongyu Wei, Piji Li, Ying Cheng, Haijun Shan, Ji Zhang, Qi Zhang and Xuanjing Huang - KeyGames: A Game Theoretic Approach to Automatic Keyphrase Extraction
Arnav Saxena, Mudit Mangal and Goonjan Jain - KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text Classification for Kinyarwanda and Kirundi
Rubungo Andre Niyongabo, Qu Hong, Julia Kreutzer and Li Huang - KnowDis: Knowledge Enhanced Data Augmentation for Event Causality Detection via Distant Supervision
Xinyu Zuo, Yubo Chen, Kang Liu and Jun Zhao - Knowledge Aware Emotion Recognition in Textual Conversations via Multi-Task Incremental Transformer
Duzhen Zhang, Xiuyi Chen, Shuang Xu and Bo Xu - Knowledge Base Embedding By Cooperative Knowledge Distillation
Raphaël Sourty, Jose G. Moreno, Lynda Tamine-Lechani and François-Paul Servant - Knowledge Graph Embedding with Atrous Convolution and Residual Learning
Feiliang Ren - Knowledge Graph Embeddings in Geometric Algebras
Chengjin Xu, Mojtaba Nayyeri, Yung-Yu Chen and Jens Lehmann - Knowledge Graph Enhanced Neural Machine Translation via Multi-task Learning on Sub-entity Granularity
Yang Zhao, Lu Xiang, Junnan Zhu, Jiajun Zhang, Yu Zhou and Chengqing Zong - Knowledge-Enhanced Natural Language Inference Based on Knowledge Graphs
Zikang Wang, Linjing Li and Daniel Zeng - Knowledge-enriched, Type-constrained and Grammar-guided Question Generation over Knowledge Bases
Sheng Bi, Xiya Cheng, Yuan-Fang Li, Yongzhen Wang and Guilin Qi - Label Correction Model for Aspect-based Sentiment Analysis
Qianlong Wang and Jiangtao Ren - LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression
Yujing Wang - Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Isaac Caswell, Theresa Breiner, Daan van Esch and Ankur Bapna - Language Model Transformers as Evaluators for Open-domain Dialogues
Rostislav Nedelchev, Ricardo Usbeck and Jens Lehmann - Language-Driven Region Pointer Advancement for Controllable Image Captioning
Annika Lindh, Robert Ross and John Kelleher - LAVA: Latent Action Spaces via Variational Auto-encoding for Dialogue Policy Optimization
Nurul Lubis, Christian Geishauser, Michael Heck, Hsien-chin Lin, Marco Moresi, Carel van Niekerk and Milica Gasic - Layer-wise Multi-view Learning for Neural Machine Translation
Qiang Wang, Yue Zhang, Tong Xiao and Jingbo Zhu - Learn to Combine Linguistic and Symbolic Information for Table-based Fact Verification
Qi Shi, Yu Zhang, Qingyu Yin and Ting Liu - Learn with Noisy Data via Unsupervised Loss Correction for Weakly Supervised Reading Comprehension
Xuemiao Zhang, Kun Zhou, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang and Junfei Liu - Learning distributed sentence vectors with bi-directional 3D convolutions
Bin Liu, Liang Wang and Guosheng Yin - Learning Efficient Task-Specific Meta-Embeddings with Word Prisms
Jingyi He, KC Tsiolis, Kian Kenyon-Dean and Jackie Chi Kit Cheung - Learning from Non-Binary Constituency Trees via Tensor Decomposition
Daniele Castellana and Davide Bacciu - Learning Health-Bots from Training Data that was Automatically Created using Paraphrase Detection and Expert Knowledge
Anna Liednikova, Philippe Jolivet, Alexandre Durand-Salmon and Claire Gardent - Learning Semantic Correspondences from Noisy Data-text Pairs \\by Local-to-Global Alignments
Feng Nie, Jinpeng Wang and Chin-Yew Lin - Learning to Decouple Relations: Few-Shot Relation Classification with Entity-Guided Attention and Confusion-Aware Training
Yingyao Wang, Junwei Bao, Guangyi Liu, Youzheng Wu, Xiaodong He, Bowen Zhou and Tiejun Zhao - Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks
Trapit Bansal, Rishikesh Jha and Andrew McCallum - Learning to Prune Dependency Trees with Rethinking for Neural Relation Extraction
Bowen Yu, Xue Mengge, Zhenyu Zhang, Tingwen Liu, Wang Yubin and Bin Wang - Learning with Contrastive Examples for Data-to-Text Generation
Yui Uehara, Tatsuya Ishigaki, Kasumi Aoki, Hiroshi Noji, Keiichi Goshima, Ichiro Kobayashi, Hiroya Takamura and Yusuke Miyao - Leveraging Discourse Rewards for Document-Level Neural Machine Translation
Inigo Jauregi Unanue, Nazanin Esmaili, Gholamreza Haffari and Massimo Piccardi - Leveraging HTML in Free Text Web Named Entity Recognition
Colin Ashby and David Weir - Leveraging WordNet Paths for Neural Hypernym Prediction
Yejin Cho, Juan Diego Rodriguez, Yifan Gao and Katrin Erk - Lexical Relation Mining in Neural Word Embeddings
Aishwarya Jadhav, Yifat Amir and Zachary Pardos - Lexical Semantic Analysis of Meaning Representation
Daniel Hershcovich, Nathan Schneider, Dotan Dvir, Jakob Prange, Miryam de Lhoneux and Omri Abend - Lin: Unsupervised Extraction of Tasks from Textual Communication
Parth Diwanji, Hui Guo, Munindar Singh and Anup Kalia - Linguistic Profiling of a Neural Language Model
Alessio Miaschi, Dominique Brunato, Felice Dell'Orletta and Giulia Venturi - Linguistic Regularities in Sentence Embeddings
Xunjie Zhu and Gerard de Melo - Living Machines: A study of atypical animacy
Mariona Coll Ardanuy, Federico Nanni, Kaspar Beelen, Kasra Hosseini, Ruth Ahnert, Jon Lawrence, Katherine McDonough, Giorgia Tolfo, Daniel CS Wilson and Barbara McGillivray - Localness Matters: The Evolved Cross-Attention for Non-Autoregressive Translation
Liang Ding, Di Wu, Longyue Wang, Dacheng Tao and Zhaopeng Tu - Logic-guided Semantic Representation Learning for Zero-Shot Relation Classification
Juan Li, Ruoxu Wang, Ningyu Zhang, Wen Zhang, Fan Yang and Huajun Chen - Lost in Back-Translation: Emotion Preservation in Neural Machine Translation
Enrica Troiano, Roman Klinger and Sebastian Padó - Making the Best Use of Review Summary for Sentiment Analysis
Sen Yang, Leyang Cui, Jun Xie and Yue Zhang - Mama/Papa, Is this Text for Me?
Rashedur Rahman, Gwénolé Lecorvé, Aline Étienne, Delphine Battistelli, Nicolas Béchet and Jonathan Chevelu - Manifold Learning-based Word Representation Refinement Incorporating Global and Local Information
Wenyu Zhao, Dong Zhou, LIN LI and Jinjun Chen - Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis
Olga Majewska, Ivan Vulić, Diana McCarthy and Anna Korhonen - ManyEnt: A Dataset for Few-shot Entity Classification
Markus Eberts, Kevin Pech and Adrian Ulges - Mark-Evaluate: Assessing Language Generation using Population Estimation Methods
Gonçalo Mordido and Christoph Meinel - Measuring Correlation-to-Causation Exaggeration in Press Releases
Bei Yu, Jun Wang, Lu Guo and Yingya Li - Medical Knowledge-enriched Textual Entailment Framework
Shweta Yadav, Vishal Pallagani and Amit Sheth - MedWriter: Knowledge-Aware Medical Text Generation
Youcheng Pan, Qingcai Chen, Weihua Peng, Xiaolong Wang, Baotian Hu, Xin Liu, Junying Chen and Wenxiu Zhou - Meet Changes with Constancy: Learning Invariance in Multi-Source Translation
Jianfeng Liu, Ling Luo, Xiang Ao, Yan Song, Haoran Xu and Jian Ye - MEISD: A Multimodal Multi-Label Emotion, Intensity and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations
Mauajama Firdaus, Hardik Chauhan, Asif Ekbal and Pushpak Bhattacharyya - Meta-Information Guided Meta-Learning for Few-Shot Relation Classification
Bowen Dong, Yuan Yao, Ruobing Xie, Tianyu Gao, Xu Han, Zhiyuan Liu, Fen Lin, Leyu Lin and Maosong Sun - METNet: A Mutual Enhanced Transformation Network for Aspect-based Sentiment Analysis
Bin Jiang, Jing Hou, Wanyue Zhou, Chao Yang, Shihan Wang and Liang Pang - Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics
Manik Bhandari, Pranav Narayan Gour, Atabak Ashfaq and Pengfei Liu - Mining Crowdsourcing Problems from Discussion Forums of Workers
Zahra Nouri, Henning Wachsmuth and Gregor Engels - Mixup-Transfomer: Dynamic Data Augmentation for NLP Tasks
Lichao Sun, Congying Xia, Wenpeng Yin, Tingting Liang, Philip Yu and Lifang He - Modality Enriched Neural Network for Metaphor Detection
Mingyu WAN and Baixi Xing - Modeling Event Salience in Narratives via Barthes’ Cardinal Functions
Takaki Otake, Sho Yokoi, Naoya Inoue, Ryo Takahashi, Tatsuki Kuribayashi and Kentaro Inui - Modeling Evolution of Message Interaction for Rumor Resolution
Zhongyu Wei, Qi ZHANG and Xuanjing Huang - Modeling language evolution and feature dynamics in a realistic geographic environment
Rhea Kapur and Phillip Rogers - Modeling Local Contexts for Joint Dialogue Act Recognition and Sentiment Classification with Bi-channel Dynamic Convolutions
Jingye Li, Hao Fei and Donghong Ji - Modelling Long-distance Node Relations for KBQA with Global Dynamic Graph
Xu Wang, Shuai Zhao, Jiale Han, Bo Cheng, Hao Yang, Jianchang Ao and Zhenzi Li - Molweni: A Challenge Multiparty Dialogues-based Machine Reading Comprehension Dataset with Discourse Structure
Jiaqi Li, Ming Liu, Min-Yen Kan, Zihao Zheng, Zekun Wang, Wenqiang Lei, Ting Liu and Bing Qin - Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations
Sheng Liang, Philipp Dufter and Hinrich Schütze - Morph Completion for Morphologically Rich Languages
William Lane and Steven Bird - Morphological disambiguation from stemming data
Antoine Nzeyimana - Morphologically Aware Word-Level Translation
Paula Czarnowska, Sebastian Ruder, Ryan Cotterell and Ann Copestake - Multi-choice Relational Reasoning for Machine Reading Comprehension
Wuya Chen, Xiaojun Quan, Chunyu Kit, Zhengcheng Min and Jiahai Wang - Multi-grained Chinese Word Segmentation with Weakly Labeled Data
Chen Gong, Zhenghua Li, Bowei Zou and Min Zhang - Multi-label Fine-grained Sexism Classification using Semi-supervised Multi-task Learning
Harika Abburi, Pulkit Parikh, Niyati Chhaya and Vasudeva Varma - Multi-level Alignment Pretraining for Multi-lingual Semantic Parsing
Bo Shao, Yeyun Gong, Weizhen Qi, Nan Duan and Xiaola Lin - Multi-Task Learning for Knowledge Graph Completion with Pre-trained Language Models
Bosung Kim, Taesuk Hong, Youngjoong Ko and Jungyun Seo - Multi-Word Lexical Simplification
Piotr Przybyła and Matthew Shardlow - Multilingual Epidemiological Text Classification: A Comparative Study
Stephen Mutuvi, Emanuela Boros, Antoine Doucet, Adam Jatowt, Gaël Lejeune and Moses Odeo - Multilingual Irony Detection with Dependency Syntax and Neural Models
Alessandra Teresa Cignarella, Valerio Basile, Manuela Sanguinetti, Cristina Bosco, Farah Benamara and Paolo Rosso - Multilingual Neural RST Discourse Parsing
Zhengyuan Liu, Ke Shi and Nancy Chen - Multimodal Review Generation with Privacy and Fairness Awareness
Xuan-Son Vu, Thanh-Son Nguyen, Duc-Trong Le and Lili Jiang - Multimodal Sentence Summarization via Multimodal Selective Encoding
Haoran Li, Junnan Zhu, Jiajun Zhang, Xiaodong He and Chengqing Zong - Multimodal Topic-Enriched Auxiliary Learning for Depression Detection
Minghui An, Jingjing Wang, Shoushan Li and Guodong Zhou - Multitask Easy-First Dependency Parsing: Exploiting Complementarities of Different Dependency Representations
Yash Kankanampati, Joseph Le Roux, Nadi Tomeh, Dima Taji and Nizar Habash - Multitask Learning-Based Neural Bridging Reference Resolution
Juntao Yu and Massimo Poesio - MZET: Memory Augmented Zero-Shot Fine-grained Named Entity Typing
Tao Zhang, Congying Xia, Chun-Ta Lu and Philip Yu - Named Entity Recognition for Chinese biomedical patents
Yuting Hu and Suzan Verberne - Native-like Expression Identification by Contrasting Native and Proficient Second Language Speakers
Oleksandr Harust, Yugo Murawaki and Sadao Kurohashi - Neural Approaches for Natural Language Interfaces to Databases: A Survey
Radu Cristian Alexandru Iacob, Florin Brad, Elena-Simona APOSTOL, Ciprian-Octavian Truică, Ionel Alexandru Hosu and Traian Rebedea - Neural Automated Essay Scoring Incorporating Handcrafted Features
Masaki Uto, Yikuan Xie and Maomi Ueno - Neural Language Modeling for Named Entity Recognition
Zhihong Lei, Weiyue Wang, Christian Dugast and Hermann Ney - Neural Machine Translation Models with Back-Translation for the Extremely Low-Resource Indigenous Language Bribri
Isaac Feldman and Rolando Coto-Solano - Neural Networks approaches focused on French Spoken Language Understanding: application to the MEDIA Evaluation Task
Sahar Ghannay, Christophe Servan and Sophie Rosset - Neural text normalization leveraging similarities of strings and sounds
Riku Kawamura, Tatsuya Aoki, Hidetaka Kamigaito, Hiroya Takamura and Manabu Okumura - Neural Transduction for Multilingual Lexical Translation
Dylan Lewis, Winston Wu, Arya D. McCarthy and David Yarowsky - Neural Unsupervised Domain Adaptation in NLP---A Survey
Alan Ramponi and Barbara Plank - New Benchmark Corpus and Models for Fine-grained Event Classification: To BERT or not to BERT?
Jakub Piskorski, Jacek Haneczok and Guillaume Jacquet - News Editorials: Towards Summarizing Long Argumentative Texts
Shahbaz Syed, Roxanne El Baff, Johannes Kiesel, Khalid Al Khatib, Benno Stein and Martin Potthast - Noise Isn't Always Negative: Countering Exposure Bias in Sequence-to-Sequence Inflection Models
Garrett Nicolai and Miikka Silfverberg - Normalizing Compositional Structures Across Graphbanks
Lucia Donatelli, Jonas Groschwitz, Matthias Lindemann, Alexander Koller and Pia Weißenhorn - NUT-RC: Noisy User-generated Text-oriented Reading Comprehension
Rongtao Huang, Bowei Zou, Yu Hong, Wei Zhang, AiTi Aw and Guodong Zhou - NYTWIT: A Dataset of Novel Words in the New York Times
Yuval Pinter, Cassandra L. Jacobs and Max Bittker - Offensive Language Detection on Video Live Streaming Chat
Zhiwei Gao, Shuntaro Yada, Shoko Wakamiya and Eiji Aramaki - On the Consistency for E-commerce Product Summarization
Peng Yuan, Haoran Li, Song Xu, Youzheng Wu, Xiaodong He and Bowen Zhou - On the Helpfulness of Document Context to Sentence Simplification
Renliang Sun, Zhe Lin and Xiaojun Wan - On the Practical Ability of Recurrent Neural Networks to Recognize Context-Free Languages
Satwik Bhattamishra, Kabir Ahuja and Navin Goyal - One Comment from One Perspective: An Effective Strategy for Enhancing Automatic Music Comment
Tengfei Huo, Zhiqiang Liu, Jinchao Zhang, Cheng Niu and Jie Zhou - Online Versus Offline NMT Quality: An In-depth Analysis on English-German and German-English
Maha Elbayad, Michael Ustaszewski, Emmanuelle Esperança-Rodier, Francis Brunet-Manquat, Jakob Verbeek and Laurent Besacier - Optimized Transformer for Low-resource Neural Machine Translation
Ali Araabi and Christof Monz - Out-of-Task Training for Dialog State Tracking Models
Michael Heck, Christian Geishauser, Hsien-chin Lin, Nurul Lubis, Marco Moresi, Carel van Niekerk and Milica Gasic - Parsers Know Best: German PP Attachment Revisited
Bich-Ngoc Do and Ines Rehbein - PEDNet: A Persona Enhanced Dual Alternating Learning Network for Conversational Response Generation
Bin Jiang, Wanyue Zhou, Jingxu Yang, Chao Yang, Shihan Wang and Liang Pang - Personalized Multimodal Feedback Generation in Education
Haochen Liu, Zitao Liu, Zhongqin Wu and Jiliang Tang - PG-GSQL: Pointer-Generator Network with Guide Decoding for Cross-Domain Context-Dependent Text-to-SQL Generation
Huajie Wang, Mei Li and Lei Chen - PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents
Ryo Fujii, Masato Mita, Kaori Abe, Kazuaki Hanawa, Makoto Morishita, Jun Suzuki and Kentaro Inui - Pick a Fight or Bite your Tongue: Investigation of Gender Differences in Figurative Language Usage
Ella Rabinovich, Hila Gonen and Suzanne Stevenson - Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis
Michael Lepori and R. Thomas McCoy - PoD: Positional Dependency-Based Word Embedding for Aspect Term Extraction
Yichun Yin, Chenguang Wang and Ming Zhang - Pointing to Select: A Fast Pointer-LSTM for Long Text Classification
Jinhua Du, Yan Huang and Karo Moilanen - Pointing to Subwords for Generating Function Names in Source Code
Shogo Fujita, Hidetaka Kamigaito, Hiroya Takamura and Manabu Okumura - Porous Lattice Transformer Encoder for Chinese NER
Xue Mengge, Bowen Yu, Tingwen Liu, Yue Zhang, Erli Meng and Bin Wang - Pre-trained Language Model Based Active Learning for Sentence Matching
Guirong Bai, Shizhu He, Kang Liu, Jun Zhao and Zaiqing Nie - Predicting Clickbait Strength in Online Social Media
Vijayasaradhi Indurthi, Bakhtiyar Syed, Manish Gupta and Vasudeva Varma - Predicting Personal Opinion on Future Events with Fingerprints
Fan Yang, Eduard Dragut and Arjun Mukherjee - Predicting Stance Change Using Modular Architectures
Aldo Porco and Dan Goldwasser - Priorless Recurrent Networks Learn Curiously
Jeff Mitchell and Jeffrey Bowers - Probabilistic Interpretation with Bag of Latent Features for Text Classification
Phong Le and Willem Zuidema - Probing classifiers may just learn from linear context features
Jenny Kunz and Marco Kuhlmann - Probing Multilingual BERT for Genetic and Typological Signals
Taraka Rama, Lisa Beinborn and Steffen Eger - Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case
Adam Dahlgren Lindström, Johanna Björklund, Suna Bensch and Frank Drewes - QANom: Question-Answer driven SRL for Nominalizations
Ayal Klein, Jonathan Mamou, Valentina Pyatkin, Daniela Stepanov, Hangfeng He, Dan Roth, Luke Zettlemoyer and Ido Dagan - QE-Anon: Translation Quality Estimation with Cross-lingual Transformers
Tharindu Ranasinghe, Constantin Orasan and Ruslan Mitkov - R-VGAE: Relational-variational Graph Autoencoder for Unsupervised Prerequisite Chain Learning
Irene Li, Alexander Fabbri, Swapnil Hingmire and Dragomir Radev - RANCC: Rationalizing Neural Networks via Concept Clustering
Housam Khalifa Bashier, Mi-Young Kim and Randy Goebel - RatE: Relation-Adaptive Translating Embedding for Knowledge Graph Completion
Hao Huang, Guodong Long, Tao Shen, Jing Jiang and Chengqi Zhang - Re-framing Incremental Deep Language Models for Dialogue Processing with Multi-task Learning
Morteza Rohanian and Julian Hough - Read and Reason with MuSeRC and RuCoS: Datasets for Machine Reading Comprehension for Russian
Alena Fenogenova, Vladislav Mikhailov and Denis Shevelev - Real-Valued Logics for Typological Universals: Framework and Application
Tillmann Dönicke, Xiang Yu and Jonas Kuhn - Reasoning Requirements for Indirect Speech Act Interpretation
Vasanth Sarathy, Alexander Tsuetaki, Antonio Roque and Matthias Scheutz - Reasoning Step-by-Step: Temporal Sentence Localization in Videos via Deep Rectification-Modulation Network
Daizong Liu, Xiaoye Qu, Jianfeng Dong and Pan Zhou - Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey
Samuel Louvan and Bernardo Magnini - Recognizing Paragraph-level Chinese Discourse Relation via Discourse Argument Graph
Zhenhua Sun, Peifeng Li and Qiaoming Zhu - Reference and Document Aware Semantic Evaluation Methods for Korean Language Summarization
Dongyub Lee, Myeong Cheol Shin, Taesun Whang, Seungwoo Cho, Byeongil Ko, Daniel Lee, EungGyun Kim and Jaechoon Jo - Referring to what you know and do not know: Making Referring Expression Generation Models Generalize To Unseen Entities
Rossana Cunha, Thiago Castro Ferreira, Adriana Pagano and Fabio Alves - Regrexit or not Regrexit: Aspect-based Sentiment Analysis in Polarized Contexts
Vorakit Vorakitphan, Marco Guerini, Elena Cabrio and Serena Villata - Regularized Attentive Capsule Network for Overlapped Relation Extraction
Tianyi Liu, Xiangyu Lin, Weijia Jia, Mingliang Zhou and Wei Zhao - Reinforced Multi-task Approach for Multi-hop Question Generation
Deepak Gupta, Hardik Chauhan, Ravi Tej Akella, Asif Ekbal and Pushpak Bhattacharyya - Resource Constrained Dialog Policy Via Differentiable Inductive Logic
Zhenpeng Zhou, Ahmad Beirami, Paul Crook, Pararth Shah, Rajen Subba and Alborz Geramifard - Rethinking Residual Connection with Layer Normalization
Fenglin Liu, Xuancheng Ren, Zhiyuan Zhang and Xu SUN - Rethinking the Value of Transformer Components
Wenxuan Wang and Zhaopeng Tu - Retrieving Inductive Bias of Attribute as Reference for Review Generation
Jihyeok Kim, Seungtaek Choi, Reinald Kim Amplayo and Seung-won Hwang - Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework
Akshay Bhola, Kishaloy Halder, Animesh Prasad and Min-Yen Kan - Rhetoric, Logic, and Dialectic: Advancing Theory-based Argument Quality Assessment in Natural Language Processing
Anne Lauscher, Lily Ng, Courtney Napoles and Joel Tetreault - RIVA: A Pre-trained Tweet Multimodal Model Based on Text-image Relation for Multimodal NER
Lin Sun, Jiquan Wang, Yindu Su, Fangsheng Weng, Yuxuan Sun, Zengwei Zheng and Yuanyi Chen - RoBERT – A Romanian BERT Model
Mihai Masala, Stefan Ruseti and Mihai Dascalu - Robust Machine Reading Comprehension by Learning Soft labels
Zhenyu Zhao, Shuangzhi Wu, Muyun Yang, Kehai Chen and Tiejun Zhao - Robust Unsupervised Neural Machine Translation with Adversarial Denoising Training
Haipeng Sun, Rui Wang, Kehai Chen, Xugang Lu, Masao Utiyama, Eiichiro Sumita and Tiejun Zhao - RuSemShift: a dataset of historical lexical semantic change in Russian
Julia Rodina and Andrey Kutuzov - SaSAKE: Syntax and Semantics Aware Keyphrase Extraction from Research Papers
T.Y.S.S Santosh, Debarshi Kumar Sanyal, Plaban Kumar Bhowmick and Partha Pratim Das - Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model
Sungrae Park, Geewook Kim, JUNYEOP LEE, Junbum Cha, Ji-Hoon Kim and Hwalsuk Lee - Schema Aware Semantic Reasoning for Interpreting Natural LanguageQueries in Enterprise Settings
Jaydeep Sen, Tanaya Babtiwale, Kanishk Saxena, Yash Butala, Sumit Bhatia and Karthik Sankaranarayanan - Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning
Seoyeon Park and Cornelia Caragea - Second-Order Unsupervised Neural Dependency Parsing
Songlin Yang, Yong Jiang, Wenjuan Han and Kewei Tu - Seeing Both the Forest and the Trees: Multi-head Attention for Joint Classification on Different Compositional Levels
Miruna Pislar and Marek Rei - Semantic Role Labeling with Heterogeneous Syntactic Knowledge
Qingrong Xia, Rui Wang, Zhenghua Li, Yue Zhang and Min Zhang - Semi-supervised Autoencoding Projective Dependency Parsing
Xiao Zhang and Dan Goldwasser - Semi-Supervised Dependency Parsing with Arc-Factored Variational Autoencoding
Ge Wang and Kewei Tu - Semi-supervised Domain Adaptation for Dependency Parsing via Improved Contextualized Word Representations
Ying Li, Zhenghua Li and Min Zhang - Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities
Hao Zhang, Jae Ro and Richard Sproat - Sentence Matching with Syntax- and Semantics-Aware BERT
Tao Liu, Xin Wang, Chengguo Lv, Ranran Zhen and Guohong Fu - Sentiment Analysis for Emotional Speech Synthesis in a News Dialogue System
Hiroaki Takatsu, Ryota Ando, Yoichi Matsuyama and Tetsunori Kobayashi - Sentiment Forecasting in Dialog
Zhongqing Wang, Xiujun Zhu, Yue Zhang, Shoushan Li and Guodong Zhou - SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis
Jie Zhou, Junfeng Tian, Rui Wang, Yuanbin Wu, Wenming Xiao and liang he - Similarity or deeper understanding? Analyzing the TED-Q dataset of evoked questions
Matthijs Westera, Jacopo Amidei and Laia Mayol - Situated and Interactive Multimodal Conversations
Seungwhan Moon, Satwik Kottur, Paul Crook, Ankita De, Shivani Poddar, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami, Eunjoon Cho, Rajen Subba and Alborz Geramifard - SLICE: Supersense-based Lightweight Interpretable Contextual Embeddings
Cindy ALOUI, Alexis Nasr, Lucie Barque and Carlos Ramisch - Solving Math Word Problems with Multi-Encoders and Multi-Decoders
Yibin Shen and Cheqing Jin - SOME: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction
Ryoma Yoshimura, Masahiro Kaneko, Tomoyuki Kajiwara and Mamoru Komachi - Span-based Joint Entity and Relation Extraction with Attention-based Span-specific and Contextual Semantic Representations
bin ji, Shasha Li, Jie Yu, Jun Ma, Qingbo Wu and Yusong Tan - SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP
Katsuki Chousa, Masaaki Nagata and Masaaki Nishino - Speaker-change Aware CRF for Dialogue Act Classification
Guokan Shang, Antoine Tixier, Michalis Vazirgiannis and Jean-Pierre Lorré - Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity
Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti, Anna Korhonen and Goran Glavaš - Specializing Word Vectors by Spectral Decomposition on Heterogeneously Twisted Graphs
Yuanhang Ren and Ye Du - Spotting Text-to-Text Patterns for Multiple-Choice Question Answering
Jheng-Hong Yang, Sheng-Chieh Lin, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang and Jimmy Lin - SQL Generation via Machine Reading Comprehension
ZEYU YAN, Jianqiang Ma, Yang Zhang and Jianping Shen - Statistical Parsing of Tree Wrapping Grammars
Tatiana Bladier, Jakub Waszczuk and Laura Kallmeyer - Story Generation with Rich Details
Fangzhou Zhai, Vera Demberg and Alexander Koller - Studying Taxonomy Enrichment on Diachronic WordNet Versions
Irina Nikishina, Varvara Logacheva, Alexander Panchenko and Natalia Loukachevitch - Style versus Content: A distinction without a (learnable) difference?
Somayeh Jafaritazehjani, Gwénolé Lecorvé, Damien Lolive and John Kelleher - Summarize before Aggregate: A Global-to-local Heterogeneous Graph Inference Network for Conversational Emotion Recognition
Dongming Sheng, Dong Wang, Ying Shen, Haitao Zheng and Haozhuang Liu - Summarizing Medical Conversations via Identifying Important Utterances
Yan Song, Yuanhe Tian, Nan Wang and Fei Xia - SumTitles: a Summarization Dataset with Low Extractiveness
Valentin Malykh, Konstantin Chernis, Ekaterina Artemova and Irina Piontkovskaya - Supervised Visual Attention for Multimodal Neural Machine Translation
Tetsuroh Nishihara, Akihiro Tamura, Takashi Ninomiya, Yutaro Omote and Hideki Nakayama - SWAFN: Sentimental Words Aware Fusion Network for Multimodal Sentiment Analysis
Minping Chen and Xia Li - Syllable-based Neural Thai Word Segmentation
Pattarawat Chormai, Ponrawee Prasertsom, Jin Cheevaprawatdomrong and Attapol Rutherford - Synonym Knowledge Enhanced Reader for Chinese Idiom Reading Comprehension
Siyu Long, Ran Wang, Kun Tao, Jiali Zeng and Xinyu Dai - Syntactic Graph Convolutional Network for Spoken Language Understanding
Keqing He, Shuyu Lei, Yushu Yang, Huixing Jiang and Zhongyuan Wang - Syntactically Aware Cross-Domain Aspect and Opinion Terms Extraction
Oren Pereg, Daniel Korat and Moshe Wasserblat - Syntax-Aware Graph Attention Network for Aspect-Level Sentiment Classification
Lianzhe Huang, Xin Sun, Sujian Li, Linhao Zhang and Houfeng Wang - Taking the Correction Difficulty into Account in Grammatical Error Correction Evaluation
Takumi Gotou, Ryo Nagata, Masato Mita and Kazuaki Hanawa - Target Word Masking for Location Metonymy Resolution
Haonan Li, Maria Vasardani, Martin Tomko and Timothy Baldwin - Task-Aware Representation of Sentences for Generic Text Classification
Kishaloy Halder, Alan Akbik, Josip Krapac and Roland Vollgraf - Temporal Relations Annotation and Extrapolation Based on Semi-intervals and Boundig Relations
Alejandro Pimentel, Gemma Bel Enguix, Gerardo Sierra Martínez and Azucena Montes - TeRo: A Time-aware Knowledge Graph Embedding via Temporal Rotation
Chengjin Xu, Mojtaba Nayyeri, Fouad Alkhoury, Hamed Shariat Yazdi and Jens Lehmann - Text Classification by Contrastive Learning and Cross-lingual Data Augmentation for Alzheimer’s Disease Detection
Zhiqiang Guo, Zhaoci Liu, Zhenhua Ling, Shijin Wang, Lingjing Jin and Yunxia Li - The ApposCorpus: a new multilingual, multi-domain dataset for factual appositive generation
Yova Kementchedjhieva, Di Lu and Joel Tetreault - The Devil is in the Details: Evaluating Limitations of Transformer-based Methods for Granular Tasks
Brihi Joshi, Leonardo Neves, Neil Shah and Francesco Barbieri - The Indigenous Languages Technology project: An empowerment-oriented approach to developing language software
Roland Kuhn, Fineen Davis, Alain Désilets, Eric Joanis, Anna Kazantseva, Rebecca Knowles, Patrick Littell, Delaney Lothian, Aidan Pine, Caroline Running Wolf, Eddie Santos, Darlene Stewart, Gilles Boulianne, Vishwa Gupta, Brian Maracle Owennatékha, Akwiratékha’ Martin, Christopher Cox, Marie-Odile Junker, Olivia Sammons, Delasie Torkornoo, Nathan Thanyehténhas Brinklow, Sara Child, Benoît Farley, David Huggins-Daines, Daisy Rosenblum and Heather Souter - The SADID Evaluation Datasets for Low-Resource Spoken Language Machine Translation of Arabic Dialects
Wael Abid - The Transference Architecture for Automatic Post-Editing
Santanu Pal, Hongfei Xu, Nico Herbig, Sudip Kumar Naskar, Antonio Krüger and Josef van Genabith - The Two Shades of Dubbing in Neural Machine Translation
Alina Karakanta, Supratik Bhattacharya, Shravan Nayak, Timo Baumann, Matteo Negri and Marco Turchi - TIMBERT: Toponym Identifier For The Medical Domain Based on BERT
MohammadReza Davari, Leila Kosseim and Tien Bui - Tiny Word Embeddings Using Globally Informed Reconstruction
Sora Ohashi, Mao Isogawa, Tomoyuki Kajiwara and Yuki Arase - To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding?
Quynh Do, Judith Gaspers, Tobias Roeding and Melanie Bradford - ToHRE: A Top-Down Classification Strategy with Hierarchical Bag Representation for Distantly Supervised Relation Extraction
Erxin Yu, Wenjuan Han, Yuan Tian and Yi Chang - Token Drop mechanism for Neural Machine Translation
Huaao Zhang, Shigui Qiu, Xiangyu Duan and Min Zhang - Topic-driven Ensemble for Online Advertising Generation
Egor Nevezhin, Nikolay Butakov, Maria Khodorchenko, Maxim Petrov and Denis Nasonov - Topic-relevant Response Generation using Optimal Transport for an Open-domain Dialog System
Shuying Zhang, Tianyu Zhao and Tatsuya Kawahara - Towards A Friendly Online Community: An Unsupervised Style Transfer Framework for Profanity Redaction
Minh Tran, Yipeng Zhang and Mohammad Soleymani - Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction
Tong Zhu, Haitao Wang, Junjie Yu, Wenliang Chen, Wei Zhang and Min Zhang - Towards automatically generating Questions under Discussion to link information and discourse structure
Kordula De Kuthy, Madeeswaran Kannan, Haemanth Santhi Ponnusamy and Detmar Meurers - Towards Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
Weipeng Huang, Xingyi Cheng, Kunlong Chen, Taifeng Wang and Wei Chu - Towards Instance-Level Parser Selection for Cross-Lingual Transfer of Dependency Parsers
Robert Litschko, Ivan Vulić, Željko Agić and Goran Glavaš - Towards Knowledge-Augmented Visual Question Answering
Maryam Ziaeefard and Freddy Lecue - Towards Privacy by Design in Learner Corpora Research: A Case of On-the-fly Pseudonymization of Swedish Learner Essays
Elena Volodina, Yousuf Ali Mohammed, Sandra Derbring, Arild Matsson and Beata Megyesi - Towards the First Machine Translation System for Sumerian Transliterations
Ravneet Punia and Niko Schenk - Towards Topic-Guided Conversational Recommender System
Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, Xiaoke Wang and Ji-Rong Wen - TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking
Yucheng Wang, Bowen Yu, Yueyang Zhang, Tingwen Liu, Hongsong Zhu and Limin Sun - Train Once, and Decode As You Like
Chao Tian, Yifei Wang, Hao Cheng, Yijiang Lian and Zhihua Zhang - Transformation of Dense and Sparse Text Representations
Wenpeng Hu, Mengyu Wang, Bing Liu, Feng Ji, Jinwen Ma and Dongyan Zhao - Translation vs. Dialogue: A Comparative Analysis of Sequence-to-Sequence Modeling
Wenpeng Hu, Ran Le, Bing Liu, Jinwen Ma, Dongyan Zhao and Rui Yan - Tree Representations in Transition System for RST Parsing
Jinfen Li and Lu Xiao - Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet
Bairu Hou, Fanchao Qi, Yuan Zang, Xurui Zhang, Zhiyuan Liu and Maosong Sun - TWEETSUM: Event oriented Social Summarization Dataset
Ruifang He, Liangliang Zhao and Huanyu Liu - Two-level classification for dialogue act recognition in task-oriented dialogues
Philippe Blache, Massina Abderrahmane, Stéphane Rauzy, Magalie Ochs and Houda Oufaida - Understanding Pre-trained BERT for Aspect-based Sentiment Analysis
Hu Xu, Lei Shu, Philip Yu and Bing Liu - Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English
Gongbo Tang, Rico Sennrich and Joakim Nivre - Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation
Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz and Felipe Sánchez-Martínez - Understanding Translationese in Multi-view Embedding Spaces
Koel Dutta Chowdhury, Cristina España-Bonet and Josef van Genabith - Understanding Unnatural Questions Improves Reasoning over Text
Xiaoyu Guo, Yuan-Fang Li and Gholamreza Haffari - Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis
Michael Lepori - Unifying Input and Output Smoothing in Neural Machine Translation
Yingbo Gao, Baohao Liao and Hermann Ney - Unleashing the Power of Neural Discourse Parsers - A Context and Structure Aware Approach Using Large Scale Pretraining
Grigorii Guz, Patrick Huber and Giuseppe Carenini - Unsupervised Deep Language and Dialect Identification for Short Texts
Koustava Goswami, Rajdeep Sarkar, Bharathi Raja Chakravarthi, Theodorus Fransen and John P. McCrae - Unsupervised Fact Checking by Counter-Weighted Positive and Negative Evidential Paths in A Knowledge Graph
Jiseong Kim and KEY-SUN CHOI - Unsupervised Fine-tuning for Text Clustering
Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang and Ming Zhou - User Memory Reasoning for Conversational Recommendation
Hu Xu, Seungwhan Moon, Honglei Liu, Bing Liu, Pararth Shah, Bing Liu and Philip Yu - Using a Penalty-based Loss Re-estimation Method to Improve Implicit Discourse Relation Classification
xiao li, Yu Hong, Huibin Ruan and Zhen Huang - Using Bilingual Patents for Translation Training
John Lee, Benjamin Tsou and Tianyuan Cai - Using Eye-tracking Data to Predict the Readability of Brazilian Portuguese Sentences in Single-task, Multi-task and Sequential Transfer Learning Approaches
Sidney Evaldo Leal, Erica dos Santos Rodrigues and Sandra Aluísio - Utilizing Subword Entities in Character-Level Sequence-to-Sequence Lemmatization Models
Nasser Zalmout and Nizar Habash - Variation in Coreference Strategies across Genres and Production Media
Berfin Aktaş and Manfred Stede - Variational Autoencoder with Embedded Student-t Mixture Model for Authorship Attribution
Benedikt Boenninghoff, Steffen Zeiler, Robert Nickel and Dorothea Kolossa - Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
Steffen Eger and Martin Kerscher - Verbal Multiword Expression Identification: Do We Need a Sledgehammer to Crack a Nut?
Caroline Pasquer, Agata Savary, Carlos Ramisch and Jean-Yves Antoine - VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks
Caren Han, SIQU LONG, Siwen Luo, Kunze Wang and Josiah Poon - Visual-Textual Alignment for Graph Inference in Visual Dialog
Tianling Jiang, Yi Ji, Chunping Liu and Hailin Shao - Weighed Domain-Invariant Representation Learning for Cross-domain Sentiment Analysis
Minlong Peng and Qi Zhang - What Can We Learn from Noun Substitutions in Revision Histories?
Talita Anthonio and Michael Roth - What Does This Acronym Mean? Introducing a New Dataset for Acronym Identification and Disambiguation
Amir Pouran Ben Veyseh, Franck Dernoncourt, Quan Hung Tran and Thien Huu Nguyen - What Meaning-Form Correlation Has to Compose With
Timothee Mickus, Timothée Bernard and Denis Paperno - When and Who? Conversation Transition Based on Bot-Agent Symbiosis Learning Network
Yipeng Yu, Ran Guan, Jie Ma, Zhuoxuan Jiang and Jingchang Huang - When Beards Start Shaving Men: A Subject-object Resolution Test Suite for Morpho-syntactic and Semantic Model Introspection
Patricia Fischer, Daniël de Kok and Erhard Hinrichs - WikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking
Afshin Rahimi, Timothy Baldwin and Karin Verspoor - Wiktionary Normalization of Translations and Morphological Information
Winston Wu and David Yarowsky - Word Embedding Binarization with Semantic Information Preservation
Samarth Navali, Praneet Sherki, Ramesh Inturi and Vanraj Vala - Words are the Window to the Soul: Language-based User Representations for Fake News Detection
Marco Del Tredici and Raquel Fernández - Would you describe a leopard as yellow? Evaluating crowd-annotations with justified and informative disagreement
Pia Sommerauer, Antske Fokkens and Piek Vossen - WSL-DS: Weakly Supervised Learning with Distant Supervision for Query Focused Multi-Document Abstractive Summarization
Md Tahmid Rahman Laskar, Enamul Hoque and Jimmy Xiangji Huang - XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection
Emily Öhman, Kaisla Kajava, Marc Pàmies and Jörg Tiedemann - XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages
Goran Glavaš, Mladen Karan and Ivan Vulić - “Suggest me a movie for tonight”: Leveraging Knowledge Graphs for Conversational Recommendation
Rajdeep Sarkar, Koustava Goswami, Mihael Arcan and John P. McCrae - “What is on your mind?” Automated Scoring of Mindreading in Childhood and Early Adolescence
Venelin Kovatchev, Philip Smith, Mark Lee, Imogen Grumley Traynor, Irene Luque Aguilera and Rory Devine
"Judge me by my size(noun), do you?" YodaLib: A Demographic-Aware Humor Generation Framework
Aparna Garimella, Carmen Banea, Nabil Hossain and Rada Mihalcea
The subjective nature of humor makes computerized humor generation a challenging task. We propose an automatic humor generation framework for filling the blanks in Mad Libs® stories, while accounting for the demographic backgrounds of the desired audience. We collect a dataset consisting of such stories, which are filled in and judged by carefully selected workers on Amazon Mechanical Turk. We build upon the BERT platform to predict location-biased word fillings in incomplete sentences, and we fine-tune BERT to classify location-specific humor in a sentence. We leverage these components to produce YodaLib, a fully-automated Mad Libs style humor generation framework, which selects and ranks appropriate candidate words and sentences in order to generate a coherent and funny story tailored to certain demographics. Our experimental results indicate that YodaLib outperforms a previous semi-automated approach proposed for this task, while also surpassing human annotators in both qualitative and quantitative analyses.
100,000 Podcasts: A Large-Scale Spoken Document Corpus
Ann Clifton, Sravana Reddy, Yongze Yu, Aasish Pappu, Rezvaneh Rezapour, Hamed Bonab, Jussi Karlgren, Ben Carterette and Rosie Jones
As an audio format, podcasts are more varied in style and production type than broadcast news, contain more genres than typically studied in video data, and are more varied in style and format than previous corpora of conversations. When transcribed with Automatic Speech Recognition (ASR) they represent a noisy but fascinating collection of text which can be studied through the lens of NLP, IR, and linguistics. Paired with the audio files, they are also a resource for speech processing and the study of paralinguistic, sociolinguistic, and acoustic aspects of the domain. We introduce a new corpus of 100,000 podcasts, and demonstrate the complexity of the domain with a case study of two tasks: (1) passage search and (2) summarization. This is orders of magnitude larger than previous speech corpora used for search and summarization. Our results show that the size and variability of this corpus opens up new avenues for research.
A BERT-based Dual Embedding Model for Chinese Idiom Prediction
Minghuan Tan and Jing Jiang
Chinese idioms are fixed phrases that have special meanings usually derived from an ancientstory. The meanings of these idioms are oftentimes not directly related to their component char-acters. In this paper, we propose a BERT-based dual embedding model for the Chinese idiomprediction task, where given a context with a missing Chinese idiom and a set of candidate id-ioms, the model needs to find the correct idiom to fill in the blank. Our method is based on theobservation that some part of an idiom’s meaning comes from a long-range context that containstopical information, and part of its meaning comes from a local context that encodes more of itssyntactic usage. We therefore propose to use BERT to process the contextual words and to matchthe embedding of each candidate idiom with both the hidden representation corresponding tothe blank in the context and the hidden representations of all the tokens in the context thoroughcontext pooling. We further propose to use two separate idiom embeddings for the two kindsof matching. Experiments on a recently released Chinese idiom cloze test dataset show that ourproposed method performs better than existing state of the art. Ablation experiments also showthat both context pooling and dual embedding contribute to the performance improvement.
A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English
Marius Mosbach, Stefania Degaetano-Ortlieb, Marie-Pauline Krielke, Badr Abdullah and Dietrich Klakow
Transformer-based language models achieve high performance on various task, but we still lack understanding of the kind of linguistic knowledge they learn and rely on. We evaluate three models (BERT, RoBERTa, and ALBERT), testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks. We focus on relative clauses (in American English) as a complex phenomenon needing contextual information and antecedent identification to be resolved. Based on a naturalistic dataset, probing shows that all three models indeed capture linguistic knowledge about grammaticality, achieving high performance. Evaluation on diagnostic cases and masked prediction tasks considering fine-grained linguistic knowledge, however, shows pronounced model-specific weaknesses especially on semantic knowledge, strongly impacting models’ performance. Our results highlight the importance of model comparison in evaluation task and building up claims of model performance and captured linguistic knowledge beyond purely probing-based evaluations.
A Co-Attentive Cross-Lingual Neural Model for Dialogue Breakdown Detection
Qian Lin, Souvik Kundu and Hwee Tou Ng
Ensuring smooth communication is essential in a chat-oriented dialogue system, so that a user can obtain meaningful responses through interactions with the system. Most prior work on dialogue research does not focus on preventing dialogue breakdown. One of the major challenges is that a dialogue system may generate an undesired utterance leading to a dialogue breakdown, which degrades the overall interaction quality. Hence, it is crucial for a machine to detect dialogue breakdowns in an ongoing conversation. In this paper, we propose a novel dialogue breakdown detection model that jointly incorporates a pretrained cross-lingual language model and a co-attention network. Our proposed model leverages effective word embeddings trained on one hundred different languages to generate contextualized representations. Co-attention aims to capture the interaction between the latest utterance and the conversation history, and thereby determines whether the latest utterance causes a dialogue breakdown. Experimental results show that our proposed model outperforms all previous approaches on all evaluation metrics in both the Japanese and English tracks in Dialogue Breakdown Detection Challenge 4 (DBDC4 at IWSDS2019).
A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI
Angus Addlesee, Yanchao Yu and Arash Eshghi
Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e.g. Google, IBM, and Microsoft). Currently the most stringent standards for such systems are set within the context of their use in, and for, Conversational AI technology. These systems are expected to operate incrementally in real-time, be responsive, stable, and robust to the pervasive yet peculiar characteristics of conversational speech such as disfluencies and overlaps. In this paper we evaluate the most popular of such systems with metrics and experiments designed with these standards in mind. We also evaluate the speaker diarization (SD) capabilities of the same systems which will be particularly important for dialogue systems designed to handle multi-party interaction. We found that Microsoft has the leading incremental ASR system which preserves disfluent materials and IBM has the leading incremental SD system in addition to the ASR that is most robust to speech overlaps. Google strikes a balance between the two but none of these systems are yet suitable to reliably handle natural spontaneous conversations in real-time.
A Contextual Alignment Enhanced Cross Graph Attention Network for Cross-lingual Entity Alignment
Zhiwen Xie, Runjie Zhu, Kunsong Zhao, Jin Liu, Guangyou Zhou and Jimmy Xiangji Huang
Cross-lingual entity alignment, which aims to match equivalent entities in KGs with different languages, has attracted considerable focus in recent years. Recently, many graph neural network (GNN) based methods are proposed for entity alignment and obtain promising results. However, existing GNN-based methods consider the two KGs independently and learn embeddings for different KGs separately, which ignore the useful pre-aligned links between two KGs. In this paper, we propose a novel Contextual Alignment Enhanced Cross Graph Attention Network (CAECGAT) for the task of cross-lingual entity alignment, which is able to jointly learn the embeddings in different KGs by propagating cross-KG information through pre-aligned seed alignments. We conduct extensive experiments on three benchmark cross-lingual entity alignment datasets. The experimental results demonstrate that our proposed method obtains remarkable performance gains compared to state-of-the-art methods.
A Corpus for Argumentative Writing Support in German
Thiemo Wambsganss, Christina Niklaus, Matthias Söllner, Siegfried Handschuh and Jan Marco Leimeister
In this paper, we present a novel annotation approach to capture claims and premises of arguments and their relations in student-written persuasive peer reviews on business models in German language. We propose an annotation scheme based on annotation guidelines that allows to model claims and premises as well as support and attack relations for capturing the structure of argumentative discourse in student-written peer reviews. We conduct an annotation study with three annotators on 50 persuasive essays to evaluate our annotation scheme. The obtained inter-rater agreement of α = 0.57 for argument components and α = 0.49 for argumentative relations indicates that the proposed annotation scheme successfully guides annotators to moderate agreement. Finally, we present our freely available corpus of 1,000 persuasive student-written peer reviews on business models and our annotation guidelines to encourage future research on the design and development of argumentative writing support systems for students.
A Dataset and Evaluation Framework for Complex Geographical Description Parsing
Egoitz Laparra and Steven Bethard
Much previous work on geoparsing has focused on identifying and resolving individual toponyms in text like Adrano, S.Maria di Licodia or Catania. However, geographical locations occur not only as individual toponyms, but also as compositions of reference geolocations joined and modified by connectives, e.g., "...between the towns of Adrano and S.Maria di Licodia, 32 kilometres northwest of Catania". Ideally, a geoparser should be able to take such text, and the geographical shapes of the toponyms referenced within it, and parse these into a geographical shape, formed by a set of coordinates, that represents the location described. But creating a dataset for this complex geoparsing task is difficult and, if done manually, would require a huge amount of effort to annotate the geographical shapes of not only the geolocation described but also the reference toponyms. We present an approach that automates most of the process by combining Wikipedia and OpenStreetMap. As a result, we have gathered a collection of 329,264 uncurated complex geolocation descriptions, from which we have manually curated 1,000 examples intended to be used as a test set. To accompany the data, we define a new geoparsing evaluation framework along with a scoring methodology and a set of baselines.
A Deep Generative Approach to Native Language Identification
Ehsan Lotfi, Ilia Markov and Walter Daelemans
Native language identification (NLI) – identifying the native language (L1) of a person based on his/her writing in the second language (L2) – is useful for a variety of purposes, including marketing, security, and educational applications. From a traditional machine learning perspective,NLI is usually framed as a multi-class classification task, where numerous designed features are combined in order to achieve the state-of-the-art results. We introduce a deep generative language modelling (LM) approach to NLI, which consists in fine-tuning a GPT-2 model separately on texts written by the authors with the same L1, and assigning a label to an unseen text based on the minimum LM loss with respect to one of these fine-tuned GPT-2 models. Our method outperforms traditional machine learning approaches and currently achieves the best results on the benchmark NLI datasets.
A Deep Generative Distance-Based Classifier for Out-of-Domain Detection with Mahalanobis Space
Hong Xu, Keqing He, Yuanmeng Yan, Sihong Liu, Zijun Liu and Weiran XU
Detecting out-of-domain (OOD) input intents is critical in the task-oriented dialog system. Different from most existing methods that rely heavily on manually labeled OOD samples, we focus on the unsupervised OOD detection scenario where there are no labeled OOD samples except for labeled in-domain data. In this paper, we propose a simple but strong generative distance-based classifier to detect OOD samples. We estimate the class-conditional distribution on feature spaces of DNNs via Gaussian discriminant analysis (GDA) to avoid over-confidence problems. And we use two distance functions, Euclidean and Mahalanobis distances, to measure the confidence score of whether a test sample belongs to OOD. Experiments on four benchmark datasets show that our method can consistently outperform the baselines.
A Deep Metric Learning Method for Biomedical Passage Retrieval
Andrés Rosso-Mateus, Fabio A. González and Manuel Montes-y-Gómez
Passage retrieval is the task of identifying text snippets that are valid answers for a natural language posed question. One way to address this problem is to look at it as a metric learning problem, where we want to induce a metric between questions and passages that assign smaller distances to more relevant passages. In this work, we present a novel method for passage retrieval that learns a metric for questions and passages based on their internal semantic interactions. The method uses a similar approach to that of triplet networks, where the training samples are com-posed of one anchor (the question) and two positive and negative samples (passages). However,and in contrast with triplet networks, the proposed method uses a novel deep architecture that better exploits the particularities of text and takes into consideration complementary relatedness measures. Besides, the paper presents a sampling strategy that selects both easy and hard negative samples which improve the accuracy of the trained model. The method is particularly well suited for domain-specific passage retrieval where it is very important to take into account different sources of information. The proposed approach was evaluated in a biomedical passage retrieval task, the BioASQ challenge, outperforming standard triplet loss substantially by 10%,and state-of-the-art performance by 26%.
A Document-Level Neural Machine Translation Model with Dynamic Caching Guided by Theme-Rheme Information
Yiqi Tong, Jiangbin Zheng, Hongkang Zhu, Yidong Chen and xiaodong shi
Research on document-level Neural Machine Translation (NMT) models has attracted increasing attention in recent years. Although the proposed works have proved that the inter-sentence information is helpful for improving the performance of the NMT models, what information should be regarded as context remains ambiguous. To solve this problem, we proposed a novel cache-based document-level NMT model which conducts dynamic caching guided by theme-rheme information. The experiments on NIST evaluation sets demonstrate that our proposed model achieves substantial improvements over the state-of-the-art baseline NMT models. As far as we know, we are the first to introduce theme-rheme theory into the field of machine translation.
A Geometry-Inspired Attack for Generating Natural Language Adversarial Examples
Zhao Meng and Roger Wattenhofer
Generating adversarial examples for natural language is hard, as natural language consists of discrete symbols, and examples are often of variable lengths. In this paper, we propose a geometry-inspired attack for generating natural language adversarial examples. Our attack generates adversarial examples by iteratively approximating the decision boundary of Deep Neural Networks (DNNs). Experiments on two datasets with two different models show that our attack fools natural language models with high success rates, while only replacing a few words. Human evaluation shows that adversarial examples generated by our attack are hard for humans to recognize. Further experiments show that adversarial training can improve model robustness against our attack.
A Graph Representation of Semi-structured Data for Web Question Answering
xingyao zhang, Linjun Shou, Jian Pei, Ming Gong, Lijie Wen and Daxin Jiang
The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA). Different from plain text passages in Web documents, Web tables and lists have inherent structures, which carry semantic correlations among various elements in tables and lists. Many existing studies treat tables and lists as flat documents with pieces of text and do not make good use of semantic information hidden in structures. In this paper, we propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. We also develop pre-training and reasoning techniques on the graph model for the QA task. Extensive experiments on several real datasets collected from a commercial engine verify the effectiveness of our approach. Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
A hierarchical approach to automatic vision to language generation: from simple sentences to complex natural language
Simion-Vlad Bogolin, Ioana Croitoru and Marius Leordeanu
Automatically describing videos in natural language is an ambitious problem, which could bridge our understanding of vision and language. We propose a hierarchical approach, by first generating video descriptions as sequences of simple sentences, followed at the next level by a more complex and fluent description in natural language. While the simple sentences describe simple actions in the form of (subject, verb, object), the second-level paragraph descriptions, indirectly using information from the first-level description, presents the visual content in a more compact, coherent and semantically rich manner. To this end, we introduce the first video dataset in the literature that is annotated with captions at two levels of linguistic complexity. We perform extensive tests that demonstrate that our hierarchical linguistic representation, from simple to complex language, allows us to train a two-stage network that is able to generate significantly more complex paragraphs than current one-stage approaches.
A High Precision Pipeline for Financial Knowledge Graph Construction
Sarah Elhammadi, Laks V.S. Lakshmanan, Raymond Ng, Michael Simpson, Baoxing Huai, Zhefeng Wang and Lanjun Wang
Motivated by applications such as question answering, fact checking, and data integration, there is significant interest in constructing knowledge graphs by extracting information from unstructured information sources, particularly text documents. Knowledge graphs have emerged as a standard for structured knowledge representation, whereby entities and their inter-relations are represented and conveniently stored as (subject,predicate,object) triples in a graph that can be used to power various downstream applications. The proliferation of financial news sources reporting on companies, markets, currencies, and stocks presents an opportunity for extracting valuable knowledge about this crucial domain. In this paper, we focus on constructing a knowledge graph automatically by information extraction from a large corpus of financial news articles. For that purpose, we develop a high precision knowledge extraction pipeline tailored for the financial domain. This pipeline combines multiple information extraction techniques with a financial dictionary that we built, all working together to produce over 342,000 compact extractions from over 288,000 financial news articles, with a precision of 78% at the top-100 extractions.The extracted triples are stored in a knowledge graph making them readily available for use in downstream applications.
A Human Evaluation of AMR-to-English Generation Systems
Emma Manning, Shira Wein and Nathan Schneider
Most current state-of-the art systems for generating English text from Abstract Meaning Representation (AMR) have been evaluated only using automated metrics, such as BLEU, which are known to be problematic for natural language generation. In this work, we present the results of a new human evaluation which collects fluency and adequacy scores, as well as categorization of error types, for several recent AMR generation systems. We discuss the relative quality of these systems and how our results compare to those of automatic metrics, finding that while the metrics are mostly successful in ranking systems overall, collecting human judgments allows for more nuanced comparisons. We also analyze common errors made by these systems.
A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents
Tuan Lai, Trung Bui, Doo Soon Kim and Quan Hung Tran
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train increasingly complex neural networks. In contrast, digital libraries store millions of scientific articles online, covering a wide range of topics. While a significant portion of these articles contain keyphrases provided by their authors, most other articles lack such kind of annotations. Therefore, to effectively utilize these large amounts of unlabeled articles, we propose a simple and efficient joint learning approach based on the idea of self-distillation. Experimental results show that our approach consistently improves the performance of baseline models for keyphrase extraction. Furthermore, our best models outperform previous methods for the task, achieving new state-of-the-art results on two public benchmarks: Inspec and SemEval-2017.
A Large-Scale Corpus of E-mail Conversations with Standard and Two-Level Dialogue Act Annotations
Motoki Taniguchi, Yoshihiro Ueda, Tomoki Taniguchi and Tomoko Ohkuma
We present a large-scale corpus of e-mail conversations with domain-agnostic and two-level dialogue act (DA) annotations towards the goal of a better understanding of asynchronous conversations. We annotate over 6,000 messages and 35,000 sentences from more than 2,000 threads. For a domain-independent and application-independent DA annotations, we choose ISO standard 24617-2 as the annotation scheme. To assess the difficulty of DA recognition on our corpus, we evaluate several models, including a pre-trained contextual representation model, as our baselines. The experimental results show that BERT outperforms other neural network models, including previous state-of-the-art models, but falls short of a human performance. We also demonstrate that DA tags of two-level granularity enable a DA recognition model to learn efficiently by using multi-task learning. An evaluation of a model trained on our corpus against other domains of asynchronous conversation reveals the domain independence of our DA annotations.
A Learning-Exploring Method to Generate Diverse Paraphrases with Multi-Objective Deep Reinforcement Learning
Mingtong Liu, Erguang Yang, Deyi Xiong, YUJIE ZHANG, Yao Meng, Changjian Hu, Jinan Xu and Yufeng Chen
Paraphrase generation (PG) is of great importance to many downstream tasks in natural language processing. Diversity is an essential nature to PG for enhancing generalization capability and robustness of downstream applications. Recently, neural sequence-to-sequence (Seq2Seq) models have shown promising results in PG. However, traditional model training for PG focuses on optimizing model prediction against single reference and employs cross-entropy loss, which objective is unable to encourage model to generate diverse paraphrases. In this work, we present a novel approach with multi-objective learning to PG. We propose a learning-exploring method to generate sentences as learning objectives from the learned data distribution, and employ reinforcement learning to combine these new learning objectives for model training. We first design a sample-based algorithm to explore diverse sentences. Then we introduce several reward functions to evaluate the sampled sentences as learning signals in terms of expressive diversity and semantic fidelity, aiming to generate diverse and high-quality paraphrases. To effectively optimize model performance satisfying different evaluating aspects, we use a GradNorm-based algorithm that automatically balances these training objectives. Experiments and analyses on Quora and Twitter datasets demonstrate that our proposed method not only gains a significant increase in diversity but also improves generation quality over several state-of-the-art baselines.
A Linguistic Perspective on Reference: Choosing a Feature Set for Generating Referring Expressions in Context
Fahime Same and Kees van Deemter
This paper reports on a structured evaluation of feature-based machine learning algorithms for selecting the form of a referring expression in discourse context. Based on this evaluation and a number of follow-up studies (e.g. using ablation), we propose a “consensus” feature set which we compare with insights in the linguistic literature.
A Locally Linear Procedure for Word Translation
Soham Dan, Hagai Taitelbaum and Jacob Goldberger
Learning a mapping between word embeddings of two languages given a dictionary is an important problem with several applications. A common mapping approach is using an orthogonal matrix. The Orthogonal Procrustes Analysis (PA) algorithm can be applied to find the optimal orthogonal matrix. This solution restricts the expressiveness of the translation model which may result in sub-optimal translations. We propose a natural extension of the PA algorithm that uses multiple orthogonal translation matrices to model the mapping and derive an algorithm to learn these multiple matrices. We achieve better performance in a bilingual word translation task and a cross lingual word similarity task compared to the single matrix baseline. We also show how multiple matrices can model multiple senses of a word.
A Mixture-of-Experts Model for Learning Multi-Facet Entity Embeddings
Rana Alshaikh, Zied Bouraoui, Shelan Jeawak and Steven Schockaert
Various methods have already been proposed for learning entity embeddings from text descriptions. Such embeddings are commonly used for inferring properties of entities, for recommendation and entity-oriented search, and for injecting background knowledge into neural architectures, among others. Entity embeddings essentially serve as a compact encoding of a similarity relation, but similarity is an inherently multi-faceted notion. By representing entities as single vectors, existing methods leave it to downstream applications to identify these different facets, and to select the most relevant ones. In this paper, we propose a model that instead learns several vectors for each entity, each of which intuitively captures a different aspect of the considered domain. We use a mixture-of-experts formulation to jointly learn these facet-specific embeddings. The individual entity embeddings are learned using a variant of the GloVe model, which has the advantage that we can easily identify which properties are modelled well in which of the learned embeddings. This is exploited by an associated gating network, which uses pre-trained word vectors to encourage the properties that are modelled by a given embedding to be semantically coherent, i.e. to encourage each of the individual embeddings to capture a meaningful facet.
A Multitask Active Learning Framework for Natural LanguageUnderstanding
Hua Zhu, Sihan Luo, Wu Ye and Xidong Zhang
Natural language understanding (NLU) aims at identifying user intent and extracting semantic slots. This requires sufficient annotating data to get considerable performance in real-world situations. Active learning has been well-studied to decrease the needed amount of the annotating data and successfully applied to NLU. However, no research has been done on investigating how the relation information between intents and slots can improve the efficiency of active learning algorithms. In this paper, we propose a multitask active learning framework for NLU. Our framework enables pool-based active learning algorithms to make use of the relation information between sub-tasks provided by a joint model, and we propose an efficient computation for the entropy of a joint model. Simulated experiments show that our framework can use the same annotating budget to perform better than frameworks without considering the relevance between intents and slots. We also prove that the efficiency of these active learning algorithms in our framework is still effective when incorporating with the Bidirectional Encoder Representations from Transformers (BERT).
A Neural Local Coherence Analysis Model for Clarity Text Scoring
Panitan Muangkammuen, Sheng Xu, Fumiyo Fukumoto, Kanda Runapongsa Saikaew and Jiyi Li
Local coherence relation between two phrases/sentences such as cause-effect and contrast gives a strong influence of whether a text is well-structured or not. This paper follows the assumption and presents a method for scoring text clarity by utilizing local coherence between adjacent sentences. We hypothesize that the contextual features of coherence relations learned by utilizing different data from the target training data are also possible to discriminate well-structured of the target text and thus help to score the text clarity. We propose a text clarity scoring method that utilizes local coherence analysis with an out-domain setting, i.e. the training data for the source and target tasks are different from each other. The method with language model pre-training BERT firstly trains the local coherence model as an auxiliary manner and then re-trains it together with clarity text scoring model. The experimental results by using the PeerRead benchmark dataset show the improvement compared with a single model, scoring text clarity model. Our source codes are available online.
A Neural Model for Aggregating Coreference Annotation in Crowdsourcing
Maolin Li, Hiroya Takamura and Sophia Ananiadou
Coreference resolution is the task of identifying all mentions in a text that refer to the same real-world entity. Collecting sufficient labelled data from expert annotators to train a high-performance coreference resolution system is time-consuming and expensive. Crowdsourcing makes it possible to obtain the required amounts of data rapidly and cost-effectively. However, crowd-sourced labels can be noisy. To ensure high-quality data, it is crucial to infer the correct labels by aggregating the noisy labels. In this paper, we split the aggregation into two subtasks, i.e, mention classification and coreference chain inference. Firstly, we predict the general class of each mention using an autoencoder, which incorporates contextual information about each mention, while at the same time taking into account the mention’s annotation complexity and annotators’ reliability at different levels. Secondly, to determine the coreference chain of each mention, we use weighted voting which takes into account the learned reliability in the first subtask. Experimental results demonstrate the effectiveness of our method in predicting the correct labels. We also illustrate our model’s interpretability through a comprehensive analysis of experimental results.
A Quantitative Analysis on the Role of Training Data for Text Classification
Aleksandra Edwards, Jose Camacho-Collados, Hélène De Ribaupierre and Alun Preece
Pre-trained language models provide the foundations for state-of-the-art performance across a wide range of natural language processing tasks, including text classification. However, most classification datasets assume a large amount labeled data, which is commonly not the case in practical settings. In particular, in this paper we compare the performance of a light-weight linear classifier based on word embeddings, i.e., fastText (Joulin et al., 2017), versus a pre-trained language model, i.e., BERT (Devlin et al., 2019), across a wide range of datasets and classification tasks. Results show that, while BERT outperforms all baselines in standard datasets with large training sets, in settings with small training datasets a simple method like fastText coupled with corpus-trained embeddings performs equally well or better than BERT.
A Representation Learning Approach to Animal Biodiversity Conservation
Meet Mukadam, Mandhara Jayaram and Yongfeng Zhang
Generating knowledge from natural language data has aided in solving many artificial intelligence problems. Vector representations of words have been the driving force behind majority of natural language processing tasks. This paper develops a novel approach for predicting the conservation status of animal species using custom generated scientific name embeddings. We use two different vector embeddings generated using representation learning on Wikipedia text and animal taxonomy data. We generate name embeddings for all species in the animal kingdom using unsupervised learning and build a model on the IUCN Red List dataset to classify species into endangered or least-concern. To our knowledge, this is the first work that makes use of learnt features instead of handcrafted features for this task and we achieve competitive results. Based on the high confidence results of our model, we also predict the conservation status of data deficient species whose conservation status is still unknown and thus steering more focus towards them for protection. These embeddings have also been made publicly available here. We believe this will greatly help in solving various downstream tasks and further advance research in the cross-domain involving Natural Language Processing and Conservation Biology.
A Retrofitting Model for Incorporating Semantic Relations into Word Embeddings
Sapan Shah, Sreedhar Reddy and Pushpak Bhattacharyya
We present a novel retrofitting model that can leverage relational knowledge available in a knowledge resource to improve word embeddings. The knowledge is captured in terms of relation inequality constraints that compare similarity of related and unrelated entities in the context of an anchor entity. These constraints are used as training data to learn a non-linear transformation function that maps original word vectors to a vector space respecting these constraints. The transformation function is learned in a similarity metric learning setting using Triplet network architecture. We applied our model to synonymy, antonymy and hypernymy relations in WordNet and observed large gains in performance over original distributional models as well as other retrofitting approaches on word similarity task and significant overall improvement on lexical entailment detection task.
A Review of dataset and labeling methods for causality extraction
Jinghang Xu, Wanli Zuo, Shining Liang and Xianglin Zuo
Causality represents the most important kind of correlation between events. Extracting causali-ty from text has become a promising hot topic in NLP. However, there is no mature research systems and datasets for public evaluation. Moreover, there is a lack of unified causal sequence label methods, which constitute the key factors that hinder the progress of causality extraction research. We survey the limitations and shortcomings of existing causality research field com-prehensively from the aspects of basic concepts, extraction methods, experimental data, and la-bel methods, so as to provide reference for future research on causality extraction. We summa-rize the existing causality datasets, explore their practicability and extensibility from multiple perspectives and create a new causal dataset ESC. Aiming at the problem of causal sequence labeling, we analyse the existing methods with a summarization of its regulation and propose a new causal label method of core word. Multiple candidate causal label sequences are put for-ward according to label controversy to explore the optimal label method through experiments, and suggestions are provided for selecting label method.
A Semantically Consistent and Syntactically Variational Encoder-Decoder Framework for Paraphrase Generation
Wenqing Chen, Jidong Tian, Liqiang Xiao, Hao He and Yaohui Jin
Paraphrase generation aims to generate semantically consistent sentences with different syntactic realizations. Most of the recent studies rely on the typical encoder-decoder framework where the generation process is determinative. However, in practice, the ability to generate multiple syntactically different paraphrases is important. Recent work proposed to cooperate variational inference on a target-related latent variable to introduce the diversity. But the latent variable may be contaminated by the semantic information of other unrelated sentences, and in turn, change the conveyed meaning of generated paraphrases. In this paper, we propose a semantically consistent and syntactically variational encoder-decoder framework, which uses adversarial learning to ensure the syntactic latent variable be semantic-free. Moreover, we adopt another discriminator to improve the word-level and sentence-level semantic consistency. So the proposed framework can generate multiple semantically consistent and syntactically different paraphrases. The experiments show that our model outperforms the baseline models on the metrics based on both n-gram matching and semantic similarity, and our model can generate multiple different paraphrases by assembling different syntactic variables.
A Sentence Cloze Dataset for Chinese Machine Reading Comprehension
Yiming Cui, Ting Liu, Ziqing Yang, Zhipeng Chen, Wentao Ma, Wanxiang Che, Shijin Wang and Guoping Hu
Owing to the continuous efforts by the Chinese NLP community, more and more Chinese machine reading comprehension datasets become available. To add diversity in this area, in this paper, we propose a new task called Sentence Cloze-style Machine Reading Comprehension (SC-MRC). The proposed task aims to fill the right candidate sentence into the passage that has several blanks. Moreover, to add more difficulties, we also made fake candidates that are similar to the correct ones, which requires the machine to judge their correctness in the context. The proposed dataset contains over 100K blanks (questions) within over 10K passages, which was originated from Chinese narrative stories. To evaluate the dataset, we implement several baseline systems based on the pre-trained models, and the results show that the state-of-the-art model still underperforms human performance by a large margin. We release the dataset and baseline system to further facilitate our community.
A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction
Yanyang Li, Yingfeng Luo, Ye Lin, Quan Du, Huizhen Wang, Tong Xiao and Jingbo Zhu
Unsupervised Bilingual Dictionary Induction methods based on the initialization and the self-learning have achieved great success in similar language pairs, e.g., English-Spanish. But they still fail and have an accuracy of 0% in many distant language pairs, e.g., English-Japanese. In this work, we show that this failure results from the gap between the actual initialization performance and the minimum initialization performance for the self-learning to succeed. We propose Iterative Dimension Reduction to bridge this gap. Our experiments show that this simple method does not hamper the performance of similar language pairs and achieves an accuracy of 13.64~55.53% between English and four distant languages, i.e., Chinese, Japanese, Vietnamese and Thai.
A Straightforward Approach to Narratologically Grounded Character Identification
Labiba Jahan, Rahul Mittal, W. Victor Yarlott and Mark Finlayson
One of the most fundamental elements of narrative is character: if we are to understand a narrative, we must be able to identify the characters of that narrative. Therefore, character identification is a critical task in narrative natural language understanding. Most prior work has lacked a narratologically grounded definition of character, instead relying on simplified or implicit definitions that do not capture essential distinctions between characters and other referents in narratives. In prior work we proposed a preliminary definition of character that was based in clear narratological principles: a character is an animate entity that is important to the plot. Here we flesh out this concept, demonstrate that it can be reliably annotated (0.78 Cohen's kappa), and provide annotations of 170 narrative texts, drawn from 3 different corpora, containing 1,347 character co-reference chains and 21,999 non-character chains that include 3,937 animate chains. Furthermore, we have shown that a supervised classifier using a simple set of easily computable features can effectively identify these characters (overall F1 of 0.94). A detailed error analysis shows that character identification is first and foremost affected by co-reference quality, and further, that the shorter a chain is the harder it is to effectively identify as a character. We release our code and data for the benefit of other researchers
A Study on Efficiency, Accuracy and Document Structure for Answer Sentence Selection
Daniele Bonadiman and Alessandro Moschitti
An essential task of most Question Answering (QA) systems is to re-rank the set of answer candidates, i.e., Answer Sentence Selection (A2S). These candidates are typically sentences either extracted from one or more documents preserving their natural order or retrieved by a search engine. Most state-of-the-art approaches to the task use huge neural models, such as BERT, or complex attentive architectures. In this paper, we argue that by exploiting the intrinsic structure of the original rank together with an effective word-relatedness encoder, we can achieve competitive results with respect to the state of the art while retaining high efficiency. Our model takes 9.5 seconds to train on the WikiQA dataset, i.e., very fast in comparison with the 18 minutes required by a standard BERT-base fine-tuning.
A Survey of Automatic Personality Detection from Texts
Sanja Stajner and Seren Yenikent
Personality profiling has long been used in psychology to predict life outcomes. Recently, automatic detection of personality traits from written messages has gained significant attention in computational linguistics and natural language processing communities, due to its applicability in various fields. In this survey, we show the trajectory of research towards automatic personality detection from purely psychology approaches, through psycholinguistics, to the recent purely natural language processing approaches on large datasets automatically extracted from social media. We point out what has been gained and what lost during that trajectory, and show what can be realistic expectations in the field.
A Survey of Unsupervised Dependency Parsing
Wenjuan Han, Yong Jiang, Hwee Tou Ng and Kewei Tu
Syntactic dependency parsing is an important task in natural language processing. Unsupervised dependency parsing aims to learn a dependency parser from sentences that have no annotation of their correct parse trees. Despite its difficulty, unsupervised parsing is an interesting research direction because of its capability of utilizing almost unlimited unannotated text data. It also serves as the basis for other research in low-resource parsing. In this paper, we survey existing approaches to unsupervised dependency parsing, identify two major classes of approaches, and discuss recent trends. We hope that our survey can provide insights for researchers and facilitate future research on this topic.
A Symmetric Local Search Network for Emotion-Cause Pair Extraction
Zifeng Cheng, Zhiwei Jiang, Yafeng Yin, Hua Yu and Qing Gu
Emotion-cause pair extraction (ECPE) is a new task which aims at extracting the potential clause pairs of emotions and corresponding causes in a document. To tackle this task, a two-step method was proposed by previous study which first extracted emotion clauses and cause clauses individually, and then paired the emotion and cause clauses, and filtered out the pairs without causality. Different from this method that separated the detection and the matching of emotion and cause in two steps, we propose a Symmetric Local Search Network (SLSN) model to preform the detection and matching simultaneously by local search. SLSN consists of two symmetric subnetworks, namely the emotion subnetwork and the cause subnetwork. Each subnetwork is composed of a clause representation learner and a local pair searcher. The local pair searcher is a specially-designed cross-subnetwork component which can extract the local emotion-cause pairs. Experimental results on the ECPE corpus demonstrate the effectiveness of SLSN which achieves a new start-of-the-art performance.
A Systematic Study of Data Augmentation for Multiclass Utterance Classification Tasks
Binxia Xu, Siyuan Qiu, Jie Zhang, Yafang Wang, Xiaoyu Shen and Gerard de Melo
Utterance classification is a key component in many conversational systems. However, classifying real-world user utterances is challenging, as people may express their ideas and thoughts in manifold ways, and the amount of training data for some categories may be fairly limited, resulting in imbalanced data distributions. To alleviate these issues, we conduct a comprehensive survey regarding data augmentation approaches for text classification, including simple random resampling, word-level transformations, and neural text generation to cope with imbalanced data. Our experiments focus on multi-class datasets with a large number of data samples, which has not been systematically studied in previous work. The results show that the effectiveness of different data augmentation schemes depends on the nature of the dataset under consideration.
A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing
Sanxing Chen, Aidan San, Xiaodong Liu and Yangfeng Ji
In Text-to-SQL semantic parsing, selecting the correct entities (tables and columns) to output is both crucial and challenging; the parser is required to connect the natural language (NL) question and the current SQL prediction with the structured world, i.e., the database. We formulate two linking processes to address this challenge: schema linking which links explicit NL mentions to the database and structural linking which links the entities in the output SQL with their structural relationships in the database schema. Intuitively, the effects of these two linking processes change based on the entity being generated, thus we propose to dynamically choose between them using a gating mechanism. Integrating the proposed method with two graph neural network based semantic parsers together with BERT representations demonstrates substantial gains in parsing accuracy on the challenging Spider dataset. Analyses show that our method helps to enhance the structure of the model output when generating complicated SQL queries and offers explainable predictions.
A Taxonomy of Empathetic Response Intents in Human Social Conversations
Anuradha Welivita and Pearl Pu
Open-domain conversational agents or chatbots are becoming increasingly popular in the natural language processing community. One of the challenges is enabling them to converse in an empathetic manner. Current neural response generation methods rely solely on end-to-end learning from large scale conversation data to generate dialogues. This approach can produce socially unacceptable responses due to the lack of large-scale quality data used to train the neural models. However, recent work has shown the promise of combining dialogue act/intent modelling and neural response generation. This hybrid method improves the response quality of chatbots and makes them more controllable and interpretable. A key element in dialog intent modelling is the development of a taxonomy. Inspired by this idea, we have manually labeled 500 response intents using a subset of a sizeable empathetic dialogue dataset (25K dialogues). Our goal is to produce a large-scale taxonomy for empathetic response intents. Furthermore, using lexical and machine learning methods, we automatically analyzed both speaker and listener utterances of the entire dataset with identified response intents and 32 emotion categories. Finally, we use information visualization methods to summarize emotional dialogue exchange patterns and their temporal progression. These results reveal novel and important empathy patterns in human-human open-domain conversations and can serve as rules for hybrid approaches.
A Thorough Analysis of Dataset Overlap on Winograd-Style Tasks
Ali Emami, Kaheer Suleman, Adam Trischler and Jackie Chi Kit Cheung
The Winograd Schema Challenge (WSC) and variants inspired by it have become important benchmarks for common-sense reasoning (CSR). Model performance on the WSC has quickly progressed from chance-level to near-human using neural language models trained on massive corpora. In this paper, we analyze the effects of varying degrees of overlaps that occur between these corpora and the test instances in WSC-style tasks. We find that a large number of test instances overlap considerably with the pretraining corpora on which state-of-the-art models are trained, and that a significant drop in classification accuracy occurs when models are evaluated on instances with minimal overlap. Based on these results, we provide the WSC-Web dataset, consisting of over 60k pronoun disambiguation problems scraped from web data, being both the largest corpus to date, and having a significantly lower proportion of overlaps with current pretraining corpora.
A Two-Level Interpretation of Modality in Human-Robot Dialogue
Lucia Donatelli, Kenneth Lai and James Pustejovsky
We analyze the use and interpretation of modal expressions in a corpus of situated human-robot dialogue and ask how to effectively represent these expressions for automatic learning. We present a two-level annotation scheme for modality that captures both content and intent, integrating a logic-based, semantic representation and a task-oriented, pragmatic representation that maps to our robot’s capabilities. Data from our annotation task reveals that the interpretation of modal expressions in human-robot dialogue is quite diverse, yet highly constrained by the physical environment and asymmetrical speaker/addressee relationship. We sketch a formal model of human-robot common ground in which modality can be grounded and dynamically interpreted.
A Two-phase Prototypical Network Model for Incremental Few-shot Relation Classification
Haopeng Ren, Yi Cai, Xiaofeng Chen, Guohua Wang and Qing Li
Relation Classification (RC) plays an important role in natural language processing (NLP). Current conventional supervised and distantly supervised RC models always make a closed-world assumption which ignores the emergence of novel relations in open environment. To incrementally recognize the novel relations, current two solutions (i.e, re-training and lifelong learning) are designed but suffer from the lack of large-scale labeled data for novel relations. Meanwhile, prototypical network enjoys better performance on both fields of deep supervised learning and few-shot learning. However, it still suffers from the incompatible feature embedding problem when the novel relations come in. Motivated by them, we propose a two-phase prototypical network with prototype attention alignment and triplet loss to dynamically recognize the novel relations with a few support instances meanwhile without catastrophic forgetting. Extensive experiments are conducted to evaluate the effectiveness of our proposed model.
A Unified Sequence Labeling Model for Emotion Cause Pair Extraction
Xinhong Chen, Qing Li and Jianping Wang
Emotion-cause pair extraction (ECPE) aims at extracting emotions and causes as pairs from documents, where each pair contains an emotion clause and a set of cause clauses. Existing approaches address the task by first extracting emotion and cause clauses via two binary classifiers separately, and then training another binary classifier to pair them up. However, the extracted emotion-cause pairs of different emotion types cannot be distinguished from each other through simple binary classifiers, which limits the applicability of the existing approaches. Moreover, such two-step approaches may suffer from possible cascading errors. In this paper, to address the first problem, we assign emotion type labels to emotion and cause clauses so that emotion-cause pairs of different emotion types can be easily distinguished. As for the second problem, we reformulate the ECPE task as a unified sequence labeling task, which can extract multiple emotion-cause pairs in an end-to-end fashion. We propose an approach composed of a convolution neural network for encoding neighboring information and two Bidirectional Long-Short Term Memory networks for two auxiliary tasks. Experiment results demonstrate the feasibility and effectiveness of our approaches.
A Unifying Theory of Transition-based and Sequence Labeling Parsing
Carlos Gómez-Rodríguez, Michalina Strzyz and David Vilares
We define a mapping from transition-based parsing algorithms that read sentences from left to right to sequence labeling encodings of syntactic trees. This not only establishes a theoretical relation between transition-based parsing and sequence-labeling parsing, but also provides a method to obtain new encodings for fast and simple sequence labeling parsing from the many existing transition-based parsers for different formalisms. Applying it to dependency parsing, we implement sequence labeling versions of four algorithms, showing that they are learnable and obtain comparable performance to existing encodings.
A Vietnamese Dataset for Evaluating Machine Reading Comprehension
Kiet Nguyen, Vu Nguyen, Anh Nguyen and Ngan Nguyen
Over 97 million inhabitants speak Vietnamese as the native language in the world. However, there are few research studies on machine reading comprehension (MRC) in Vietnamese, the task of understanding a document or text, and answering questions related to it. Due to the lack of benchmark datasets for Vietnamese, we present the Vietnamese Question Answering Dataset (ViQuAD), a new dataset for the low-resource language as Vietnamese to evaluate MRC models. This dataset comprises over 23,000 human-generated question-answer pairs based on 5,109 passages of 174 Vietnamese articles from Wikipedia. In particular, we propose a new process of dataset creation for Vietnamese MRC. Our in-depth analyses illustrate that our dataset requires abilities beyond simple reasoning like word matching and demands complicate reasoning such as single-sentence and multiple-sentence inferences. Besides, we conduct experiments on state-of-the-art MRC methods in English and Chinese as the first experimental models on ViQuAD, which will be compared to further models. We also estimate human performances on the dataset and compare it to the experimental results of several powerful machine models. As a result, the substantial differences between humans and the best model performances on the dataset indicate that improvements can be explored on ViQuAD through future research. Our dataset is freely available to encourage the research community to overcome challenges in Vietnamese MRC.
A Word-Level Uncertainty Estimation Approach for Black-Box Text Classifiers using RNNs
Jakob Smedegaard Andersen, Tom Schöner and Walid Maalej
Estimating uncertainties of Neural Network prediction paves the way towards more reliable and trustful text classifications. However, common uncertainty estimation approaches remain as a black boxes without explaining which features have led to the uncertainty of a prediction. This hinders users from understanding the cause of unreliable model behaviour. We introduce an approach to decompose and visualize the uncertainty of text classifiers at the level of words. We aim to provide detailed explanations of uncertainties and thus enable a deeper inspection and reasoning about unreliable model behaviours. Our approach builds on top of Recurrent Neural Networks and Bayesian modelling. We conduct a preliminary experiment to check the impact and correctness of our approach. By explaining and investigating the predictive uncertainties of a sentiment analysis task, we argue that our approach is able to provide a more profound understanding of artificial decision making.
AbuseAnalyzer: Abuse Detection, Severity and Target Prediction for Gab Posts
Mohit Chandra, Ashwin Pathak, Eesha Dutta, Paryul Jain, Manish Gupta, Manish Shrivastava and Ponnurangam Kumaraguru
While extensive popularity of online social media platforms has made information dissemination faster, it has also resulted in widespread online abuse of different types like hate speech, offensive language, sexist and racist opinions, etc. Detection and curtailment of such abusive content is critical for avoiding its psychological impact on victim communities, and thereby preventing hate crimes. Previous works have focused on classifying user posts into various forms of abusive behavior. But there has hardly been any focus on estimating the severity of abuse and the target. In this paper, we present a first of the kind dataset with 7,601 posts from Gab which looks at online abuse from the perspective of presence of abuse, severity and target of abusive behavior. We also propose a system to address these tasks, obtaining an accuracy of ∼80% for abuse presence, ∼82% for abuse target prediction, and ∼65% for abuse severity prediction.
Ad Lingua: Text Classification Improves Symbolism Prediction in Image Advertisements
Andrey Savchenko, Anton Alekseev, Sejeong Kwon, Elena Tutubalina, Evgeny Myasnikov and Sergey Nikolenko
Understanding image advertisements is a challenging task, often requiring non-literal interpretation. We argue that standard image-based predictions are not enough for symbolism prediction. Following the intuition that texts and images are complementary in advertising, we introduce a multimodal ensemble of state of the art image-based classifier, object detection architecture-based classifier, and fine-tuned language model applied to texts extracted from ads by OCR. The resulting system establishes a new state of the art in symbolism prediction.
Adversarial Learning on the Latent Space for Diverse Dialog Generation
Kashif Khan, Gaurav Sahu, Vikash Balasubramanian, Lili Mou and Olga Vechtomova
Generating relevant responses in a dialog is challenging, and requires not only proper modelling of context in the conversation, but also the ability to generate fluent sentences during inference. In this paper, we propose a two-step framework based on generative adversarial nets for generating conditioned responses. Our model first learns a meaningful representation of sentences, and then uses a generator to \textit{match} the query with the response distribution. Latent codes from the latter are then used to generate responses. Both quantitative and qualitative evaluations show that our model generates more fluent, relevant and diverse responses than the existing state-of-the-art methods
Affective and Contextual Embedding for Sarcasm Detection
Nastaran Babanejad, Heidar Davoudi, Aijun An and Manos Papagelis
Automatic sarcasm detection from text is an important classification task that can help identify the actual sentiment in user-generated data, such as reviews or tweets. Despite its usefulness, sarcasm detection remains a challenging task, due to a lack of any vocal intonation or facial gestures in textual data. To date, most of the approaches to addressing the problem have relied on hand-crafted affect features, or pre-trained models of non-contextual word embeddings, such as Word2vec. However, these models inherit limitations that render them inadequate for the task of sarcasm detection. In this paper, we propose two novel deep neural network models for sarcasm detection, namely ACE 1 and ACE 2. Given as input a text passage, the models predict whether it is sarcastic (or not). Our models extend the architecture of BERT by incorporating both affective and contextual features. To the best of our knowledge, this is the first attempt to directly extend BERT's architecture to build a sarcasm classifier. Extensive experiments on different datasets demonstrate that the proposed models outperform state-of-the-art models for sarcasm detection with significant margins.
Affective Text Generation
Tushar Goswamy, Ishika Singh, Ahsan Barkati and Ashutosh Modi
Human use language not just to convey information but also to express their inner feelings and mental states. In this work, we adapt the state-of-the-art language generation models to generate affective (emotional) text. We posit a model capable of generating affect-driven and topic focused sentences without losing grammatical correctness as the affect intensity increases. We propose to incorporate emotion as prior for the probabilistic state-of-the-art sentence generation models such as GPT-2. The model will give the user the flexibility to control the category and intensity of emotion as well as the subject of the generated text. Previous attempts at modelling fine-grained emotions fall out on grammatical correctness at extreme intensities, but our model is resilient to this and delivers robust results at all intensities. We conduct automated evaluations and human studies to test the performance of our model, and provide a detailed comparison of the results with other models. In all evaluations, our model outperforms existing affective text generation models.
Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution
Nikolay Arefyev, Boris Sheludko, Alexander Podolskiy and Alexander Panchenko
Lexical substitution in context is an extremely powerful technology that can be used as a backbone of various NLP applications, such as word sense induction, lexical relation extraction, data augmentation, etc. In this paper, we present a large-scale comparative study of popular neural language and masked language models (LMs and MLMs), such as context2vec, ELMo, BERT, XLNet, applied to the task of lexical substitution. We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly, and compare several target injection methods. Besides, we are first to analyze the semantics of the produced substitutes via an analysis of the types of semantic relations between the target and substitutes generated by different models providing insights into what kind of words are generated or given by annotators as substitutes.
An analysis of language models for metaphor recognition
Arthur Neidlein, Philip Wiesenbach and Katja Markert
We conduct a linguistic analysis of recent metaphor recognition systems, all of which are based on language models. We show that their overall promising performance has considerable gaps from a linguistic perspective. First, they perform substantially worse on unconventional metaphors than on conventional ones. Second, they struggle with handling rarer word types. These two findings together suggest that a large part of the systems' success is due to optimising the disambiguation of conventionalised, metaphoric word senses for specific words instead of modelling general properties of metaphors. As a positive result, the systems show increasing capabilities to recognise metaphoric readings of unseen words if synonyms or morphological variations of these words have been seen before, leading to enhanced generalisation beyond word sense disambiguation.
An Analysis of Simple Data Augmentation for Named Entity Recognition
Xiang Dai and Heike Adel
Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (MaSciP and i2b2-2010), we show that simple augmentation can boost performance for both recurrent and transformer based models, especially for small training sets.
An Anchor-Based Automatic Evaluation Metric for Document Summarization
Kexiang Wang, Tianyu Liu, Baobao Chang and Zhifang Sui
The widespread adoption of reference-based automatic evaluation metrics such as ROUGE has promoted the development of document summarization. We consider in this paper a new protocol for designing reference-based metrics which require the endorsement of source document(s). Following protocol, we propose an anchored ROUGE metric fixing each summary particle on source document, which bases the computation on more solid ground. Empirical results on benchmark datasets validate that source document helps to induce a higher correlation with human judgments for ROUGE metric. Being self-explanatory and easy-to-implement, the protocol can naturally foster various effective designs of reference-based metrics besides the anchored ROUGE introduced here.
An empirical analysis of existing systems and datasets toward general simple question answering
Namgi Han, Goran Topic, Hiroshi Noji, Hiroya Takamura and Yusuke Miyao
In this paper, we evaluate the progress of our field toward solving simple factoid questions over a knowledge base, a practically important problem in natural language interface to database. As in other natural language understanding tasks, a common practice for this task is to train and evaluate a model on a single dataset, and recent studies suggest that SimpleQuestions, the most popular and largest dataset, is nearly solved under this setting. However, this common setting does not evaluate the robustness of the systems outside of the distribution of the used training data. We rigorously evaluate such robustness of existing systems using different datasets. Our analysis, including shifting of train and test datasets and training on a union of the datasets, suggests that our progress in solving SimpleQuestions dataset does not indicate the success of more general simple question answering. We discuss a possible future direction toward this goal.
An Empirical Investigation of Low-Resource Cross-Lingual Emotion Lexicon Induction using Representation Alignment
Arun Ramachandran and Gerard de Melo
Emotion lexicons provide information about associations between words and emotions. They have proven useful in analyses of reviews, literary texts, and posts on social media, among other things. We evaluate the feasibility of deriving emotion lexicons cross-lingually, especially for low-resource languages, from existing emotion lexicons in resource-rich languages. For this, we start out from very small corpora to induce cross-lingually aligned vector spaces. Our study empirically analyses the effectiveness of the induced emotion lexicons by measuring translation precision and correlations with existing emotion lexicons, along with measurements on a downstream task of sentence emotion prediction.
An Empirical Study of Contextual Data Augmentation forJapanese Zero Anaphora Resolution
Ryuto Konno, Yuichiroh Matsubayashi, Shun Kiyono, Hiroki Ouchi, Ryo Takahashi and Kentaro Inui
One thorny issue of zero anaphora resolution (ZAR) is the scarcity of labeled data. This study explores how effectively the problem can be alleviated by data augmentation. We adopt a state-of-the-art data augmentation method, contextual data augmentation (CDA), which generates labeled training instances using a pretrained language model and has been reported to work well for several other NLP tasks including text classification and machine translation. In this paper, we address two underexplored issues on CDA, (i) how to reduce the computational cost of data augmentation and (ii) how to ensure the quality of generated data, and propose two methods to adapt CDA to ZAR, (i) [MASK]-based augmentation and (ii) linguistically-controlled masking. The results of our experiments on Japanese ZAR show that our methods contribute to both the accuracy gain and the reduction of computation cost. Our closer analysis reveals that the proposed method can also improve the quality of the augmented training data over the conventional CDA.
An Empirical Study of Rich Morphological Word Segmentation on Inuktitut in Low-resource NMT
Tan Ngoc Le and Fatiha Sadat
Indigenous languages bring significant challenges for Natural Language Processing approaches because of multiple features such as polysynthesis, morphological complexity, dialectal variations with rich morpho-phonemics, spelling with noisy data and low resource scenarios. The current research paper focuses on Inukitut, one of the Indigenous polysynthetic language spoken in Northern Canada. First, a rich word segmentation for Inuktitut is studied using a set of rich features and by leveraging (bi-)character-based and word-based pretrained embeddings from large-scale raw corpora. Second, we incorporated this pre-processing step into our first Neural Machine Translation system. Our evaluations showed promising results and performance improvements in the context of low-resource Inuktitut-English neural machine translation.
An Empirical Study of the Downstream Reliability of Pre-Trained Word Embeddings
Anthony Rios and Brandon Lwowski
While pre-trained word embeddings have been shown to improve the performance of downstream tasks, many questions remain regarding their reliability: Do the same pre-trained word embeddings result in the best performance with slight changes to the training data? Do the same pre-trained embeddings perform well with multiple neural network architectures? What is the relation between downstream fairness of different architectures and pre-trained embeddings? In this paper, we introduce two new metrics to understand the downstream reliability of word embeddings. We find that downstream reliability of word embeddings depend on multiple factors, including, the handling of out-of-vocabulary words and whether the embeddings are fine-tuned.
An Enhanced Knowledge Injection Model for Commonsense Generation
Zhihao Fan, Yeyun Gong, Zhongyu Wei, Siyuan Wang, Yameng Huang, Jian Jiao, Xuanjing Huang, Nan Duan and Ruofei Zhang
Commonsense generation aims at generating plausible everyday scenario description based on a set of provided concepts. Digging the relationship of concepts from scratch is non-trivial, therefore, we retrieve prototypes from external knowledge to assist the understanding of the scenario for better description generation. We integrate two additional modules into the pretrained encoder-decoder model for prototype modeling to enhance the knowledge injection procedure. We conduct experiment on CommonGen benchmark, experimental results show that our method significantly improves the performance on all the metrics.
An Iterative Emotion Interaction Network for Emotion Recognition in Conversations
Xin Lu, Yanyan Zhao, Yang Wu, Yijian Tian, Huipeng Chen and Bing Qin
Emotion recognition in conversations (ERC) has received much attention recently in the natural language processing community. Considering that the emotions of the utterances in conversations are interactive, previous works usually implicitly model the emotion interaction between utterances by modeling dialogue context, but the misleading emotion information from context often interferes with the emotion interaction. We noticed that the gold emotion labels of the context utterances can provide explicit and accurate emotion interaction, but it is impossible to input gold labels at inference time. To address this problem, we propose an iterative emotion interaction network, which uses iteratively predicted emotion labels instead of gold emotion labels to explicitly model the emotion interaction. This approach solves the above problem, and can effectively retain the performance advantages of explicit modeling. We conduct experiments on two datasets, and our approach achieves state-of-the-art performance.
An Unsupervised Method for Learning Representations of Multi-word Expressions
Robert Vacareanu, Rebecca Sharp, Marco A. Valenzuela-Escárcega and Mihai Surdeanu
This paper explores an unsupervised approach to learning a compositional representation function for multi-word expressions (MWEs). This composition function is based on recurrent neural networks, and is trained using the Skip-Gram objective to predict the words in the context of MWEs. Thus our approach can naturally leverage large unlabeled text sources. Further, our method can make use of provided MWEs when available, but can also function as a completely unsupervised algorithm, using MWE boundaries predicted by a single, domain-agnostic part-of-speech pattern. With pre-defined MWE boundaries, our method achieves state-of-the-art performance on the coarse-grained evaluation of the Tratz dataset, with an accuracy of 50.4\%. The unsupervised version of our method approaches the performance of the supervised one, and even outperforms it in some configurations.
Analogy Models for Neural Word Inflection
Ling Liu and Mans Hulden
Analogy is assumed to be the cognitive mechanism speakers resort to in order to inflect an unknown form of a lexeme based on knowledge of other words in a language. In this process, an analogy is formed between word forms within an inflectional paradigm but also across paradigms. As neural network models for inflection are typically trained only on lemma-target form pairs, we propose three new ways to provide neural models with additional source forms to strengthen analogy-formation, and compare our methods to other approaches in the literature. We show that the proposed methods of providing a Transformer sequence-to-sequence model with additional analogy sources in the input are consistently effective, and improve upon recent state-of-the-art results on 46 languages, particularly in low-resource settings. We also propose a method to combine the analogy-motivated approach with data hallucination or augmentation. We find that the two approaches are complementary to each other and combining the two approaches is especially helpful when the training data is extremely limited.
Analysing cross-lingual transfer in lemmatisation for Indian languages
Kumar Saurav, Kumar Saunack and Pushpak Bhattacharyya
Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form. However, most of the prior work on this topic has focused on high resource languages. In this paper, we evaluate cross-lingual approaches for low resource languages, especially in the context of morphologically rich Indian languages. We test our model on six languages from two different families and develop linguistic insights into each model's performance.
Answer-driven Deep Question Generation based on Reinforcement Learning
Liuyin Wang, Zihan Xu, Zibo Lin, Haitao Zheng and Ying Shen
Deep question generation (DQG) aims to generate complex questions through reasoning over multiple documents. The task is challenging and underexplored. Existing methods mainly focus on enhancing document representations, with little attention paid to the answer information, which may result in the generated question not matching the answer type and being answerirrelevant. In this paper, we propose an Answer-driven Deep Question Generation (ADDQG) model based on the encoder-decoder framework. The model makes better use of the target answer as a guidance to facilitate question generation. First, we propose an answer-aware initialization module with a gated connection layer which introduces both document and answer information to the decoder, thus helping to guide the choice of answer-focused question words. Then a semantic-rich fusion attention mechanism is designed to support the decoding process, which integrates the answer with the document representations to promote the proper handling of answer information during generation. Moreover, reinforcement learning is applied to integrate both syntactic and semantic metrics as the reward to enhance the training of the ADDQG. Extensive experiments on the HotpotQA dataset show that ADDQG outperforms state-of-the-art models in both automatic and human evaluations.
Answering Legal Questions by Learning Neural Attentive Text Representation
Phi Manh Kien, Nguyen Ha Thanh, Ngo Xuan Bach, Vu Tran, Nguyen Le Minh and Tu Minh Phuong
Text representation plays a vital role in retrieval-based question answering, especially in the legal domain where documents are usually long and complicated. The better the question and the legal documents are represented, the more accurate they are matched. In this paper, we focus on the task of answering legal questions at the article level. Given a legal question, the goal is to retrieve all the correct and valid legal articles, that can be used as the basic to answer the question. We present a retrieval-based model for the task by learning neural attentive text representation. Our text representation method first leverages convolutional neural networks to extract important information in a question and legal articles. Attention mechanisms are then used to represent the question and articles and select appropriate information to align them in a matching process. Experimental results on an annotated corpus consisting of 5,922 Vietnamese legal questions show that our model outperforms state-of-the-art retrieval-based methods for question answering by large margins in terms of both recall and NDCG.
Appraisal Theories for Emotion Classification in Text
Jan Hofmann, Enrica Troiano, Kai Sassenberg and Roman Klinger
Automatic emotion categorization has been predominantly formulated as text classification in which textual units are assigned to an emotion from a predefined inventory, for instance following the fundamental emotion classes proposed by Paul Ekman (fear, joy, anger, disgust, sadness, surprise) or Robert Plutchik (adding trust, anticipation). This approach ignores existing psychological theories to some degree, which provide explanations regarding the perception of events. For instance, the description that somebody discovers a snake is associated with fear, based on the appraisal as being an unpleasant and non-controllable situation. This emotion reconstruction is even possible without having access to explicit reports of a subjective feeling (for instance expressing this with the words “I am afraid.”). Automatic classification approaches therefore need to learn properties of events as latent variables (for instance that the uncertainty and the mental or physical effort associated with the encounter of a snake leads to fear). With this paper, we propose to make such interpretations of events explicit, following theories of cognitive appraisal of events, and show their potential for emotion classification when being encoded in classification models. Our results show that high quality appraisal dimension assignments in event descriptions lead to an improvement in the classification of discrete emotion categories. We make our corpus of appraisal-annotated emotion-associated event descriptions publicly available.
AprilE: Attention with Pseudo Residual Connection for Knowledge Graph Embedding
Yuzhang Liu, Peng Wang, Yingtai Li, Yizhan Shao and Zhongkai Xu
Knowledge graph embedding maps entities and relations into low-dimensional vector space. However, it is still challenging for many existing methods to model diverse relational patterns, especially symmetric and antisymmetric relations. To address this issue, we propose a novel model, AprilE, which employs triple-level self-attention and pseudo residual connection to model relational patterns. The triple-level self-attention treats head entity, relation, and tail entity as a sequence and captures the dependency within a triple. At the same time the pseudo residual connection retains primitive semantic features. Furthermore, to deal with symmetric and antisymmetric relations, two schemas of score function are designed via a position-adaptive mechanism. Experimental results on public datasets demonstrate that our model can produce expressive knowledge embedding and significantly outperforms most of the state-of-the-art works.
AraBench: Benchmarking Dialectal Arabic-English Machine Translation
Hassan Sajjad, Ahmed Abdelali, Nadir Durrani and Fahim Dalvi
Low-resource machine translation suffers from the scarcity of training data and the unavailability of standard evaluation sets. While a number of research efforts target the former, the unavailability of evaluation benchmarks remain a major hindrance in tracking the progress in low-resource machine translation. In this paper, we introduce AraBench, an evaluation suite for dialectal Arabic to English machine translation. Compared to modern standard Arabic, Arabic dialects are challenging due to their spoken nature, non-standard orthography, and a large variation in dialectness. To this end, we pool together already available Dialect-English resources and additionally build novel test sets. AraBench offers 4 coarse, 17 fine-grained and 25 city-level dialect categories, belonging to diverse genres, such as media, chat, religion, travel with varying level of dialectness. We report strong baselines using several training settings: fine-tuning, back-translation and data augmentation. The evaluation suite opens a wide range of research frontiers to push efforts in low-resource machine translation, particularly Arabic dialect translation.
Arabizi Language Models for Sentiment Analysis
Gaétan Baert, Souhir Gahbiche, Guillaume Gadek and Alexandre Pauchet
Arabizi is a written form of spoken Arabic, relying on Latin characters and digits. It is informal and does not follow any conventional rules, raising many NLP challenges. In particular, Arabizi has recently emerged as the Arabic language in online social networks, becoming of great interest for opinion mining and sentiment analysis. Unfortunately, only few Arabizi resources exist and state-of-the-art language models such as BERT do not consider Arabizi.
Are We Ready for this Disaster? Towards Location Mention Recognition from Crisis Tweets
Reem Suwaileh, Muhammad Imran, Tamer Elsayed and Hassan Sajjad
The widespread usage of Twitter during emergencies has provided a new opportunity and timely resource to crisis responders for various disaster management tasks. Geolocation information of pertinent tweets is crucial for gaining situational awareness and delivering aid. However, the majority of tweets do not come with geoinformation. In this work, we focus on the task of location mention recognition from crisis-related tweets. Specifically, we investigate the influence of different types of labeled data on the performance of a BERT-based classification model. We explore several training settings such as combing in- & and out-domain NER data from news articles, and general-purpose \& crisis-related tweets. Furthermore, we investigate the effect of geospatial proximity while training on near or far-aware events from the target event. Our extensive experimentation provides answers to several critical research questions that are useful for the research community to foster research in this important direction.
Argumentation Mining on Essays at Multi Scales
Hao Wang, Zhen Huang, Yong Dou and Yu Hong
Argumentation mining on essays is a new challenging task in natural language processing, which aims to identify the types and locations of argumentation components. Recent research mainly models the task as a sequence tagging problem and deal with all the argumentation components at word level. However, this task is not scale-independent. Some types of argumentation components which serve as core opinions on essays or paragraphs, are at essay level or paragraph level. Sequence tagging method conducts reasoning by local context words, and fails to effectively mine these components. To this end, we propose a multi-scale argumentation mining model, where we respectively mine different types of argumentation components at corresponding levels. Besides, an effective coarse-to-fine argumentation fusion mechanism is proposed to further improve the performance. We conduct a serial of experiments on the Persuasive Essay dataset (PE2.0). Experimental results indicate that our model outperforms existing models on mining all types of argumentation components.
Ask to Learn: A Study on Curiosity-driven Question Generation
Thomas Scialom and Jacopo Staiano
We propose a novel text generation task, namely Curiosity-driven Question Generation. We start from the observation that the Question Generation task has traditionally been considered as the dual problem of Question Answering, hence tackling the problem of generating a question given the text that contains its answer. Such questions can be used to evaluate machine reading comprehension. However, in real life, and especially in conversational settings, humans tend to ask questions with the goal of enriching their knowledge and/or clarifying aspects of previously gathered information.
Aspect-based Document Similarity for Research Papers
Malte Ostendorff, Terry Ruas, Till Blume, Bela Gipp and Georg Rehm
Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recommender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classification task. We evaluate our aspect-based document similarity for research papers. Paper citations indicate the aspect-based similarity, i.e., the section title in which a citation occurs acts as a label for the pair of citing and cited paper. We apply a series of Transformer models such as RoBERTa, ELECTRA, XLNet, and BERT variations and compare them to an LSTM baseline. We perform our experiments on two newly constructed datasets of 172,073 research paper pairs from the ACL Anthology and CORD-19 corpus. Our results show SciBERT as the best performing system. A qualitative examination validates our quantitative results. Our findings motivate future research of aspect-based document similarity and the development of a recommender system based on the evaluated techniques. We make our datasets, code, and trained models publicly available.
Aspect-Category based Sentiment Analysis with Hierarchical Graph Convolutional Network
Hongjie Cai, Xiangsheng Zhou, Yaofeng Tu, Jianfei Yu and Rui Xia
Most of the aspect based sentiment analysis research aims at identifying the sentiment polarities toward some explicit aspect terms while ignores implicit aspects in text. To capture both explicit and implicit aspects, we focus on aspect-category based sentiment analysis, which involves joint aspect category detection and category-oriented sentiment classification. However, currently only a few simple studies have focused on this problem. The shortcomings in the way they defined the task make their approaches difficult to effectively learn the inner-relations between categories and the inter-relations between categories and sentiments. In this work, we re-formalize the task as a category-sentiment hierarchy prediction problem, which contains a hierarchy output structure to first identify multiple aspect categories in a piece of text, and then predict the sentiment for each of the identified categories. Specifically, we propose a Hierarchical Graph Convolutional Network (Hier-GCN), where a lower-level GCN is to model the inner-relations among multiple categories, and the higher-level GCN is to capture the inter-relations between aspect categories and sentiments. Extensive evaluations demonstrate that our hierarchy output structure is superior over existing ones, and the Hier-GCN model can consistently achieve the best results on four benchmarks.
Aspectuality Across Genre: A Distributional Semantics Approach
Thomas Kober, Malihe Alikhani, Matthew Stone and Mark Steedman
The interpretation of the lexical aspect of verbs in English plays a crucial role in tasks such as recognizing textual entailment and learning discourse-level inferences. We show that two elementary dimensions of aspectual class, states vs. events, and telic vs. atelic events, can be modelled effectively with distributional semantics. We find that a verb’s local context is most indicative of its aspectual class, and we demonstrate that closed class words tend to be stronger discriminating contexts than content words. Our approach outperforms previous work on three datasets. Further, we present a new dataset of human-human conversations annotated with lexical aspects and present experiments that show the correlation of telicity with genre and discourse goals.
At Which Level Should We Extract? An Empirical Analysis on Extractive Document Summarization
Qingyu Zhou, Furu Wei and Ming Zhou
Extractive methods have been proven effective in automatic document summarization. Previous works perform this task by identifying informative contents at sentence level. However, it is unclear whether performing extraction at sentence level is the best solution. In this work, we show that unnecessity and redundancy issues exist when extracting full sentences, and extracting sub-sentential units is a promising alternative. Specifically, we propose extracting sub-sentential units based on the constituency parsing tree. A neural extractive model which leverages the sub-sentential information and extracts them is presented. Extensive experiments and analyses show that extracting sub-sentential units performs competitively comparing to full sentence extraction under the evaluation of both automatic and human evaluations. Hopefully, our work could provide some inspiration of the basic extraction units in extractive summarization for future research.
Attention is All You Sign: Sign Language Translation with Transformers
Kayo Yin and Jesse Read
Sign Language Translation (SLT) first uses a Sign Language Recognition (SLR) system to extract sign language glosses from videos. Then, a translation system generates spoken language translations from the sign language glosses. Though SLT has attracted interest recently, little study has been performed on the translation system. This paper improves the translation system by utilizing Transformers. We report a wide range of experimental results for various Transformer setups and introduce a novel end-to-end SLT system combining Spatial-Temporal Multi-Cue (STMC) and Transformer networks.
Attention Transfer Network for Aspect-level Sentiment Classification
Fei Zhao, Zhen Wu and Xinyu Dai
Aspect-level sentiment classification (ASC) aims to detect the sentiment polarity of a given opinion target in a sentence. In neural network-based methods for ASC, most works employ the attention mechanism to capture the corresponding sentiment words of the opinion target, then aggregate them as evidence to infer the sentiment of the target. However, aspect-level datasets are all relatively small-scale due to the complexity of annotation. Data scarcity causes the attention mechanism sometimes to fail to focus on the corresponding sentiment words of the target, which finally weakens the performance of neural models. To address the issue, we propose a novel Attention Transfer Network (ATN) in this paper, which can successfully exploit attention knowledge from resource-rich document-level sentiment classification datasets to improve the attention capability of the aspect-level sentiment classification task. In the ATN model, we design two different methods to transfer attention knowledge and conduct experiments on two ASC benchmark datasets. Extensive experimental results show that our methods consistently outperform state-of-the-art works. Further analysis also validates the effectiveness of ATN.
Attention Word Embedding
Shashank Sonkar, Andrew Waters and Richard Baraniuk
Word embedding models learn semantically rich vector representations of words and are widely used to initialize natural processing language (NLP) models. The popular continuous bag-of-words (CBOW) model of word2vec learns a vector embedding by masking a given word in a sentence and then using the other words as a context to predict it. A limitation of CBOW is that it equally weights the context words when making a prediction, which is inefficient, since some words have higher predictive value than others. We tackle this inefficiency by introducing the Attention Word Embedding (AWE) model, which integrates the attention mechanism into the CBOW model. We also propose AWE-S, which incorporates subword information. We demonstrate that AWE and AWE-S outperform the state-of-the-art word embedding models both on a variety of word similarity datasets and when used for initialization of NLP models.
Attentively Embracing Noise for Robust Latent Representation in BERT
Gwenaelle Cunha Sergio, Dennis Singh Moirangthem and Minho Lee
Modern digital personal assistants interact with users through voice. Therefore, they heavily rely on automatic speech recognition (ASR) in order to convert speech to text and perform further tasks. We introduce EBERT, which stands for EmbraceBERT, with the goal of extracting more robust latent representations for the task of noisy ASR text classification. Conventionally, BERT is fine-tuned for downstream classification tasks using only the [CLS] starter token, with the remaining tokens being discarded. We propose using all encoded transformer tokens and further encode them using a novel attentive embracement layer and multi-head attention layer. This approach uses the otherwise discarded tokens as a source of additional information and the multi-head attention in conjunction with the attentive embracement layer to select important features from clean data during training. This allows for the extraction of a robust latent vector resulting in improved classification performance during testing when presented with noisy inputs. We show the impact of our model on the Chatbot corpus for intent classification with ASR error. Results, in terms of F1-score and mean between 10 runs, show that our model significantly outperforms the baseline model.
Augmenting NLP models using Latent Feature Interpolations
Amit Jindal, Arijit Ghosh Chowdhury, Aniket Didolkar, Di Jin, Ramit Sawhney and Rajiv Ratn Shah
Models with a large number of parameters are prone to over-fitting and often fail to capture theunderlying input distribution. We introduceEmix, a data augmentation method that uses interpo-lations of word embeddings and hidden layer representations to construct virtual examples. Weshow thatEmixshows significant improvements over previously used interpolation based regular-izers and data augmentation techniques. We also demonstrate how our proposed method is morerobust to sparsification. We highlight the merits of our proposed methodology by performingthorough quantitative and qualitative assessments.
Author’s Sentiment Prediction
Mohaddeseh Bastan, Mahnaz Koupaee, Youngseo Son, Richard Sicoli and Niranjan Balasubramanian
Even though sentiment analysis has been well-studied on a wide range of domains, there hasn’tbeen much work on inferring author sentiment in news articles. To address this gap, we introducePerSenT, a crowd-sourced dataset that captures the sentiment of an author towards the mainentity in a news article. Our benchmarks of multiple strong baselines show that this is a difficultclassification task. BERT performs the best amongst the baselines. However, it only achievesa modest performance overall suggesting that fine-tuning document-level representations aloneisn’t adequate for this task. Making paragraph-level decisions and aggregating over the entiredocument is also ineffective. We present empirical and qualitative analyses that illustrate thespecific challenges posed by this dataset. We release this dataset with 5.3k documents and 38kparagraphs with 3.2k unique entities as a challenge in entity sentiment analysis.
Auto-Encoding Variational Bayes for Inferring Topics and Visualization
Dang Pham and Tuan Le
Visualization and topic modeling are widely used approaches for text analysis. Traditional visualization methods find low-dimensional representations of documents in the visualization space (typically 2D or 3D) that can be displayed using a scatterplot. In contrast, topic modeling aims to discover topics from text, but for visualization, one needs to perform a post-hoc embedding using dimensionality reduction methods. Recent approaches propose using a generative model to jointly find topics and visualization, allowing the semantics to be infused in the visualization space for a meaningful interpretation. A major challenge that prevents these methods from being used practically is the scalability of their inference algorithms. We present, to the best of our knowledge, the first fast Auto-Encoding Variational Bayes based inference method for jointly inferring topics and visualization. Since our method is black box, it can handle model changes efficiently with little mathematical rederivation effort. We demonstrate the efficiency and effectiveness of our method on real-world large datasets and compare it with existing baselines.
Autoencoding Improves Pre-trained Word Embeddings
Masahiro Kaneko and Danushka Bollegala
Prior works investigating the geometry of pre-trained word embeddings have shown that word embeddings to be distributed in a narrow cone and by centering and projecting using principal component vectors one can increase the accuracy of a given set of pre-trained word embeddings. However, theoretically, this post-processing step is equivalent to applying a linear autoencoder to minimize the squared L2 reconstruction error. This result contradicts prior work (Mu and Viswanath, 2018) that proposed to remove the top principal components from pre-trained embeddings. We experimentally verify our theoretical claims and show that retaining the top principal components is indeed useful for improving pre-trained word embeddings, without requiring access to additional linguistic resources or labeled data.
Automated Graph Generation at Sentence Level for Reading Comprehension Based on Conceptual Graphs
Wan-Hsuan Lin and Chun-Shien Lu
This paper proposes a novel miscellaneous-context-based method to convert a sentence into a knowledge embedding in the form of a directed graph. We adopt the idea of conceptual graphs to frame for the miscellaneous textual information into conceptual compactness. We first empirically observe that this graph representation method can (1) accommodate the slot-filling challenges in typical question answering and (2) access to the sentence-level graph structure in order to explicitly capture the neighbouring connections of reference concept nodes. Secondly, we propose a task-agnostic semantics-measured module, which cooperates with the graph representation method, in order to (3) project an edge of a sentence-level graph to the space of semantic relevance with respect to the corresponding concept nodes. As a result of question-answering experiments, the combination of the graph representation and the semantics-measured module achieves the high accuracy of answer prediction and offers human-comprehensible graphical interpretation for every well-formed sample. To our knowledge, our approach is the first towards the interpretable process of learning vocabulary representations with the experimental evidence.
Automated Prediction of Examinee Proficiency from Short-Answer Questions
Le An Ha, Victoria Yaneva, Polina Harik, Ravi Pandian, Amy Morales and Brian Clauser
This paper brings together approaches from the fields of NLP and psychometric measurement to address the problem of predicting examinee proficiency from responses to short-answer questions (SAQs). While previous approaches train on manually labeled data to predict the human-ratings assigned to SAQ responses, the approach presented here models examinee proficiency directly and does not require manually labeled data to train on. We use data from a large medical exam where experimental SAQ items are embedded alongside 106 scored multiple-choice questions (MCQs). First, the latent trait of examinee proficiency is measured using the scored MCQs and then a model is trained on the experimental SAQ responses as input, aiming to predict proficiency as its target variable. The predicted value is then used as a “score” for the SAQ response and evaluated in terms of its contribution to the precision of proficiency estimation.
Automatic Assistance for Academic Word Usage
Dariush Saberi, John Lee and Jonathan James Webster
This paper describes a writing assistance system that helps students improve their academic writing. Given an input text, the system suggests lexical substitutions that aim to incorporate more academic vocabulary. The substitution candidates are drawn from an academic word list and ranked by a masked language model. Experimental results show that lexical formality analysis can improve the quality of the suggestions, in comparison to a baseline that relies on the masked language model only.
Automatic Crime Identification from Facts: A Few Sentence-Level Crime Annotations is All You Need
Shounak Paul, Pawan Goyal and Saptarshi Ghosh
Automatic Crime Identification (ACI) is the task of identifying the relevant crimes given the facts of a situation and the statutory laws that define these crimes, and is a crucial aspect of the judicial process. Existing works focus on learning crime-side representations by modeling relationships between the crimes, but not much effort has been made in improving fact-side representations. We observe that only a small fraction of sentences in the facts actually indicates the crimes. We show that by using a very small subset (< 3%) of fact descriptions annotated with sentence-level crimes, we can achieve an improvement across a range of different ACI models, as compared to modeling just the main document-level task on a much larger dataset. Additionally, we propose a novel model that utilizes sentence-level crime labels as an auxiliary task, coupled with the main task of document-level crime identification in a multi-task learning framework. The proposed model comprehensively outperforms a large number of recent baselines for ACI. The improvement in performance is particularly noticeable for the rare crimes which are known to be especially challenging to identify.
Automatic Detection of Machine Generated Text: A Critical Survey
Ganesh Jawahar, Muhammad Abdul-Mageed and Laks Lakshmanan, V.S.
Text generative models (TGMs) excel in producing text that matches the style of human language reasonably well. Such TGMs can be misused by adversaries, e.g., by automatically generating fake product reviews and fake news that can look authentic and fool humans. Detectors that can distinguish text generated by TGM from human written text play a vital role in mitigating such misuse of TGMs. Recently, there has been a flurry of works from both natural language processing (NLP) and machine learning (ML) communities to build accurate detectors. Despite the importance of this problem, there is currently no work that surveys this fast-growing literature and introduces newcomers to important research challenges. In this work, we fill this void by providing a critical survey and review of this literature to facilitate a comprehensive understanding of this problem. We conduct an in-depth error analysis of the state-of-the-art detector, and discuss research directions to guide future work in this exciting area.
Automatic Discovery of Heterogeneous Machine Learning Pipelines: An Application to Natural Language Processing
Suilan Estevez-Velarde, Yoan Gutiérrez, Andres Montoyo and Yudivián Almeida Cruz
This paper presents AutoGOAL, a system for automatic machine learning (AutoML) that uses heterogeneous techniques. In contrast with existing AutoML approaches, our contribution can automatically build machine learning pipelines that combine techniques and algorithms from different frameworks, including shallow classifiers, natural language processing tools, and neural networks. We define the heterogeneous AutoML optimization problem as the search for the best sequence of algorithms that transforms specific input data into the desired output. This provides a novel theoretical and practical approach to AutoML. Our proposal is experimentally evaluated in diverse machine learning problems and compared with alternative approaches, showing that it is competitive with other AutoML alternatives in standard benchmarks. Furthermore, it can be applied to novel scenarios, such as several NLP tasks, where existing alternatives cannot be directly deployed. The system is freely available and includes in-built compatibility with a large number of popular machine learning frameworks, which makes our approach useful for solving practical problems with relative ease and effort.
Automatic Distractor Generation for Multiple Choice Questions in Standard Tests
Zhaopeng Qiu, Xian Wu and Wei Fan
To assess knowledge proficiency of a learner, multiple choice question is an efficient and widespread form in standard tests. However, the composition of the multiple choice question, especially the construction of distractors is quite challenging. The distractors are required to both incorrect and plausible enough to confuse the learners who did not master the knowledge. Currently, the distractors are generated by domain experts which are both expensive and time-consuming. This urges the emergence of automatic distractor generation, which can benefit various standard tests in a wide range of domains. In this paper, we propose a question and answer guided distractor generation (EDGE) framework to automate distractor generation. EDGE consists of three major modules: (1) the Reforming Question Module and the Reforming Passage Module apply gate layers to guarantee the inherent incorrectness of the generated distractors; (2) the Distractor Generator Module applies attention mechanism to control the level of plausibility. Experimental results on a large-scale public dataset demonstrate that our model significantly outperforms existing models and achieves a new state-of-the-art.
Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations
Xingyuan Zhao, Satoru Ozaki, Antonios Anastasopoulos, Graham Neubig and Lori Levin
Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers. Manual production of IGT takes time and requires linguistic expertise. We tackle the issue by creating automatic glossing models, using modern multi-source neural models that additionally leverage easy-to-collect translations. We further explore cross-lingual transfer and a simple output length control mechanism, further refining our models. Evaluated against three challenging low-resource scenarios, our approach significantly outperforms a recent, state-of-the-art baseline, particularly improving on overall accuracy as well as lemma and tag recall.
Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification
Timo Schick, Helmut Schmid and Hinrich Schütze
A recent approach for few-shot text classification is to convert textual inputs to cloze questions that contain some form of task description, process them with a pretrained language model and map the predicted words to labels. Manually defining this mapping between words and labels requires both domain expertise and an understanding of the language model's abilities. To mitigate this issue, we devise an approach that automatically finds such a mapping given small amounts of training data. For a number of tasks, the mapping found by our approach performs almost as well as hand-crafted label-to-word mappings.
AutoMeTS: The Autocomplete for Medical Text Simplification
Hoang Van, David Kauchak and Gondy Leroy
The goal of text simplification (TS) is to transform difficult text into a version that is easier to understand and more broadly accessible to a wide variety of readers. In some domains, such as healthcare, fully automated approaches cannot be used since information must be accurately preserved. Instead, semi-automated approaches can be used that assist a human writer in simplifying text faster and at a higher quality. In this paper, we examine the application of autocomplete to text simplification in the medical domain. We introduce a new parallel medical data set consisting of aligned English Wikipedia with Simple English Wikipedia sentences and examine the application of pretrained neural language models (PNLMs) on this dataset. We compare four PNLMs (BERT, RoBERTa, XLNet, and GPT-2), and show how the additional context of the sentence to be simplified can be incorporated to achieve better results (6.17% absolute improvement over the best individual model). We also introduce an ensemble model that combines the four PNLMs and outperforms the best individual model by 2.1%, resulting in an overall word prediction accuracy of 64.52%.
Autoregressive Affective Language Forecasting: A Self-Supervised Task
Matthew Matero and H. Andrew Schwartz
Human natural language is mentioned at a specific point in time while human emotions change over time. While much work has established a strong link between language use and emotional states, few have attempted to model emotional language in time. Here, we introduce the task of \textit{affective language forecasting} -- predicting future change in language based on past changes of language, a task with real-world applications such as treating mental health or forecasting trends in consumer confidence. We establish some of the fundamental autoregressive characteristics of the task (necessary history size, static versus dynamic length, varying time-step resolutions) and then build on popular sequence models for \textit{words} to instead model sequences of \textit{language-based emotion in time}. Over a novel Twitter dataset of 1,900 users and weekly + daily scores for 6 emotions and 2 additional linguistic attributes, we find a novel dual-sequence GRU model with decayed hidden states achieves best results ($r = .66$) significantly out-predicting, e.g., a moving averaging based on the past time-steps ($r = .49$). We make our anonymized dataset as well as task setup and evaluation code available for others to build on.
Autoregressive Reasoning over Chains of Facts with Transformers
Ruben Cartuyvels, Graham Spinks and Marie-Francine Moens
This paper proposes an iterative inference algorithm for multi-hop explanation regeneration, that retrieves relevant factual evidence in the form of text snippets, given a natural language question. Combining multiple sources of evidence or facts for multi-hop reasoning becomes increasingly hard when the number of sources needed to make an inference grows. Our algorithm copes with this by decomposing the selection of facts from a corpus autoregressively, conditioning the next iteration on previously selected facts. This allows us to use a pairwise learning-to-rank loss from information retrieval literature. We validate our method on datasets of the TextGraphs 2019 and 2020 Shared Tasks for explanation regeneration. Existing work on this task either evaluates facts in isolation or artificially limits the possible chains of facts, thus limiting multi-hop inference.We demonstrate that our algorithm, when used with a pretrained transformer model, outperforms the previous state-of-the-art in terms of precision, training time and inference efficiency.
Balanced Joint Adversarial Training for Robust Intent Detection and Slot Filling
Xu Cao, Deyi Xiong, Chongyang Shi, Chao Wang, Yao Meng and Changjian Hu
Joint intent detection and slot filling has recently achieved tremendous success in advancing the performance of utterance understanding. However, many joint models still suffer from the robustness problem, especially on noisy inputs or rare/unseen events. To address this issue, we propose a Joint Adversarial Training (JAT) model to improve the robustness of joint intent detection and slot filling, which consists of two parts: (1) automatically generating joint adversarial examples to attack the joint model, and (2) training the model to defend against the joint adversarial examples so as to robustify the model on small perturbations. As the generated joint adversarial examples have different impacts on the intent detection and slot filling loss, we further propose a Balanced Joint Adversarial Training (BJAT) model that applies a balance factor as a regularization term to the final loss function, which yields a stable training procedure. Extensive experiments and analyses on the lightweight models show that our proposed methods achieve significantly higher scores and substantially improve the robustness of both intent detection and slot filling. In addition, the combination of our BJAT with BERT-large achieves state-of-the-art results on two datasets.
Bayes-enhanced Lifelong Attention Networks for Sentiment Classification
Hao Wang, Shuai Wang, Sahisnu Mazumder, Bing Liu, Yan Yang and Tianrui Li
The classic deep learning paradigm learns a model from the training data of a single task and the learned model is also tested on the same task. This paper studies the problem of learning a sequence of tasks (sentiment classification tasks in our case). After each sentiment classification task learned, its knowledge is retained to help future task learning. Following this setting, we explore attention neural networks and propose a Bayes-enhanced Lifelong Attention Networks (BLAN). The key idea is to exploit the generative parameters of naive Bayes to learn attention knowledge. The learned knowledge from each task is stored in a knowledge base and later used to build lifelong attentions. The built lifelong attentions are used to enhance the attention of the networks to help new task learning. Experimental results on product reviews from Amazon.com show the effectiveness of the proposed model.
BERT-based Cohesion Analysis of Japanese Texts
Nobuhiro Ueda, Daisuke Kawahara and Sadao Kurohashi
The meaning of natural language text is supported by cohesion among various kinds of entities,including coreference relations, predicate-argument structures, and bridging anaphora relations.However, predicate-argument structures for nominal predicates and bridging anaphora relationshave not been studied well, and their analyses have been still very difficult. Recent advances inneural networks, in particular self training-based language models including BERT (Devlin etal., 2019), have significantly improved many natural language processing (NLP) tasks, makingit possible to dive into the study on analysis of cohesion in the whole text. In this study, wetackle integrated analysis of cohesion in Japanese texts. Our results significantly outperformedexisting studies in each task, especially about 10 to 20 point improvement both for zero anaphoraresolution and coreference. Furthermore, we also showed that coreference resolution is differentin nature from the other tasks and should be treated specially.
Bi-directional CognitiveThinking Network for Machine Reading Comprehension
Wei Peng, Yue Hu, Luxi Xing, Yuqiang Xie, Jing Yu, Yajing Sun and Xiangpeng Wei
We propose a novel Bi-directional Cognitive Knowledge Framework (BCKF) for reading comprehension from the perspective of complementary learning systems theory. It aims to simulate two ways of thinking in the brain to answer questions, including reverse thinking and inertial thinking. To validate the effectiveness of our framework, we design a corresponding Bi-directional Cognitive Thinking Network (BCTN) to encode the passage and generate a question (answer) given an answer (question) and decouple the bi-directional knowledge. The model has the ability to reverse reasoning questions which can assist inertial thinking to generate more accurate answers. Competitive improvement is observed in DuReader dataset, confirming our hypothesis that bi-directional knowledge helps the QA task. The novel framework shows an interesting perspective on machine reading comprehension and cognitive science.
Biased TextRank: Unsupervised Graph-Based Content Extraction
Ashkan Kazemi, Verónica Pérez-Rosas and Rada Mihalcea
We introduce Biased TextRank, a content extraction method inspired by the popular TextRank algorithm that ranks text spans according to their importance for language processing tasks and according to their relevance to an input "focus." Biased TextRank enables focused content extraction for text by modifying the random restarts in the execution of TextRank. The random restart probabilities are assigned based on the relevance of the graph nodes to the focus of the task. We present two applications of Biased TextRank: focused summarization and explanation extraction, and show that our algorithm leads to significantly improved performance on two different datasets by margins as large as 11.9 ROUGE-2 F1 scores. Much like its predecessor, Biased TextRank is unsupervised, easy to implement and orders of magnitude faster and lighter than current state-of-the-art Natural Language Processing methods for similar tasks.
Bilingual Subword Segmentation for Neural Machine Translation
Hiroyuki Deguchi, Masao Utiyama, Akihiro Tamura, Takashi Ninomiya and Eiichiro Sumita
This paper proposes a new subword segmentation method for neural machine translation, "Bilingual Subword Segmentation", which tokenizes sentences so as to minimize the difference between the number of subword units of a sentence and that of its translation. While existing subword segmentation methods tokenize a sentence without considering its translation, the proposed method tokenizes a sentence by using subword units induced from bilingual sentences, which could be more favorable to machine translation. Evaluations on the WAT ASPEC English-to-Japanese and Japanese-to-English translation tasks and the WMT14 English-to-German and German-to-English translation tasks show that our bilingual subword segmentation improves the performance of Transformer NMT (up to +0.81 BLEU).
BioMedBERT: A Pre-trained Biomedical Language Model for QA and IR
SOURADIP CHAKRABORTY, Ekaba Bisong, Shweta Bhatt, Thomas Wagner, Riley Elliott and Francesco Mosconi
The SARS-CoV-2 (COVID-19) pandemic spotlighted the importance of moving quickly with biomedical research. However, as the number of biomedical research papers continue to increase, the task of finding relevant articles to answer pressing questions has become significant. In this work, we propose a textual data mining tool that supports literature search to accelerate the work of researchers in the biomedical domain. We achieve this by building a neural-based deep contextual understanding model for Question-Answering (QA) and Information Retrieval (IR) tasks. We also leverage the new BREATHE dataset which is one of the largest available datasets of biomedical research literature, containing abstracts and full-text articles from ten different biomedical literature sources on which we pre-train our BioMedBERT model. Our work achieves state-of-the-art results on the QA fine-tuning task on BioASQ 5b, 6b and 7b datasets. In addition, we observe superior relevant results when BioMedBERT embeddings are used with Elasticsearch for the Information Retrieval task on the intelligently formulated BioASQ dataset. We believe our diverse dataset and our unique model architecture are what led us to achieve the state-of-the-art results for QA and IR tasks.
Biomedical Concept Relatedness – A large EHR-based benchmark
Claudia Schulz, Josh Levy-Kramer, Camille Van Assel, Miklos Kepes and Nils Hammerla
A promising application of AI to healthcare is the retrieval of information from electronic health records (EHRs), e.g. to aid clinicians in finding relevant information for a consultation or to recruit suitable patients for a study. This requires search capabilities far beyond simple string matching, including the retrieval of concepts (diagnoses, symptoms, medications, etc.) related to the one in question. The suitability of AI methods for such applications is tested by predicting the relatedness of concepts with known relatedness scores. However, all existing biomedical concept relatedness datasets are notoriously small and consist of hand-picked concept pairs. We open-source a novel concept relatedness benchmark overcoming these issues: it is six times larger than existing datasets and concept pairs are chosen based on co-occurrence in EHRs, ensuring their relevance for the application of interest. We present an in-depth analysis of our new dataset and compare it to existing ones, highlighting that it is not only larger but also complements existing datasets in terms of the types of concepts included. Initial experiments with state-of-the-art embedding methods show that our dataset is a challenging new benchmark for testing concept relatedness models.
Bracketing Encodings for 2-Planar Dependency Parsing
Michalina Strzyz, David Vilares and Carlos Gómez-Rodríguez
We present a bracketing-based encoding that can be used to represent any 2-planar dependency tree over a sentence of length n as a sequence of n labels, hence providing almost total coverage of crossing arcs in sequence labeling parsing. First, we show that existing bracketing encodings for parsing as labeling can only handle a very mild extension of projective trees. Second, we overcome this limitation by taking into account the well-known property of 2-planarity, which is present in the vast majority of dependency syntactic structures in treebanks, i.e., the arcs of a dependency tree can be split into two planes such that arcs in a given plane do not cross. We take advantage of this property to design a method that balances the brackets and that encodes the arcs belonging to each of those planes, allowing for almost unrestricted non-projectivity (∼99.9% coverage) in sequence labeling parsing. The experiments show that our linearizations improve over the accuracy of the original bracketing encoding in highly non-projective treebanks (on average by 0.59 LAS), while achieving a similar speed. \davidd{Also, it is especially suitable when PoS tags are not used as input parameters to the models.
Break the Gap: High-level Semantic Planning for Image Captioning
Chenxi Yuan, Yang Bai and Chun Yuan
Recent image captioning models have made much progress for exploring the multi-modal interaction, such as attention mechanisms. Though these mechanisms can boost the interaction, there are still two gaps between the visual and language domains: (1) the gap between the visual features and textual semantics, (2) the gap between the disordering of visual features and the ordering of texts. To break the gaps we propose a high-level semantic planning (HSP) mechanism that incorporates both a semantic reconstruction and an explicit order planning. We integrate the planning mechanism to the attention based caption model and propose the High-level Semantic PLanning based Attention Network (HS-PLAN). First an attention based reconstruction module is designed to reconstruct the visual features with high-level semantic information. Then we apply a pointer network to serialize the features and obtain the explicit order plan to guide the generation. Experiments conducted on MS COCO show that our model outperforms previous methods and achieves the state-of-the-art performance of 133.4% CIDEr-D score.
Breeding Gender-aware Direct Speech Translation Systems
Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri and Marco Turchi
In automatic speech translation (ST), traditional cascade approaches involving separate transcription and translation steps are giving ground to increasingly competitive and potentially more robust direct solutions. In particular, by translating speech audio data without intermediate transcription, direct ST models are able to leverage and preserve essential information present in the input (e.g. speaker's vocal traits) that is otherwise lost in the cascade framework. Although such ability proved to be useful for gender translation, direct ST is nonetheless affected by gender bias just like its cascade counterpart, as well as machine translation and numerous other natural language processing applications. In this paper, we compare different approaches to inform direct ST models about the speaker's gender and test their ability to handle gender translation from English into Italian and French. To this aim, we annotate large datasets with speakers' gender information and use them for experiments reflecting different possible real-world scenarios. Our results show that gender-aware direct ST solutions can significantly outperform strong - but gender-unaware - direct ST models. In particular, the translation of gender-marked words can increase up to 30 points in accuracy while preserving overall translation quality.
Bridging Anaphora Resolution: A Survey of the State of the Art
Hideo Kobayashi and Vincent Ng
Bridging relation identification is a task that is arguably more challenging and less studied than other relation extraction tasks. Given that significant progress has been made on relation extraction in recent years, we believe that bridging relation identification will receive increasing attention in the NLP community. Nevertheless, progress on bridging relation identification is currently hampered in part by the lack of large corpora for model training as well as the lack of standardized evaluation protocols. This paper presents a survey of the current state of research on bridging relation identification and discusses future research directions.
Bridging Text and Knowledge with Multi-Prototype Embedding for Few-Shot Relational Triple Extraction
Haiyang Yu, Ningyu Zhang, Shumin Deng, Hongbin Ye, Wei Zhang and Huajun Chen
Current supervised relational triple extraction approaches require huge amounts of labeled data and thus suffer from poor performance in few-shot settings. However, people can grasp new knowledge by learning a few instances. To this end, we take the first step to study the few-shot relational triple extraction, which has not been well understood. Unlike previous single-task few-shot problems, relational triple extraction is more challenging as the entities and relations have implicit correlations. In this paper, We propose a novel multi-prototype embedding network model to jointly extract the composition of relational triples, namely, entity pairs and corresponding relations. To be specific, we design a hybrid prototypical learning mechanism that bridges text and knowledge concerning both entities and relations. Thus, implicit correlations between entities and relations are injected. Additionally, we propose a prototype-aware regularization to learn more representative prototypes. Experimental results demonstrate that the proposed method can improve the performance of the few-shot triple extraction. The code and dataset are available in anonymous for reproducibility.
Bridging the Gap in Multilingual Semantic Role Labeling: a Language-Agnostic Approach
Simone Conia and Roberto Navigli
Recent research indicates that taking advantage of complex syntactic features leads to favorable results in Semantic Role Labeling. Nonetheless, an analysis of the latest state-of-the-art multilingual systems reveals the difficulty of bridging the wide gap in performance between high-resource (e.g., English) and low-resource (e.g., German) settings. To overcome this issue, we propose a fully language-agnostic model that does away with morphological and syntactic features to achieve robustness across languages. Our approach outperforms the state of the art in all the languages of the CoNLL-2009 benchmark dataset, especially whenever a scarce amount of training data is available. Our purpose is not to dismiss approaches that rely on syntax, rather to set a strong and consistent baseline for future syntactic novelties in Semantic Role Labeling. We release our model code and checkpoints at http://anonymized.
Building Hierarchically Disentangled Language Models for Text Generation with Named Entities
Yash Agarwal, Devansh Batra and Ganesh Bagler
Named entities pose a unique challenge to traditional methods of language modeling. While several domains are characterised with a high proportion of named entities, the occurrence of specific entities varies widely. Cooking recipes, for example, contain a lot of named entities — viz. ingredients, cooking techniques (also called processes), and utensils. However, some ingredients occur frequently within the instructions while most occur rarely. In this paper, we build upon the previous work done on language models developed for text with named entities by introducing a Hierarchically Disentangled Model. Training is divided into multiple branches with each branch producing a model with overlapping subsets of vocabulary. We found the existing datasets insufficient to accurately judge the performance of the model. Hence, we have curated 158,473 cooking recipes from several publicly available online sources. To reliably derive the entities within this corpus, we employ a combination of Named Entity Recognition (NER) as well as an unsupervised method of interpretation using dependency parsing and POS tagging, followed by a further cleaning of the dataset. This unsupervised interpretation models instructions as action graphs and is specific to the corpus of cooking recipes, unlike NER which is a general method applicable to all corpora. To delve into the utility of our language model, we apply it to tasks such as graph-to-text generation and ingredients-to-recipe generation, comparing it to previous state-of-the-art baselines. We make our dataset (including annotations and processed action graphs) available for use, considering their potential use cases for language modeling and text generation research.
Building Large-Scale English and Korean Datasets for Aspect-Level Sentiment Analysis in Automotive Domain
Dongmin Hyun, Junsu Cho and Hwanjo Yu
We release large-scale datasets of users’ comments in two languages, English and Korean, for aspect-level sentiment analysis in automotive domain. The datasets consist of 58,000+ commentaspect pairs, which are the largest compared to existing datasets. In addition, this work covers new language (i.e., Korean) along with English for aspect-level sentiment analysis. We build the datasets from automotive domain to enable users (e.g., marketers in automotive companies) to analyze the voice of customers on automobiles. We also provide baseline performances for future work by evaluating recent models on the released datasets.
Building The First English-Brazilian Portuguese Corpus for Automatic Post-Editing
Felipe de Almeida Costa, Thiago Castro Ferreira, Adriana Pagano and Wagner Meira
This paper introduces the first corpus of English and the low-resource Brazilian Portuguese language for Automatic Post-Editing. The source English texts were extracted from the WebNLG corpus and automatically translated into Portuguese using a state-of-the-art industrial neural machine translator. Post-edits were then obtained in an experiment with native speakers of Brazilian Portuguese. To assess the quality of the corpus, we performed an error analysis and computed complexity indicators measure how difficult the APE task would be. Finally, we introduce preliminary results by evaluating a Transformer encoder-decoder to automatically post-edit the machine translations of the new corpus. Data and code are available in the submission.
Catching Attention with Automatic Pull Quote Selection
Tanner Bohn and Charles Ling
To advance understanding on how to engage readers, we advocate the novel task of automatic pull quote selection. Pull quotes are a component of articles specifically designed to catch the attention of readers with spans of text selected from the article and given more salient presentation. This task differs from related tasks such as summarization and clickbait identification by several aspects. We establish a spectrum of baseline approaches to the task, ranging from handcrafted features to a neural mixture-of-experts to cross-task models. By examining the contributions of individual features and embedding dimensions from these models, we uncover unexpected properties of pull quotes to help answer the important question of what engages readers. Human evaluation also supports the uniqueness of this task and the suitability of our selection models. The benefits of exploring this problem further are clear: pull quotes increase enjoyment and readability, shape reader perceptions, and facilitate learning. Code to reproduce this work is available at https://github.com/tannerbohn/AutomaticPullQuoteSelection.
CEREC: A Corpus for Entity Resolution in Email Conversations
Parag Pravin Dakle and Dan Moldovan
We present the first large scale corpus for entity resolution in email conversations (CEREC). The corpus consists of 6001 email threads from the Enron Email Corpus containing 36,448 email messages and 38,996 entity coreference chains. The annotation is carried out as a two-step process with minimal manual effort. Experiments are carried out for evaluating different features and performance of four baselines on the created corpus. For the task of mention identification and coreference resolution, a best performance of 54.1 F1 is reported, highlighting the room for improvement. An in-depth qualitative and quantitative error analysis is presented to understand the limitations of the baselines considered.
CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters
Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Hiroshi Noji, Pierre Zweigenbaum and Jun'ichi Tsujii
Due to the compelling improvements brought by BERT, many recent representation models adopted the Transformer architecture as their main building block, consequently inheriting the wordpiece tokenization system. While this system is thought to achieve a good balance between the flexibility of characters and the efficiency of full words, using predefined wordpiece vocabularies from the general domain is not always suitable, especially when building models for specialized domains (e.g., the medical domain). Moreover, adopting a wordpiece tokenization shifts the focus from the word level to the subword level, making the models conceptually more complex and arguably less convenient in practice. For these reasons, we propose CharacterBERT, a new variant of BERT that drops the wordpiece system altogether and uses a Character-CNN module instead to represent entire words by consulting their characters. We show that this new model improves the performance of BERT on a variety of medical domain tasks while at the same time producing robust, word-level and open-vocabulary representations.
CharBERT: Character-aware Pre-trained Language Model
Wentao Ma, Yiming Cui, Chenglei Si, Ting Liu, Shijin Wang and Guoping Hu
Pre-trained language models (PLMs) have achieved great progress in various language understanding benchmarks. Most of the previous works construct the representations in the subword level by Byte-Pair Encoding (BPE) or its variations, which make the word representation incomplete and fragile. In this paper, we propose a character-aware pre-trained language model named CharBERT, improving on the previous methods (such as BERT, RoBERTa) to tackle the problem. We first construct the contextual word embedding for each token from the sequential character representations, and fuse the representations from character and subword iteratively by a heterogeneous interaction module. Then we propose a new pre-training task for unsupervised character learning. We evaluate the method on question answering, sequence labeling, and text classification tasks, both on the original dataset and adversarial misspelling test set. The experimental results show that our method can significantly improve performance and robustness.
CHIME: Cross-passage Hierarchical Memory Network for Generative Review Question Answering
Junru Lu, Gabriele Pergola, Lin Gui, Binyang Li and Yulan He
We introduce CHIME, a cross-passage hierarchical memory network for question answering (QA) via text generation. It extends XLNet introducing an auxiliary memory module consisting of two components: the context memory collecting cross-passage evidences, and the answer memory working as a buffer continually refining the generated answers. Empirically, we show the efficacy of the proposed architecture in the multi-passage generative QA, outperforming the state-of-the-art baselines with better syntactically well-formed answers and increased precision in addressing the questions of the AmazonQA review dataset. An additional qualitative analysis reveals the rationale of the underlying generative process.
Chinese Paragraph-level Discourse Parsing with Global Backward and Local Reverse Reading
Feng Jiang, Xiaomin Chu, Peifeng Li, Fang Kong and Qiaoming Zhu
Discourse structure tree construction is the fundamental task of discourse parsing and most previous work focused on English. Due to the cultural and linguistic differences, existing successful methods on English discourse parsing cannot be transformed into Chinese directly, especially in paragraph level suffering from longer discourse units and fewer explicit connectives. To alleviate the above issues, we propose two reading modes, i.e., the global backward reading and the local reverse reading, to construct Chinese paragraph level discourse trees. The former processes discourse units from the end to the beginning in a document to utilize the left-branching bias of discourse structure in Chinese, while the latter reverses the position of paragraphs in a discourse unit to enhance the differentiation of coherence between adjacent discourse units. The experimental results on Chinese MCDTB demonstrate that our model outperforms all strong baselines.
CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson and Zhenzhong Lan
With the advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE where new single models can be evaluated across a diverse set of NLU tasks, research in natural language processing has prospered. And it becomes more widely accessible to researchers in neighboring areas of machine learning and industry. The problem, however, is that most such benchmarks are limited to English, which has made it difficult to replicate many of the successes in English NLU for other languages. To help remedy this issue, we introduce the first large-scale Chinese Language Understanding Evaluation (CLUE) benchmark. CLUE, which is an open-ended, community-driven project, brings together 9 tasks spanning several well-established single-sentence/sentence-pair classification tasks, as well as machine reading comprehension, all on original Chinese text. To establish results on these tasks, we report scores using an exhaustive set of current state-of-the-art pre-trained Chinese models (9 in total). We also introduce a number of supplementary datasets and additional tools to help facilitate further progress on Chinese NLU.
CoLAKE: Contextualized Language and Knowledge Embedding
Tianxiang Sun, Yunfan Shao, Xipeng Qiu, Qipeng Guo, Yaru Hu, Xuanjing Huang and Zheng Zhang
With the emerging branch of incorporating factual knowledge into pre-trained language models such as BERT, most existing models consider shallow, static, and separately pre-trained entity embeddings, which limits the performance gains of these models. Few works explore the potential of deep contextualized knowledge representation when injecting knowledge. In this paper, we propose the Contextualized Language and Knowledge Embedding (CoLAKE), which jointly learns contextualized representation for both language and knowledge with the extended MLM objective. Instead of injecting only entity embeddings, CoLAKE extracts the knowledge context of an entity from large-scale knowledge bases. To handle the heterogeneity of knowledge context and language context, we integrate them in a unified data structure, word-knowledge graph (WK graph). CoLAKE is pre-trained on large-scale WK graphs with the modified Transformer encoder. We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks. Experimental results show that CoLAKE outperforms previous counterparts on most of the tasks. Besides, CoLAKE achieves surprisingly high performance on our synthetic task called word-knowledge graph completion, which shows the superiority of simultaneously contextualizing language and knowledge representation.
Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation
Fahimeh Saleh, Wray Buntine and Gholamreza Haffari
Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Machine Translation (NMT) models in bilingually low-resource scenarios. A standard approach is transfer learning, which involves taking a model trained on a high-resource language-pair and fine-tuning it on the data of the low-resource MT condition of interest. However, it is not clear generally which high-resource language-pair offers the best transfer learning for the target MT setting. Furthermore, different transferred models may have complementary semantic and/or syntactic strengths, hence using only one model may be sub-optimal. In this paper, we tackle this problem using knowledge distillation, where we propose to distill the knowledge of ensemble of teacher models to a single student model. As the quality of these teacher models varies, we propose an effective adaptive knowledge distillation approach to dynamically adjust the contribution of the teacher models during the distillation process. Experiments on transferring from a collection of six language pairs from IWSLT to five low-resource language-pairs from TED Talks demonstrate the effectiveness of our approach, achieving up to +0.9 BLEU score improvements compared to strong baselines.
Combining Cognitive Modeling and Reinforcement Learning for Clarification in Dialogue
Baber Khalid, Malihe Alikhani and Matthew Stone
In many domains, dialogue systems need to work collaboratively with users to successfully reconstruct the meaning the user had in mind. In this paper, we show how cognitive models of users’ communicative strategies can be leveraged in a reinforcement learning approach to dialogue planning to enable interactive systems to give targeted, effective feedback about the system’s understanding. We describe a prototype system that collaborates on reference tasks that distinguish arbitrarily varying color patches from similar distractors, and use experiments with crowd workers and analyses of our learned policies to document that our approach leads to context-sensitive clarification strategies that focus on key missing information, elicit correct answers that the system understands, and contribute to increasing dialogue success.
Combining Event Semantics and Degree Semantics for Natural Language Inference
Izumi Haruta, Koji Mineshima and Daisuke Bekki
In formal semantics, there are two well-developed semantic frameworks, event semantics, which treats verbs and adverbial modifiers using the notion of event, and degree semantics, which analyzes adjectives and comparatives using the notion of degree. However, it is not obvious whether these frameworks can be combined to handle cases where the phenomena in question interact. We study this issue by focusing on natural language inference (NLI). We implement a logic-based NLI system that combines event semantics and degree semantics as well as their interaction with lexical knowledge. We evaluate the system on various NLI datasets that contain linguistically challenging problems. The results show that it achieves high accuracies on these datasets in comparison to previous logic-based systems and deep-learning-based systems. This suggests that the two semantic frameworks can be combined consistently to handle various combinations of linguistic phenomena without compromising the advantage of each framework.
Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction
Silvia Severini, Viktor Hangya, Alexander Fraser and Hinrich Schütze
Bilingual dictionary induction (BDI) is the task of accurately translating words to the target language. It is of great importance in many low-resource scenarios where cross-lingual training data is not available. To perform BDI, bilingual word embeddings (BWEs) are often used due to their low bilingual training signal requirement. They achieve high performance but problematic cases still remain, such as the translation of rare words or named entities, which often need to be transliterated. In this paper, we enrich BWE-based BDI with transliteration information by using Bilingual Orthography Embeddings (BOEs). BOEs represent source and target language transliteration word pairs with similar vectors. A key problem in our BDI setup is to decide which information source – BWEs or semantics vs. BOEs or orthography – is more reliable for a particular word pair. We propose a novel classification-based BDI system that uses BWEs, BOEs and a number of other features to make this decision. We test our system on English-Russian BDI and show improved performance. In addition, we show the effectiveness of our BOEs by successfully using them for transliteration mining based on cosine similarity.
Common Mistakes in Financial Sentiment Analysis Practices
Frank Xing, Lorenzo Malandri, Yue Zhang and Erik Cambria
The recent dominance of machine learning-based natural language processing methods fosters a culture to overemphasize the model accuracies rather than the reasons behind their errors. However, interpretability is a critical requirement for many downstream applications, e.g., in healthcare and finance. This paper investigates the error patterns of some most popular sentiment analysis methods in the finance domain. We discover that (1) methods belonging to the same cluster are prone to similar error patterns and (2) six types of linguistic features in the finance domain cause the poor performance of financial sentiment analysis. These findings provide important clues for improving the sentiment analysis models using social media data for finance.
Commonsense Question Answering Boosted by Graph-Based Iterative Retrieval over Multiple Knowledge Sources
Qianglong Chen, Feng Ji, Haiqing Chen and Yin Zhang
To better understand natural language text and speech, it is critically required to make use of background or commonsense knowledge. However, how to efficiently leverage external knowledge in question-answering systems is still a hot research topic in both academic and industrial communities. In this paper, we propose a novel question-answering method with integrating multiple knowledge sources. More specifically, we first introduce a novel graph-based iterative knowledge acquisition module with potential relations to retrieve both concepts and entities related to the given question. After obtaining the relevant knowledge, we utilize a pre-trained language model to encode the question with its evidence and present a question-aware attention mechanism to fuse all representations by previous modules. At last, a task-specific linear classifier is used to predict the possibility. We conduct experiments on the dataset, CommonsenseQA, and the results show that our proposed method outperforms other competitive methods and archives a new state-of-the-art. Furthermore, we also conduct ablation studies to demonstrate the effectiveness of our proposed graph-based iterative knowledge acquisition module and question-aware attention module and find the key properties that are helpful to the method.
comp-syn: Perceptually Grounded Word Embeddings with Color
Bhargav Srinivasa Desikan, Tasker Hull, Ethan Nadler, Douglas Guilbeault, Aabir Abubakar Kar, Mark Chu and Donald Ruggiero Lo Sardo
Popular approaches to natural language processing create word embeddings based on textual co-occurrence patterns, but often ignore embodied, sensory aspects of language. Here, we introduce the Python package comp-syn, which provides grounded word embeddings based on the perceptually uniform color distributions of Google Image search results. We demonstrate that comp-syn significantly enriches models of distributional semantics. In particular, we show that(1) comp-syn predicts human judgments of word concreteness with greater accuracy and in a more interpretable fashion than word2vec using low-dimensional word–color embeddings ,and (2) comp-syn performs comparably to word2vec on a metaphorical vs. literal word-pair classification task. comp-syn is open-source on PyPi and is compatible with mainstream machine-learning Python packages. Our package release includes word–color embeddings forover 40,000 English words, each associated with crowd-sourced word concreteness judgments.
Comparative Probing of Lexical Semantics Theories for Cognitive Plausibility and Technological Usefulness
António Branco, João António Rodrigues, Malgorzata Salawa, Ruben Branco and Chakaveh Saedi
Lexical semantics theories differ in advocating that the meaning of words is represented as an inference graph, a feature mapping or a cooccurrence vector, thus raising the question: is it the case that one of these approaches is superior to the others in representing lexical semantics appropriately? Or in its non antagonistic counterpart: could there be a unified account of lexical semantics where these approaches seamlessly emerge as (partial) renderings of (different) aspects of a core semantic knowledge base?
Complaint Identification in Social Media with Transformer Networks
Mali Jin and Nikolaos Aletras
Complaining is a speech act extensively used by humans to communicate a negative inconsistency between reality and expectations. Previous work on automatically identifying complaints in social media has focused on using feature-based and task-specific neural network models. Adapting state-of-the-art pre-trained neural language models and their combinations with other linguistic information from topics or sentiment for complaint prediction has yet to be explored. In this paper, we evaluate a battery of neural models underpinned by transformer networks which we subsequently combine with linguistic information. Experiments on a publicly available data set of complaints demonstrate that our models outperform previous state-of-the-art methods by a large margin achieving a macro F1 up to 87.
Computational Modeling of Affixoid Behavior in Chinese Morphology
Yu-Hsiang Tseng, Shu-Kai HSIEH, Pei-Yi Chen and Sara Court
The morphological status of affixes in Chinese has long been a matter of debate. How one might apply the conventional criteria of free/bound and content/function features to distinguish word-forming affixes from bound roots in Chinese is still far from clear. Issues involving polysemy and diachronic change further blur the boundaries. In this paper, we propose three quantitative features in a computational modeling of affixoid behavior in Mandarin Chinese. The results show that except for a very few cases, there is no clear criteria that can be used to identify an affix’s status in an isolating language like Chinese. A diachronic check using contextual embeddings with the WordNet Sense Inventory also demonstrates the possible role of the polysemy of lexical roots across diachronic settings.
CoNAN: A Complementary Neighboring-based Attention Network for Referring Expression Generation
Jungjun Kim, Hanbin Ko and Jialin Wu
Daily scenes are complex in the real world due to occlusion, undesired lighting condition, etc. Although humans handle those complicated environments relatively well, they evoke challenges for machine learning systems to identify and describe the target without ambiguity. Previous studies focus on the context of the target object by comparing objects within the same category and utilizing the cycle-consistency between listener and speaker modules. However, it is still very challenging to mine the discriminative features of the target object on forming unambiguous expression. In this work, we propose a novel Complementary Neighboring-based Attention Network (CoNAN) that explicitly utilizes the visual differences between the target object and its highly-related neighbors. This highly-related neighbors are determined by an attentional ranking module, as complementary features, highlighting the discriminating aspects for the target object. The speaker module then takes the visual difference features as an additional input to generate the expression. Our qualitative and quantitative results on the dataset RefCOCO, RefCOCO+, and RefCOCOg demonstrate that our generated expressions outperform other state-of-the-art models by a clear margin.
Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations
Simone Conia and Roberto Navigli
To date, the most successful word, word sense, and concept modelling techniques use large corpora and knowledge resources to produce dense vector representations that capture semantic similarities in a relatively low-dimensional space. Most current approaches, however, suffer from a monolingual bias with their strength depending on the amount of data available across languages. In this paper, we address this issue and propose Conception, a novel technique for building language-independent, vector representations of concepts which places multilinguality at its core while retaining explicit relationships between concepts. We show that our high-coverage representations outperform current work on multilingual and cross-lingual word similarity and Word Sense Disambiguation.
Connecting the Dots Between Fact Verification and Fake News Detection
Qifei LI and Wangchunshu Zhou
Fact verification models have enjoyed a fast advancement in the last two years with the development of pre-trained language models like BERT and the release of large scale datasets such as FEVER. However, the challenging problem of fake news detection has not benefited from the improvement of fact verification models, which is closely related to fake news detection. In this paper, we propose a simple yet effective approach to connect the dots between fact verification and fake news detection. Our approach first employs a text summarization model pre-trained on news corpora to summarize the long news article into a short claim. Then we use a fact verification model pre-trained on the FEVER dataset to detect whether the input news article is real or fake. Our approach makes use of the recent success of fact verification models and enables zero-shot fake news detection, alleviating the need of large scale training data to train fake news detection models. Experimental results on FakenewsNet, a benchmark dataset for fake news detection, demonstrate the effectiveness of our proposed approach.
Constituency Lattice Encoding for Aspect Term Extraction
Yunyi Yang, Kun Li, Xiaojun Quan, Weizhou Shen and Qinliang Su
One of the remaining challenges for aspect term extraction in sentiment analysis resides in the extraction of phrase-level aspect terms, which is non-trivial to determine the boundaries of such terms. In this paper, we aim to address this issue by incorporating the span annotations of constituents of a sentence to leverage the syntactic information in neural network models. To this end, we first construct a constituency lattice structure based on the constituents of a constituency tree. Then, we present two approaches to encoding the constituency lattice using BiLSTM-CRF and BERT as the base models, respectively, whereas other models can be applied as well. We experimented on two benchmark datasets to evaluate the two models, and the results confirm their effectiveness with respective 3.17 and 1.35 points gained in F1-Measure over the current state of the art. The improvements justify the effect of the constituency lattice for aspect term extraction.
Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps
Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara and Akiko Aizawa
A multi-hop dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question. However, current datasets do not provide a complete explanation for the reasoning process from the question to the answer. Further, previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question. In this study, we present a new multi-hop dataset, called 2WikiMultiHopQA, by using Wikipedia and Wikidata. In our dataset, we introduced the evidence information containing a reasoning path for multi-hop questions. The evidence information has two benefits: (i) providing a comprehensive explanation for predictions and (ii) evaluating the reasoning skills of a model. We carefully designed a pipeline and a set of templates when generating a question--answer pair that guarantees the multi-hop steps and the quality of the questions. We also exploited the structured format in Wikidata and use logical rules to create questions that are natural but still require multi-hop reasoning. Through experiments, we demonstrated that our dataset is challenging for multi-hop models and it ensures that multi-hop reasoning is required.
Context Dependent Semantic Parsing: A Survey
Zhuang Li, Lizhen Qu and Gholamreza Haffari
Semantic parsing is the task of translating natural language utterances into machine-readable meaning representations. Currently, most semantic parsing methods are not able to utilize the contextual information (e.g. dialogue and comments history), which has a great potential to boost the semantic parsing systems. To overcome this issue, context dependent semantic parsing has recently drawn a lot of attention. In this survey, we investigate progress on the methods for the context dependent semantic parsing, together with the current datasets and tasks. We then point out open problems and challenges for future research in this area.
Context in Informational Bias Detection
Esther van den Berg and Katja Markert
Informational bias is bias through sentences or clauses that convey tangential, speculative, or background information that can sway readers’ opinions towards entities. By nature, informational bias is context-dependent, but previous work on informational bias detection has not explored the role of context beyond the sentence. In this paper we explore four kinds of context, namely direct textual context, article context, coverage context and domain context, and find that article context can help improve performance. We also perform the first error analysis of classification models on this task, and find that models are sensitive to differences in newspaper source, do well on informational bias in quotes and struggle with informational bias with positive polarity. Finally, we observe improvement by the model with article context on articles that do not prominently feature well-known entities.
Context-aware Lexical Coherence Modeling
Sungho Jeon and Michael Strube
Previous models of lexical coherence capture coherence patterns on the graph, but they disregard the context in which words occur. We propose a lexical coherence model, which takes contextual information into account. Our model first captures the central point of a text, called a semantic centroid vector, computed as the mean of sentence vector representations. Then, the model encodes the patterns of semantic changes between the semantic centroid vector and sentence representations.
Context-Aware Text Normalisation for Historical Dialects
Maria Sukhareva
Context-aware historical text normalisation is a severely under-researched area. To fill the gapwe propose a context-aware normalisation approach that relies on the state-of-the-art methods inneural machine translation and transfer learning. We propose a multidialect normaliser with acontext-aware reranking of the candidates. The reranker relies on a word-level n-gram languagemodel that is applied to the five best normalisation candidates. The results are evaluated onthe historical multidialect datasets of German, Spanish, Portuguese and Slovene. We show thatincorporating dialectal information into the training leads to the accuracy improvement on all thedatasets. The context-aware reranking gives further improvement over the baseline. For threeout of six datasets, we reach a significantly higher accuracy than reported in the previous studies.The other three results are comparable with the current state-of-the-art. The code for the rerankerwill be published as open-source
Contextual Argument Component Classification for Class Discussions
Luca Lugini and Diane Litman
Argument mining systems often consider contextual information, i.e. information outside of an argumentative discourse unit, when trained to accomplish tasks such as argument component identification, classification, and relation extraction. However, prior work has not carefully analyzed the utility of different contextual properties in context-aware models. In this work, we show how two different types of contextual information, local discourse context and speaker context, can be incorporated into a computational model for classifying argument components in multi-party classroom discussions. We find that both context types can improve performance, although the improvements are dependent on context size and position.
Contextualized Embeddings for Enriching Linguistic Analyses on Politeness
Ahmad Aljanaideh, Eric Fosler-Lussier and Marie-Catherine de Marneffe
Linguistic analyses have often been performed around the static notion of words where the context (surrounding words) is not considered. For example, previous analyses on politeness have focused on comparing the use of static words such as personal pronouns across (im)polite requests without taking the context of those words into account. Current word embeddings in NLP do capture context and thus can be leveraged to enrich linguistic analyses. In this work, we introduce a model which leverages the pre-trained BERT model to cluster contextualized representations of a word based on (1) the context in which the word appears and (2) the labels of items the word occurs in. Using politeness as case study, this model is able to automatically discover interpretable, fine-grained context patterns of words, some of which align with existing theories on politeness. Our model further discovers novel finer-grained patterns associated with (im)polite language. For example, the word “please” can occur in impolite contexts that are predictable from BERT clustering. The approach proposed here is validated by showing that features based on fine-grained patterns inferred from the clustering improve over politeness-word baselines.
Continual Lifelong Learning in Natural Learning Processing: A Survey
Magdalena Biesialska, Katarzyna Biesialska and Marta R. Costa-jussà
Continual learning (CL) aims to enable information systems to learn from a continuous data stream across time. However, it is difficult for existing deep learning architectures to learn a new task without largely forgetting previously acquired knowledge. Furthermore, CL is particularly challenging for language learning, as natural language is ambiguous: it is discrete, compositional, and its meaning is context-dependent. In this work, we look at the problem of CL through the lens of various NLP tasks. Our survey discusses major challenges in CL and current methods applied in neural network models. We also provide a critical review of the existing CL evaluation methods and datasets in NLP. Finally, we present our outlook on the future research directions.
ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation
Dario Stojanovski, Benno Krojer, Denis Peskov and Alexander Fraser
Recent high scores on pronoun translation using context-aware neural machine translation have suggested that current approaches work well. ContraPro is a notable example of a contrastive challenge set for English→German pronoun translation. The high scores achieved by current approaches may suggest that they are able to effectively model the complicated set of inferences required to carry out pronoun translations. This entails the ability to determine which entities could be referred to, identify which entity a source- language pronoun refers to (if any), and access the target-language grammatical gender for that entity. We first show through a series of targeted adversarial attacks that in fact current approaches are not able to model all of this information well. Inserting small amounts of distracting information is enough to strongly reduce scores, which should not be the case. We then create a new template test set ContraCAT, designed to individually assess the ability to handle the specific steps necessary for successful pronoun translation. Our analyses show that current approaches to context-aware NMT rely on a set of surface heuristics, which break down when translations require real reasoning. We also try an approach for augmenting the training data, with some improvements.
Contrastive Zero-Shot Learning for Cross-Domain Slot Filling with Adversarial Attack
Keqing He, Jinchao Zhang, Yuanmeng Yan, Weiran XU, Cheng Niu and Jie Zhou
Zero-shot slot filling has widely arisen to cope with data scarcity in target domains. However, previous approaches often ignore constraints between slot value representation and related slot description representation in the latent space and lack enough model robustness. In this paper, we propose a Contrastive Zero-Shot Learning with Adversarial Attack (CZSL-Adv) method for the cross-domain slot filling. The contrastive loss aims to map slot value contextual representations to the corresponding slot description representations. And we introduce an adversarial attack training strategy to improve model robustness. Experimental results show that our model significantly outperforms state-of-the-art baselines under both zero-shot and few-shot settings.
Controllable Abstractive Sentence Summarization with Guiding Entities
Changmeng Zheng, Yi Cai, Guanjie Zhang and Qing Li
Entities are the major proportion and build up the topic of text summaries. Although existing text summarization models can produce promising results of automatic metrics, for example, ROUGE, it is difficult to guarantee that an entity is contained in generated summaries. In this paper, we propose an controllable abstractive sentence summarization model which generates summaries with guiding entities. Instead of generating summaries from left to right, we start with a selected entity, generate the left part first, then the right part of a complete summary. Compared to previous entity-based text summarization models, our method can ensure that entities appear in final output summaries rather than generating the complete sentence with implicit entity and article representations. Our model can also generate more novel entities with them incorporated into outputs directly. To evaluate the informativeness of the proposed model, we develop a fine-grained informativeness metrics in the relevance, extraness and omission perspectives. We conduct experiments in two widely-used sentence summarization datasets and experimental results show that our model outperforms the state-of-the-art methods in both automatic evaluation scores and informativeness metrics.
Conversational Machine Comprehension: a Literature Review
Somil Gupta, Bhanu Pratap Singh Rawat and hong yu
Conversational Machine Comprehension (CMC) is a research track in conversational AI which expects the machine to understand an open-domain text and thereafter engage in a multi-turn conversation to answer questions related to the text. While most of the research in Machine Reading Comprehension (MRC) revolves around single-turn question answering (QA), multi-turn CMC has recently gained prominence, thanks to the advancement in natural language understanding via neural language models such as BERT and the introduction of large-scale conversational datasets such as CoQA and QuAC. The rise in interest has, however, led to a flurry of concurrent publications, each with a different yet structurally similar modeling approach and an inconsistent view of the surrounding literature. With the volume of model submissions to conversational datasets increasing every year, there exists a need to consolidate the scattered knowledge in this domain to streamline future research. This literature review attempts at providing a holistic overview of CMC with an emphasis on the common trends across recently published models, specifically in their approach to tackling conversational history. The review synthesizes a generic framework for CMC models while highlighting the differences in recent approaches and intends to serve as a compendium of CMC for future researchers.
Coordination Boundary Identification without Labeled Data for Compound Terms Disambiguation
Yuya Sawada, Takashi Wada, Takayoshi Shibahara, Hiroki Teranishi, Shuhei Kondo, Hiroyuki Shindo, Taro Watanabe and Yuji Matsumoto
We propose a simple method for nominal coordination boundary identification. The main strength of our method is that it can identify the coordination boundaries without training on labeled data, and is applicable under the conditions where annotations of coordination structure are not readily available. Our system employs pre-trained word embeddings to measure the similarities of words, and detects the span of coordination, assuming that conjuncts share syntactic and semantic similarities. We demonstrate that our method yields good results in identifying coordinated noun phrases under the GENIA corpus, and is comparable to a recent supervised method for the case in which the coordinator conjoins simple noun phrases.
Coreference information guides human expectations during natural reading
Evan Jaffe, Cory Shain and William Schuler
Models of human sentence processing effort tend to focus on costs associated with retrieving structures and discourse referents from memory (memory-based) and/or on costs associated with anticipating upcoming words and structures based on contextual cues (expectation-based) (Levy,2008). Although evidence suggests that expectation and memory may play separable roles in language comprehension (Levy et al., 2013), theories of coreference processing have largely focused on memory: how comprehenders identify likely referents of linguistic expressions. In this study, we hypothesize that coreference tracking also informs human expectations about upcoming words, and we test this hypothesis by evaluating the degree to which incremental surprisal measures generated by a novel coreference-aware semantic parser explain human response times in a naturalistic self-paced reading experiment. Results indicate (1) that coreference information indeed guides human expectations and (2) that coreference effects on memory retrieval exist independently of coreference effects on expectations. Together, these findings suggest that the language processing system exploits coreference information both to retrieve referents from memory and to anticipate upcoming material.
Corpus-Based Identification of Verbs participating in Verb Alternations using Classification and Manual Annotation
Esther Seyffarth and Laura Kallmeyer
English verb alternations allow participating verbs to appear in a set of syntactically different constructions whose associated semantic frames are systematically related. We use ENCOW and VerbNet data to train classifiers to predict the instrument subject alternation and the causative-inchoative alternation, relying on count-based and vector-based features as well as perplexity-based language model features, which are intended to reflect each alternation's felicity by simulating it. Beyond the prediction task, we use the classifier results as a source for a manual annotation step in order to identify new, unseen instances of each alternation. This is possible because existing alternation datasets contain positive, but no negative instances and are not comprehensive. Over several sequences of classification-annotation steps, we iteratively extend our sets of alternating verbs. Our hybrid approach to the identification of new alternating verbs reduces the required annotation effort by only presenting annotators with the highest-scoring candidates from the previous classification. Due to the success of semi-supervised and unsupervised features, our approach can easily be transferred to further alternations.
CosMo: Conditional Seq2Seq-based Mixture Model for Zero-Shot Commonsense Question Answering
Farhad Moghimifar, Lizhen Qu, Yue Zhuo, Mahsa Baktashmotlagh and Gholamreza Haffari
Commonsense reasoning refers to the ability of evaluating a social situation and acting accordingly. Identification of the implicit causes and effects of a social context is the driving capability which can enable machines to perform commonsense reasoning. The dynamic world of social interactions requires context-dependent on-demand systems to infer such underlying information. However, current approaches in this realm lack the ability to perform commonsense reasoning upon facing an unseen situation, mostly due to incapability of identifying a diverse range of implicit social relations. Hence they fail to estimate the correct reasoning path. In this paper, we present Conditional Seq2Seq-based Mixture model (CosMo), which provides us with the capabilities of dynamic and diverse content generation. We use CosMo to generate context-dependent clauses, which form a dynamic Knowledge Graph (KG) on-the-fly for commonsense reasoning. To show the adaptability of our model to context-dependant knowledge generation, we address the task of zero-shot commonsense question answering. The empirical results indicate an improvement of up to +5.2\% over the state-of-the-art models.
Creation of Corpus and analysis in Code-Mixed Kannada-English Twitter data for Emotion Prediction
Abhinav Reddy Appidi, Vamshi Krishna Srirangam, Darsi Suhas and Manish Shrivastava
Emotion prediction is a critical task in the field of Natural Language Processing (NLP). There has been a significant amount of work done in emotion prediction for resource-rich languages. There has been work done on code-mixed social media corpus but not on emotion prediction of Kannada-English code-mixed Twitter data. In this paper, we analyze the problem of emotion prediction on corpus obtained from code-mixed Kannada-English extracted from Twitter annotated with their respective ‘Emotion’ for each tweet. We experimented with machine learning prediction models using features like Character N-Grams, Word N-Grams, Repetitive characters, and others on SVM and LSTM on our corpus, which resulted in an accuracy of 30% and 32% respectively.
Cross-lingual Annotation Projection in Legal Texts
Andrea Galassi, Kasper Drazewski, Marco Lippi and Paolo Torroni
We study annotation projection in text classification problems where source documents are published in multiple languages and may not be an exact translation of one another. In particular, we focus on the detection of unfair clauses in privacy policies and terms of service. We present the first English-German parallel asymmetric corpus for the task at hand. We study and compare several language-agnostic sentence-level projection methods. Our results indicate that a combination of word embeddings and dynamic time warping performs best.
Cross-Lingual Document Retrieval with Smooth Learning
Jiapeng Liu, Xiao Zhang, Dan Goldwasser and Xiao Wang
Cross-lingual document search is an information retrieval task in which the queries' language and the documents' language are different. In this paper, we study the instability of neural document search models and propose a novel end-to-end robust framework that achieves improved performance in cross-lingual search with different documents' languages. This framework includes a novel measure of the relevance, smooth cosine similarity, between queries and documents, and a novel loss function, Smooth Ordinal Search Loss, as the objective function. We further provide theoretical guarantee on the generalization error bound for the proposed framework. We conduct experiments to compare our approach with other document search models, and observe significant gains under commonly used ranking metrics on the cross-lingual document retrieval task in a variety of languages.
Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation
Junhao Liu, Linjun Shou, Jian Pei, Ming Gong, Min Yang and Daxin Jiang
Cross-lingual Machine Reading Comprehension (CLMRC) remains a challenging problem due to the lack of large-scale annotated datasets in low-source languages, such as Arabic, Hindi, and Vietnamese. Many previous approaches use translation data by translating from a rich-source language, such as English, to low-source languages as auxiliary supervision. However, how to effectively leverage translation data and reduce the impact of noise introduced by translation remains onerous. In this paper, we tackle this challenge and enhance the cross-lingual transferring performance by a novel augmentation approach named Language Branch Machine Reading Comprehension (LBMRC). A language branch is a group of passages in one single language paired with questions in all target languages. We train multiple machine reading comprehension (MRC) models proficient in individual language based on translation data. Then, we devise a knowledge distillation approach to transfer knowledge from multiple language branch models to a single model for all target languages, to save the cost of training, inference, and maintenance for multiple models. Our extensive experiments on two CLMRC benchmarks clearly show the effectiveness of our proposed method.
Cross-lingual Transfer Learning for Grammatical Error Correction
Ikumi Yamashita, Satoru Katsumata, Masahiro Kaneko, Aizhan Imankulova and Mamoru Komachi
In this study, we explore cross-lingual transfer learning in grammatical error correction (GEC) tasks. Many languages suffer from lack of resources to train GEC models. Cross-lingual transfer learning from high-resource languages (the source models) is effective for training models of low-resource languages (the target models) in various tasks. However, in GEC tasks, the possibility of transferring grammatical knowledge (e.g., grammatical functions) across languages is not evident. Therefore, we investigate cross-lingual transfer learning methods for GEC. Our results show that transfer learning from other languages improves the accuracy of GEC. We also demonstrate that proximity to source languages has a significant impact on the accuracy of correcting certain error types.
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale
Ozan Caglayan, Pranava Madhyastha and Lucia Specia
Automatic evaluation of language generation systems is a well-studied problem in NLP. While new metrics are proposed every year, a few popular metrics are preferred to evaluate tasks such as image captioning and machine translation, despite their known limitations. This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them. In this paper, we urge the community for more careful consideration of how they automatically evaluate their models, by demonstrating important failure cases on multiple datasets, language pairs and tasks. Our experiments show that metrics (i) usually prefer system outputs to human-authored texts, (ii) can be insensitive to correct translations of rare words, (iii) can yield surprisingly high scores when given a single sentence as system output for the entire test set.
CxGBERT: BERT meets Construction Grammar
Harish Tayyar Madabushi, Laurence Romain, Dagmar Divjak and Petar Milin
While lexico-semantic elements no doubt capture a large amount of linguistic information, it has been argued that they do not capture all information contained in text. This assumption is central to constructionist approaches to language which argue that language consists of \textit{constructions}, learned pairings of a form and a function or meaning that cannot be predicted from its component parts. BERT’s training objectives give it access to a tremendous amount of lexico-semantic information, and while BERTology has shown that BERT captures certain important linguistic dimensions, there have been no studies exploring the extent to which BERT might have access to constructional information. In this work we design several probes and conduct extensive experiments to answer this question. Our results allow us to conclude that BERT does indeed have access to a significant amount of information, much of which linguists typically call constructional information. The impact of this observation is potentially far-reaching as it provides insights into what deep learning methods learn from text, while also showing that information contained in constructions is redundantly encoded in lexico-semantics.
Cycle-Consistent Adversarial Autoencoders for Unsupervised Text Style Transfer
Yufang Huang, Wentao Zhu, Deyi Xiong, Yiye Zhang, Changjian Hu and Feiyu Xu
Unsupervised text style transfer is full of challenges due to the lack of parallel data and difficulties in content preservation. In this paper, we propose a novel neural approach to unsupervised text style transfer which we refer to as Cycle-consistent Adversarial autoEncoders (CAE) trained from non-parallel data. CAE consists of three essential components: (1) LSTM autoencoders that encode a text in one style into its latent representation and decode an encoded representation into its original text or a transferred representation into a style-transferred text, (2) adversarial style transfer networks that use an adversarially trained generator to transform a latent representation in one style into a representation in another style, and (3) a cycle-consistent constraint that enhances the capacity of the adversarial style transfer networks in content preservation. The entire CAE with these three components can be trained end-to-end. Extensive experiments and in-depth analyses on two widely-used public datasets consistently validate the effectiveness of proposed CAE in both style transfer and content preservation against several strong baselines in terms of four automatic evaluation metrics and human evaluation.
DaN+: Danish Nested Named Entities and Lexical Normalization
Barbara Plank, Kristian Nørgaard Jensen and Rob van der Goot
This paper introduces DaN+, a multi-domain resource for nested named entities (NEs) and lexical normalization for Danish, a less-resourced language. We empirically assess three strategies to model the two-layer NE annotations, cross-lingual cross-domain transfer from German versus in-language annotation, language-specific versus multilingual BERT, and the effect of lexical normalization on Danish NE. Our results show that the most robust strategy is multi-task learning which is rivaled by multi-label decoding, transfer is successful also for zero-shot, and in-language BERT and lexical normalization works the best on the least canonical data. However, our results also show that out-of-domain remains challenging, while performance on news plateaus quickly. This highlights the importance of cross-domain evaluation of cross-lingual transfer.
Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages
Mathieu Dehouck and Carlos Gómez-Rodríguez
The lack of annotated data is a big issue for building reliable NLP systems for most of the world’s languages. But this problem can be alleviated by automatic data generation. In this paper, we present a new data augmentation method for artificially creating new dependency-annotated sentences. The main idea is to swap subtrees between annotated sentences while enforcing strong constraints on those trees to ensure maximal grammaticality of the new sentences. We also propose a method to perform low-resource experiments using resource-rich languages by mimicking low-resource languages by sampling sentences under a low-resource distribution. In a series of experiments, we show that our newly proposed data augmentation method outperforms previous proposals using the same basic inputs.
Data Selection for Bilingual Lexicon Induction from Specialized Comparable Corpora
Martin Laville, Amir Hazem, Emmanuel Morin and Phillippe Langlais
Narrow specialized comparable corpora are often small in size. This particularity makes it difficult to build efficient models to acquire translation equivalents, especially for less frequent and rare words. One way to overcome this issue is to enrich the specialized corpora with out-of-domain resources. Although some recent studies have shown improvements using data augmentation, the enrichment method was roughly conducted by adding out-of-domain data with no particular attention given to which words to enrich and how to do it optimally. In this paper, we contrast several data selection techniques to improve bilingual lexicon induction from specialized comparable corpora. We first apply two well-established techniques often used in machine translation' data selection that is: Tf-Idf and Cross-Entropy and then, we propose to exploit BERT for data selection. Overall, all the proposed techniques improve bilingual lexicon extraction with a large margin. The best performing model was the Cross-Entropy obtaining a gain of about 4 points in MAP while decreasing computation time by a factor 10.
Debunking Rumors on Twitter with Tree Transformer
Jing Ma and Wei Gao
Rumors are manufactured with no respect for accuracy, but can circulate quickly and widely by ``word-of-post'' through social media conversations. Conversation tree encodes important information indicative of the credibility of rumor. Existing conversation-based techniques for rumor detection either just strictly follow tree edges or treat all the posts fully-connected during feature learning. In this paper, we propose a novel detection model based on tree transformer to better utilize user interactions in the dialogue where post-level self-attention plays the key role for aggregating the intra-/inter-subtree stances. Experimental results on the TWITTER and PHEME datasets show that the proposed approach consistently improves rumor detection performance.
Decolonising Speech and Language Technology
Steven Bird
After generations of exploitation, Indigenous people often respond negatively to the idea that their languages are mere data ready for the taking. Too often, speech language technologies work by treating Indigenous knowledge as a commodity, disenfranchising local knowledge authorities and reenacting the causes of language endangerment. Linguists and technologists have heard the calls for decolonisation, and we need to understand what this means in our practice concerning Indigenous languages. In this paper I call attention to colonising discourse in speech and language technology, and suggest different ways of working with Indigenous communities that build on local strengths, opening a discussion of a postcolonial approach to computational methods for supporting language vitality.
Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems
Vitou Phy, Yang Zhao and Akiko Aizawa
Many automatic evaluation metrics have been proposed to score the overall quality of a response in open-domain dialogue. Generally, an overall quality is comprised of various aspects, such as relevancy, specificity, and empathy. Moreover, the importance of each aspect differs according to the task. For instance, a specific response is more important than an empathetic response in food-ordering dialogue task. However, existing metrics are not designed to cope with such flexibility. For example, BLEU score fundamentally relies only on word overlapping, whereas BERTScore relies on semantic similarity between reference and candidate response. Thus, they are not guaranteed to capture the required aspects, i.e., specificity. To design a metric that is flexible to a task, we first propose to make these qualities more manageable by grouping them into three groups: understandability, sensibleness, and likability where likability is a combination of qualities that are essential for a task. We also propose a simple method to composite each group’s metric to obtain a single metric called USL-H, which stands for Understandability, Sensibleness, and Likability in Hierarchy. We demonstrated that USL-H score achieves good correlations with human judgement and maintains its flexibility towards different aspects and metrics.
Deep Inside-outside Recursive Autoencoder with All-span Objective
Ruyue Hong, Jiong Cai and Kewei Tu
Deep inside-outside recursive autoencoder (DIORA) is a neural-based model designed for unsupervised constituency parsing. During its forward computation, it provides phrase and contextual representations for all spans in the input sentence. By utilizing the contextual representation of each leaf-level span, the span of length 1, to reconstruct the word inside the span, the model is trained without labeled data. In this work, we extend the training objective of DIORA by making use of all spans instead of only leaf-level spans. We test our new training objective on datasets of two languages: English and Japanese, and empirically show that our method achieves improvement in parsing accuracy over the original DIORA.
Deep Learning Framework for Measuring the Digital Strategy of Companies from Earnings Calls
Ahmed Ghanim Al-Ali, Robert Phaal and Donald Sull
Companies today are racing to leverage the latest digital technologies, such as artificial intelligence, blockchain, and cloud computing. However, many companies report that their strategies did not achieve the anticipated business results. This study is the first to apply state-of-the-art NLP models on unstructured data to understand the different clusters of digital strategy that companies are Adopting. We achieve this by analyzing earnings calls from Fortune 500 companies between 2015 and 2019. We use Transformer-based architecture for text classification which show a better understanding of the conversation context. We then investigate digital strategy patterns by applying clustering analysis. Our findings suggest that Fortune 500 companies use four distinct strategies which are product-led, experience-led, service-led, and efficiency-led. This work provides an empirical baseline for companies and researchers to enhance our understanding of the field.
Definition Frames: Using Definitions for Hybrid Concept Representations
Evangelia Spiliopoulou, Artidoro Pagnoni and Eduard Hovy
Advances in word representations have shown tremendous improvements in downstream NLP tasks, but lack semantic interpretability. In this paper, we introduce Definition Frames (DF), a matrix distributed representation extracted from definitions, where each dimension is semantically interpretable. DF dimensions correspond to the Qualia structure relations: a set of relations that uniquely define a term. Our results show that DFs have competitive performance with other distributional semantic approaches on word similarity tasks.
Detect All Abuse! Toward Universal Abusive Language Detection Models
Kunze Wang, Dong Lu, Caren Han, SIQU LONG and Josiah Poon
Online abusive language detection (ALD) has become a societal issue of increasing importance in recent years. Several previous works in online ALD focused on solving a single abusive language problem in a single domain, like Twitter, and have not been successfully transferable to the general ALD task or domain. In this paper, we introduce a new generic ALD framework, MACAS, which is capable of addressing several types of ALD tasks across different domains. Our generic framework covers multi-aspect abusive language embeddings that represent the target and content aspects of abusive language and applies a textual graph embedding that analyses the user's linguistic behaviour. Then, we propose and use the cross-attention gate flow mechanism to embrace multiple aspects of abusive language. Quantitative and qualitative evaluation results show that our ALD algorithm rivals or exceeds the six state-of-the-art ALD algorithms across seven ALD datasets covering multiple aspects of abusive language and different online community domains.
Detecting de minimis Code-Switching in Historical German Books
Shijia Liu and David Smith
Code-switching has long interested linguists, with computational work in particular focusing on speech and social media data (Sitaram et al., 2019). This paper contrasts these informal instances of code-switching to its appearance in more formal registers, by examining the mixture of languages in the Deutsches Textarchiv (DTA), a corpus of 1406 primarily German books from the 17th to 19th centuries. We manually annotate spans of six embedded languages (Latin, French, English, Italian, Spanish, and Greek) in the corpus. We quantitatively analyze the differences between code-switching patterns in these books and those in more typically studied speech and social media corpora. Furthermore, we address the practical task of predicting code-switching from features of the matrix language alone in the DTA corpus. Such classifiers can help reduce errors when optical character recognition or speech transcription is applied to a large corpus with rare embedded languages.
Detecting Non-literal Translations by Fine-tuning Cross-lingual Pre-trained Language Models
Yuming Zhai, Gabriel ILLOUZ and Anne Vilnat
Human-generated non-literal translations reflect the richness of human languages and are sometimes indispensable to ensure adequacy and fluency. Non-literal translations are difficult to produce even for human translators, especially for foreign language learners, and machine translations are still on the way to simulate human ones on this aspect. In order to foster the study on appropriate and creative non-literal translations, automatically detecting them in parallel corpora is an important step, which can benefit downstream NLP tasks or help to construct materials to train human translators. This article demonstrates that generic sentence representations produced by a pre-trained cross-lingual language model could be fine-tuned to solve this task. We show that there exists a moderate positive correlation between the prediction probability of being human translation and the non-literal translations’ proportion in a sentence. The fine-tuning experiments show an accuracy of 80.16% when predicting the presence of non-literal translations in a sentence and an accuracy of 85.20% when distinguishing literal and non-literal translations at phrase level. We further conduct a linguistic error analysis and propose directions for future work. The dataset and code will be available.
Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for Low Resource Languages
Efsun Sarioglu Kayi, Linyong Nan, Bohan Qu, Mona Diab and Kathleen McKeown
We release an urgency dataset that consists of English tweets relating to natural crises, along with annotations of their corresponding urgency status. Additionally, we release evaluation datasets for two low-resource languages, i.e. Sinhala and Odia, and demonstrate an effective zero-shot transfer from English to these two languages by training cross-lingual classifiers. We adopt cross-lingual embeddings constructed using different methods to extract features of the tweets, including a few state-of-the-art contextual embeddings such as BERT, RoBERTa and XLM-R. We train classifiers of different architectures on the extracted features. We also explore semi-supervised approaches by utilizing unlabeled tweets and experiment with ensembling different classifiers. With very limited amounts of labeled data in English and zero data in the low resource languages, we show a successful framework of training monolingual and cross-lingual classifiers using deep learning methods which are known to be data hungry. Specifically, we show that the recent deep contextual embeddings are also helpful when dealing with very small-scale datasets. Classifiers that incorporate RoBERTa yield the best performance for English urgency detection task, with F1 scores that are more than 25 points over our baseline classifier. For the zero-shot transfer to low resource languages, classifiers that use LASER features perform the best for Sinhala transfer while XLM-R features benefit the Odia transfer the most.
DisenE: Disentangling Knowledge Graph Embeddings
Xiaoyu Kou, Yankai Lin, Yuntao Li, Jiahao Xu, Peng Li, Jie Zhou and Yan Zhang
Knowledge graph embedding (KGE), aiming to embed entities and relations into low-dimensional vectors, has attracted wide attention recently. However, the existing research is mainly based on the black-box neural models, which makes it difficult to interpret the learned representation. In this paper, we introduce DisenE, an end-to-end framework to learn disentangled knowledge graph embeddings. Specially, we introduce an attention-based mechanism that enables the model to explicitly focus on relevant components of entity embeddings according to a given relation. Furthermore, we introduce two novel regularizers to encourage each component of the entity representation to independently reflect an isolated semantic aspect. Experimental results demonstrate that our proposed DisenE investigates a perspective to address the interpretability of KGE and is proved to be an effective way to improve the performance of link prediction tasks.
Distill and Replay for Continual Language Learning
Jingyuan Sun, Shaonan Wang, Jiajun Zhang and Chengqing Zong
Accumulating knowledge to tackle new tasks without necessarily forgetting the old ones is a hallmark of human-like intelligence. But the current dominant paradigm of machine learning is still to train a model that works well on static datasets. When learning tasks in a stream where data distribution may fluctuate, fitting on new tasks often leads to forgetting on the previous ones. We propose a simple yet effective framework that continually learns natural language understanding tasks with one model. Our framework distills knowledge and replays experience from previous tasks when fitting on a new task, thus named DnR (distill and replay). The framework is based on language models and can be smoothly built with different language model architectures. Experimental results demonstrate that DnR outperfoms previous state-of-the-art models in continually learning tasks of the same type but from different domains, as well as tasks of different types. With the distillation method, we further show that it’s possible for DnR to incrementally compress the model size while still outperforming most of the baselines. We hope that DnR could promote the empirical application of continual language learning, and contribute to building human-level language intelligence minimally bothered by catastrophic forgetting.
Distinguishing Between Foreground and Background Events in News
Mohammed Aldawsari, Adrian Perez, Deya Banisakher and Mark Finlayson
Determining whether an event in a news article is a foreground or background event would be useful in many natural language processing tasks, for example, temporal relation extraction, summarization, or storyline generation. We introduce the task of distinguishing between foreground and background events in news articles as well as identifying the general temporal position of background events relative to the foreground period (past, present, future, and their combinations). We achieve good performance (0.72 F1 for background vs. foreground and temporal position, and 0.78 F1 for background vs. foreground only) on a dataset of news articles by leveraging discourse information in a featurized model. We release our implementation and annotated data for other researchers.
Diverse and Non-redundant Answer Set Extraction on Community QA based on DPPs
Shogo Fujita, Tomohide Shibata and Manabu Okumura
In community-based question answering (CQA) platforms, the problem is that it takes time to get useful information from many answers to a question. One of the solutions, ranking methods, had the problem of showing similar answers at the top, or the importance of the answers could only be determined by their similarity to the question or the best answer. Therefore, we propose a new task of selecting a diverse and non-redundant answer set, rather than ranking the answers. We build a dataset for the task and propose a solution using Determinantal Point Processes (DPPs), probabilistic models that give higher probability mass to more high-quality and diverse subsets, and BERT, which has achieved high accuracy in many tasks recently. The proposed methods outperformed several baseline methods in the task.
Diverse dialogue generation with context dependent dynamic loss function
Ayaka Ueyama and Yoshinobu Kano
Dialogue systems using deep learning have achieved generation of fluent response sentences to user utterances. Nevertheless, they tend to produce responses that are not diverse and which are less context-dependent. To address these shortcomings, we propose a new loss function, an Inverse N-gram loss (INF), which incorporates contextual fluency and diversity at the same time by a simple formula. Our INF loss can adjust its loss dynamically by a weight using the inverse frequency of the tokens' n-gram applied to Softmax Cross Entropy loss, so that rare tokens appear more likely while retaining the fluency of the generated sentences. We trained Transformer using English and Japanese Twitter replies as single-turn dialogues using different loss functions. Our INF loss model outperformed the baselines of SCE loss and ITF loss models in automatic evaluations such as DIST-N and ROUGE, and also achieved higher scores on our human evaluations of coherence and richness.
Diverse Keyphrase Generation with Neural Unlikelihood Training
Hareesh Bahuleyan and Layla El Asri
In this paper, we study sequence-to-sequence (S2S) keyphrase generation models from the perspective of diversity. Recent advances in neural natural language generation have made possible remarkable progress on the task of keyphrase generation, demonstrated through improvements on quality metrics such as F1-score. However, the importance of diversity in keyphrase generation has been largely ignored. We first analyze the extent of information redundancy present in the outputs generated by a baseline model trained using maximum likelihood estimation (MLE). Our findings show that repetition of keyphrases is a major issue with MLE training. To alleviate this issue, we adopt neural unlikelihood (UL) objective for training the S2S model. Our version of UL training operates at (1) the target token level to discourage the generation of repeating tokens; (2) the copy token level to avoid copying repetitive tokens from the source text. Further, to encourage better model planning during the decoding process, we incorporate K-step ahead token prediction objective that computes both MLE and UL losses on future tokens as well. Through extensive experiments on datasets from three different domains we demonstrate that the proposed approach attains considerably large diversity gains, while maintaining competitive output quality.
Do Neural Language Models Overcome Reporting Bias?
Vered Shwartz and Yejin Choi
Mining commonsense knowledge from corpora suffers from reporting bias, over-representing the rare at the expense of the trivial (Gordon and Van Durme, 2013). We study to what extent pre-trained language models overcome this issue. We find that while they better estimate the probability of frequent actions, outcomes, and properties, they also tend to overestimate that of the very rare, amplifying the bias that already exists in their training corpus.
Do Word Embeddings Capture Spelling Variation?
Dong Nguyen and Jack Grieve
Analyses of word embeddings have primarily focused on semantic and syntactic properties. However, word embeddings have the potential to encode other properties as well. In this paper, we propose a new perspective on the analysis of word embeddings by focusing on spelling variation. In social media, spelling variation is abundant and often socially meaningful. Here, we analyze word embeddings trained on Twitter and Reddit data. We present three analyses using pairs of word forms covering seven types of spelling variation in English. Taken together, our results show that word embeddings encode spelling variation patterns of various types to some extent, even embeddings trained using the skipgram model which does not take spelling into account. Our results also suggest a link between the intentionality of the variation and the distance of the non-conventional spellings to their conventional spellings.
DocBank: A Benchmark Dataset for Document Layout Analysis
Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li and Ming Zhou
Document layout analysis usually relies on computer vision models to understand documents while ignoring textual information that is vital to capture. Meanwhile, high quality labeled datasets with both visual and textual information are still insufficient. In this paper, we present DocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis. DocBank is constructed using a simple yet effective way with weak supervision from the \LaTeX{} documents available on the arXiv.com. With DocBank, models from different modalities can be compared fairly and multi-modal approaches will be further investigated and boost the performance of document layout analysis. We build several strong baselines and manually split train/dev/test sets for evaluation. Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents. The DocBank dataset will be publicly available at our GitHub repository soon.
Document-level Relation Extraction with Dual-tier Heterogeneous Graph
Zhenyu Zhang, Bowen Yu, Xiaobo Shu, Tingwen Liu, Hengzhu Tang, Wang Yubin and Li Guo
Document-level relation extraction (RE) poses new challenges over traditional sentence-level RE since it requires an adequate comprehension of the whole document and multi-hop reasoning across multiple sentences to reach the final result. In this paper, we propose a novel graph-based model with Dual-tier Heterogeneous Graph (DHG) for document-level RE. In particular, DHG is composed of a structure modeling layer followed by a relation reasoning layer, and the major advantage is that it is capable of not only capturing both the sequential and structural information of documents but also mixing them together to benefit for multi-hop reasoning and final decision-making. Furthermore, we employ Graph Neural Networks (GNNs) based message propagation strategy to accumulate information on DHG. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on two widely used datasets, and further analyses suggest that all the modules in our model are indispensable for document-level RE.
Does Chinese BERT Encode Word Structure?
Yile Wang, Leyang Cui and Yue Zhang
Contextualized representations give significantly improved results for a wide range of NLP tasks. Much work has been dedicated to analyzing the features captured by representative models such as BERT. Existing work finds that syntactic, semantic and word sense knowledge are encoded in BERT. However, little work has investigated word features for character languages such as Chinese. We investigate Chinese BERT using both attention weight distribution statistics and probing tasks, finding that (1) word information is captured by BERT; (2) word-level features are mostly in the middle representation layers; (3) downstream tasks make different use of word features in BERT, with POS tagging and chunking relying the most on word features, and natural language inference relying the least on such features.
Does Gender Matter? Towards Fairness in Dialogue Systems
Haochen Liu, Jamell Dacon, Wenqi Fan, Hui Liu, Zitao Liu and Jiliang Tang
Recently there are increasing concerns about the fairness of Artificial Intelligence (AI) in real-world applications such as computer vision and recommendations. For example, recognition algorithms in computer vision are unfair to black people such as poorly detecting their faces and inappropriately identifying them as "gorillas''. As one crucial application of AI, dialogue systems have been extensively applied in our society. They are usually built with real human conversational data; thus they could inherit some fairness issues which are held in the real world. However, the fairness of dialogue systems has not been well investigated. In this paper, we perform a pioneering study about the fairness issues in dialogue systems. In particular, we construct a benchmark dataset and propose quantitative measures to understand fairness in dialogue models. Our studies demonstrate that popular dialogue models show significant prejudice towards different genders and races. Besides, to mitigate the bias exhibited in dialogue systems, we propose two effective debiasing methods. Experiments show that our methods can reduce the biases in dialogue systems significantly. We will release the dataset and the code to foster the fairness research in dialogue systems upon the acceptance of the paper.
Domain Transfer based Data Augmentation for Neural Query Translation
Liang Yao, Baosong Yang, zhang haibo, Boxing Chen and Weihua Luo
Query translation (QT) serves as a critical factor in successful cross-lingual information retrieval (CLIR). Due to the lack of parallel query samples, neural-based QT models are usually optimized with synthetic data which are derived from large-scale monolingual queries. Nevertheless, such kind of pseudo corpus is mostly produced by a general-domain translation model, making it be insufficient to guide the learning of QT model. In this paper, we extend the data augmentation with a domain transfer procedure, thus to revise synthetic candidates to search-aware examples. Specifically, the domain transfer model is built upon advanced Transformer, in which layer coordination and mixed attention are exploited to speed up the refining process and leverage parameters from a pre-trained cross-lingual language model. In order to examine the effectiveness of the proposed method, we collected French-to-English and Spanish-to-English QT test sets, each of which consists of 10,000 parallel query pairs with careful manual-checking. Qualitative and quantitative analyses reveal that our model significantly outperforms strong baselines and the related domain transfer methods on both translation quality and retrieval accuracy.
Don't Invite BERT to Drink a Bottle: Modeling the Interpretation of Metonymies Using BERT and Distributional Representations
Paolo Pedinotti and Alessandro Lenci
In this work, we carry out two experiments in order to assess the ability of BERT to capture the meaning shift associated with metonymic expressions. We test the model on a new dataset that is representative of the most common types of metonymy. We compare BERT with the Structured Distributional Model (SDM), a model for the representation of words in context which is based on the notion of Generalized Event Knowledge. The results reveal that, while BERT ability to deal with metonymy is quite limited, SDM is good at predicting the meaning of metonymic expressions, providing support for an account of metonymy based on event knowledge.
Don’t Patronize Me! An Annotated Dataset with Patronizing and Condescending Language towards Vulnerable Communities
Carla Perez Almendros, Luis Espinosa Anke and Steven Schockaert
In this paper, we introduce a new annotated dataset which is aimed at supporting the development of NLP models to identify and categorize language that is patronizing or condescending towards vulnerable communities (e.g. refugees, homeless people, poor families). While the prevalence of such language in the general media has long been shown to have harmful effects, it differs from other types of harmful language, in that it is generally used unconsciously and with good intentions. We furthermore believe that the often subtle nature of patronizing and condescending language (PCL) presents an interesting technical challenge for the NLP community. Our analysis of the proposed dataset shows that identifying PCL is indeed hard for standard NLP models, with language models such as BERT achieving the best results.
Don’t take “nswvtnvakgxpm” for an answer –The surprising vulnerability of automatic content scoring systems to adversarial input
Yuning Ding, Brian Riordan, Andrea Horbach, Aoife Cahill and Torsten Zesch
Automatic content scoring systems are widely used on short answer tasks to save human effort. However, the use of these systems can invite cheating strategies, such as students writing irrelevant answers in the hopes of gaining at least partial credit. We generate adversarial answers for benchmark content scoring datasets based on different methods of increasing sophistication and show that even simple methods lead to a surprising decrease in content scoring performance. As an extreme example, up to 60\% of adversarial answers generated from random shuffling of words in real answers are accepted by a state-of-the-art scoring system. In addition to analyzing the vulnerabilities of content scoring systems, we examine countermeasures such as adversarial training and show that these measures improve system robustness against adversarial answers considerably but do not suffice to completely solve the problem.
DT-QDC: A Dataset for Question Comprehension in Online Test
Sijin Wu, Yujiu Yang, Nicholas Yung, Zhengchen Shen and Zeyang Lei
With the transformation of education from the traditional classroom environment to online education and assessment, it is more and more important to accurately assess the difficulty of questions than ever. As teachers may not be able to follow the student’s performance and learning behavior closely, a well-defined method to measure the difficulty of questions to guide learning is necessary. In this paper, we explore the concept of question difficulty and provide our new Chinese DT-QDC dataset. This is currently the largest and only Chinese question dataset, and it also has enriched attributes and difficulty labels. Additional attributes such as keywords, chapter, and question type would allow models to understand questions more precisely. We proposed the MTMS-BERT and ORMS-BERT, which can improve the judgment of difficulty from different views. The proposed methods outperforms different baselines by 7.79% on F1-score and 15.92% on MAE, 28.26% on MSE on the new DT-QDC dataset, laying the foundation for the question difficulty comprehension task.
Dual Attention Model for Citation Recommendation
Yang Zhang and Qiang Ma
Based on an exponentially increasing number of academic articles, discovering and citing comprehensive and appropriate resources has become a non-trivial task. Conventional citation recommender methods suffer from severe information loss. For example, they do not consider the section on which a user is working, the relatedness between words, or the importance of words. These shortcomings make such methods insufficient for recommending adequate citations when working on manuscripts. In this study, we propose a novel approach called dual attention model for citation recommendation (DACR) to recommend citations during manuscript preparation. Our method considers three dimensions of information: contextual words, contextual articles, and the section on which a user is working. The core of the proposed model is composed of self-attention and additive attention, where the former aims to capture the relatedness between input information, and the latter aims to learn the importance of inputs. The experiments on real-world datasets demonstrate the effectiveness of the proposed approach.
Dual Attention Network for Cross-lingual Entity Alignment
Jian Sun, Yu Zhou and Chengqing Zong
Cross-lingual Entity alignment is an essential part of building a knowledge graph, which can help integrate knowledge among different language knowledge graphs. In the real KGs, there exists an imbalance among the information in the same hierarchy of corresponding entities, which results in the heterogeneity of neighborhood structure, making this task challenging. To tackle this problem, we propose a dual attention network for cross-lingual entity alignment (DAEA). Specifically, our dual attention consists of relation-aware graph attention and hierarchical attention. The relation-aware graph attention aims at selectively aggregating multi-hierarchy neighborhood information to alleviate the difference of heterogeneity among counterpart entities. The hierarchical attention adaptively aggregates the low-hierarchy and the high-hierarchy information, which is beneficial to balance the neighborhood information of counterpart entities and distinguish non-counterpart entities with similar structures. Finally, we treat cross-lingual entity alignment as a process of linking prediction. Experimental results on three real-world cross-lingual entity alignment datasets have shown the effectiveness of DAEA.
Dual Dynamic Memory Network for End-to-End Multi-turn Task-oriented Dialog Systems
Jian Wang, Junhao Liu, Wei Bi, Xiaojiang Liu, Kejing He, Ruifeng Xu and Min Yang
Existing end-to-end task-oriented dialog systems struggle to dynamically model long dialog context for interactions and effectively incorporate knowledge base (KB) information into dialog generation. To conquer these limitations, we propose a Dual Dynamic Memory Network (DDMN) for multi-turn dialog generation, which maintains two core components: dialog memory manager and KB memory manager. The dialog memory manager dynamically expands the dialog memory turn by turn and keeps track of dialog history with an updating mechanism, which encourages the model to filter irrelevant dialog history and memorize important newly coming information. The KB memory manager shares the structural KB triples throughout the whole conversation, and dynamically extracts KB information with a memory pointer at each turn. Experimental results on three benchmark datasets demonstrate that DDMN significantly outperforms the strong baselines in terms of both automatic evaluation and human evaluation. Our code is available at https://github.com/siat-nlp/DDMN.
Dual Supervision Framework for Relation Extraction with Distant Supervision and Human Annotation
Woohwan Jung and Kyuseok Shim
Relation extraction (RE) has been extensively studied due to its importance in real-world applications such as knowledge base construction and question answering. Most of the existing works train the models on either distantly supervised data or human-annotated data. To take advantage of the high accuracy of human annotation and the cheap cost of distant supervision, we propose the dual supervision framework which effectively utilizes both types of data. However, simply combining the two types of data to train a RE model may decrease the prediction accuracy since distant supervision has labeling bias. We employ two separate prediction networks HA-Net and DS-Net to predict the labels by human annotation and distant supervision, respectively, to prevent the degradation of accuracy by the incorrect labeling of distant supervision. Furthermore, we propose an additional loss term called disagreement penalty to enable HA-Net to learn from distantly supervised labels. In addition, we exploit additional networks to adaptively assess the labeling bias by considering contextual information. Our performance study on sentence-level and document-level REs confirms the effectiveness of the dual supervision framework.
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation
Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab and Laurent Besacier
We introduce dual-decoder Transformer, a new model architecture to jointly perform automatic speech recognition (ASR) and multilingual speech translation (ST). Our models are based on the original Transformer architecture (Vaswani et al., 2017) but consist of two decoders, each responsible for one task (ASR or ST). Our major contribution lies in how these decoders interact with each other: one decoder can attend to different information sources from the other via a dual-attention mechanism. We propose two variants of these architectures corresponding to two different levels of dependencies between the decoders, called the parallel and cross dual-decoder Transformers, respectively. We conduct extensive experiments on the MuST-C dataset. Results show that our models outperform the previously-reported results on joint ASR/ST decoding and outperform as well highest performance on the same dataset in the multilingual settings. Our code will be made publicly available upon publication.
Dynamic Curriculum Learning for Low-Resource Neural Machine Translation
Chen Xu, Bojie Hu, Yufan Jiang, Kai Feng, Zeyang Wang, shen huang, Qi Ju, Tong Xiao and Jingbo Zhu
Large amounts of data has made neural machine translation (NMT) a big success in recent years. But it is still a challenge if we train these models on small-scale corpora. In this case, the way of using data appears to be more important. Here, we investigate the effective use of training data for low-resource NMT. In particular, we propose a dynamic curriculum learning (DCL) method to reorder training samples in training. Unlike previous work, we do not use a static scoring function for reordering. Instead, the order of training samples is dynamically determined in two ways - loss decline and model competence. This eases training by highlighting easy samples that the current model has enough competence to learn. We test our DCL method in a Transformer-based system. Experimental results show that DCL outperforms several strong baselines on three low-resource machine translation benchmarks and different sized data of WMT'16 En-De.
Dynamic Topic Tracker for KB-to-Text Generation
Zihao Fu, Lidong Bing, Wai Lam and Shoaib Jameel
Recently, many KB-to-text generation tasks have been proposed to bridge the gap between knowledge bases and natural language by directly converting a group of knowledge base triples into human-readable sentences. However, most of the existing models suffer from the off-topic problem, namely, the models are prone to generate some unrelated clauses that are somehow involved with certain input terms regardless of the given input data. This problem seriously degrades the quality of the generation results. In this paper, we propose a novel dynamic topic tracker for solving this problem. Different from existing models, our proposed model learns a global hidden representation for topics and recognizes the corresponding topic during each generation step. The recognized topic is used as additional information to guide the generation process and thus alleviates the off-topic problem. The experimental results show that our proposed model can enhance the performance of sentence generation and the off-topic problem is significantly mitigated.
Early Detection of Fake News by Utilizing the Credibility of News, Publishers, and Users based on Weakly Supervised Learning
Chunyuan Yuan, Qianwen Ma, Wei Zhou, Jizhong Han and Songlin Hu
The dissemination of fake news significantly affects personal reputation and public trust. Recently, fake news detection has attracted tremendous attention, and previous studies mainly focused on finding clues from news content or diffusion path. However, the required features of previous models are often unavailable or insufficient in early detection scenarios, resulting in poor performance. Thus, early fake news detection remains a tough challenge. Intuitively, the news from trusted and authoritative sources or shared by many users with a good reputation is more reliable than other news. Using the credibility of publishers and users as prior weakly supervised information, we can quickly locate fake news in massive news and detect them in the early stages of dissemination.
Effective Few-Shot Classification with Transfer Learning
Aakriti Gupta, Kapil Thadani and Neil O'Hare
Few-shot learning addresses the the problem of learning based on a small amount of training data. Although more well-studied in the domain of computer vision, recent work has adapted the Amazon Review Sentiment Classification (ARSC) text dataset for use in the few-shot setting. In this work, we use the ARSC dataset to study a simple application of transfer learning approaches to few-shot classification. We train a single binary classifier to learn all few-shot classes jointly by prefixing class identifiers to the input text. Given the text and class, the model then makes a binary prediction for that text/class pair. Our results show that that this simple approach can outperform most published results on this dataset. Surprisingly, we also show that including domain information as part of the task definition is not necessary and removing it does not harm model accuracy. This last result suggests that the classes in the ARSC few-shot task, which are defined by the intersection of domain and rating, are actually very similar to each other, and that a more suitable dataset should be proposed for the study of few-shot text classification.
Effective Use of Target-side Context for Neural Machine Translation
Hideya Mino, Hitoshi Ito, Isao Goto, Ichiro Yamada and Takenobu Tokunaga
In this paper, we deal with two problems in Japanese-English machine translation of news articles. The first problem is the quality of parallel corpora. Neural machine translation (NMT) systems suffer degraded performance when trained with noisy data. Because there is no clean Japanese-English parallel data for news articles, we build a novel parallel news corpus consisting of Japanese news articles translated into English in a content-equivalent manner. This is the first content-equivalent Japanese-English news corpus translated specifically for training NMT systems. The second problem involves the domain-adaptation technique. NMT systems suffer degraded performance when trained with mixed data having different features, such as noisy data and clean data. Though the existing methods try to overcome this problem by using tags for distinguishing the differences between corpora, it is not sufficient. We thus extend a domain-adaptation method using multi-tags to train an NMT model effectively with the clean corpus and existing parallel news corpora with some types of noise. Experimental results show that our corpus increases the translation quality, and that our domain-adaptation method is more effective for learning with the multiple types of corpora than existing domain-adaptation methods are.
Embedding Dynamic Attributed Networks by Modeling the Evolution Processes
Zenan Xu, Zijing Ou, Qinliang Su, Jianxing Yu, Xiaojun Quan and ZhenKun Lin
Network embedding has recently emerged as a promising technique to embed nodes of a network into low-dimensional vectors. While fairly successful, most existing works focus on the embedding techniques for static networks. But in practice, there are many networks that are evolving over time and hence are dynamic, e.g., the social networks. To address this issue, a high-order spatio-temporal embedding model is developed to track the evolutions of dynamic networks. Specifically, an activeness-aware neighborhood embedding method is first proposed to extract the high-order neighborhood information at each given timestamp. Then, an embedding prediction framework is further developed to capture the temporal correlations, in which the attention mechanism is employed instead of recurrent neural networks (RNNs) for its efficiency in computing and flexibility in modeling. Extensive experiments are conducted on four real-world datasets from three different areas. It is shown that the proposed method outperforms all the baselines by a substantial margin for the tasks of dynamic link prediction and node classification, which demonstrates the effectiveness of the proposed methods on tracking the evolutions of dynamic networks.
Embedding Meta-Textual Information for Improved Learning to Rank
Toshitaka Kuwa, Shigehiko Schamoni and Stefan Riezler
Neural approaches to learning term embeddings have led to improved computation of similarity and ranking in information retrieval (IR). So far neural representation learning has not been extended to meta-textual information that is readily available for many IR tasks, for example, patent classes in prior-art retrieval, topical information in Wikipedia articles, or product categories in e-commerce data. We present a framework that learns embeddings for meta-textual categories, and optimizes a pairwise ranking objective for improved matching based on combined embeddings of textual and meta-textual information. We show considerable gains in an experimental evaluation on cross-lingual retrieval in the Wikipedia domain for three language pairs, and in the Patent domain for one language pair. Our results emphasize that the mode of combining different types of information is crucial for model improvement.
Embedding Semantic Taxonomies
Alyssa Lees, Chris Welty, Shubin Zhao and Jacek Korycki
A common step in developing an understanding of a vertical domain, e.g. shopping, dining, movies, medicine, etc., is curating a taxonomy of categories specific to the domain. These human created artifacts have been the subject of research in embeddings that attempt to encode aspects of the partial ordering property of taxonomies. We compare Box Embeddings, a natural containment representation of category taxonomies, to partial-order embeddings and a baseline Bayes Net, in the context of representing the Medical Subject Headings (MeSH) taxonomy given a set of 300K PubMed articles with subject labels from MeSH. We deeply explore the experimental properties of training box embeddings, including preparation of the training data, sampling ratios and class balance, initialization strategies, and propose a fix to the original box objective. We then present first results in using these techniques for representing a bipartite learning problem (i.e. collaborative filtering) in the presence of taxonomic relations within each partition, inferring disease (anatomical) locations from their use as subject labels in journal articles. Our box model substantially outperforms all baselines for taxonomic reconstruction and bipartite relationship experiments. This performance improvement is observed both in overall accuracy and the weighted spread by true taxonomic depth.
Emergent Communication Pretraining for Few-Shot Machine Translation
Yaoyiran Li, Edoardo Maria Ponti, Ivan Vulić and Anna Korhonen
While state-of-the-art models that rely upon massively mutilingual pre-trained encoders achieve sample efficiency in downstream applications, they still require abundant amounts of unlabelled text. Yet, most of the world's languages lack such resources. Hence, we investigate a more radical form of unsupervised knowledge transfer in absence of linguistic data. In particular, we pre-train neural networks via emergent communication from referential games and show that this benefits machine translation in few-shot settings. Contrary to the arbitrary nature of the lexicon, communication grounded on images---as a crude approximation of real-world environments---inductively biases the model towards learning natural languages. In addition, in order to enhance knowledge transfer, we introduce a customised Adapter layer and annealing strategies for the regulariser of maximum-a-posteriori inference during fine-tuning. Compared to a recurrent baseline, our method yields gains of 59.0%~147.6% in BLEU score with only 500 NMT training instances and 65.1%~196.7% with 1,000 NMT training instances across 4 language pairs. Not only this demonstrates for the first time the usefulness of emergent communication for natural language applications, but also provides an extrinsic evaluation for the properties of emergent languages, measuring the influence of communication success and sequence length on downstream performances.
Emotion Classification by Jointly Learning to Lexiconize and Classify
Deyu Zhou, Shuangzhi Wu, Qing Wang, Jun Xie, Zhaopeng Tu and Mu Li
Emotion lexicons have been shown effective for emotion classification (Baziotis et al., 2018). Previous studies handle emotion lexicon construction and emotion classification separately. In this paper, we propose an emotional network (EmNet) to jointly learn sentence emotions and construct emotion lexicons which are dynamically adapted to a given context. The dynamic emotion lexicons are useful for handling words with multiple emotions based on different context, which can effectively improve the classification accuracy. We validate the approach on two representative architectures – LSTM and BERT, demonstrating its superiority on identifying emotions in Tweets. Our model outperforms several approaches proposed in previous studies and achieves new state-of-the-art on the benchmark Twitter dataset.
EmpDG: Multi-resolution Interactive Empathetic Dialogue Generation
Qintong Li, Hongshen Chen, Zhaochun Ren, Pengjie Ren, Zhaopeng Tu and Zhumin CHEN
A humanized dialogue system is expected to generate empathetic replies, which should be sensitive to the users' expressed emotion. The task of empathetic dialogue generation is proposed to address this problem. The essential challenges lie in accurately capturing the nuances of human emotion and considering the potential of user feedback, which are overlooked by the majority of existing work. In response to this problem, we propose a multi-resolution adversarial model -- EmpDG, to generate more empathetic responses. EmpDG exploits both the coarse-grained dialogue-level and fine-grained token-level emotions, the latter of which helps to better capture the nuances of user emotion. In addition, we introduce an interactive adversarial learning framework which exploits the user feedback, to identify whether the generated responses evoke emotion perceptivity in dialogues. Experimental results show that the proposed approach significantly outperforms the state-of-the-art baselines in both content quality and emotion perceptivity.
Enabling Interactive Transcription in an Indigenous Community
Eric Le Ferrand, Steven Bird and Laurent Besacier
We present a new transcription workflow which combines spoken term detection and native speaker expertise. This work is grounded in a almost zero-resource scenario where only a few terms have so far been identified, involving endangered languages of Africa and Australia.
Encoding Lexico-Semantic Knowledge using Ensembles of Feature Maps from Deep Convolutional Neural Networks
Steven Derby, Paul Miller and Barry Devereux
Semantic models derived from visual information have helped to overcome some of the limitations of solely text-based distributional semantic models. Researchers have demonstrated that text and image-based representations encode complementary semantic information, which when combined provide a more complete representation of word meaning, in particular when compared with data on human conceptual knowledge. In this work, we reveal that these vision-based representations, whilst quite effective, do not make use of all the semantic information available in the neural network that could be used to inform vector-based models of semantic representation. Instead, we build image-based meta-embeddings from computer vision models, which can incorporate information from all layers of the network, and show that they encode a richer set of semantic attributes and yield a more complete representation of human conceptual knowledge.
End to End Chinese Lexical Fusion Recognition with Sememe Knowledge
Yijiang Liu, Meishan Zhang and Donghong Ji
In this paper, we present Chinese lexical fusion recognition, a new task which could be regarded as one kind of coreference recognition. First, we introduce the task in detail, showing the relationship with coreference recognition and differences from the existing tasks. Second, we propose an end-to-end model for the task, handling mentions as well as coreference relationship jointly. The model exploits the state-of-the-art contextualized BERT representations as an encoder, and is further enhanced with the sememe knowledge from HowNet by graph attention networks. We manually annotate a benchmark dataset for the task and then conduct experiments on it. Results demonstrate that our final model is effective and competitive for the task. Detailed analysis is offered for comprehensively understanding the new task and our proposed model.
End-to-End Emotion-Cause Pair Extraction with Graph Convolutional Network
Ying Chen, Wenjun Hou and Xiaoqiang Zhang
Emotion-cause pair extraction (ECPE), which aims at simultaneously extracting emotion-cause pairs that express emotions and their corresponding causes in a document, plays a vital role in understanding natural languages. Considering that most emotions usually have few causes mentioned in their contexts, we present a novel end-to-end Pair Graph Convolutional Network (PairGCN) to model pair-level contexts so that to capture the dependency information among local neighborhood candidate pairs. Moreover, in the graphical network, contexts are grouped into three types and each type of contexts is propagated by its own way. Experiments on a benchmark Chinese emotion-cause pair extraction corpus demonstrate the effectiveness of the proposed model.
Enhancing Clinical BERT Embedding using a Biomedical Knowledge Base
Boran Hao, Henghui Zhu and Ioannis Paschalidis
Domain knowledge is important for building Natural Language Processing (NLP) systems for low-resource settings, such as in the clinical domain. In this paper, a novel joint training method is introduced for adding knowledge available in the Unified Medical Language System (UMLS) into the language model pre-training procedure used for a variety of NLP tasks applied to clinical reports. We tested the models in two different downstream clinical NLP tasks using the pre-trained language models and achieved performance surpassing the current state-of-the-art. Specifically, in a natural language inference task applied to clinical texts our knowledge base pre-training approach improves accuracy by up to 1.4%, whereas in clinical name entity recognition tasks F1-score improves by up to 1.4%
Enhancing Extractive Text Summarization with Topic-Aware Graph Neural Networks
Peng Cui, Le Hu and Yuanchao Liu
Text summarization aims to compress a textual document to a short summary while keeping salient information. Extractive approaches are widely used in text summarization because of their fluency and efficiency. However, most of existing extractive models hardly capture inter-sentence relationships, particularly in long documents. They also often ignore the effect of topical information on capturing important contents. To address these issues, this paper proposes a graph neural network (GNN)-based extractive summarization model, enabling to capture inter-sentence relationships efficiently via graph-structured document representation. Moreover, our model integrates a joint neural topic model (NTM) to discover latent topics, which can provide document-level features for sentence selection. The experimental results demonstrate that our model not only substantially achieves state-of-the-art results on CNN/DM and NYT datasets but also considerably outperforms existing approaches on scientific paper datasets consisting of much longer documents, indicating its better robustness in document genres and lengths. Further discussions show that topical information can help the model preselect salient contents from an entire document, which interprets its effectiveness in long document summarization.
Enhancing Neural Models with Vulnerability via Adversarial Attack
Rong Zhang, Qifei Zhou, Bo Wu, Bo An, Weiping Li and Tong Mo
Natural language sentence matching (NLSM) serves as the core of many natural language processing tasks. 1) Most previous work develops a single specific neural model for NLSM tasks. 2) There is no previous work considering adversarial attack to improve the performance of NLSM tasks. 3) Adversarial attack is usually used to generate adversarial samples that can fool neural models. In this paper, we first find a phenomenon that different categories of samples have different vulnerabilities. Vulnerability is the difficulty degree in changing the label of a sample. Considering the phenomenon, we propose a general two-stage training framework to enhance neural models with \textbf{V}ulnerability via \textbf{A}dversarial \textbf{A}ttack (VAA). We design criterion to measure the vulnerability which is obtained by adversarial attack. VAA framework can be adapted to various neural models by incorporating the vulnerability. In addition, we prove a theorem and four corollaries to explain the factors influencing vulnerability effectiveness. Experimental results show that VAA significantly improves the performance of neural models on NLSM datasets. The results are also consistent with the theorem and corollaries.
Evaluating Pretrained Transformer-based Models on the Task of Fine-Grained Named Entity Recognition
Cedric Lothritz, Kevin Allix, Lisa Veiber, Tegawendé F. Bissyandé and Jacques Klein
Named Entity Recognition (NER) is a fundamental Natural Language Processing (NLP) task and has remained an active research field. In recent years, transformer models and more specifically the BERT model developed at Google revolutionised the field of NLP. While the performance of transformer-based approaches such as BERT has been studied for NER, there has not yet been a study for the fine-grained Named Entity Recognition (FG-NER) task. In this paper, we compare three transformer-based models (BERT, RoBERTa, and XLNet) to two non-transformer-based models (CRF and BiLSTM-CNN-CRF). Furthermore, we apply each model to a multitude of distinct domains. We find that transformer-based models incrementally outperform the studied non-transformer-based models in most domains with respect to the F1 score. Furthermore, we find that the choice of domains significantly influenced the performance regardless of the respective data size or the model chosen.
Evaluating Unsupervised Representation Learning for Detecting Stances of Fake News
Maike Guderlei and Matthias Aßenmacher
Our goal is to evaluate the usefulness of unsupervised representation learning techniques for detecting stances of Fake News. Therefore we examine several pre-trained language models with respect to their performance on two Fake News related data sets, both consisting of instances with a headline, an associated news article and the stance of the article towards the respective headline. Specifically, the aim is to understand how much hyperparameter tuning is necessary when fine-tuning the pre-trained architectures, how well transfer learning works in this specific case of stance detection and how sensitive the models are to changes in hyperparameters like batch size, learning rate (schedule), sequence length as well as the freezing technique. The results indicate that the computationally more expensive autoregression approach of XLNet (Yanget al., 2019) is outperformed by BERT-based models, notably by RoBERTa (Liu et al., 2019).While the learning rate seems to be the most important hyperparameter, experiments with different freezing techniques indicate that all evaluated architectures had already learned powerful language representations that pose a good starting point for fine-tuning them.
Event coreference resolution based on event-specific paraphrases and argument-aware semantic embeddings
Yutao Zeng, Xiaolong Jin, Saiping Guan, Jiafeng Guo and Xueqi Cheng
Event coreference resolution aims to determine whether a few event mentions refer to the same event, which is necessary to information aggregation and many downstream applications. To resolve event coreference, existing methods usually calculate the similarities between the event mentions and their arguments of events. However, they fail to capture deep event paraphrase features and may suffer from error propagation. Therefore, we propose an Event-specific Paraphrases and Argument-aware Semantic Embeddings enhanced model (EPASE) for event coreference resolution. EPASE recognizes deep paraphrase features in an event-specific context and thus improves its generalization ability. Additionally, the embeddings argument roles are encoded into event embedding without relying on a fixed number and type of arguments, which results in the better scalability of EPASE. Experiments on both within- and cross-document event coreference demonstrate its consistent and significant superiority compared to existing methods.
Event-Guided Denoising for Multilingual Relation Learning
Amith Ananthram, Emily Allaway and Kathleen McKeown
General purpose relation extraction has recently seen considerable gains in part due to a massively data-intensive distant supervision technique from Soares et al. (2019) that produces state-of-the-art results across several benchmarks. In this work, we present a denoising methodology for collecting relation training data from unlabeled text which achieves a near-recreation of their zero-shot and few-shot results at a fraction of the training cost. Our approach exploits the predictable distributional structure of news corpora to extract fewer, higher quality examples. We train a smaller multilingual language model encoder on these examples and find that it performs comparably to theirs (when both receive little to no fine-tuning) on few-shot and standard relation benchmarks in English and Spanish despite using many fewer examples (50k vs. 300mil+).
Expert Concept-Modeling Ground Truth Construction for Word Embeddings Evaluation in Concept-Focused Domains
Arianna Betti, Martin Reynaert, Thijs Ossenkoppele, Yvette Oortwijn, Andrew Salway and Jelke Bloem
We present a novel, domain expert-controlled, replicable procedure for the construction of concept-modeling ground truths to the aim of evaluating the application of word embeddings in concept-focused textual domains. We illustrate the procedure by describing the construction of a threefold expert ground truth built to answer research questions concerning the concept of science in the Quine corpus, a 2-million-token, single-author, 20th-century English philosophy corpus of outstanding quality, cleaned up and enriched for the purpose. To the best of our ken, expert concept-modeling ground truths are extremely rare in current literature, nor has the theoretical methodology behind their construction ever been explicitly conceptualised and properly systematised. Expert-controlled concept-modeling ground truths are however essential to allow proper evaluation of word embeddings techniques, and increase their trustworthiness in specialised domains in which the detection of concepts through their expression in texts is important. We highlight challenges, requirements, and prospects for future work.
Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering
Quan Hung Tran, Nhan Dam, Tuan Lai, Franck Dernoncourt, Trung Le, Nham Le and Dinh Phung
Interpretability and explainability of deep neural net models are always challenging due to their size and complexity. Many previous works focused on visualizing internal components of neural networks to represent them through human-friendly concepts. On the other hand, in real life, when making a decision, human tends to rely on similar situations in the past. Thus, we argue that one potential approach to make the model interpretable and explainable is to design it in a way such that the model explicitly connects the current sample with the seen samples, and bases its decision on these samples. In this work, we design one such model: an explainable, evidence-based memory network architecture, which learns to summarize the dataset and extract supporting evidences to make its decision. The model achieves state-of-the-art performance on two popular question answering datasets, the TrecQA dataset and the WikiQA dataset. Via further analysis, we showed that this model can reliably trace the errors it has made in the validation step to the training instances that might have caused this error. We believe that this error-tracing capability might be beneficial in improving dataset quality in many applications.
Explainable and Sparse Representations of Academic Articles for Knowledge Exploration
Keng-Te Liao, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, PoChun Chen, Kuansan Wang and Shou-de Lin
We focus on a recently deployed system which is built for summarizing academic articles by concept tagging. The system leverages knowledge acquired from millions of publications to present high accuracy and coverage of concept identification. Provided with the interpretability and knowledge encoded in a pre-trained neural model, we investigate whether the tagged concepts can be applied to a broader class of applications. We propose transforming the tagged concepts into sparse vectors as representations of academic documents. The effectiveness of the representations is then analyzed theoretically by a proposed framework. Also, we empirically show that the representations can show advantages on academic topic discovery and paper recommendation. On these applications, we reveal that the knowledge encoded in the tagging system can be effectively utilized and can help infer additional features from data with limited information.
Explainable Automated Fact-Checking: A Survey
Neema Kotonya and Francesca Toni
A number of advances have been made in automated fact-checking recently: thanks to larger datasets and more powerful systems, we have seen improvements in the complexity of claims which can be accurately fact-checked. However, despite these advances, there are still desirable functionalities missing from the fact-checking pipeline. In this survey, we focus on the explanation functionality – that is fact-checking systems providing reasoning for their predictions. We summarize existing methods for explaining the predictions of fact-checking systems and we explore trends in this topic area. Further, we consider what makes for good explanations in this specific domain through a comparative analysis of existing fact-checking explanations against some desirable properties. Finally, we propose further research directions for generating factchecking explanations, and describe how these research directions may lead to improvements in the area.
Exploiting a lexical resource for discourse connective disambiguation in German
Peter Bourgonje and Manfred Stede
In this paper we focus on connective identification and sense classification for explicit discourse relations in German, as two individual sub-tasks of the overarching Shallow Discourse Parsing task. We successively augment a purely-empirical approach based on contextualised embeddings with linguistic knowledge encoded in a connective lexicon. In this way, we improve over published results for connective identification, achieving a final f1-score of 87.93; and we introduce, to the best of our knowledge, first results for German sense classification, achieving an f1-score of 87.13. Our approach demonstrates that a connective lexicon can be a valuable resource for those languages that do not have a large PDTB-style-annotated coprus available.
Exploiting Microblog Conversation Structures to Detect Rumors
Jiawen Li, Yudianto Sujana and Hung-Yu Kao
As one of the most popular social media platforms, Twitter has become a primary source of information for many people. Unfortunately, both valid information and rumors are propagated on Twitter due to the lack of an automatic information verification system. Twitter users communicate by replying to other users' messages, forming a conversation structure. Using this structure, users can decide whether the information in the source tweet is a rumor by reading the tweet’s replies, which voice other users’ stances on the tweet. The majority of rumor detection researchers process such tweets based on time, ignoring the conversation structure. To reap the benefits of the Twitter conversation structure, we developed a model to detect rumors by modeling conversation structure as a graph. Thus, our model’s improved representation of the conversation structure enhances its rumor detection accuracy. The experimental results on two rumor datasets show that our model outperforms several baseline models, including a state-of-the-art model
Exploiting Narrative Context and a Priori Knowledge of Categories in Textual Emotion Classification
Hikari Tanabe, Tetsuji Ogawa, Tetsunori Kobayashi and Yoshihiko Hayashi
Recognition of the mental state of a human character in text is a major challenge in natural language processing. In this study, we investigate the efficacy of the narrative context in recognizing the emotional states of human characters in text and discuss an approach to make use of a priori knowledge regarding the employed emotion category system. Specifically, we experimentally show that the accuracy of emotion classification is substantially increased by encoding the preceding context of the target sentence using a BERT-based text encoder. We also compare ways to incorporate a priori knowledge of emotion categories by altering the loss function used in training, in which our proposal of multi-task learning that jointly learns to classify positive/negative polarity of emotions is included. The experimental results suggest that, when using Plutchik's Wheel of Emotions, it is better to jointly classify the basic emotion categories with positive/negative polarity rather than directly exploiting its characteristic structure in which eight basic categories are arranged in a wheel.
Exploiting Node Content for Multiview Graph Convolutional Network and Adversarial Regularization
Qiuhao Lu, Nisansa de Silva, Dejing Dou, Thien Huu Nguyen, Prithviraj Sen, Berthold Reinwald and Yunyao Li
Network representation learning (NRL) is crucial in the area of graph learning. Recently, graph autoencoders and its variants have gained much attention and popularity among various types of node embedding approaches. Most existing graph autoencoder-based methods aim to minimize the reconstruction errors of the input network while not explicitly considering the semantic relatedness between nodes. In this paper, we propose a novel network embedding method which models the consistency across different views of networks. More specifically, we create a second view from the input network which captures the relation between nodes based on node content and enforce the latent representations from the two views to be consistent by incorporating a multiview adversarial regularization module. The experimental studies on benchmark datasets prove the effectiveness of this method, and demonstrate that our method compares favorably with the state-of-the-art algorithms on challenging tasks such as link prediction and node clustering. We also evaluate our method on a real-world application, i.e., 30-day unplanned ICU readmission prediction, and achieve promising results compared with several baseline methods.
Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models
Seid Muhie Yimam, Hizkiel Mitiku Alemayehu, Abinew Ayele and Chris Biemann
In this paper, we present the first study of sentiment analysis for Amharic social media texts. As the number of social media users is ever-increasing, social media platforms would like to understand the latent meaning and sentiments of contents to enhance the decision-making procedure. However, low-resource languages such as Amharic has received less attention due to several reasons such as lack of well-annotated datasets, unavailability of computing resources, and fewer or no expert researchers in the area. This research address three main research questions. We first explore the suitability of existing tools for the sentiment analysis task. There are no tools to support large-scale annotation tasks in Amharic. Also, the existing crowdsourcing platforms do not support Amharic text annotation. Hence, we build a social network friendly annotation tool called `ASAB' using the Telegram bot. We have collected 9.4k tweets, where each tweet is annotated by three Telegram users. Secondly, we explore which machine learning approach works better for Amharic sentiment analysis. The FLAIR deep learning text classifier, based on the network embeddings that are computed from a distributional thesaurus, outperforms other supervised classifiers. Lastly, we have investigated the challenges in building a sentiment analysis system for Amharic and we found that the widespread usage of sarcasm and figurative speech are the main issues in dealing with the problem. To advance the sentiment analysis research in Amharic and other related low-resource languages, we release the dataset, the annotation tool, source codes, and models publicly under a permissive license.
Exploring Controllable Text Generation Techniques
Shrimai Prabhumoye, Alan W Black and Ruslan Salakhutdinov
Neural controllable text generation is an important area gaining attention due to its plethora of applications. In this work, we provide a new schema of the pipeline of the generation process by classifying it into five modules. We present an overview of the various techniques used to modulate each of these modules to provide with control of attributes in the generation process. We also provide an analysis on the advantages and disadvantages of these techniques and pave ways to develop new architectures based on the combination of the modules described in this paper.
Exploring Cross-sentence Contexts for Named Entity Recognition with BERT
Jouni Luoma and Sampo Pyysalo
Named entity recognition (NER) is frequently addressed as a sequence classification task with each input consisting of one sentence of text. It is nevertheless clear that useful information for NER is often found also elsewhere in text. Recent self-attention models like BERT can both capture long-distance relationships in input and represent inputs consisting of several sentences. This creates opportunities for adding cross-sentence information in natural language processing tasks. This paper presents a systematic study exploring the use of cross-sentence information for NER using BERT models in five languages. We find that adding context as additional sentences to BERT input systematically increases NER performance. Multiple sentences in input samples allows us to study the predictions of the sentences in different contexts. We propose a straightforward method, Contextual Majority Voting (CMV), to combine these different predictions and demonstrate this to further increase NER performance. Evaluation on established datasets, including the CoNLL'02 and CoNLL'03 NER benchmarks, demonstrates that our proposed approach can improve on the state-of-the-art NER results on English, Dutch, and Finnish, achieves the best reported BERT-based results on German, and is on par with other BERT-based approaches in Spanish. We release all methods implemented in this work under open licenses.
Exploring End-to-End Differentiable Natural Logic Modeling
Yufei Feng, Zi'ou Zheng, Quan Liu, Michael Greenspan and Xiaodan Zhu
This paper presents a differentiable natural logic framework that integrates natural logic with neural networks. The framework performs inference based on natural logic and leverages the powerful modeling capabilities of neural networks and subsymbolic representation to enhance the learning process. Experiments show that the proposed framework can effectively model monotonicity-based reasoning, compared to existing models without built-in inductive bias for monotonicity reasoning. We provide detailed analysis on intermediate aggregation modeling.
Exploring Question-Specific Rewards for Generating Deep Questions
Yuxi Xie, Liangming Pan, Dongzhe Wang, Min-Yen Kan and Yansong Feng
Recent question generation (QG) approaches often utilize the sequence-to-sequence framework (Seq2Seq) to optimize the log likelihood of ground-truth questions using teacher forcing. However, this training objective is inconsistent with actual question quality, which is often reflected by certain global properties such as whether the question can be answered by the document. We propose to optimize directly for QG-specific objectives via reinforcement learning (RL) to improve question quality. We design three different rewards that target to improve the fluency, relevance, and answerability of generated questions. We conduct both automatic and human evaluations in addition to thorough analysis to explore the effect of each QG-specific reward. We find that optimizing on question-specific rewards generally leads to better performance in automatic evaluation metrics. However, only the rewards that correlate well with human judgement (e.g., relevance) lead to real improvement in question quality. Optimizing for the others, especially answerability, introduces incorrect bias to the model, resulting in poorer question quality.
Exploring the Language of Data
Gábor Bella, Linda Gremes and Fausto Giunchiglia
We set out to uncover the unique grammatical properties of an important yet so far under-researched type of natural language text: that of short labels typically found within structured datasets. We show that such labels obey a specific type of abbreviated grammar that we call the Language of Data, with properties significantly different from the kinds of text typically addressed in computational linguistics and NLP, such as 'standard' written language or social media messages. We analyse orthography, parts of speech, and syntax over a large, bilingual, hand-annotated corpus of data labels collected from a variety of domains. We perform experiments on tokenisation, part-of-speech tagging, and named entity recognition over real-world structured data, demonstrating that models adapted to the Language of Data outperform those trained on standard text. These observations point in a new direction to be explored as future research, in order to develop new NLP tools and models dedicated to the Language of Data.
Exploring the Value of Personalized Word Embeddings
Charles Welch, Jonathan K. Kummerfeld, Verónica Pérez-Rosas and Rada Mihalcea
In this paper, we introduce personalized word embeddings, and examine their value for language modeling. We compare the performance of our proposed prediction model when using personalized versus generic word representations, and study how these representations can be leveraged for improved performance. We provide insight into what types of words can be more accurately predicted when building personalized models. Our results show that a subset of words belonging to specific psycholinguistic categories tend to vary more in their representations across users and that combining generic and personalized word embeddings yields the best performance, with a 4.7% relative reduction in perplexity. Additionally, we show that a language model using personalized word embeddings can be effectively used for authorship attribution.
Exploring the zero-shot limit of FewRel
alberto cetoli
This paper proposes a general purpose relation extractor that uses Wikidata descriptions to represent the relation's surface form. The results are tested on the FewRel 1.0 dataset, which provides an excellent framework for training and evaluating the proposed zero-shot learning system in English. This relation extractor architecture exploits the implicit knowledge of a language model through a question-answering approach.
Extracting Adherence Information from Electronic Health Records
Jordan Sanders, Meghana Gudala, Kathleen Hamilton, Nishtha Prasad, Jordan Stovall, Eduardo Blanco, Jane E Hamilton and Kirk Roberts
Patient adherence is a critical factor in health outcomes. We present a framework to extract adherence information from electronic health records, including both sentence-level information indicating general adherence information (full, partial, none, etc.) and span-level information providing additional information such as adherence type (medication or nonmedication), reasons and outcomes. We annotate and make publicly available a new corpus of 3,000 de-identified sentences, and discuss the language physicians use to document adherence information. We also explore models based on state-of-the-art transformers to automate both tasks.
Fact vs. Opinion: the Role of Argumentative Features in News Classification
Tariq Alhindi, Smaranda Muresan and Daniel Preotiuc-Pietro
Editorial news articles aim to persuade readers of an opinion and, in an era of widespread digital misinformation, it is paramount for readers to be able to easily distinguish these from news stories reporting factual events. Argumentative discourse is a key distinctive feature of editorials and likely more resilient across different topics or publishers. We study classifying articles into news stories and opinion using models that aim to supplement the article content representation with argumentative features. We show that argumentative features outperform linguistic features used previously and improve on fine-tuned transformer-based models when tested on data from publishers unseen in training. Automatically flagging argumentative news aids applications such as fact-checking of fact-based articles and separate them form opinion-based ones that have the writer's personal point of view.
Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT
Ruifeng Yuan, zili Wang and Wenjie Li
Most current extractive summarization models generate summaries by selecting salient sentences. However, one of the problems with sentence-level extractive summarization is that there exists a gap between the human-written gold summary and the oracle sentence labels. In this paper, we propose to extract fact-level semantic units for better extractive summarization. We also introduce a hierarchical structure, which incorporates the multi-level of granularities of the textual information into the model. In addition, we incorporate our model with BERT using a hierarchical graph mask. This allows us to combine BERT’s ability in natural language understanding and the structural information without increasing the scale of the model. Experiments on the CNN/DaliyMail dataset show that our model achieves state-of-the-art results.
Facts2Story: Controlling Text Generation by Key Facts
Eyal Orbach and Yoav Goldberg
Recent advancements in self-attention neural network architectures have raised the bar for open-ended text generation. Yet, while current methods are capable of producing a coherent text which is several hundreds words long, attaining control over the content that is being generated---as well as evaluating it---are still open questions. We propose a controlled generation task which is based on expanding a sequence of facts, expressed in natural language, into a longer narrative. We introduce human-based evaluation metrics for this task, as well as a method for deriving a large training dataset.
Fair Evaluation in Concept Normalization: a Large-scale Comparative Analysis for BERT-based Models
Elena Tutubalina, Zulfat Miftahutdinov and Artur Kadurin
Linking of biomedical entity mentions to various terminologies of drugs, diseases, and targets is a challenging task, often requiring non-syntactic interpretation. A large number of biomedical corpora and state-of-the-art models have been introduced in the past five years. However, there are no general guidelines regarding the evaluation of models on these corpora in single- and cross-terminology settings. In this work, we present a fine-grained evaluation intended to perform a comparative evaluation of the various benchmarks and understand the efficiency of state-of-the-art neural architectures based on Bidirectional Encoder Representations from Transformers (BERT) for linking of three entity types across three domains, namely scientific abstracts, drug labels and user-generated texts on drug therapy in English.
FASTMATCH: Accelerating the Inference of BERT-based Text Matching
Shuai Pang, Jianqiang Ma, ZEYU YAN, Yang Zhang and Jianping Shen
Recently, pre-trained language models such as BERT have shown state-of-the-art accuracies in text matching. When being applied to IR (or QA), the BERT-based matching models need to online calculate the representations and interactions for all query-candidate pairs. The high inference cost has prohibited the deployments of BERT-based matching models in many practical applications. To address this issue, we propose a novel BERT-based text matching model, in which the representations and the interactions are decoupled. Then, the representations of the candidates can be calculated and stored offline, and directly retrieved during the online matching phase. To conduct the interactions and generate final matching scores, a lightweight attention network is designed. Experiments based on several large scale text matching datasets show that the proposed model, called FASTMATCH, can achieve up to 100X speed-up to BERT and RoBERTa at the online matching phase, while keeping more up to 98:7% of the performance.
Federated Learning for Spoken Language Understanding
Zhiqi Huang, Fenglin Liu and Yuexian Zou
Recently, spoken language understanding (SLU) has attracted extensive interests from both academia and industry, and various SLU datasets have been proposed to promote the development. However, most of the existing methods focus on a single individual dataset, the efforts to improve the robustness of models and obtain better performance by combining the merits of various datasets are not well studied. In this paper, we argue that if these SLU datasets are considered together, different knowledge from different datasets could be learned jointly, and there are high chances to promote the performance of each dataset. At the same time, we further attempt to prevent data leakage when unifying multiple datasets which, arguably, is more useful in an industry setting. To this end, we propose a federated learning framework, which could unify various types of datasets as well as tasks to learn and fuse various types of knowledge, i.e., text representations, from different datasets and tasks, without the sharing of downstream task data. The fused text representations merge useful features from different SLU datasets and tasks and are thus much more powerful than the original text representations alone in individual tasks. At last, in order to provide multi-granularity text representations for our framework, we propose a novel Multi-view Encoder (MV-Encoder) as the backbone of our federated learning framework. The extensive experiments on two SLU benchmark datasets consisting of two tasks (i.e., intent detection and slot filling), and three federated learning settings including horizontal federated learning, vertical federated learning, and federated transfer learning, demonstrate the effectiveness and the universality of our approach. Specifically, we are able to get 1.53% improvement on the intent detection metric accuracy. And we could also boost the performance of a strong baseline by up to 5.29% on the slot filling metric F1. Furthermore, by leveraging BERT as an additional encoder, we establish new state-of-the-art results on SNIPS and ATIS datasets, where we get 99.33\% and 98.28% in terms of accuracy on intent detection task as well as 97.20% and 96.41% in terms of F1 score on slot filling task, respectively.
Few-shot Pseudo-Labeling for Intent Detection
Thomas Dopierre, Christophe Gravier, Julien Subercaze and Wilfried Logerais
In this paper, we introduce a state-of-the-art pseudo-labeling technique for few-shot intent detection. We devise a folding/unfolding hierarchical clustering algorithm which assigns weighted pseudo-labels to unlabeled user utterances. We show that our two-step method yields significant improvement over existing solutions. This performance is achieved on multiple intent detection datasets, even in more challenging situations where the number of classes is large or when the dataset is highly imbalanced. Moreover, we confirm this results on the more general text classification task. We also demonstrate that our approach nicely complements existing solutions, thereby providing an even stronger state-of-the-art ensemble method.
Few-Shot Text Classification with Edge-Labeling Graph Neural Network-Based Prototypical Network
Chen Lyu, Weijie Liu, Meng Ma and Ping Wang
In this paper, we propose a new few-shot text classification method. Compared with supervised learning methods which require a large corpus of labeled documents, our method aims to make it possible to classify unlabeled text with few labeled data. To achieve this goal, we take advantage of advanced pre-trained language model to extract the semantic features of each document. Furthermore, we utilize an edge-labeling graph neural network to implicitly models the intra-cluster similarity and the inter-cluster dissimilarity of the documents. Finally, we take the results of the graph neural network as the input of a prototypical network to classify the unlabeled texts. We verify the effectiveness of our method on a sentiment analysis dataset and a relation classification dataset and achieve the state-of-the-art performance on both tasks.
Filtering Back-Translated Data in Unsupervised Neural Machine Translation
Jyotsana Khatri and Pushpak Bhattacharyya
Back-translation is an important part of unsupervised neural machine translation (NMT) where only monolingual data is utilized for training. The quality of back-translated data plays an important role in the performance of NMT systems. In back-translation all generated pseudo parallel sentence pairs are not of the same quality. Taking inspiration from domain adaptation where in-domain sentences are given more weight in training, in this paper we propose an approach to filter back-translated data as part of the training process of unsupervised NMT. Our approach gives more weight to good pseudo parallel sentence pairs in the back-translation phase. We calculate weight of pseudo parallel sentence pairs using sentence-wise round-trip BLEU score which is normalized batch-wise. We show results of our approach with current state of the art approaches for unsupervised NMT.
Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering
Wei Han, Hantao Huang and Tao Han
Image text carries essential information to understand the scene and perform reasoning. Text-based visual question answering (text VQA) task focuses on visual questions that require reading text in images. Existing text VQA systems generate an answer by selecting from optical character recognition (OCR) texts or a fixed vocabulary. Positional information of text is underused and there is a lack of evidence for the generated answer. As such, this paper proposes a localization-aware answer prediction network (LaAP-Net) to address this challenge. Our LaAP-Net not only generates the answer to the question but also predicts a bounding box as evidence of the generated answer. Moreover, a context-enriched OCR representation (COR) for multimodal fusion is proposed to facilitate the localization task. Our proposed LaAP-Net outperforms existing approaches on three benchmark datasets for the text VQA task by a noticeable margin.
Fine-grained Information Status Classification Using Discourse Context-Aware BERT
Yufang Hou
Previous work on bridging anaphora recognition (Hou et al., 2013) casts the problem as a subtask of learning fine-grained information status (IS). However, these systems heavily depend on many hand-crafted linguistic features. In this paper, we propose a simple discourse context-aware BERT model for fine-grained IS classification. On the ISNotes corpus (Markert et al., 2012), our model achieves new state-of-the-art performances on fine-grained IS classification, obtaining a 4.8% absolute overall accuracy improvement compared to Hou et al. (2013). More importantly, we also show an improvement of 10.5% F1 for bridging anaphora recognition without using any complex hand-crafted semantic features designed for capturing the bridging phenomenon. We further analyze the trained model and find that the most attended signals for each IS category correspond well to linguistic notions of information status.
Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning
Daniel Grießhaber, Johannes Maucher and Ngoc Thang Vu
Recently, leveraging pre-trained Transformer based language models in down stream, task specific models has advanced state of the art results in natural language understanding tasks. However, only a little research has explored the suitability of this approach in low resource settings with less than 1,000 training data points. In this work, we explore fine-tuning methods of BERT - a pre-trained Transformer based language model - by utilizing pool-based active learning to speed up training while keeping the cost of labeling new data constant. Our experimental results on the GLUE data set show an advantage in model performance by maximizing the approximate knowledge gain of the model when querying from the pool of unlabeled data. Finally, we demonstrate and analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters, making it more suitable for low-resource settings.
Flight of the PEGASUS? Comparing Transformers on Few-shot and Zero-shot Multi-document Abstractive Summarization
Travis Goodwin, Max Savery and Dina Demner-Fushman
Recent work has shown that pre-trained Transformers obtain remarkable performance on many natural language processing tasks including automatic summarization. However, most work has focused on (relatively) data-rich single-document summarization settings. In this paper, we explore highly-abstractive multi-document summarization where the summary is explicitly conditioned on a user-given topic statement or question. We compare the summarization quality produced by three state-of-the-art transformer-based models: BART, T5, and PEGASUS. We report the performance on four challenging summarization datasets: three from the general domain and one from consumer health in both zero-shot and few-shot learning settings. While prior work has shown significant differences in performance for these models on standard summarization tasks, our results indicate that with as few as 10 labeled examples there is no statistically significant difference in summary quality, suggesting the need for more abstractive benchmark collections when determining state-of-the-art.
ForceReader: a BERT-based Interactive Machine Reading Comprehension Model with Attention Separation
zheng chen and kangjian wu
The release of BERT revolutionized the development of NLP. Various BERT-based reading comprehension models have been proposed, thus updating the performance ranking of reading comprehension tasks. However, the above BERT-based models inherently employ BERT’s combined input method, representing the input question and paragraph as a single packed sequence, without further modification for reading comprehension. This paper makes an in-depth analysis of this input method, proposes a problem of this approach. We call it attention deconcentration. Accordingly, this paper proposes ForceReader, a BERT-based interactive machine reading comprehension model. First, ForceReader proposes a novel solution called the Attention Separation Representation to respond to attention deconcentration. Moreover, starting from the logical nature of reading comprehension tasks, ForceReader adopts Multi-mode Reading and Interactive Reasoning strategy. For the calculation of attention, ForceReader employs Conditional Background Attention to solve the lack of the overall context semantic after the separation of attention. As an integral model, ForceReader shows a significant improvement in reading comprehension tasks compared to BERT. Moreover, this paper makes detailed visual analyses of the attention and propose strategies accordingly. This may be another argument to the explanations of the attention.
Formality Style Transfer with Shared Latent Space
Yunli Wang, Yu Wu, Lili Mou, Zhoujun Li and WenHan Chao
Conventional approaches for formality style transfer borrow models from neural machine translation, which typically requires massive parallel data for training. However, the dataset for formality style transfer is considerably smaller than translation corpora. On the other hand, we observe that informal and formal sentences closely resemble each other, which is different from the translation domain where two languages have different vocabularies and grammars. Therefore, we propose a new approach, Sequence-to-Sequence with Shared Latent Space (S2S-SLS), for formality style transfer, where we propose two auxiliary losses and adopt jointly training of bidirectional transfer as well as auto-encoding training. Experimental results show that S2SSLS is significantly better than the baselines in the data limited scenario and consistently outperforms baselines in various data settings. It can also be adapted to different neural architectures.
Free the Plural: Unrestricted Split-Antecedent Anaphora Resolution
Juntao Yu, Nafise Sadat Moosavi, Silviu Paun and Massimo Poesio
Now that the performance of coreference resolvers on the simpler forms of anaphoric reference has greatly improved, more attention is devoted to more complex aspects of anaphora. One limitation of virtually all coreference resolution models is the focus on single-antecedent anaphors. Plural anaphors with multiple antecedents–so-called split-antecedent anaphors (as in John met Mary. They went to the movies)–have not been widely studied in NLP, because they are not annotated in ONTONOTES and are relatively infrequent in other corpora. We introduce the first model for unrestricted resolution of split-antecedent anaphors. We start with a strong baseline enhanced by BERT embeddings, and show that its performance can be substantially improved by addressing the sparsity issue. We experiment with auxiliary corpora where split-antecedent anaphors were annotated by the crowd, and with transfer learning models using as auxiliary tasks element-of bridging references and single-antecedent coreference. Evaluation on the gold annotated ARRAU corpus shows that the best results are obtained using a combination of three auxiliary corpora; scores of 70% and 43.6% were obtained when evaluated in a lenient and strict setting respectively. This is an 11% and 21% gain when compared with our strong baseline.
French Biomedical Text Simplification: When Small and Precise Helps
Rémi Cardon and Natalia Grabar
We present experiments on biomedical text simplification in French. We use two kinds of corpora (parallel sentences extracted from existing health comparable corpora in French and WikiLarge corpus translated from English to French) and lexicon that associates medical terms with their paraphrases. Then, we train neural models on these parallel corpora using different ratios of general and specialized sentences. We evaluate the results with BLEU, SARI and FKGL scores. The results point out that little specialized data helps significantly the simplification.
From Sentiment Annotations to Sentiment Prediction through Discourse Augmentation
Patrick Huber and Giuseppe Carenini
Sentiment analysis, especially for long documents, plausibly requires methods capturing complex linguistics structures. To accommodate this, we propose a novel framework to exploit domain-related discourse for the task of sentiment analysis. More specifically, we are combining the large-scale, sentiment-dependent MEGA-DT treebank with a novel neural architecture for sentiment prediction, based on a hybrid TreeLSTM hierarchical attention model. Experiments show that our framework using sentiment-related discourse augmentations for sentiment prediction enhances the overall performance for long documents, even beyond previous approaches using well-established discourse parsers based on human annotations. We show that a simple ensemble approach can further enhance performance by selectively using discourse depending on the document length.
Generalized Shortest-Paths Encoders for AMR-to-Text Generation
Lisa Jin and Daniel Gildea
For text generation from semantic graphs, past neural models encoded input structure via gated convolutions along graph edges. Although these operations provide local context, the distance messages can travel is bounded by the number of encoder propagation steps. We adopt recent efforts of applying Transformer self-attention to graphs to allow global feature propagation. Instead of feeding shortest paths to the vertex self-attention module, we train a model to learn them using generalized shortest-paths algorithms. This approach widens the receptive field of a graph encoder by exposing it to all possible graph paths. We explore how this path diversity affects performance across levels of AMR connectivity, demonstrating gains on AMRs of higher reentrancy counts and diameters. Analysis of generated sentences also supports high semantic coherence of our models for reentrant AMRs. Our best model achieves a 1.4 BLEU and 1.8 chrF++ margin over a baseline that encodes only pairwise-unique shortest paths.
Generating Diverse Corrections with Local Beam Search for Grammatical Error Correction
Kengo Hotate, Masahiro Kaneko and Mamoru Komachi
We propose a beam search method to obtain diverse outputs in a local sequence transduction task where most of the tokens in the source and target sentences are overlaps such as Grammatical Error Correction (GEC). In GEC, it is desirable to rewrite only the local sequences that need to be rewritten, while leaving the other correct sequence parts unchanged. On the other hand, existing methods of acquiring various outputs focus on revising all tokens of the sentence. Therefore, existing methods either may generate ungrammatical sentences because they force the entire sentence to be changed, or may produce non-diversified sentences by weakening the constraints to avoid generating ungrammatical sentences. In light of this, we do not rewrite all of the tokens in the text, but only those parts that need to be diversely corrected. Our beam search method adjusts the search token in the beam according to the probability that the prediction is copied from the source sentence. Experimental results show that our proposed method generates more diverse corrections than existing methods without losing accuracy in the GEC task.
Generating Equation by Utilizing Operators : GEO model
Kyung Seo Ki, Donggeon Lee, Bugeun Kim and Gahgene Gweon
Math word problem solving is an emerging research topic in Natural Language Processing. Re-cently, neural models have been studied to solve word problems by applying the encoder-decoderarchitecture, which is mainly used in machine translation tasks. Nevertheless, neural modelsbased on these generation methods have not yet reached a satisfactory level of performance tobuild equations from given word problems without adding hand-crafted features. The GEO (gen-eration of equations by utilizing operators) model addresses two issues present in current existingneural models; 1. missing domain-specific knowledge features and 2. losing encoder-level knowl-edge. To address missing domain-specific feature issue, we designed two auxiliary tasks; groupdifference prediction and attribute prediction. To address losing ender-level knowledge issue, weadded Operation Feature Feed Forward(OP3F) layer. Experimental results showed that the GEOmodel outperformed existing state-of-the-art models for two datasets, 85.1% in MAWPS, and60% in DRAW-1K, and reached comparable performance of 82.2% in ALG514.
Generating Instructions at Different Levels of Abstraction
Arne Köhn, Julia Wichlacz, Álvaro Torralba, Daniel Höller, Jörg Hoffmann and Alexander Koller
When generating technical instructions, it is often convenient to describe complex objects in the world at different levels of abstraction. A novice user might need an object explained piece by piece, while for an expert, talking about the complex object (e.g. a wall or railing) directly may be more succinct and efficient. We show how to generate building instructions at different levels of abstraction in Minecraft. We introduce the use of hierarchical planning to this end, a method from AI planning which can capture the structure of complex objects neatly. A crowdsourcing evaluation shows that the choice of abstraction level matters to users, and that an abstraction strategy which balances low-level and high-level object descriptions compares favorably to ones which don't.
Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification
Linyi Yang, Eoin Kenny, Tin Lok James Ng, Yi Yang, Barry Smyth and Ruihai Dong
Corporate mergers and acquisitions (M&A) account for billions of dollars of investment globally every year and offer an interesting and challenging domain for artificial intelligence. However, in these highly sensitive domains, it is crucial to not only have a highly robust/accurate model, but be able to generate useful explanations to garner a user's trust in the automated system. Regrettably, the recent research regarding eXplainable AI (XAI) in financial text classification has received little to no attention, and many current methods for generating textual-based explanations result in highly implausible explanations, which damage a user's trust in the system. To address these issues, this paper proposes a novel methodology for producing plausible counterfactual explanations, whilst exploring the regularization benefits of adversarial training on language models in the domain of FinTech. Exhaustive quantitative experiments demonstrate that not only does this approach improve the model accuracy when compared to the current state-of-the-art and human performance, but it also generates counterfactual explanations which are significantly more plausible based on human trials.
GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation
Zhijing Jin, Qipeng Guo, Xipeng Qiu and Zheng Zhang
Data collection for the knowledge graph-to-text generation is expensive. As a result, research on unsupervised models has emerged as an active field recently. However, most unsupervised models have to use non-parallel versions of existing small supervised datasets, which largely constrain their potential. In this paper, we propose a large-scale, general-domain dataset, GenWiki. Our unsupervised dataset has 1.3M text and graph examples, respectively. With a human-annotated test set, we provide this new benchmark dataset for future research on unsupervised text generation from knowledge graphs.
Geo-Aware Image Caption Generation
Sofia Nikiforova, Tejaswini Deoskar, Denis Paperno and Yoad Winter
Standard image caption generation systems produce generic descriptions of images and do not utilize any contextual information or world knowledge. In particular, they are unable to generate captions that contain references to the geographic context of an image, for example, the location where a photograph is taken or relevant geographic objects around an image location. In this paper, we develop a geo-aware image caption generation system, which incorporates geographic contextual information into a standard image captioning pipeline. We propose a way to build an image-specific representation of the geographic context and adapt the caption generation network to produce appropriate geographic names in the image descriptions. We evaluate our system on a novel captioning dataset that contains contextualized captions and geographic metadata and achieve substantial improvements in BLEU, ROUGE, METEOR and CIDEr scores. We also introduce a new metric to assess generated geographic references directly and empirically demonstrate our system's ability to produce captions with relevant and factually accurate geographic referencing.
German's Next Language Model
Branden Chan, Stefan Schweter and Timo Möller
In this work we present the experiments which lead to the creation of our BERT and ELECTRA based German language models, GBERT and GELECTRA. By varying the input training data, model size, and the presence of Whole Word Masking (WWM) we were able to attain SoTA performance across a set of document classification and named entity recognition (NER) tasks for both models of base and large size. We adopt an evaluation driven approach in training these models and our results indicate that both adding more data and utilizing WWM improve model performance. By benchmarking against existing German models, we show that these models are the best German models to date. All trained models will be made publicly available to the research community.
Global Context-enhanced Graph Convolutional Networks for Document-level Relation Extraction
Huiwei Zhou, Yibin Xu, Zhe Liu, Weihong Yao and Chengkun Lang
Document-level Relation Extraction (RE) is particularly challenging due to complex se-mantic interactions among multiple entities in the document. Among exiting approaches, Graph Convolutional Networks (GCN) is one of the most effective approaches for doc-ument-level RE. However, traditional GCN only takes word nodes and adjacency matrix to represent graphs. In this paper, we propose Global Context-enhanced Graph Convolu-tional Networks (GCGCN), a novel model which is composed of entities as nodes and context of entity pairs as edges between nodes to capture rich global context information. Two hierarchical identical blocks, Context-aware Attention Guided Graph Convolution (CAGGC) for partially-connected graphs and Multi-head Attention Guided Graph Con-volution (MAGGC) for fully-connected graphs, could take progressively more global context into account. Meantime, we leverage large-scale distantly supervised data to pre-train GCGCN models with curriculum learning, which are then fine-tuned on the human-annotated data for further improving document-level RE performance. The experimental results on DocRED show that our model could capture complex semantic interactions across all entities in the document, leading to a new state-of-the-art result.
GPolS: A Contextual Graph-Based Language Model for Analyzing Parliamentary Debates and Political Cohesion
Ramit Sawhney, Arnav Wadhwa, Shivam Agarwal and Rajiv Ratn Shah
Parliamentary debates present a valuable language resource for analyzing comprehensive options in electing representatives under a functional, free society. However, the esoteric nature of political speech coupled with non-linguistic aspects such as political cohesion between party members presents a complex and underexplored task of contextual parliamentary debate analysis. We introduce GPolS, a neural model for political speech sentiment analysis jointly exploiting both semantic language representations and relations between debate transcripts, motions, and political party members. Through experiments on real-world English data and by visualizing attention, we provide a use case of GPolS as a tool for political speech analysis and polarity prediction.
GPT-based Few-shot Table-to-Text Generation with Table Structure Reconstruction and Content Matching
Heng Gong, Yawei Sun, Xiaocheng Feng, Bing Qin, Wei Bi, Xiaojiang Liu and Ting Liu
Although neural table-to-text models have achieved remarkable progress with the help of large-scale datasets, they suffer insufficient learning problem with limited training data. Recently, pre-trained language models show potential in few-shot learning with linguistic knowledge learnt from pretraining on large-scale corpus. However, benefiting table-to-text generation in few-shot setting with the powerful pretrained language model faces three challenges, including (1) the gap between the task's structured input and the natural language input for pretraining language model. (2) The lack of modeling for table structure and (3) improving text fidelity with less incorrect expressions that are contradicting to the table. To address aforementioned problems, we propose TableGPT for table-to-text generation. At first, we utilize table transformation module with template to rewrite structured table in natural language as input for GPT-2. In addition, we exploit multi-task learning with two auxiliary tasks that preserve table's structural information by reconstructing the structure from GPT-2's representation and improving the text's fidelity with content matching task aligning the table and information in the generated text. By experimenting on Humans, Songs and Books, three few-shot table-to-text datasets in different domains, our model outperforms existing systems on most few-shot settings.
Grammatical error detection in transcriptions of spoken English
Andrew Caines, Christian Bentz, Kate Knill, Marek Rei and Paula Buttery
We describe the collection of transcription corrections and grammatical error annotations for the CrowdED Corpus of spoken English monologues on business topics. The corpus recordings were crowdsourced from native speakers of English and learners of English with German as their first language. The new transcriptions and annotations are obtained from different crowdworkers: we analyse the 1108 new crowdworker submissions and propose that they can be used for automatic transcription post-editing and grammatical error correction for speech. To further explore the data we train grammatical error detection models with various configurations including pre-trained and contextual word representations as input, additional features and auxiliary objectives, and extra training data from written error-annotated corpora. We find that a model concatenating pre-trained and contextual word representations as input performs best, and that additional information does not lead to further performance gains.
Graph Convolution over Multiple Dependency Sub-graphs for Relation Extraction
Angrosh Mandya, Danushka Bollegala and Frans Coenen
We propose a contextualised graph convolution network over multiple dependency-based sub-graphs for relation extraction. A novel method to construct multiple sub-graphs using words in shortest dependency path and words linked to entities in the dependency parse is proposed. Graph convolution operation is performed over the resulting multiple sub-graphs to obtain more informative features useful for relation extraction. Our experimental results show that the proposed method achieves superior performance over the existing GCN-based models achieving state-of-the-art performance on cross-sentence n-ary relation extraction dataset and SemEval 2010 Task 8 sentence-level relation extraction dataset. Our model also achieves a comparable performance to the SoTA on the TACRED dataset.
Graph Enhanced Dual Attention Network for Document-Level Relation Extraction
Bo Li, Wei Ye, Zhonghao Sheng, Rui Xie, Xiangyu Xi and Shikun Zhang
Document-level relation extraction requires inter-sentence reasoning capabilities to capture local and global contextual information for multiple relational facts. To improve inter-sentence reasoning, we propose to characterize the complex interaction between sentences and potential relation instances via a Graph Enhanced Dual Attention network (GEDA). In GEDA, sentence representation generated by the sentence-to-relation (S2R) attention is refined and synthesized by a Heterogeneous Graph Convolutional Network before being fed into the relation-to-sentence (R2S) attention . We further design a simple yet effective regularizer based on the natural duality of the S2R and R2S attention, whose weights are also supervised by the supporting evidence of relation instances during training. An extensive set of experiments on an existing large-scale dataset show that our model achieve competitive performance, especially for the inter-sentence relation extraction, while the neural predictions can also be interpretable and easily observed.
Graph-Based Co-reference and Relation Knowledge Integration for Question Answering over Dialogue
Jian Liu, Dianbo Sui, Kang Liu and Jun Zhao
Question answering over dialogue, a specialized machine reading comprehension task, aims to comprehend a dialogue and to answer specific questions. Despite many advances, existing approaches for this task did not consider dialogue structure and background knowledge (e.g., relationships between speakers). In this paper, we introduce a new approach for the task, featured by its novelty in structuring dialogue and integrating background knowledge for reasoning. Specifically, different from previous
Handling Anomalies of Synthetic Questions in Unsupervised Question Answering
Giwon Hong, Junmo Kang, Doyeon Lim and Sung-Hyon Myaeng
Advances in Question Answering (QA) research require additional datasets for new domains, languages, types of questions, as well as for performance increases. Human creation of a QA dataset like SQuAD, however, is expensive. As an alternative, unsupervised QA approach has been proposed so that QA training data can be generated automatically. However, their QA performance is much lower than that of supervised QA models. We identify two anomalies in the generated questions and propose methods for mitigating them. We show that our approach significantly improves the performance of unsupervised QA across a number of QA tasks.
Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages
Diptesh Kanojia, Raj Dabre, Shubham Dewangan, Pushpak Bhattacharyya, Gholamreza Haffari and Malhar Kulkarni
Cognates are variants of the same lexical form across different languages; for example "fonema" in Spanish and "phoneme" in English are cognates, both of which mean "a unit of sound". The task of automatic detection of cognates among any two languages can help downstream NLP tasks such as Cross-lingual Information Retrieval, Computational Phylogenetics, and Machine Translation. In this paper, we demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian Languages. Our approach introduces the use of context from a knowledge graph to generate improved feature representations for cognate detection. We, then, evaluate the impact of our cognate detection mechanism on neural machine translation (NMT), as a downstream task. We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages, namely, Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam. Additionally, we create evaluation datasets for two more Indian languages, Konkani and Nepali. We observe an improvement of up to 18\% points, in terms of F-score, for cognate detection. Furthermore, we observe that cognates extracted using our method help improve NMT quality by up to 2.76 BLEU. We also release our code, newly constructed datasets and cross-lingual models publicly.
HateGAN: Adversarial Generative-Based Data Augmentation for Hate Speech Detection
RUI CAO and Roy Ka-Wei Lee
Academia and industry have developed machine learning and natural language processing models to detect online hate speech automatically. However, most of these existing methods adopt a supervised approach that heavily depends on labeled datasets for training. This results in the methods’ poor detection performance of the hate speech class as the training datasets are highly imbalanced. In this paper, we propose HateGAN, a deep generative reinforcement learning model, which addresses the challenge of imbalance class by augmenting the dataset with hateful tweets. We conduct extensive experiments to augment two commonly-used hate speech detection datasets with the HateGAN generated tweets. Our experiment results show that HateGAN improves the detection performance of the hate speech class regardless of the classifiers and datasets used in the detection task. Specifically, we observe an average 5% improvement for the hate class F1 scores across all state-of-the-art hate speech classifiers. We also conduct case studies to empirically examine the HateGAN generated hate speeches and show that the generated tweets are diverse, coherent, and relevant to hate speech detection.
Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity
Hamza Harkous, Isabel Groves and Amir Saffari
End-to-end neural data-to-text (D2T) generation has recently emerged as an alternative to pipeline-based architectures. However, it has faced challenges generalizing to new domains and generating semantically consistent text. In this work, we present DataTuner, a neural, end-to-end data-to-text generation system that makes minimal assumptions about the data representation and target domain. We take a two-stage generation-reranking approach, combining a fine-tuned language model with a semantic fidelity classifier. Each component is learnt end-toe-nd without needing dataset-specific heuristics, entity delexicalization, or post-processing. We show that DataTuner achieves state of the art results on automated metrics across four major D2T datasets (LDC2017T10, WebNLG, ViGGO, and Cleaned E2E), with fluency assessed by human annotators as nearing or exceeding the human-written reference texts. Our generated text has better semantic fidelity than the state of the art on these datasets. We further demonstrate that our model-based semantic fidelity scorer is a better assessment tool compared to traditional heuristic-based measures of semantic accuracy.
Heterogeneous Graph Neural Networks to Predict What Happen Next
Jianming Zheng, Fei Cai, Yanxiang Ling and Honghui Chen
Given an incomplete event chain, script learning aims to predict the missing event, which can support a series of NLP applications. Existing work cannot well represent the heterogeneous relations and capture the discontinuous event segments that are common in the event chain. To address these issues, we introduce a heterogeneous-event (HeterEvent) graph network. In particular, we employ each unique word and individual event as nodes in the graph, and explore three kinds of edges based on realistic relations (e.g., the relations of word-and-word, word-and-event, event-and-event). We also design a message passing process to realize information interactions among homo or heterogeneous nodes. And the discontinuous event segments could be explicitly modeled by finding the specific path between corresponding nodes in the graph. The experimental results on one-step and multi-step inference tasks demonstrate that our ensemble model HeterEvent[W+E] can outperform existing baselines.
Heterogeneous Recycle Generation for Chinese Grammatical Error Correction
Charles Hinson, Hen-Hsen Huang and Hsin-Hsi Chen
Most recent works in the field of grammatical error correction (GEC) rely on neural machine translation-based models. Although these models boast impressive performance, they require a massive amount of data to properly train. Furthermore, NMT-based systems treat GEC purely as a translation task and overlook the editing aspect of it. In this work we propose a heterogeneous approach to Chinese GEC, composed of a NMT-based model, a sequence editing model, and a spell checker. Our methodology not only achieves a new state-of-the-art performance for Chinese GEC, but also does so without relying on data augmentation or GEC-specific architecture changes. We further experiment with all possible configurations of our system with respect to model composition order and number of rounds of correction. A detailed analysis of each model and their contributions to the correction process is performed by adapting the ERRANT scorer to be able to score Chinese sentences.
Hidden Message Extraction: A Task Challenge and a Corpus
Gerardo Ocampo Diaz and Vincent Ng
In this challenge paper, we propose a new task, Hidden Message Extraction. This task requires the ability to read between the lines. As a first step in demonstrating the feasibility of this challenge, we design guidelines for annotating hidden messages, provide a corpus of 400 article annotations taken from the recent SemEval 2019 Hyperpartisan News Detection task, and discuss possible directions to start work on this task.
Hierarchical Bi-Directional Self-Attention Networks for Paper Review Rating Recommendation
Zhongfen Deng, Hao Peng, Congying Xia, Jianxin Li, Lifang He and Philip Yu
Review rating prediction of text reviews is a rapidly growing technology with a wide range of applications in natural language processing. However, most existing methods either use hand-crafted features or learn features using deep learning with simple text corpus as input for review rating prediction, ignoring the hierarchies among data. In this paper, we propose a hierarchical bi-directional self-attention network framework (HabNet) for paper review rating prediction and recommendation, which can serve as an effective decision-making tool for the academic paper review process. Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three). Each encoder first derives contextual representation of each level, then generates a higher-level representation, and after the learning process, we are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers. Furthermore, we introduce two new metrics to evaluate models in data imbalance situations. Extensive experiments on a publicly available dataset (PeerRead) and our own collected dataset (OpenReview) demonstrate the superiority of the proposed approach compared with state-of-the-art methods.
Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism
Shirong Shen, Guilin Qi, Zhen Li, Sheng Bi and Lusheng Wang
Event extraction plays an important role in legal applications including case push and auxiliary judgment. However traditional event structure cannot express the connections between arguments which are extremely important in legal events. Therefore, this paper defines a dynamic event structure for Chinese legal events. To distinguish between similar events, we design hierarchical event features for event detection. Moreover, to address the problem of long-distance semantic dependence and anaphora resolution in argument classification, we propose a novel pedal attention mechanism to extract the semantic relaton between two words through their dependent adjacent words. We labeled a Chinese legal event dataset and evaluates our model on it. Experimental results demonstrate that our model can surpass other state-of-the-art models.
Hierarchical Text Segmentation for Medieval Manuscripts
Amir Hazem, Beatrice Daille, Dominique Stutzmann and Christopher Kermorvan
Text segmentation is an important prerequisite for document navigation and structure understanding. Previous works have been mainly concerned with narrative, expository or user dialogue texts, under the strong hypothesis of topical shifts to perform text segmentation. In this paper, we address books of hours, Latin devotional manuscripts of the late Middle Ages that exhibit challenging issues: a complex hierarchical entangled structure, variable content, noisy transcriptions with no sentence markers, and strong correlations between sections for which topical information is no longer sufficient to draw segmentation boundaries. We show that the main state-of-the-art segmentation methods are either inefficient or inapplicable for books of hours and propose a bottom-up greedy segmentation approach that achieves significant results.
Hierarchical Trivia Fact Extraction from Wikipedia Articles
Jingun Kwon, Hidetaka Kamigaito, Young-In Song and Manabu Okumura
Recently, automatic trivia fact extraction has attracted much research interest. Modern search engines have begun to provide trivia facts as the information for entities because they can motivate more user engagement. In this paper, we propose a new unsupervised algorithm that automatically mines trivia facts for a given entity. Unlike previous studies, the proposed algorithm targets at a single Wikipedia article and leverages its hierarchical structure via top-down processing. Thus, the proposed algorithm offers two distinctive advantages: it does not incur high computation time, and it provides a domain-independent approach for extracting trivia facts. Experimental results demonstrate that the proposed algorithm is over 100 times faster than the existing method which considers Wikipedia categories. Human evaluation demonstrates that the proposed algorithm can mine better trivia facts regardless of the target entity domain and outperforms the existing methods.
HiTrans: A Transformer-Based Context- and Speaker-Sensitive Model for Emotion Detection in Conversations
Jingye Li, Donghong Ji, Fei Li, Meishan Zhang and Yijiang Liu
Emotion detection in conversations (EDC) is to detect the emotion for each utterance in conversations that have multiple speakers. Different from the traditional non-conversational emotion detection, the model for EDC should be context-sensitive (e.g., understanding the whole conversation rather than one utterance) and speaker-sensitive (e.g., understanding which utterance belongs to which speaker). In this paper, we propose a transformer-based context- and speaker-sensitive model for EDC, namely HiTrans, which consists of two hierarchical transformers. We utilize BERT as the low-level transformer to generate local utterance representations, and feed them into another high-level transformer so that utterance representations could be sensitive to the global context of the conversation. Moreover, we exploit an auxiliary task to make our model speaker-sensitive, called pairwise utterance speaker verification (PUSV), which aims to classify whether two utterances belong to the same speaker. We evaluate our model on three benchmark datasets, namely EmoryNLP, MELD and IEMOCAP. Results show that our model outperforms previous state-of-the-art models.
HOLMS: Alternative Summary Evaluation with Large Language Models
Yassine Mrabet and Dina Demner-Fushman
Efficient document summarization requires evaluation measures that can not only rank a set of systems based on an average score, but also highlight which individual summary is better than another.
Homonym normalisation by word sense clustering: a case in Japanese
Kevin Heffernan and Yo Sato
This work presents a method of word sense clustering that differentiates homonyms and merge homophones, taking Japanese as an example, where orthographical variation causes problem for language processing. It uses contextualised embeddings (BERT) to cluster tokens into distinct sense groups, and we use these groups to normalise synonymous instances to a single representative form. We see the benefit of this normalisation in language model, as well as in transliteration.
How coherent are neural models of coherence?
Leila Pishdad, Federico Fancellu, Ran Zhang and Afsaneh Fazly
Despite the recent advances in coherence modelling, most such models including state-of-the-art neural ones, are evaluated on either contrived proxy tasks such as the standard order discrimination benchmark, or tasks that require special expert annotation. Moreover, most evaluations are conducted on small newswire corpora. To address these shortcomings, in this paper we propose four generic evaluation tasks that draw on different aspects of coherence at both the lexical and document levels, and can be applied to any corpora. In designing these tasks, we aim at capturing coherence-specific properties, such as the correct use of discourse connectives, lexical cohesion, as well as the overall temporal and causal consistency among events and participants in a story. Importantly, our proposed tasks either rely on automatically-generated data, or data annotated for other purposes, hence alleviating the need for specialized annotation. We perform experiments with several existing state-of-the-art neural models of coherence on these tasks, across large corpora from different domains, including newswire, dialogue, as well as narrative and instructional text. Our findings point to a strong need for revisiting the common practices in the development and evaluation of coherence models.
How Domain Terminology Affects Meeting Summarization Performance
Jia Jin Koay, Alexander Roustai, Xiaojin Dai, Dillon Burns, Alec Kerrigan and Fei Liu
Meetings are essential to modern organizations. There is a vast number of meetings being held and recorded daily, more than can ever be comprehended. A meeting summarization system that identifies salient utterances from the transcripts to automatically generate meeting minutes can help. It empowers users to rapidly search and sift through large meeting archives. To date, the impact of domain terminology on the performance of meeting summarization remains understudied, despite meetings are rich with domain knowledge. In this paper, we create gold-standard annotations for domain terminology on a sizable meeting corpus; they are known as jargon terms. We then analyze the performance of a meeting summarization system with and without jargons. Our findings reveal that domain terminology can have a substantial impact on summarization performance. We will publicly release all domain terminology to advance research in meeting summarization.
How Far Does BERT Look At: Distance-based Clustering and Analysis of BERT's Attention
Yue Guan, Jingwen Leng, Chao Li, Quan Chen and Minyi Guo
Recent research on the multi-head attention mechanism, especially that in pre-trained models such as BERT, has shown us heuristics and clues in analyzing various aspects of the mechanism. As most of the research focus on probing tasks or hidden states, previous works have found some primitive patterns of attention head behavior by heuristic analytical methods, but a more systematic analysis specific on the attention patterns still remains primitive. In this work, we clearly cluster the attention heatmaps into significantly different patterns through unsupervised clustering on top of a set of proposed features, which corroborates with previous observations. We further study their corresponding functions through analytical study. In addition, our proposed features can be used to explain and calibrate different attention heads in Transformer models.
How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text
Chihiro Shibata, Kei Uchiumi and Daichi Mochihashi
Long Short-Term Memory recurrent neural network (LSTM) is widely used and known to capture informative long-term syntactic dependencies. However, how such information are reflected in its internal vectors for natural text has not yet been sufficiently investigated. We analyze them by learning a language model where syntactic structures are implicitly given. We empirically show that the context update vectors, i.e. outputs of internal gates, are approximately quantized to binary or ternary values to help the language model to count the depth of nesting accurately, as Suzgun et al. (2019) recently show for synthetic Dyck languages. For some dimensions in the context vector, we show that their activations are highly correlated with the depth of phrase structures, such as VP and NP. Moreover, with an L1 regularization, we also found that it can accurately predict whether a word is inside a phrase structure or not from a small number of components of the context vector. Even for the case of learning from raw text, context vectors are shown to still correlate well with the phrase structures. Finally, we show that natural clusters of the functional words and the part of speeches that trigger phrases are represented in a small but principal subspace of the context-update vector of LSTM.
How Positive Are You: Text Style Transfer using Adaptive Style Embedding
Heejin Kim and Kyung-Ah Sohn
The prevalent approach for unsupervised text style transfer is disentanglement between content and style. However, it is difficult to completely separate style information from the content. Other approaches allow the latent text representation to contain style and the target style to affect the generated output more than the latent representation does. In both approaches, however, it is impossible to adjust the strength of the style in the generated output. Moreover, those previous approaches typically perform both the sentence reconstruction and style control tasks in a single model, which complicates the overall architecture. In this paper, we address these issues by separating the model into a sentence reconstruction module and a style module. We use the Transformer-based autoencoder model for sentence reconstruction and the adaptive style embedding is learned directly in the style module. Because of this separation, each module can better focus on its own task. Moreover, we can vary the style strength of the generated sentence by changing the style of the embedding expression. Therefore, our approach not only controls the strength of the style, but also simplifies the model architecture. Experimental results show that our approach achieves better style transfer performance and content preservation than previous approaches.
How Relevant Are Selectional Preferences for Transformer-based Language Models?
Eleni Metheniti, Tim Van de Cruys and Nabil Hathout
Selectional preference is defined as the tendency of a predicate to favor particular arguments within a certain linguistic context, and likewise, reject others that result in conflicting or implausible meanings. The stellar success of contextual word embedding models such as BERT in NLP tasks has led many to question whether these models have learned linguistic information, but up till now, most research has focused on syntactic information. We investigate whether \textsc{Bert} contains information on the selectional preferences of words, by examining the probability it assigns to the dependent word given the presence of a head word in a sentence. We are using word pairs of head-dependent words in five different syntactic relations from the SP-10K corpus of selectional preference \cite{zhang2019SP10K}, in sentences from the ukWaC corpus, and we are calculating the correlation of the plausibility score (from SP-10K) and the model probabilities. Our results show that overall, there is no strong positive or negative correlation in any syntactic relation, but we do find that certain head words have a strong correlation.
Human or Neural Translation?
Shivendra Bhardwaj, David Alfonso Hermelo, Phillippe Langlais, Gabriel Bernier-Colborne, Cyril Goutte and Michel Simard
Deep neural models tremendously improved machine translation. In this context, we investigate whether distinguishing machine from human translations is still feasible. We trained and applied 18 classifiers under two settings: a monolingual task, in which the classifier only looks at the translation; and a bilingual task, in which the source text is also taken into consideration. We report on extensive experiments involving 4 neural MT systems (Google Translate, DeepL, as well as two systems we trained) and varying the domain of texts. We show that the bilingual task is the easiest one and that transfer-based deep-learning classifiers perform best, with mean accuracies around 85% in-domain and 75% out-of-domain .
Humans Meet Models on Object Naming: A New Dataset and Analysis
Carina Silberer, Sina Zarrieß, Matthijs Westera and Gemma Boleda
We release a verified version of an object naming dataset, ManyNames v2 (MN v2), that contains dozens of valid names per object for 25K images. We analyze issues in the data collection method originally employed, standard in Language & Vision (L&V), and find that the main source of noise in the data comes from simulating a naming context solely from an image with a target object marked with a bounding box, which causes subjects to sometimes disagree regarding which object is the target. We also find that both the degree of this uncertainty in the original data and the amount of true object naming variation in MN v2 differs substantially across object domains.
Hy-NLI: a Hybrid system for Natural Language Inference
Aikaterini-Lida Kalouli, Richard Crouch and Valeria de Paiva
Despite the advances in Natural Language Inference through the training of massive deep models, recent work has revealed the generalization difficulties of such models, which fail to perform on adversarial datasets with challenging linguistic phenomena. Such phenomena, however, can be handled well by symbolic systems. Thus, we propose Hy-NLI, a hybrid system that learns to identify an NLI pair as challenging or not. Based on this, it uses its symbolic or deep learning component, respectively, to make the final inference decision. We show how linguistically less complex cases are best solved by state-of-the-art models, like BERT and XLNet, while hard linguistic phenomena are best handled by our implemented symbolic component. Our thorough evaluation shows that our hybrid system achieves state-of-the-art performance across mainstream and adversarial datasets and opens the way for further research into the hybrid direction.
I Know What You Asked: Graph Path Learning using AMR for Commonsense Reasoning
Jungwoo Lim, Dongsuk Oh, Yoonna Jang, Kisu Yang and Heuiseok Lim
CommonsenseQA is a task in which a correct answer is predicted through commonsense reasoning with pre-defined knowledge. Most previous works have aimed to improve its performance with distributed representation without considering the process of predicting the answer from the semantic representation of the question. To shed light upon the semantic interpretation of the question, we propose the AMR-ConceptNet-Pruned (ACP) graph. The ACP graph is pruned from a full integrated graph encompassing Abstract Meaning Representation (AMR) graph generated from input questions and an external commonsense knowledge graph, ConceptNet (CN). Then the ACP graph is exploited to interpret the reasoning path as well as to predict the correct answer on the CommonsenseQA task. To demonstrate the interpretability and effectiveness of the graph, this paper presents the manner in which the commonsense reasoning process can be interpreted with the relations and concepts provided by the ACP graph. Moreover, ACP-based models are shown to outperform the baselines.
Identifying Annotator Bias: A new IRT-based method for bias identification
Jacopo Amidei, Paul Piwek and Alistair Willis
A basic step in any annotation effort is the measurement of the Inter Annotator Agreement (IAA). An important factor that can affect the IAA is the presence of annotator bias. In this paper we introduce a new interpretation and application of the Item Response Theory (IRT) to detect annotators’ bias. Our interpretation of IRT offers an original bias identification method that can be used to compare annotators’ bias and characterise annotation disagreement. Our method can be used to spot outlier annotators, improve annotation guidelines and provide a better picture of the annotation reliability. Additionally, because scales for IAA interpretation are not generally agreed upon, our bias identification method is valuable as a complement to the IAA value which can help with understanding the annotation disagreement.
Identifying Depressive Symptoms from Tweets: Figurative Language Enabled Multitask Learning Framework
Shweta Yadav, Jainish Chauhan, Joy Prakash Sain, Krishnaprasad Thirunarayan, Amit Sheth and Jeremiah Schumm
Existing studies on using social media for deriving mental health status of users focus on the depression detection task. However, for case management and referral to psychiatrists, health-care workers require practical and scalable depressive disorder screening and triage system. This study aims to design and evaluate a decision support system (DSS) to reliably determine the depressive triage level by capturing fine-grained depressive symptoms expressed in user tweets through the emulation of the Patient Health Questionnaire-9 (PHQ-9) that is routinely used in clinical practice. The reliable detection of depressive symptoms from tweets is challenging because the 280-character limit on tweets incentivizes the use of creative artifacts in the utterances and figurative usage contributes to effective expression. We propose a novel BERT based robust multi-task learning framework to accurately identify the depressive symptoms using the auxiliary task of figurative usage detection. Specifically, our proposed novel task sharing mechanism,co-task aware attention, enables automatic selection of optimal information across the BERT lay-ers and tasks by soft-sharing of parameters. Our results show that modeling figurative usage can demonstrably improve the model’s robustness and reliability for distinguishing the depression symptoms.
Identifying Motion Entities in Natural Language and A Case Study for Named Entity Recognition
Ngoc Phuoc An Vo, Irene Manotas, Vadim Sheinin and Octavian Popescu
Motion recognition is one of the basic cognitive capabilities of many life forms, however, detecting and understanding motion in text is not a trivial task. In addition, identifying motion entities in natural language is not only challenging but also beneficial for a better natural language understanding. In this paper, we present a Motion Entity Tagging model to identify entities in motion in a text, along with the Literal-Motion-in-Text (LiMiT) dataset used for training the model. We also present results showing that motion features, in particular, entity in motion benefits the Named-entity Recognition (NER) task. Finally, we present an analysis for the special co-occurrence relation between the person category in NER and animate entities in motion, which significantly improves the classification performance for the person category in NER.
Image Caption Generation for News Articles
Zhishen Yang and Naoaki Okazaki
In this paper, we address the task of news-image captioning, which generates a description of an image given the image and its article body as input. This task is more challenging than the conventional image captioning, because it requires a joint understanding of image and text. We present a Transformer model that integrates text and image modalities and attends to textual features from visual features in generating a caption. Experiments based on automatic evaluation metrics and human evaluation show that an article text provides primary information to reproduce news-image captions written by journalists. The results also demonstrate that the proposed model outperforms the state-of-the-art model. In addition, we also confirm that visual features contribute to improving the quality of news-image captions.
Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games
Alessandro Suglia, Antonio Vergari, Ioannis Konstas, Yonatan Bisk, Emanuele Bastianelli, Andrea Vanzo and Oliver Lemon
In visual guessing games, a Guesser has to identify a target object in a scene by asking questions to an Oracle. An effective strategy for the players is to learn conceptual representations of objects that are both discriminative and expressive enough to ask questions and guess correctly. However, as shown by Suglia et al. (2020), existing models fail to learn truly multi-modal representations, relying instead on gold category labels for objects in the scene both at training and inference time. This provides an unnatural performance advantage when categories at inference time match those at training time, and it causes these models to fail in more realistic “zero-shot” scenarios where out-of-domain object categories are involved. To overcome this issue, we introduce a novel “imagination” module based on Regularized Auto-Encoders, that learns context-aware and category-aware latent embedding without relying on category labels at inference time. Our imagination module outperforms state-of-the-art competitors by 8.26% gameplay accuracy in the CompGuessWhat?! zero-shot scenario (Suglia et al., 2020), it improves by 2.08% and 12.86% the Oracle and Guesser accuracy in the GuessWhat?! benchmark, when no gold categories are available at inference time, and it boosts reasoning about object properties and attributes.
Improving Abstractive Dialogue Summarization with Graph Structures and Topic Words
Lulu Zhao, Weiran Xu and Jun Guo
Recently, people have been beginning paying more attention to the abstractive dialogue summarization task. Since the information flows are exchanged between at least two interlocutors and key elements about a certain event are often spanned across multiple utterances, it is necessary for researchers to explore the inherent relations and structures of dialogue contents. However, the existing approaches often process the dialogue with sequence-based models, which are hard to capture long-distance inter-sentence relations. In this paper, we propose a Topic-word Guided Dialogue Graph Attention (TGDGA) network to model the dialogue as an interaction graph according to the topic word information. A masked graph self-attention mechanism is used to integrate cross-sentence information flows and focus more on the related utterances, which makes it better to understand the dialogue. Moreover, the topic word features are introduced to assist the decoding process. We evaluate our model on the SAMSum Corpus and Automobile Master Corpus. The experimental results show that our method outperforms most of the baselines.
Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning
Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Kyunghyun Cho, Eneko Agirre and Gorka Azkune
The interaction of conversational systems with users poses an exciting opportunity for improving them after deployment, but little evidence has been provided of its feasibility. In most applications, users are not able to provide the correct answer to the system, but they are able to provide binary (correct, incorrect) feedback. In this paper we propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback. We perform simulated experiments on document classification (for development) and Conversational Question Answering datasets like QuAC and DoQA, where binary user feedback is derived from gold annotations. The results show that our method is able to improve over the initial supervised system, getting close to a fully-supervised system that has access to the same labeled examples in in-domain experiments (QuAC), and even matching in out-of-domain experiments (DoQA). Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
Improving Document-Level Sentiment Analysis with User and Product Context
Chenyang Lyu, Jennifer Foster and Yvette Graham
Past work that improves document-level sentiment analysis by incorporating user and product in- formation has been limited to incorporation of information corresponding to the current review. We investigate incorporating additional review text available at the time of sentiment prediction that may prove meaningful for guiding prediction. Firstly, we incorporate all available histori- cal review text belonging to the author of the review in question. Secondly, we investigate the inclusion of historical reviews associated with the current product (written by other users). We achieve this by explicitly storing representations of reviews written by the same user and about the same product and force the model to memorize all reviews for one particular user and prod- uct. Additionally, we drop the hierarchical architecture used in previous work to enable words in the text to directly attend to each other. Experiment results on IMDB, Yelp 2013 and Yelp 2014 datasets show improvement to state-of-the-art of more than 2 percentage points in the best case.
Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation
Zhaohong Wan and Xiaojun Wan
The incorporation of data augmentation method in grammatical error correction task has attracted much attention. However, existing data augmentation methods mainly apply noise to tokens, which leads to the lack of diversity of generated errors. In view of this, we propose a new data augmentation method that can apply noise to the latent representation of a sentence.By editing the latent representations of grammatical sentences, we can generate synthetic samples with various error types. Combining with some pre-defined rules, our method can greatly improve the performance and robustness of existing grammatical error correction models. We evaluate our method on public benchmarks of GEC task and it achieves the state-of-the-art performance on CoNLL-2014 and FCE benchmarks.
Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution
David Q. Sun, Hadas Kotek, Christopher Klein, Mayank Gupta, William Li and Jason D. Williams
This paper develops and implements a scalable methodology for (a) estimating the noisiness of labels produced by a typical crowdsourcing semantic annotation task, and (b) reducing the resulting error of the labeling process by as much as 20-30% in comparison to other common labeling strategies. Importantly, this new approach to the labeling process, which we name Dynamic Automatic Conflict Resolution (DACR), does not require a ground truth dataset and is instead based on inter-project annotation inconsistencies. This makes DACR not only more accurate but also available to a broad range of labeling tasks. In what follows we present results from a text classification task performed at scale for a commercial personal assistant, and evaluate the inherent ambiguity uncovered by this annotation strategy as compared to other common labeling strategies.
Improving Long-Tail Relation Extraction with Collaborating Relation-Augmented Attention
Yang Li, Tao Shen, Guodong Long, Jing Jiang, Tianyi Zhou and Chengqi Zhang
Wrong labeling problem and long-tail relations are two main challenges caused by distant supervision in relation extraction. Recent works alleviate the wrong labeling by selective attention via multi-instance learning, but cannot well handle long-tail relations even if hierarchies of the relations are introduced to share knowledge. In this work, we propose a novel neural network, Collaborating Relation-augmented Attention (CoRA), to handle both the wrong labeling and long-tail relations. Particularly, we first propose relation-augmented attention network as base model. It operates on sentence bag with a sentence-to-relation attention to minimize the effect of wrong labeling. Then, facilitated by the proposed base model, we introduce collaborating relation features shared among relations in the hierarchies to promote the relation-augmenting process and balance the training data for long-tail relations. Besides the main training objective to predict the relation of a sentence bag, an auxiliary objective is utilized to guide the relation-augmenting process for a more accurate bag-level representation. In the experiments on the popular benchmark dataset NYT, the proposed CoRA improves the prior state-of-the-art performance by a large margin in terms of Precision@N, AUC and Hits@K. Further analyses verify its superior capability in handling long-tail relations in contrast to the competitors.
Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation
Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama and Eiichiro Sumita
In this study, linguistic knowledge at different levels are incorporated to improve the neural machine translation (NMT) performance for tasks with extremely limited data. Under the NMT framework, arbitrary features, no matter manually designed or automatically extracted, can be integrated. However, this study further addresses that checking the relevance of the features is crucial. Specifically, we propose two methods, 1) self relevance and 2) word-based relevance, to improve the representation of features for NMT. Experiments are conducted on translation tasks from English to eight Asian languages, with no more than twenty thousand sentences for training. The proposed methods improve all tasks by up to 3.09 BLEU points. Discussions with visualization provide the explainability of the proposed methods.
Improving Relation Extraction with Relational Paraphrase Sentences
Junjie Yu, Tong Zhu, Wenliang Chen, Wei Zhang and Min Zhang
Supervised models for Relation Extraction (RE) typically require human-annotated training data. Due to the limited size, the human-annotated data is usually incapable of covering diverse relation expressions, which could limit the performance of RE. To increase the coverage of relation expressions, we may enlarge the labeled data by hiring annotators or applying Distant Supervision (DS). However, the human-annotated data is costly and non-scalable while the distantly supervised data contains many noises. In this paper, we propose an alternative approach to improve RE systems via enriching diverse expressions by relational paraphrase sentences. Based on an existing labeled data, we first automatically build a task-specific paraphrase data. Then, we propose a novel model to learn the information of diverse relation expressions. In our model, we try to capture this information on the paraphrases via a joint learning framework. Finally, we conduct experiments on a widely used dataset and the experimental results show that our approach is effective to improve the performance on relation extraction, even compared with a strong baseline.
Improving Sentiment Analysis over non-English Tweets using Multilingual Transformers and Automatic Translation for Data-Augmentation
Valentin Barriere and Alexandra Balahur
Tweets are specific text data when compared to general text. Although sentiment analysis over tweets has become very popular in the last decade for English, it is still difficult to find huge annotated corpora for non-English languages. The recent rise of the transformer models in Natural Language Processing allows to achieve unparalleled performances in many tasks, but these models need a consequent quantity of text to adapt to the tweet domain. We propose the use of a multilingual transformer model, that we pre-train over English tweets and apply data-augmentation using automatic translation to adapt the model to non-English languages. Our experiments in French, Spanish, German and Italian suggest that the proposed technique is an efficient way to improve the results of the transformers over small corpora of tweets in a non-English language.
Improving Spoken Language Understanding by Wisdom of Crowds
Koichiro Yoshino, Kana Ikeuchi, Katsuhito Sudoh and Satoshi Nakamura
Spoken language understanding (SLU), which converts user requests in natural language to machine-interpretable expressions, is becoming an essential task. The lack of training data is an important problem, especially for new system tasks, because existing SLU systems are based on statistical approaches. In this paper, we proposed to use two ``wisdom of crowds,'' crowdsourcing and knowledge community website, for improving the SLU system. We firstly collected paraphrasing variations for new system tasks through crowdsourcing as seed data, and then augmented them using similar questions from a knowledge community website. We investigated the effects of the proposed data augmentation method, even with small seed data.
Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation
Ruizhe Li, Xiao Li, Guanyi Chen and Chenghua Lin
The Variational Autoencoder (VAE) is a popular and powerful model applied to text modelling to generate diverse sentences. However, an issue known as posterior collapse (or KL loss vanishing) happens when the VAE is used in text modelling, where the approximate posterior collapses to the prior, and the model will totally ignore the latent variables and be degraded to a plain language model during text generation. Such an issue is particularly prevalent when RNN-based VAE models are employed for text modelling. In this paper, we propose a simple, generic architecture called Timestep-Wise Regularisation VAE (TWR-VAE), which can effectively avoid posterior collapse and can be applied to any RNN-based VAE models. The effectiveness and versatility of our model are demonstrated in different tasks, including language modelling and dialogue response generation.
Improving Word Embeddings through Iterative Refinement of Word- and Character-level Models
Phong Ha, Shanshan Zhang, Nemanja Djuric and Slobodan Vucetic
Embedding of rare and out-of-vocabulary (OOV) words is an important open NLP problem. A popular solution is to train a character-level neural network to reproduce the embeddings from a standard word embedding model. The trained network is then used to assign vectors to any input string, including OOV and rare words. We enhance this approach and introduce an algorithm that iteratively refines and improves both word- and character-level models. We demonstrate that our method outperforms the existing algorithms on 5 word similarity data sets, and that it can be successfully applied to job title normalization, an important problem in the e-recruitment domain that suffers from the OOV problem.
Inconsistencies in Crowdsourced Slot-Filling Annotations: A Typology and Identification Methods
Stefan Larson, Adrian Cheung, Anish Mahendran, Kevin Leach and Jonathan K. Kummerfeld
Slot-filling models in task-driven dialog systems rely on carefully annotated training data. However, annotations by crowd workers are often inconsistent or contain errors. Simple solutions like manually checking annotations or having multiple workers label each sample are expensive and waste effort on samples that are correct. If we can identify inconsistencies, we can focus effort where it is needed. Toward this end, we define six inconsistency types in slot-filling annotations. Using three new noisy crowd-annotated datasets, we show that a wide range of inconsistencies occur and can impact system performance if not addressed. We then introduce automatic methods of identifying inconsistencies. Experiments on our new datasets show that these methods effectively reveal inconsistencies in data, though there is further scope for improvement.
Incorporating Inner-word and Out-word Features for Mongolian Morphological Segmenta-tion
Na Liu, Xiangdong Su, Haoran Zhang, Guanglai Gao and Feilong Bao
Mongolian morphological segmentation is regarded as a crucial preprocessing step in many Mongolian related NLP applications and has received extensive attention. Recent-ly, end-to-end segmentation approaches with long short-term memory networks (LSTM) have achieved excellent results. However, the inner-word features among characters in the word and the out-word features from context are not well utilized in the segmentation process. In this paper, we propose a neural network incorporating inner-word and out-word features for Mongolian morphological segmentation. The network consists of two encoders and one decoder. The inner-word encoder uses the self-attention mechanisms to capture the inner-word features of each Mongolian word. The out-word encoder em-ploys a two layers BiLSTM network to extract out-word features of the word in the sen-tence. Specifically, the decoder adopts a multi-head doubly attention layer to allow the inner-word features and out-word features to attend segmentation separately. The exper-iment explores the effectiveness of the above modules and shows that our approach achieves the best performance.
Incorporating Noisy Length Constraints into Transformer with Length-aware Positional Encodings
Yui Oka, Katsuki Chousa, Katsuhito Sudoh and Satoshi Nakamura
Neural Machine Translation often suffers from an under-translation problem due to its limited modeling of output sequence lengths. In this work, we propose a novel approach to training a Transformer model using length constraints based on length-aware positional encoding (PE). Since length constraints with exact target sentence lengths degrade translation performance, we add random noise within a certain window size to the length constraints in the PE during the training. In the inference step, we predict the output lengths using input sequences and a BERT-based length prediction model. Experimental results in an ASPEC English-to-Japanese translation showed the proposed method produced translations with lengths close to the reference ones and outperformed a vanilla Transformer (especially in short sentences) by 3.22 points in BLEU. The average translation results using our length prediction model were also better than another baseline method using input lengths for the length constraints. The proposed noise injection improved robustness for length prediction errors, especially within the window size.
Incorporating Syntax and Frame Semantics in Neural Network for Machine Reading Comprehension
Shaoru Guo, Yong Guan, Ru Li, Xiaoli Li and Hongye Tan
Machine reading comprehension (MRC) is one of the most critical yet challenging tasks in natural language understanding(NLU), where both syntax and semantics information of text are essential components for text understanding. It is surprising that jointly considering syntax and semantics in neural networks was never formally reported in literature. This paper makes the first attempt by proposing a novel Syntax and Frame Semantics model for Machine Reading Comprehension (SS-MRC), which takes full advantage of syntax and frame semantics to get richer text representation. Our extensive experimental results demonstrate that SS-MRC performs better than ten state-of-the-art technologies on machine reading comprehension task.
Increasing Learning Efficiency of Self-Attention Networks through Direct Position Interactions, Learnable Temperature, and Convoluted Attention
Philipp Dufter, Martin Schmitt and Hinrich Schütze
Self-Attention Networks (SANs) are an integral part of successful neural architectures such as Transformer (Vaswani et al., 2017), and thus of pretrained language models such as BERT (Devlin et al., 2018) or GPT-3 (Brown et al., 2020). Training SANs on a task or pretraining them on language modeling requires huge amounts of data and compute resources. In this paper we are searching for extensions of SANs which enable faster learning, i.e., higher accuracies after fewer update steps. We investigate three modifications to SANs to achieve more efficient learning: direct position interactions, learnable temperature, and convoluted attention. In our experiments on Part-of-Speech tagging on the Penn Treebank we find that each of the three modifications speeds up the learning process tremendously. On Universal Dependencies learnable temperature and convoluted attention increase the overall performance by up to 2 percentage points.
IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP
Fajri Koto, Afshin Rahimi, Jey Han Lau and Timothy Baldwin
Although the Indonesian language is spoken by almost 200 million people and the 10th most-spoken language in the world, it is under-represented in NLP research. Previous work on Indonesian has been hampered by a lack of annotated datasets, a sparsity of language resources, and a lack of resource standardisation. In this work, we release the IndoLEM dataset comprising seven tasks for the Indonesian language, spanning morpho-syntax, semantics, and discourse. We additionally release IndoBERT, a new pre-trained language model for Indonesian, and evaluate it over IndoLEM, in addition to benchmarking it against existing resources. Our experiments show that IndoBERT achieves state-of-the-art performance over most of the tasks in IndoLEM.
Inducing Domain-Specific Sentiment Lexicons From Labeled Documents
SM Mazharul Islam, Xin Dong and Gerard de Melo
Sentiment analysis is an area of substantial relevance both in industry and in academia, including for instance in social studies. Although supervised learning algorithms have advanced considerably in recent years, in many settings it remains more practical to apply an unsupervised technique. The latter are oftentimes based on sentiment lexicons. However, existing sentiment lexicons reflect an abstract notion of polarity and do not do justice to the substantial differences of word polarities between different domains. In this work, we draw on a collection of domain-specific data to induce a set of 24 domain-specific sentiment lexicons. We rely on initial linear models to induce initial word intensity scores, and then train new deep models based on word vector representations to overcome the scarcity of the original seed data. Our analysis shows substantial differences between domains, which make domain-specific sentiment lexicons a promising form of lexical resource in downstream tasks, and the predicted lexicons indeed perform effectively on tasks such as review classification and cross-lingual word sentiment prediction.
Inflating Topic Relevance with Ideology: A Case Study of Political Ideology Bias in Social Topic Detection Models
Meiqi Guo, Rebecca Hwa, Yu-Ru Lin and Wen-Ting Chung
We investigate the impact of political ideology biases in training data. Through a set of comparison studies, we study how biases are propagated in several widely-used NLP models and their effect on the overall retrieval accuracy. Our work highlights the susceptibility of large, complex models to biases in training data as well as the significance of controlling biases introduced by human-selected keywords. Finally, we propose learning a text representation which is invariant to political ideology and at the same time discriminant to topic relevance as ways to mitigate the bias.
Informative Manual Evaluation of Machine Translation Output
Maja Popović
This work proposes a new method for manual evaluation of Machine Translation (MT) output based on marking actual issues in the translated text. The novelty is that the evaluators are not assigning any scores, nor classifying errors, but marking all problematic parts (words, phrases, sentences) of the translation.
Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism
Pan Xie, Zhi Cui, Xiuying Chen, XiaoHui Hu, Jianwei Cui and Bin Wang
Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy. To remedy a flawed translation by non-autoregressive models, a promising approach is to train a conditional masked translation model (CMTM), and refine the generated results within several iterations. Unfortunately, such approach hardly considers the sequential dependency among target words, which inevitably results in a translation degradation. Hence, instead of solely training a Transformer-based CMTM, we propose a Self-Review Mechanism to infuse sequential information into it. Concretely, we insert a left-to-right mask to the same decoder of CMTM, and then induce it to autoregressively review whether each generated word from CMTM is supposed to be replaced or kept. The experimental results (WMT14 En ↔ De and WMT16 En ↔ Ro) demonstrate that our model uses dramatically less training computations than the typical CMTM, as well as outperforms several state-of-the-art non-autoregressive models by over 1 BLEU. Through knowledge distillation, our model even surpasses a typical left-to-right Transformer model, while significantly speeding up decoding.
Integrating Domain Terminology into Neural Machine Translation
Elise Michon, Josep Crego and Jean Senellart
This paper extends existing work on terminology integration into Neural Machine Translation, a common industrial practice to dynamically adapt translation to a specific domain. Our method, based on the use of placeholders complemented with morphosyntactic annotation, efficiently taps into the ability of the neural network to deal with symbolic knowledge to surpass the surface generalization shown by alternative techniques. We compare our approach to state-of-the-art system and benchmark them through a well-defined evaluation framework focusing on actual application of terminology and not just on the overall performance. Results indicate the suitability of our method in the use-case where terminology is used in a system trained on generic data only.
Integrating External Event Knowledge for Script Learning
Shangwen Lv, Fuqing Zhu and Songlin Hu
Script learning aims to predict the subsequent event according to the existing event chain. Recent studies focus on event co-occurrence to solve this problem. However, few studies integrate external event knowledge to solve this problem. With our observations, external event knowledge can provide additional knowledge like temporal or causal knowledge for understanding event chain better and predicting the right subsequent event. In this work, we integrate event knowledge from ASER (Activities, States, Events and their Relations) knowledge base to help predict the next event. We propose a new approach consisting of knowledge retrieval stage and knowledge integration stage. In the knowledge retrieval stage, we select relevant external event knowledge from knowledge bases. In the knowledge integration stage, we propose three methods to integrate external knowledge into our model and infer final answers. Experiments on the widely-used Multi-Choice Narrative Cloze (MCNC) task show our approach achieves state-of-the-art performance compared to other methods.
Integrating User History into Heterogeneous Graph for Dialogue Act Recognition
Dong Wang, Ziran Li, Ying Shen and Haitao Zheng
Dialogue Act Recognition (DAR) is a challenging problem in Natural Language Understanding, which aims to attach Dialogue Act (DA) labels to each utterance in a conversation. However, previous studies cannot fully recognize the specific expressions given by users due to the informality and diversity of natural language expressions. To solve this problem, we propose a Heterogeneous User History (HUH) graph convolution network, which utilizes the user's historical answers grouped by DA labels as additional clues to recognize the DA label of utterances. To handle the noise caused by introducing the user's historical answers, we design sets of denoising mechanisms, including a History Selection process, a Similarity Re-weighting process, and an Edge Re-weighting process. We evaluate the proposed method on two benchmark datasets MSDialog and MRDA. The experimental results verify the effectiveness of integrating user's historical answers, and show that our proposed model outperforms the state-of-the-art methods.
Intent Mining from past conversations for Conversational Agent
Ajay Chatterjee and Shubhashis Sengupta
Conversational systems are of primary interest in the AI community. Organizations are increasingly using chatbot to provide round-the-clock support and to increase customer engagement. Many commercial bot building frameworks follow a standard approach that requires one to build and train an intent model to recognize user input. These frameworks require a collection of user utterances and corresponding intent to train an intent model. Collecting a substantial coverage of training data is a bottleneck in the bot building process. In cases where past conversation data is available, the cost of labeling hundreds of utterances with intent labels is time-consuming and laborious. In this paper, we present an intent discovery framework that can mine a vast amount of conversational logs and to generate labeled data sets for training intent models. We have introduced an extension to the DBSCAN algorithm and presented a density-based clustering algorithm ITER-DBSCAN for unbalanced data clustering. Empirical evaluation on one conversation dataset, six different intent dataset, and one short text clustering dataset show the effectiveness of our hypothesis.
Interactive Key-Value Memory-augmented Attention for Image Paragraph Captioning
Chunpu Xu, Yu Li, Chengming Li, Xiang Ao, Min Yang and Jinwen Tian
Image paragraph captioning (IPC) aims to generate a fine-grained paragraph to describe the visual content of an image. Significant progress has been made by deep neural networks, in which the attention mechanism plays an essential role. However, conventional attention mechanisms tend to ignore the past alignment information, which often results in problems of repetitive captioning and incomplete captioning. In this paper, we propose an Interactive key-value Memory- augmented Attention model for image Paragraph captioning (IMAP) to keep track of the attention history (salient objects coverage information) along with the update-chain of the decoder state and therefore avoid generating repetitive or incomplete image descriptions. In addition, we employ an adaptive attention mechanism to realize adaptive alignment from image regions to caption words, where an image region can be mapped to an arbitrary number of caption words while a caption word can also attend to an arbitrary number of image regions. Extensive experiments on a benchmark dataset (i.e., Stanford) demonstrate the effectiveness of our IMAP model.
Interactively-Propagative Attention Learning for Implicit Discourse Relation Recognition
Huibin Ruan, Yu Hong, Yang Xu, Zhen Huang, Guodong Zhou and Min Zhang
We tackle implicit discourse relation recognition. Both self-attention and interactive-attention mechanisms have been applied for attention-aware representation learning, which improves the current discourse analysis models. To take advantages of the two attention mechanisms simultaneously, we develop a propagative attention learning model using a cross-coupled two-channel network. We experiment on Penn Discourse Treebank. The test results demonstrate that our model yields substantial improvements over the baselines (BiLSTM and BERT).
Intermediate Self-supervised Learning for Machine Translation Quality Estimation
Raphael Rubino and Eiichiro Sumita
Pretraining sentence encoders is effective in many natural language processing tasks including in machine translation (MT) quality estimation (QE), due partly to the scarcity of annotated QE datasets required for supervised learning. In this paper, we investigate the use of an intermediate self-supervised learning task for sentence encoder aiming at improving QE performances at the sentence and word levels. Our approach is motivated by a problematic inherent of the QE task: insertions and deletions of tokens in translations. We modify the translation language model (TLM) training objective of the cross-lingual language model (XLM) to orientate the pretrained model towards the target task. The proposed method does not rely on annotated data and is complementary to QE methods involving pretrained sentence encoders and domain adaptation. Experiments on English–Germand and English–Russian show that intermediate learning improves over domain adaptated models. Additionally, our method outperforms state-of-the-art QE models by XX Pearson and XX F- 1 for sentence and word-level tasks respectively.
Interpretable Multi-headed Attention for Abstractive Summarization ar Controllable Lengths
Ritesh Sarkhel, Moniba Keymanesh, Arnab Nandi and Srinivasan Parthasarathy
Abstractive summarization at controllable lengths is a challenging task in natural language processing. It is even more challenging for low-resource domains and applications in which the length is not known beforehand. At the same time, when it comes to trusting machine-generated summaries, the need to explain how a summary is constructed in human-understandable terms may be critical. We propose MLS, a supervised method to construct abstractive summaries at controllable lengths in this paper. The key enabler of our method is an interpretable multi-headed attention model that computes attention distribution over an input sequence using an array of timestep independent semantic kernels. Each kernel plays a role in optimizing a human-understandable semantic property, often associated with a high-quality summary. Leveraging this attention mechanism, MLS constructs a length-constrained summary by shortening or expanding a prototype-text derived from the document. Exhaustive experiments on datasets from two low-resource domains show that MLS outperforms strong convolutional baselines by up to 14.70% in METEOR. Human evaluation of our summaries at arbitrary lengths suggests that they capture the main concepts of the input document.
IntKB: A Verifiable Interactive Framework for Knowledge Base Completion
Bernhard Kratzwald, Guo Kunpeng, Stefan Feuerriegel and Dennis Diefenbach
Knowledge bases (KBs) are essential for many downstream NLP tasks, yet their prime shortcoming is that they are often incomplete. State-of-the-art frameworks for KB completion often lack sufficient accuracy to work fully automated without human supervision. As a remedy, we propose \intkb: a novel interactive framework for KB completion from text based on a question answering pipeline. Our framework is tailored to the specific needs of a human-in-the-loop paradigm: (i) We generate facts that are aligned with text snippets and are thus immediately verifiable by humans. (ii) Our system is designed such that it continuously learns during the KB completion task and, therefore, significantly improves its performance upon initial zero- and few-shot relations over time. (iii) We only trigger human interactions when there is enough information for a correct prediction. Therefore, we train our system with negative examples and a fold-option if there is no answer. Our framework yields a favorable performance: it achieves a hit@1 ratio of 26.3% for initially unseen relations, upon which it gradually improves to 45.7%.
Intra-/Inter-Interaction Network with Latent Interaction Modeling for Multi-turn Response Selection
Yang Deng, Wenxuan Zhang and Wai Lam
Multi-turn response selection has been extensively studied and applied to many real-world applications in recent years. However, current methods typically model the interactions between multi-turn utterances and candidate responses with iterative approaches, which is not practical as the turns of conversations vary. Besides, some latent features, such as user intent and conversation topic, are under-discovered in existing works. In this work, we propose Intra-/Inter-Interaction Network (I$^3$) with latent interaction modeling to comprehensively model multi-level interactions between the utterance context and the response. In specific, we first encode the intra- and inter-utterance interaction with the given response from both individual utterance and the overall utterance context. Then we develop a latent multi-view subspace clustering module to model the latent interaction between the utterance and response. Experimental results show that the proposed method substantially and consistently outperforms existing state-of-the-art methods on three multi-turn response selection benchmark datasets.
Intra-Correlation Encoding for Chinese Sentence Intention Matching
Xu Zhang, Yifeng Li, Wenpeng Lu, Ping Jian and Guoqiang Zhang
Sentence intention matching is vital for natural language understanding. Especially for Chinese sentence intention matching task, due to the ambiguity of Chinese words, semantic missing or semantic confusion are more likely to occur in the encoding process. Although the existing methods have enriched text representation through pre-trained word embedding to solve this problem, due to the particularity of Chinese text, different granularities of pre-trained word embedding will affect the semantic description of a piece of text. In this paper, we propose an effective approach that combines charactergranularity and word-granularity features to perform sentence intention matching, and we utilize soft alignment attention to enhance the local information of sentences on the corresponding levels. The proposed method can capture sentence feature information from multiple perspectives and correlation information between different levels of sentences. Evaluated on BQ and LCQMC datasets, our model has achieved remarkable results, and demonstrates better or comparable performance with BERT-based models.
Intrinsic Quality Assessment of Arguments
Henning Wachsmuth and Till Werner
Several quality dimensions of natural language arguments have been investigated. Some are likely to be reflected in linguistic features (e.g., an argument's arrangement), whereas others depend on context (e.g., relevance) or topic knowledge (e.g., acceptability). In this paper, we study the intrinsic computational assessment of 15 dimensions, i.e., only learning from an argument's text. In systematic experiments with eight feature types on an existing corpus, we observe moderate but significant learning success for most dimensions. Rhetorical quality seems hardest to assess, and subjectivity features turn out strong, although length bias in the corpus impedes full validity.
Invertible Tree Embeddings using a Cryptographic Role Embedding Scheme
Coleman Haley and Paul Smolensky
We present a novel method for embedding trees in a vector space based on Tensor-Product Representations (TPRs) which allows for inversion: the retrieval of the original tree structure and nodes from the vectorial embedding. Unlike previous attempts, this does not come at the cost of intractable representation size; we utilize a method for non-exact inversion, showing that it works well when there is sufficient randomness in the representation scheme for simple data and providing an upper bound on its error. To handle the huge number of possible tree positions without memoizing position representation vectors, we present a method (Cryptographic Role Embedding) using cryptographic hashing algorithms that allows for the representation of unboundedly many positions. Through experiments on parse tree data, we show a 30,000-dimensional Cryptographic Role Embedding of trees can provide invertibility with error < 1% that previous methods would require 8.6 × 1057 dimensions to represent.
Investigating Catastrophic Forgetting During Continual Training for Neural Machine Translation
Shuhao Gu and Yang Feng
Neural machine translation (NMT) models usually suffer from catastrophic forgetting during continual training where the models tend to gradually forget previously learned knowledge and swing to fit the newly added data which may have a different distribution, e.g. a different domain. Although many methods have been proposed to solve this problem, we cannot get to know what causes this phenomenon yet. Under the background of domain adaptation, we investigate the cause of catastrophic forgetting from the perspectives of modules and parameters (neurons). The investigation on the modules of the NMT model shows that some modules have tight relation with the general-domain knowledge while some other modules are more essential in the domain adaptation. And the investigation on the parameters shows that some parameters are important for both the general-domain and in-domain translation and the great change of them during continual training brings about the performance decline in general-domain. We conducted experiments across different language pairs and domains to ensure the validity and reliability of our findings.
Is Killed More Significant than Fled? A Contextual Model for Salient Event Detection
Disha Jindal, Daniel Deutsch and Dan Roth
Identifying the key events in a document is critical to holistically understanding its important information. Although measuring the salience of events is highly contextual, most previous work has used a limited representation of events that omits essential information. In this work, we propose a highly contextual model of event salience that uses a rich representation of events, incorporates document-level information and allows for interactions between latent event encodings. Our experimental results on an event salience dataset demonstrate that our model improves over previous work by an absolute 2-4% on standard metrics, establishing a new state-of-the-art performance for the task. We also propose a new evaluation metric that addresses flaws in previous evaluation methodologies. Finally, we discuss the importance of salient event detection for the downstream task of summarization.
Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation
Bryan Eikema and Wilker Aziz
Recent studies have revealed a number of pathologies of neural machine translation (NMT) systems. Hypotheses explaining these mostly suggest there is something fundamentally wrong with NMT as a model or its training algorithm, maximum likelihood estimation (MLE). Most of this evidence was gathered using maximum a posteriori (MAP) decoding, a decision rule aimed at identifying the highest-scoring translation, i.e. the mode. We argue that the evidence corroborates the inadequacy of MAP decoding more than casts doubt on the model and its training algorithm. In this work, we show that translation distributions do reproduce various statistics of the data well, but that beam search strays from such statistics. We show that some of the known pathologies and biases of NMT are due to MAP decoding and not to NMT's statistical assumptions nor MLE. In particular, we show that the most likely translations under the model accumulate so little probability mass that the mode can be considered essentially arbitrary. We therefore advocate for the use of decision rules that take into account the translation distribution holistically. We show that an approximation to minimum Bayes risk decoding gives competitive results confirming that NMT models do capture important aspects of translation well in expectation.
Joint Aspect Extraction and Sentiment Analysis with Directional Graph Convolutional Networks
Guimin Chen, Yuanhe Tian and Yan Song
End-to-end aspect-based sentiment analysis (EASA) consists of two sub-tasks: the first extracts the aspect terms in a sentence and the second predicts the sentiment polarities for such terms. For EASA, compared to pipeline and multi-task approaches, joint aspect extraction and sentiment analysis provides a one-step solution to predict both aspect terms and their sentiment polarities through a single decoding process, which avoid the mismatches in between the results of aspect terms and sentiment polarities, as well as error propagation. Previous studies, especially recent ones, for this task focus on using powerful encoders (e.g., Bi-LSTM and BERT) to model contextual information from the input, with limited efforts paid to using advanced neural architectures (such as attentions and graph convolutional networks) or leveraging extra knowledge (such as syntactic information). To extend such efforts, in this paper, we propose directional graph convolutional networks (D-GCN) to jointly perform aspect extraction and sentiment analysis with encoding syntactic information, where dependency among words are integrated in our model to enhance its ability of representing input sentences and help EASA accordingly. Experimental results on three benchmark datasets demonstrate the effectiveness of our approach, where D-GCN achieves state-of-the-art performance on all datasets.
Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams
Yuanhe Tian, Yan Song and Fei Xia
Chinese word segmentation (CWS) and part-of-speech (POS) tagging are two fundamental tasks for Chinese language process. Previous studies have demonstrated that jointly performing them can be an effective one-step solution to both tasks and this joint task can benefit from a good modeling of contextual features such as n-grams. However, their work on modeling such contextual features is limited in concatenating the features or their embeddings directly with the input embeddings without distinguishing whether the contextual features are important for the joint task in the specific context. Therefore, their models for the joint task could be misled by unimportant contextual information. In this paper, we propose a character-based neural model for the joint task enhanced by multi-channel attention of n-grams. In the attention module, n-gram features are categorized into different groups according to several criteria, and n-grams in each group are weighted and distinguished according to their importance for the joint task in the specific context. To categorize n-grams, we try two criteria in this study, i.e., n-gram frequency and length, so that n-grams having different capabilities of carrying contextual information are discriminatively learned by our proposed attention module. Experimental results on five benchmark datasets for CWS and POS tagging demonstrate that our approach outperforms strong baseline models and achieves state-of-the-art performance on all five datasets.
Joint Entity and Relation Extraction for Legal Documents with Legal Feature Enhancement
Yanguang Chen, Yuanyuan Sun, Zhihao Yang and Hongfei LIN
In recent years, the plentiful information contained in Chinese legal documents has attracted a great deal of attention because of the large-scale release of the judgment documents on China Judgments Online. It is in great need of enabling machines to understand the semantic information stored in the documents which are transcribed in the form of natural language. The technique of information extraction provides a way of mining the valuable information implied in the unstructured judgment documents. We propose a Legal Triplet Extraction System for drug-related criminal judgment documents. The system extracts the entities and the semantic relations jointly and benefits from the proposed legal lexicon feature and multi-task learning framework. Furthermore, we manually annotate a dataset for Named Entity Recognition and Relation Extraction in Chinese legal domain, which contributes to training supervised triplet extraction models and evaluating the model performance. Our experimental results show that the legal feature introduction and multi-task learning framework are feasible and effective for the Legal Triplet Extraction System. The F1 score of triplet extraction finally reaches 0.836 on the legal dataset.
Joint Event Extraction with Hierarchical Policy Network
Peixin Huang, Xiang Zhao, Ryuichi Takanobu, Zhen Tan and Weidong Xiao
Most existing work on event extraction (EE) either follows a pipelined manner or uses a joint structure but is pipelined in essence. As a result, these work fails to utilize the information interactions among event triggers, event arguments and argument roles, and also causes information redundancy. In view of this, we propose to exploit the role information of the arguments in an event and devise a Hierarchical Policy Network (HPNet) to perform joint EE. The whole EE process is fulfilled through a two-level hierarchical structure which consists of two policy networks for event detection and argument detection respectively, so that the deep information interactions among the subtasks are realized and it is more natural to deal with multiple events issue. Extensive experiments on ACE2005 and TAC2015 demonstrate the superiority of HPNet, leading to the state-of-the-art performance and is more powerful for sentences with multiple events.
Joint Persian Word Segmentation Correction and Zero-Width Non-Joiner Recognition Using BERT
Ehsan Doostmohammadi, Minoo Nassajian and Adel Rahimi
Words are properly segmented in the Persian writing system; in practice, however, these writing rules are often neglected, resulting in single words being written disjointedly and multiple words written without any white spaces between them. This paper addresses the problems of word segmentation and zero-width non-joiner (ZWNJ) recognition in Persian, which we approach jointly as a sequence labeling problem. We achieve a macro-averaged F\textsubscript{1}-score of 92.40{\%} on a carefully collected corpus of 500 sentences with a high level of difficulty.
Joint Transformer/RNN Architecture for Gesture Typing in Indic Languages
Emil Biju, Anirudh Sriram, Mitesh M. Khapra and Pratyush Kumar
Gesture typing is a method for typing words on a touch-based keyboard by creating a continuous trace passing through the relevant keys. This work is aimed at developing a keyboard that supports gesture typing for Indic languages. We begin by noting that when dealing with Indic languages, one needs to cater to two different sets of users: (i) users who prefer to type in the native Indic script (Devanagari, Bengali, etc.) and (ii) users who prefer to type using an English script keyboard but want the output in the native script. In both cases, we need a model that takes a trace as input and maps it to the intended word. To enable the development of these models we create and release two datasets. First, we create a dataset containing keyboard traces for 193,658 words across 7 Indic languages. Second, we curate 104,412 English-Indic transliteration pairs from Wikidata for 7 Indic languages. Using these datasets we build a model that performs path decoding, transliteration and transliteration correction. Unlike similar approaches, our proposed model does not make co-character independence assumptions during decoding. The overall accuracy of our model across the 7 languages varies from 70-95%.
Jointly Learning Aspect-Focused and Inter-Aspect Relations with Graph Convolutional Networks for Aspect Sentiment Analysis
Bin Liang, Rongdi Yin, Lin Gui, Jiachen Du and Ruifeng Xu
In this paper, we explore a novel solution of constructing a heterogeneous graph for each instance by leveraging aspect-focused and inter-aspect contextual dependencies for the specific aspect and propose an Interactive Graph Convolutional Networks (InterGCN) model for aspect sentiment analysis. Specifically, an ordinary dependency graph is first constructed for each sentence over the dependency tree. Then we refine the graph by considering the syntactical dependencies between contextual words and aspect-specific words to derive the aspect-focused graph. Subsequently, the aspect-focused graph and the corresponding embedding matrix are fed into the aspect-focused GCN to capture the key aspect and contextual words. Besides, to interactively extract the inter-aspect relations for the specific aspect, an inter-aspect GCN is adopted to model the representations learned by aspect-focused GCN based on the inter-aspect graph which is constructed by the relative dependencies between the aspect words and other aspects. Hence, the model can be aware of the significant contextual and aspect words when interactively learning the sentiment features for a specific aspect. Experimental results on four benchmark datasets illustrate that our proposed model outperforms state-of-the-art methods and substantially boosts the performance in comparison with BERT.
Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication
Ruize Wang, Zhongyu Wei, Piji Li, Ying Cheng, Haijun Shan, Ji Zhang, Qi Zhang and Xuanjing Huang
Visual storytelling aims to generate a narrative paragraph from a sequence of images automatically. Existing approaches construct text description independently for each image and roughly concatenate them as a story, which leads to the problem of generating semantically incoherent content. In this paper, we propose a new way for visual storytelling by introducing a topic description task to detect the global semantic context of an image stream. A story is then constructed with the guidance of the topic description. In order to combine the two generation tasks, we propose a multi-agent communication framework that regards the topic description generator and the story generator as two agents and learn them simultaneously via iterative updating mechanism. We validate our approach on VIST dataset, where quantitative results, ablations, and human evaluation demonstrate our method's good ability in generating stories with higher quality compared to state-of-the-art methods.
KeyGames: A Game Theoretic Approach to Automatic Keyphrase Extraction
Arnav Saxena, Mudit Mangal and Goonjan Jain
In this paper, we introduce two advancements in the automatic keyphrase extraction (AKE) space - KeyGames and pke+. KeyGames is an unsupervised AKE framework that employs the concept of evolutionary game theory and consistent labelling problem to ensure consistent classification of candidates into keyphrase and non-keyphrase. Pke+ is a python based pipeline built on top of the existing pke library to standardize various AKE steps, namely candidate extraction and evaluation, to ensure truly systematic and comparable performance analysis of AKE models. In the experiments section, we compare the performance of KeyGames across three publicly available datasets (Inspec 2001, SemEval 2010, DUC 2001) against the results quoted by the existing state-of-the-art models as well as their performance when reproduced using pke+. The results show that KeyGames outperforms most of the state-of-the-art systems while generalizing better on input documents with different domains and length. Further, pke+’s pre-processing brings out improvement in several other system’s quoted performance as well.
KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text Classification for Kinyarwanda and Kirundi
Rubungo Andre Niyongabo, Qu Hong, Julia Kreutzer and Li Huang
Recent progress in text classification has been focused on high-resource languages such as English and Chinese. For low-resource languages, amongst them most African languages, the lack of well-annotated data and effective preprocessing, is hindering the progress and the transfer of successful methods. In this paper, we introduce two news datasets (KINNEWS and KIRNEWS) for multi-class classification of news articles in Kinyarwanda and Kirundi, two low-resource African languages. The two languages are mutually intelligible, but while Kinyarwanda has been studied in NLP to some extent, this work constitutes the first study on Kirundi. Along with the datasets, we provide statistics, guidelines for preprocessing, and monolingual and crosslingual baseline models. Our experiments show that training embeddings on the relatively higherresourced Kinyarwanda yields successful cross-lingual transfer to Kirundi. In addition, the design of the created datasets allows for a wider use in Natural Language Processing (NLP) beyond text classification in future studies, such as representation learning, cross-lingual learning with more distant languages, or as base for new annotations for tasks such as parsing, POS tagging, and NER.
KnowDis: Knowledge Enhanced Data Augmentation for Event Causality Detection via Distant Supervision
Xinyu Zuo, Yubo Chen, Kang Liu and Jun Zhao
Modern models of event causality detection (ECD) are mainly based on supervised learning from small hand-labeled corpora. However, hand-labeled training data is expensive to produce, low coverage of causal expressions, and limited in size, which makes supervised methods hard to detect causal relations between events. To solve this data lacking problem, we investigate a data augmentation framework for ECD, dubbed as Knowledge Enhanced Distant Data Augmentation (KnowDis). Experimental results on two benchmark datasets EventStoryLine corpus and Causal-TimeBank show that 1) KnowDis can augment available training data assisted with the lexical and causal commonsense knowledge for ECD via distant supervision, and 2) our method outperforms previous methods by a large margin assisted with automatically labeled training data.
Knowledge Aware Emotion Recognition in Textual Conversations via Multi-Task Incremental Transformer
Duzhen Zhang, Xiuyi Chen, Shuang Xu and Bo Xu
Emotion recognition in textual conversations (ERTC) plays an important role in a wide range of applications, such as opinion mining, recommender systems, and so on. ERTC, however, is a challenging task. For one thing, speakers often rely on the context and commonsense knowledge to express emotions; for another, most utterances contain neutral emotion in conversations, as a result, the confusion between a few non-neutral utterances and much more neutral ones restrains the emotion recognition performance. In this paper, we propose a novel Knowledge Aware In- cremental Transformer with Multi-task Learning (KAITML) to address these challenges. Firstly, we devise a dual-level graph attention mechanism to leverage commonsense knowledge, which augments the semantic information of the utterance. Then we apply the Incremental Transformer to encode multi-turn contextual utterances. Moreover, we are the first to introduce multi-task learning to alleviate the aforementioned confusion and thus further improve the emotion recog- nition performance. Extensive experimental results show that our KAITML model outperforms the state-of-the-art models across five benchmark datasets.
Knowledge Base Embedding By Cooperative Knowledge Distillation
Raphaël Sourty, Jose G. Moreno, Lynda Tamine-Lechani and François-Paul Servant
Knowledge bases are increasingly exploited as gold standard data sources which benefit various knowledge-driven NLP tasks. In this paper, we explore a new research direction to perform knowledge base (KB) representation learning grounded with the recent theoretical framework of knowledge distillation over neural networks. Given a set of KBs, our proposed approach KD-MKB, learns KB embeddings by mutually and jointly distilling knowledge within a dynamic teacher-student setting. Experimental results on two standard datasets show that knowledge distillation between KBs through entity and relation inference is actually observed. We also show that cooperative learning significantly outperforms traditional and sequential distillation models.
Knowledge Graph Embedding with Atrous Convolution and Residual Learning
Feiliang Ren
Knowledge graph embedding is an important task and it will benefit lots of downstream applications. Currently, deep neural networks based methods achieve state-of-the-art performance. However, most of these existing methods are very complex and need much time for training and inference. To address this issue, we propose a simple but effective atrous convolution based knowledge graph embedding method. Compared with existing state-of-the-art methods, our method has following main characteristics. First, it effectively increases feature interactions by using atrous convolutions. Second, to address the original information forgetten issue and vanishing/exploding gradient issue, it uses the residual learning method. Third, it has simpler structure but much higher parameter efficiency. We evaluate our method on six benchmark datasets with different evaluation metrics. Extensive experiments show that our model is very effective. On these diverse datasets, it achieves better results than the compared state-of-the-art methods on most of evaluation metrics.
Knowledge Graph Embeddings in Geometric Algebras
Chengjin Xu, Mojtaba Nayyeri, Yung-Yu Chen and Jens Lehmann
Knowledge graph (KG) embedding aims at embedding entities and relations in a KG into a low dimensional latent representation space. Existing KG embedding approaches model entities and relations in a KG by utilizing real-valued , complex-valued, or hypercomplex-valued (Quaternion or Octonion) representations, all of which are subsumed into a geometric algebra. In this work, we introduce a novel geometric algebra-based KG embedding framework, GeomE, which utilizes multivector representations and the geometric product to model entities and relations. Our framework subsumes several state-of-the-art KG embedding approaches and is advantageous with its ability of modeling various key relation patterns, including (anti-)symmetry, inversion and composition, rich expressiveness with higher degree of freedom as well as good generalization capacity. Experimental results on multiple benchmark knowledge graphs show that the proposed approach outperforms existing state-of-the-art models for link prediction.
Knowledge Graph Enhanced Neural Machine Translation via Multi-task Learning on Sub-entity Granularity
Yang Zhao, Lu Xiang, Junnan Zhu, Jiajun Zhang, Yu Zhou and Chengqing Zong
Previous studies combining knowledge graph (KG) with neural machine translation (NMT) have two problems: i) Knowledge under-utilization: they only focus on the entities that appear in both KG and training sentence pairs, making much knowledge in KG unable to be fully utilized. ii) Granularity mismatch: the current KG methods utilize the entity as the basic granularity, while NMT utilizes the sub-word as the granularity, making the KG different to be utilized in NMT. To alleviate above problems, we propose a multi-task learning method on sub-entity granularity. Specifically, we first split the entities in KG and sentence pairs into sub-entity granularity by using joint BPE. Then we utilize the multi-task learning to combine the machine translation task and knowledge reasoning task. The extensive experiments on various translation tasks have demonstrated that our method significantly outperforms the baseline models in both translation quality and handling the entities.
Knowledge-Enhanced Natural Language Inference Based on Knowledge Graphs
Zikang Wang, Linjing Li and Daniel Zeng
Natural Language Inference (NLI) is a vital task in natural language processing. It aims to identify the logical relationship between two sentences. Most of the existing approaches make such inference based on semantic knowledge obtained through training corpus. The adoption of background knowledge is rarely seen or limited to a few specific types. In this paper, we propose a novel Knowledge Graph-enhanced NLI (KGNLI) model to leverage the usage of background knowledge stored in knowledge graphs in the field of NLI. KGNLI model consists of three components: a semantic-relation representation module, a knowledge-relation representation module, and a label prediction module. Different from previous methods, various kinds of background knowledge can be flexibly combined in the proposed KGNLI model. Experiments on four benchmarks, SNLI, MultiNLI, SciTail, and BNLI, validate the effectiveness of our model.
Knowledge-enriched, Type-constrained and Grammar-guided Question Generation over Knowledge Bases
Sheng Bi, Xiya Cheng, Yuan-Fang Li, Yongzhen Wang and Guilin Qi
Question generation over knowledge bases (KBQG) aims at generating natural-language questions about a subgraph, i.e.\ a set of (connected) triples. Two main challenges still face the current crop of encoder-decoder-based methods, especially on small subgraphs: (1) low diversity and poor fluency due to the limited information contained in the subgraphs, and (2) semantic drift due to the decoder's oblivion of the semantics of the answer entity. We propose an innovative knowledge-enriched, type-constrained and grammar-guided KBQG model, named KTG, to addresses the above challenges. In our model, the encoder is equipped with auxiliary information from the KB, and the decoder is constrained with word types during QG. Specifically, entity domain and description, as well as relation hierarchy information are considered to construct question contexts, while a conditional copy mechanism is incorporated to modulate question semantics according to current word types. Besides, a novel reward function featuring grammatical similarity is designed to improve both generative richness and syntactic correctness via reinforcement learning. Extensive experiments show that our proposed model outperforms existing methods by a significant margin on two widely-used benchmark datasets SimpleQuestion and PathQuestion.
Label Correction Model for Aspect-based Sentiment Analysis
Qianlong Wang and Jiangtao Ren
Aspect-based sentiment analysis includes opinion aspect extraction and aspect sentiment classification. Researchers have attempted to discover the relationship between these two sub-tasks and have proposed the joint model for solving aspect-based sentiment analysis. However, they ignore a phenomenon: aspect boundary label and sentiment label of the same word can correct each other. To exploit this phenomenon, we propose a novel deep learning model named the label correction model. Specifically, given an input sentence, our model first predicts the aspect boundary label sequence and sentiment label sequence, then re-predicts the aspect boundary (sentiment) label sequence using the embeddings of the previously predicted sentiment (aspect boundary) label. The goal of the re-prediction operation (can be repeated multiple times) is to use the information of the sentiment (aspect boundary) label to correct the wrong aspect boundary (sentiment) label. Moreover, we explore two ways of using label embeddings: add and gate mechanism. We evaluate our model on three benchmark datasets. Experimental results verify that our model achieves state-of-the-art performance compared with several baselines.
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression
Yujing Wang
BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks. However, a major blocking issue of applying BERT to online services is that it is memory-intensive and leads to unsatisfactory latency of user requests, raising the necessity of model compression. Existing solutions leverage the knowledge distillation framework to learn a smaller model that imitates the behaviors of BERT. However, the training procedure of knowledge distillation is expensive itself as it requires sufficient training data to imitate the teacher model. In this paper, we address this issue by proposing a tailored solution named LadaBERT (Lightweight adaptation of BERT through hybrid model compression), which combines the advantages of different model compression methods, including weight pruning, matrix factorization and knowledge distillation. LadaBERT achieves state-of-the-art accuracy on various public datasets while the training overheads can be reduced by an order of magnitude.
Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Isaac Caswell, Theresa Breiner, Daan van Esch and Ankur Bapna
Large text corpora are increasingly important for a wide variety of Natural Language Processing (NLP) tasks, and automatic language identification (LangID) is a core technology needed to collect such datasets in a multilingual context. LangID is largely treated as a solved problem in literature, with models reported that achieve over 90\% average F1 on as many as 1,366 languages. We train LangID models on up to 1,440 languages with comparable quality on held-out test sets, but find that human-judged LangID accuracy for web-crawl text corpora created using these models is only around 5\% for many lower-resource languages, suggesting a need for more robust evaluation. Further analysis revealed a variety of error modes, arising from domain mismatch, class imbalance, language similarity, and insufficiently expressive models. We propose two classes of techniques to mitigate these errors: wordlist-based tunable-precision filters (for which we release curated lists in about 500 languages) and transformer-based semi-supervised LangID models. These techniques enable us to create an initial data set covering 100K clean sentences in each of 200+ languages, paving the way towards a 1,000-language web text corpus.
Language Model Transformers as Evaluators for Open-domain Dialogues
Rostislav Nedelchev, Ricardo Usbeck and Jens Lehmann
Computer-based systems for communication with humans are a cornerstone of AI research since the 1950s. So far, the most effective way to assess the quality of the dialogues produced by these systems is to use resource-intensive manual labor instead of automated means. In this work, we investigate whether language models (LM) based on transformer neural networks can indicate the quality of a conversation. In a general sense, language models are methods that learn to predict one or more words based on an already given context. Due to their unsupervised nature, they are candidates for efficient, automatic indication of dialogue quality. We demonstrate that human evaluators have a positive correlation between the output of the language models and scores. We also provide some insights into their behavior and inner-working in a conversational context.
Language-Driven Region Pointer Advancement for Controllable Image Captioning
Annika Lindh, Robert Ross and John Kelleher
Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more potential end-user control over results. A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer. In this paper, we propose a novel method for predicting the timing of region pointer advancement by treating the advancement step as a natural part of the language structure via a NEXT-token, motivated by a strong correlation to the sentence structure in the training data. We find that our timing agrees with the ground-truth timing in the Flickr30k Entities test data with a precision of 86.28% and a recall of 97.96%. Our model implementing this technique improves the state-of-the-art on standard captioning metrics while additionally demonstrating a considerably larger effective vocabulary size.
LAVA: Latent Action Spaces via Variational Auto-encoding for Dialogue Policy Optimization
Nurul Lubis, Christian Geishauser, Michael Heck, Hsien-chin Lin, Marco Moresi, Carel van Niekerk and Milica Gasic
Reinforcement learning (RL) can enable task-oriented dialogue systems to steer the conversation towards successful task completion. In an end-to-end setting, a response can be constructed in a word-level sequential decision making process with the entire system vocabulary as action space. Policies trained in such a fashion are independent from pre-defined ontologies, but they have to deal with large action spaces and long trajectories, making RL impractical. Using the latent space of a variational model as action space alleviates this problem. However, current approaches use an uninformed prior for training and optimize the latent distribution solely on the context. It is therefore unclear whether the latent representation truly encodes the characteristics of different actions. In this paper, we explore three ways of leveraging an auxiliary task to shape the latent variable distribution: via pre-training, to obtain an informed prior, and via multitask learning. We choose response auto-encoding as the auxiliary task, as this captures the generative factors of dialogue responses while requiring low computational cost and neither additional data nor labels. Our approach yields a more action-characterized latent representations which support end-to-end dialogue policy optimization and achieves state-of-the-art success rates. These results warrant a more wide-spread use of RL in end-to-end dialogue models.
Layer-wise Multi-view Learning for Neural Machine Translation
Qiang Wang, Yue Zhang, Tong Xiao and Jingbo Zhu
Traditional neural machine translation is limited to the context representation of the topmost encoder layer and cannot directly perceive the lower encoder layers. Existing solutions usually rely on the adjustment of network architecture, which either makes the calculation more complicated or introduces additional structural restrictions. In this work, we propose layer-wise multi-view learning to solve this problem, circumventing the necessity to change the model structure. We regard the off-the-shelf output of each encoder layer, a by-product in layer-by-layer encoding, as the redundant view for the input sentence. In this way, in addition to the topmost encoder layer (referred to the primary view), we also incorporate an intermediate encoder layer as the auxiliary view. We feed the two views to a partially shared decoder to maintain independent prediction. Consistency regularization based on KL divergence is used to encourage the two views to learn from each other. Extensive experimental results on five translation tasks show that our approach yields stable improvements over multiple strong baselines. As another bonus, our method is agnostic to network architectures and can maintain the same inference speed as the original model.
Learn to Combine Linguistic and Symbolic Information for Table-based Fact Verification
Qi Shi, Yu Zhang, Qingyu Yin and Ting Liu
Table-based fact verification is expected to perform both linguistic reasoning and symbolic reasoning. Existing methods lack attention to take advantage of the combination of linguistic information and symbolic information. In this work, we propose HeterTFV, a graph-based reasoning approach, that learns to combine linguistic information and symbolic information effectively. We first construct a program graph to encode programs, a kind of LISP-like logical form, to learn the semantic compositionality of the programs. Then we construct a heterogeneous graph to incorporate both linguistic information and symbolic information by introducing program nodes into the heterogeneous graph. Finally, we propose a graph-based reasoning approach to reason over the multiple types of nodes to make an effective combination of linguistic information and symbolic information. Experimental results on a large-scale benchmark dataset TABFACT illustrate the effect of our approach.
Learn with Noisy Data via Unsupervised Loss Correction for Weakly Supervised Reading Comprehension
Xuemiao Zhang, Kun Zhou, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang and Junfei Liu
Weakly supervised machine reading comprehension (MRC) task is practical and promising for its easily available and massive training data, but inevitablely introduces noise. Existing related methods usually incorporate extra submodels to help filter noise before the noisy data is input to main models. However, these multistage methods often make training difficult, and the qualities of submodels are hard to be controlled. In this paper, we first explore and analyze the essential characteristics of noise from the perspective of loss distribution, and find that in the early stage of training, noisy samples usually lead to significantly larger loss values than clean ones. Based on the observation, we propose a hierarchical loss correction strategy to avoid fitting noise and enhance clean supervision signals, including using an unsupervisedly fitted Gaussian mixture model to calculate the weight factors for all losses to correct the loss distribution, and employ a hard bootstrapping loss to modify loss function. Experimental results on different weakly supervised MRC datasets show that the proposed methods can help improve models significantly.
Learning distributed sentence vectors with bi-directional 3D convolutions
Bin Liu, Liang Wang and Guosheng Yin
We propose to learn distributed sentence representation using text's visual features as input. Different from the existing methods that render the words or characters of a sentence into images separately, we further fold these images into a 3-dimensional sentence tensor. Then, multiple 3-dimensional convolutions with different lengths (the third dimension) are applied to the sentence tensor, which act as bi-gram, tri-gram, quad-gram, and even five-gram detectors jointly. Similar to the Bi-LSTM, these n-gram detectors learn both forward and backward distributional semantic knowledge from the sentence tensor. That is, the proposed model using bi-directional convolutions to learn text embedding according to the semantic order of words. The feature maps from the two directions are concatenated for final sentence embedding learning. Our model involves only a single-layer of convolution which makes it easy and fast to train. Finally, we evaluate the sentence embeddings on several downstream Natural Language Processing (NLP) tasks, which demonstrate a surprisingly excellent performance of the proposed model.
Learning Efficient Task-Specific Meta-Embeddings with Word Prisms
Jingyi He, KC Tsiolis, Kian Kenyon-Dean and Jackie Chi Kit Cheung
Word embeddings are trained to predict word cooccurrence statistics, which leads them to possess different lexical properties (syntactic, semantic, etc.) depending on the notion of context defined at training time. These properties manifest when querying the embedding space for the most similar vectors, and when used at the input layer of deep neural networks trained to solve downstream NLP problems. Meta-embeddings combine multiple sets of differently trained word embeddings, and have been shown to successfully improve intrinsic and extrinsic performance over equivalent models which use just one set of source embeddings. We introduce word prisms: a simple and efficient meta-embedding method that learns to combine source embeddings according to the task at hand. Word prisms learn orthogonal transformations to linearly combine the input source embeddings, which allows them to be very efficient at inference time. We evaluate word prisms in comparison to other meta-embedding methods on six extrinsic evaluations and observe that word prisms offer improvements in performance on all tasks.
Learning from Non-Binary Constituency Trees via Tensor Decomposition
Daniele Castellana and Davide Bacciu
Processing sentence constituency trees in binarised form is a common and popular approach in literature. However, constituency trees are non-binary by nature. The binarisation procedure changes deeply the structure, furthering constituents that instead are close. In this work, we introduce a new approach to deal with non-binary constituency trees which leverages tensor-based models. In particular, we show how a powerful composition function based on the canonical tensor decomposition can exploit such a rich structure. A key point of our approach is the weight sharing constraint imposed on the factor matrices, which allows limiting the number of model parameters. Finally, we introduce a Tree-LSTM model which takes advantage of this composition function and we experimentally assess its performance on different NLP tasks.
Learning Health-Bots from Training Data that was Automatically Created using Paraphrase Detection and Expert Knowledge
Anna Liednikova, Philippe Jolivet, Alexandre Durand-Salmon and Claire Gardent
A key bottleneck for developing dialog models is the lack of adequate training data. Due to privacy issues, dialog data is even scarcer in the health domain. We propose a novel method for creating dialog corpora which we apply to create doctor-patient interaction data. We use this data to learn both a generation and a hybrid classification/retrieval model and find that the generation model consistently outperforms the hybrid model. We show that our data creation method has several advantages. Not only does it allow for the semi-automatic creation of large quantities of training data. It also provides a natural way of guiding learning and a novel method for assessing the quality of human-machine interactions.
Learning Semantic Correspondences from Noisy Data-text Pairs \\by Local-to-Global Alignments
Feng Nie, Jinpeng Wang and Chin-Yew Lin
Learning semantic correspondences between structured input data (e.g., slot-value pairs) and associated texts is a core problem for many downstream NLP applications, e.g., data-to-text generation. Large-scale datasets recently proposed for generation contain loosely corresponding data text pairs, where part of spans in text cannot be aligned to its incomplete paired input. To learn semantic correspondences from such datasets, we propose a two-stage local-to-global alignment (L2GA) framework. First, a local model based on multi-instance learning is applied to build alignments for texts spans that can be directly grounded to its paired structured input. Then, a novel global model built upon a memory-guided conditional random field (CRF) layer aims to infer missing alignments for text spans which not supported by paired incomplete inputs, where the memory is designed to leverage alignment clues provided by the local model to strengthen the global model. In this way, the local model and global model can work jointly to learn semantic correspondences in the same framework. Experimental results show that our proposed method can be generalized to both restaurant and computer domains and improve the alignment accuracy.
Learning to Decouple Relations: Few-Shot Relation Classification with Entity-Guided Attention and Confusion-Aware Training
Yingyao Wang, Junwei Bao, Guangyi Liu, Youzheng Wu, Xiaodong He, Bowen Zhou and Tiejun Zhao
This paper aims to enhance the few-shot relation classification especially for sentences that jointly describe multiple relations. Due to the fact that some relations keep high co-occurrence in the same context, previous few-shot relation classifiers struggle to distinguish them with few annotated instances. To alleviate the above relation confusion problem, we propose two novel mechanisms to learn to decouple these easily-confused relations. On the one hand, an Entity-Guided Attention (EGA) mechanism, which leverages the syntactic relations and relative positions between each word and the specified entity pair, is introduced to guide the attention to filter out information causing confusion. On the other hand, a Confusion-AwareTraining (CAT) method is proposed to explicitly learn to distinguish relations by playing a pushing-away game between classifying a sentence into a true relation and its confusing relation. Extensive experiments are conducted on the FewRel dataset, and the results show that our proposed model achieves comparable and even much better results to strong baselines in terms of accuracy. Furthermore, the ablation test and case study verify the effectiveness of our proposed EGA and CAT, especially in addressing the relation confusion problem.
Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks
Trapit Bansal, Rishikesh Jha and Andrew McCallum
Pre-trained transformer models have shown enormous success in improving performance on several downstream tasks. However, fine-tuning on a new task still requires large amounts of task-specific labeled data to achieve good performance. We consider this problem of learning to generalize to new tasks, with a few examples, as a meta-learning problem. While meta-learning has shown tremendous progress in recent years, its application is still limited to simulated problems or problems with limited diversity across tasks. We develop a novel method, LEOPARD, which enables optimization-based meta-learning across tasks with a different number of classes, and evaluate different methods on generalization to diverse NLP classification tasks. LEOPARD is trained with the state-of-the-art transformer architecture and shows better generalization to tasks not seen at all during training, with as few as 4 examples per label. Across 17 NLP tasks, including diverse domains of entity typing, natural language inference, sentiment analysis, and several other text classification tasks, we show that LEOPARD learns better initial parameters for few-shot learning than self-supervised pre-training or multi-task training, outperforming many strong baselines, for example, yielding 14.6% average relative gain in accuracy on unseen tasks with only 4 examples per label.
Learning to Prune Dependency Trees with Rethinking for Neural Relation Extraction
Bowen Yu, Xue Mengge, Zhenyu Zhang, Tingwen Liu, Wang Yubin and Bin Wang
Dependency trees have been shown to be effective in capturing long-range relations between target entities. Nevertheless, how to selectively emphasize target-relevant information and remove irrelevant content from the tree is still an open problem. Existing approaches employing pre-defined rules to eliminate noise may not always yield optimal results due to the complexity and variability of natural language. In this paper, we present a novel architecture named Dynamically Pruned Graph Convolutional Network (DP-GCN), which learns to prune the dependency tree with rethinking in an end-to-end scheme. In each layer of DP-GCN, we employ a selection module to concentrate on nodes expressing the target relation by a set of binary gates, and then augment the pruned tree with a pruned semantic graph to ensure the connectivity. After that, we introduce a rethinking mechanism to guide and refine the pruning operation by feeding back the high-level learned features repeatedly. Extensive experimental results demonstrate that our model achieves impressive results compared to strong competitors.
Learning with Contrastive Examples for Data-to-Text Generation
Yui Uehara, Tatsuya Ishigaki, Kasumi Aoki, Hiroshi Noji, Keiichi Goshima, Ichiro Kobayashi, Hiroya Takamura and Yusuke Miyao
Existing models for data-to-text tasks generate fluent but sometimes incorrect sentences e.g., "Nikkei gains" is generated when "Nikkei drops" is expected. We investigate models trained on contrastive examples i.e., incorrect sentences or terms, in addition to correct ones to reduce such errors. We first create rules to produce contrastive examples from correct ones by replacing frequent crucial terms such as "gain" or "drop". We then use learning methods with several losses that exploit contrastive examples. Experiments on the market comment generation task show that 1) exploiting contrastive examples improves the capability of generating sentences with better lexical choice, without degrading the fluency, 2) the choice of the loss function is an important factor because the performances on different metrics depend on the types of loss functions, and 3) the use of the examples produced by some specific rules further improves performance. Human evaluation also supports the effectiveness of using contrastive examples.
Leveraging Discourse Rewards for Document-Level Neural Machine Translation
Inigo Jauregi Unanue, Nazanin Esmaili, Gholamreza Haffari and Massimo Piccardi
Document-level machine translation focuses on the translation of entire documents from a source to a target language. It is widely regarded as a challenging task since the translation of the individual sentences in the document needs to retain aspects of the discourse at document level. However, document-level translation models are usually not trained to explicitly ensure discourse quality. Therefore, in this paper we propose a training approach that explicitly optimizes two established discourse metrics, lexical cohesion and coherence, by using a reinforcement learning objective. Experiments over four different language pairs and three translation domains have shown that our training approach has been able to achieve more cohesive and coherent document translations than other competitive approaches, yet without compromising the faithfulness to the reference translation. In the case of the Zh-En language pair, our method has achieved an improvement of 2.46 percentage points (pp) in LC and 1.17 pp in COH over the runner-up, while at the same time improving 0.63 pp in BLEU score and 0.47 pp in F-BERT.
Leveraging HTML in Free Text Web Named Entity Recognition
Colin Ashby and David Weir
HTML tags are typically discarded in free text Named Entity Recognition from Web pages. We investigate whether these discarded tags might be used to improve NER performance. We compare Text+Tags sentences with their Text-Only equivalents, over five datasets, two free text segmentation granularities and two NER models. We find an increased F1% performance for Text+Tags of between 0.9% and 13.2% over all datasets, variants and models. This performance increase, over datasets of varying entity types, HTML density and construction quality, indicates our method is flexible and adaptable. These findings imply that a similar technique might be of use in other Web-aware NLP tasks, including the enrichment of deep language models.
Leveraging WordNet Paths for Neural Hypernym Prediction
Yejin Cho, Juan Diego Rodriguez, Yifan Gao and Katrin Erk
We formulate the problem of hypernym prediction as a sequence generation task, where the sequences are taxonomy paths in WordNet. Our experiments with encoder-decoder models show that training to generate taxonomy paths can improve the performance of direct hypernym prediction. As a simple but powerful model, the hypo2path model achieves state-of-the-art performance and large gains over previous models.
Lexical Relation Mining in Neural Word Embeddings
Aishwarya Jadhav, Yifat Amir and Zachary Pardos
Work with neural word embeddings and lexical relations have largely focused on confirmatory experiments which use human-curated examples of semantic and syntactic relations to validate against. In this paper, we explore the degree to which lexical relations, such as those found in popular validation sets, can be derived and extended from a variety of neural embeddings using classical clustering methods. We show that the Word2Vec space of word pairs (i.e., offset vectors) significantly outperforms other more contemporary methods, even in the presence of a large number of noisy offsets. Moreover, we show that via a simple nearest neighbor approach in the offset space, new examples of known relations can be discovered. Our results speak to the amenability of offset vectors from non-contextual neural embeddings to find semantically coherent clusters. This simple approach has implications for exploration of emergent regularities and their examples, such as emerging trends on social media and their related posts.
Lexical Semantic Analysis of Meaning Representation
Daniel Hershcovich, Nathan Schneider, Dotan Dvir, Jakob Prange, Miryam de Lhoneux and Omri Abend
Many frameworks exist for representing various aspects of linguistic meaning. Building robust natural language understanding systems will require a clear characterization of whether and how these representations complement each other. To perform a systematic comparative analysis, we evaluate the mapping between meaning representations from different frameworks using two complementary methods: a linguistically motivated and carefully designed rule-based converter, and a data-driven supervised delexicalized parser, which parses to one framework using only information from the other as features. We apply these methods to convert STREUSLE to UCCA. While STREUSLE provides comprehensive lexical semantic analysis on top of Universal Dependencies, UCCA is a sentence-level (or even document-level) meaning representation. Surprisingly, we find that both methods yield accurate target representations, close to fully supervised UCCA parser outputs in quality. A construction-level analysis of the results reveals the distinctions each method is sensitive to, as well as the similarities and divergences between the semantic frameworks. Our results show that UCCA, as a sentence-level meaning representation, cannot simply be reduced to syntax and lexical semantics, and that manually annotated training data for it is still a valuable resource for semantic parsers.
Lin: Unsupervised Extraction of Tasks from Textual Communication
Parth Diwanji, Hui Guo, Munindar Singh and Anup Kalia
Commitments and requests are a hallmark of collaborative communication, especially in team settings. Identifying specific tasks being committed to or request from emails and chat messages can enable important downstream tasks, such as producing todo lists, reminders, and calendar entries. State-of-the-art approaches for task identification rely on large annotated datasets, which are not always available, especially for domain-specific tasks. Accordingly, we propose Lin, an unsupervised approach of identifying tasks that leverages dependency parsing and VerbNet. Our evaluations show that Lin yields comparable or more accurate results than supervised models on domains with large training sets, and maintains its excellent performance on unseen domains.
Linguistic Profiling of a Neural Language Model
Alessio Miaschi, Dominique Brunato, Felice Dell'Orletta and Giulia Venturi
In this paper we investigate the linguistic knowledge learned by a Neural Language Model (NLM) before and after a fine-tuning process and how this knowledge affects its predictions during several classification problems. We use a wide set of probing tasks, each of which corresponds to a distinct sentence-level feature extracted from different levels of linguistic annotation. We show that BERT is able to encode a wide range of linguistic characteristics, but it tends to lose this information when trained on specific downstream tasks. We also find that BERT's capacity to encode different kind of linguistic properties has a positive influence on its predictions: the more it stores readable linguistic information, the higher will be its capacity of predicting the correct label.
Linguistic Regularities in Sentence Embeddings
Xunjie Zhu and Gerard de Melo
While important properties of word vector representations have been studied extensively, far less is known about the properties of sentence vector representations. Word vectors are often evaluated by assessing to what degree they exhibit regularities with regard to relationships of the sort considered in word analogies. In this paper, we investigate to what extent commonly used sentence vector representation spaces as well reflect certain kinds of regularities. We propose a number of schemes to induce evaluation data, based on lexical analogy data as well as semantic relationships between sentences. Our experiments consider a wide range of sentence embedding methods, including ones based on BERT-style contextual embeddings. We find that different models differ substantially in their ability to reflect such regularities.
Living Machines: A study of atypical animacy
Mariona Coll Ardanuy, Federico Nanni, Kaspar Beelen, Kasra Hosseini, Ruth Ahnert, Jon Lawrence, Katherine McDonough, Giorgia Tolfo, Daniel CS Wilson and Barbara McGillivray
This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text. In particular, this work is focused on atypical animacy and examines the scenario in which typically inanimate objects, specifically machines, are given animate attributes. To address it, we have created the first dataset for atypical animacy detection, based on nineteenth-century sentences in English, with machines represented as either animate or inanimate. Our method builds upon recent innovations in language modeling, specifically BERT contextualized word embeddings, to better capture fine-grained contextual properties of words. We present a fully unsupervised pipeline, which can be easily adapted to different contexts, and report its performance on an established animacy dataset and our newly introduced resource. We show that our method provides a substantially more accurate characterization of atypical animacy, especially when applied to highly complex forms of language use.
Localness Matters: The Evolved Cross-Attention for Non-Autoregressive Translation
Liang Ding, Di Wu, Longyue Wang, Dacheng Tao and Zhaopeng Tu
Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence. However, due to the lack of autoregressive factorization, it is difficult for the decoder to adequately capture the source contexts. To response this problem, we propose a novel evolved cross-attention for the NAT decoder by modeling the local and global attention simultaneously. Experimental results on Romanian-English, English-German and Chinese-English translation tasks demonstrate that our approach significantly and consistently improves translation quality over strong NAT baselines. Encouragingly, the proposed model outperforms its autoregressive counterpart, Transformer.
Logic-guided Semantic Representation Learning for Zero-Shot Relation Classification
Juan Li, Ruoxu Wang, Ningyu Zhang, Wen Zhang, Fan Yang and Huajun Chen
Relation classification aims to extract semantic relations between entity pairs from the sentences. However, most existing methods can only identify seen relation classes that occurred during training. To recognize unseen relations at test time, we explore the problem of zero-shot relation classification. Previous work regards the problem as reading comprehension or textual entailment, which have to rely on artificial descriptive information to improve the understandability of relation types. Thus, rich semantic knowledge of the relation labels is ignored. In this paper, we propose a novel logic-guided semantic representation learning model for zero-shot relation classification. Our approach builds connections between seen and unseen relations via implicit and explicit semantic representations with knowledge graph embeddings and logic rules. Extensive experimental results demonstrate that our method can generalize to unseen relation types and achieve promising improvements.
Lost in Back-Translation: Emotion Preservation in Neural Machine Translation
Enrica Troiano, Roman Klinger and Sebastian Padó
Machine translation provides powerful methods to convert text between languages, and is therefore a technology enabling a multilingual world. An important part of communication, however, takes place at the non-propositional level (e.g., politeness, formality, emotions), and it is far from clear whether current MT methods properly translate this information.
Making the Best Use of Review Summary for Sentiment Analysis
Sen Yang, Leyang Cui, Jun Xie and Yue Zhang
Sentiment analysis provides a useful overview of customer review contents. Many review websites allow a user to enter a summary in addition to a full review. Intuitively, summary information may give additional benefit for review sentiment analysis. In this paper, we conduct a study to exploit methods for better use of summary information. We start by finding out that the sentimental signal distribution of a review and that of its corresponding summary are in fact complementary to each other. We thus explore various architectures to better guide the interactions between the two and propose a hierarchically-refined review-centric attention model. Empirical results show that our review-centric model can make better use of user-written summaries for review sentiment analysis, and is also more effective compared to existing methods when the user summary is replaced with summary generated by an automatic summarization system.
Mama/Papa, Is this Text for Me?
Rashedur Rahman, Gwénolé Lecorvé, Aline Étienne, Delphine Battistelli, Nicolas Béchet and Jonathan Chevelu
Children have less linguistic skills than adults, which makes it more difficult for them to understand some texts, for instance when browsing the Internet. In this context, we present a novel method which predicts the minimal age from which a text can be understood. This method analyses each sentence of a text using a recurrent neural network, and then aggregates this information to provide the text-level prediction. Different approaches are proposed and compared to baseline models, at sentence and text levels. Experiments are carried out on a corpus of 1, 500 texts and 160K sentences. Our best model, based on LSTMs, outperforms state-of-the-art results and achieves mean absolute errors of 1.86 and 2.28, at sentence and text levels, respectively.
Manifold Learning-based Word Representation Refinement Incorporating Global and Local Information
Wenyu Zhao, Dong Zhou, LIN LI and Jinjun Chen
Recent studies show that word embedding models often underestimate similarities between similar words and overestimate similarities between distant words. This results in word similarity results obtained from embedding models inconsistent with human judgment. Manifold learning-based methods are widely utilized to refine word representations by re-embedding word vectors from original embedding space to a new refined semantic space. These methods mainly focus on preserving local geometry information through performing weighted locally linear combination between words and their neighbors twice for word representation refinement. However, the reconstruction weights are easily influenced and the whole combination process is time consuming with high computational cost. In this paper, we propose two novel word representation refinement methods leveraging isometry feature mapping and local tangent space respectively. Unlike previous methods, our first method correct pre-trained word embeddings by preserving global geometry information of all words instead of local geometry information between words and their neighbors. Our second method refines word representations by aligning original and refined embedding space based on local tangent space instead of performing weighted locally linear combination twice. Results obtained from standard semantic relatedness and semantic similarity tasks show that our methods outperform various state-of-the-art baselines for word representation refinement.
Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis
Olga Majewska, Ivan Vulić, Diana McCarthy and Anna Korhonen
We present the first evaluation of the applicability of a spatial arrangement method (SpAM) to a typologically diverse language sample, and its potential to produce semantic evaluation resources to support multilingual NLP, with a focus on verb semantics. We demonstrate SpAM’s utility in allowing for quick bottom-up creation of large-scale evaluation datasets that balance cross-lingual alignment with language specificity. Starting from a shared sample of 825 English verbs, translated into Chinese, Japanese, Finnish, Polish, and Italian, we apply a two-phase annotation process which produces (i) semantic verb classes and (ii) fine-grained similarity scores for nearly 130 thousand verb pairs. We use the two types of verb data to (a) examine cross-lingual similarities and variation, and (b) evaluate the capacity of static and contextualised representation models to accurately reflect verb semantics, contrasting the performance of large language specific pretraining models with their multilingual equivalent on semantic clustering and lexical similarity, across different domains of verb meaning. We release the data from both phases as a large-scale multilingual resource, comprising 85 verb classes and nearly 130k pairwise similarity scores, offering a wealth of possibilities for further evaluation and research on multilingual verb semantics.
ManyEnt: A Dataset for Few-shot Entity Classification
Markus Eberts, Kevin Pech and Adrian Ulges
We introduce ManyEnt, a benchmark for entity classification models in few-shot scenarios. ManyEnt offers a rich typeset, with a fine-grain variant featuring 256 entity types and a coarse-grain one with 53 entity types. Both versions have been derived from the Wikidata knowledge graph in a semi-automatic fashion. We also report results for two baselines using BERT, reaching up to 70.68% accuracy (10-way 1-shot).
Mark-Evaluate: Assessing Language Generation using Population Estimation Methods
Gonçalo Mordido and Christoph Meinel
We propose a family of metrics to assess language generation derived from population estimation methods widely used in ecology. More specifically, we use mark-recapture and maximum-likelihood methods that have been applied over the past several decades to estimate the size of closed populations in the wild. We propose three novel metrics: ME$
Measuring Correlation-to-Causation Exaggeration in Press Releases
Bei Yu, Jun Wang, Lu Guo and Yingya Li
Press releases have an increasingly strong influence on media coverage of health research; however, they have been found to contain seriously exaggerated claims that can misinform the public and undermine public trust in science. In this study we propose an NLP approach to identify exaggerated causal claims made in health press releases that report on observational studies, which are designed to establish correlational findings, but are often exaggerated as causal. We developed a new corpus and trained models that can identify causal claims in the main statements in a press release. By comparing the claims made in a press release with the corresponding claims in the original research paper, we found that 22% of press releases made exaggerated causal claims from correlational findings in observational studies. Furthermore, universities exaggerated more often than journal publishers by a ratio of 1.5 to 1. Encouragingly, the exaggeration rate has slightly decreased over the past 10 years, despite the increase of the total number of press releases. More research is needed to understand the cause of the decreasing pattern.
Medical Knowledge-enriched Textual Entailment Framework
Shweta Yadav, Vishal Pallagani and Amit Sheth
One of the cardinal tasks in achieving robust medical question answering systems is textual entailment. The existing approaches make use of an ensemble of pre-trained language models or data augmentation, often to clock higher numbers on the validation metrics. However, two major shortcomings impede higher success in identifying entailment: (1) understanding the focus/intent of the question and (2) ability to utilize the real-world background knowledge to capture the con-text beyond the sentence. In this paper, we present a novel Medical Knowledge-Enriched Textual Entailment framework that allows the model to acquire a semantic and global representation of the input medical text with the help of a relevant domain-specific knowledge graph. We evaluate our framework on the benchmark MEDIQA-RQE dataset and manifest that the use of knowledge-enriched dual-encoding mechanism help in achieving an absolute improvement of 8.27% over SOTA language models.
MedWriter: Knowledge-Aware Medical Text Generation
Youcheng Pan, Qingcai Chen, Weihua Peng, Xiaolong Wang, Baotian Hu, Xin Liu, Junying Chen and Wenxiu Zhou
To exploit the domain knowledge to guarantee the correctness of generated text has been a hot topic in recent years, especially for high professional domains such as medical. However, most of recent works only consider the information of unstructured text rather than structured information of the knowledge graph. In this paper, we focus on the medical topic-to-text generation task and adapt a knowledge-aware text generation model to the medical domain, named MedWriter, which not only introduces the specific knowledge from the external MKG but also is capable of learning graph-level representation. We conduct experiments on a medical literature dataset collected from medical journals, each of which has a set of topic words, an abstract of medical literature and a corresponding knowledge graph from CMeKG. Experimental results demonstrate incorporating knowledge graph into generation model can improve the quality of the generated text and has robust superiority over the competitor methods.
Meet Changes with Constancy: Learning Invariance in Multi-Source Translation
Jianfeng Liu, Ling Luo, Xiang Ao, Yan Song, Haoran Xu and Jian Ye
Multi-source neural machine translation aims to translate from parallel sources of information (e.g. languages, images, etc.) to a single target language, which has shown better performance than most one-to-one systems. Despite the remarkable success of existing models, they usually neglect the fact that multiple source inputs may have inconsistencies. Such differences might bring noise to the task and limit the performance of existing multi-source NMT approaches due to their indiscriminate usage of input sources for target word predictions. In this paper, we attempt to advantage the potential complementary information among distinct sources and alleviate the serendipitous conflicts of them. To accomplish that, we propose a source invariance network to learn the invariant information of parallel sources. Such network can be easily integrated with multi-encoder based multi-source NMT methods (e.g. multi-encoder RNN and transformer) to enhance the translation results. Extensive experiments on two multi-source translation tasks demonstrate that the proposed approach not only achieves clear gains in translation quality but also captures implicit invariance between different sources.
MEISD: A Multimodal Multi-Label Emotion, Intensity and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations
Mauajama Firdaus, Hardik Chauhan, Asif Ekbal and Pushpak Bhattacharyya
Emotion and sentiment classification in dialogues is a challenging task that has gained popularity in recent times. Humans tend to have multiple emotions with varying intensities while expressing their thoughts and feelings. Emotions in an utterance of dialogue can either be independent or dependent on the previous utterances, thus making the task complex and interesting. Multi-label emotion detection in conversations is a significant task that provides the ability to the system to understand the various emotions of the users interacting. Sentiment analysis in dialogue/conversation, on the other hand, helps in understanding the perspective of the user with respect to the ongoing conversation. Along with text, additional information in the form of audio and video assist in identifying the correct emotions with the appropriate intensity and sentiments in an utterance of a dialogue. Lately, quite a few datasets have been made available for dialogue emotion and sentiment classification, but these datasets are imbalanced in representing different emotions and consist of an only single emotion. Hence, we present at first a large-scale balanced Multimodal Multi-label Emotion, Intensity, and Sentiment Dialogue dataset (MEISD), collected from different TV series that has textual, audio and visual features, and then establish a baseline setup for further research.
Meta-Information Guided Meta-Learning for Few-Shot Relation Classification
Bowen Dong, Yuan Yao, Ruobing Xie, Tianyu Gao, Xu Han, Zhiyuan Liu, Fen Lin, Leyu Lin and Maosong Sun
Few-shot classification requires classifiers to adapt to new classes with only a few training instances. State-of-the-art meta-learning approaches such as MAML learn how to initialize and fast adapt parameters from limited instances, which have shown promising results in few-shot classification. However, existing meta-learning models solely rely on implicit instance-based statistics, and thus suffer from instance unreliability and weak interpretability. To solve this problem, we propose a novel meta-information guided meta-learning (MIML) framework, where semantic concepts of classes provide strong guidance for meta-learning in both initialization and adaptation. In effect, our model can establish connections between instance-based information and semantic-based information, which enables more effective initialization and faster adaptation. Comprehensive experimental results on few-shot relation classification demonstrate the effectiveness of the proposed framework. Notably, MIML achieves comparable or superior performance to humans with only one shot on FewRel evaluation. The source code will be released to facilitate future research.
METNet: A Mutual Enhanced Transformation Network for Aspect-based Sentiment Analysis
Bin Jiang, Jing Hou, Wanyue Zhou, Chao Yang, Shihan Wang and Liang Pang
Aspect-based sentiment analysis (ABSA) aims to determine the sentiment polarity of each specific aspect in a given sentence. Existing researches have realized the importance of the aspect for the ABSA task and have derived many interactive learning methods that model context based on specific aspect. However, current interaction mechanisms do not work well in learning complex sentences with multiple aspects, and these methods underestimate the representation learning of the aspect. In order to solve the two problems, we propose a mutual enhanced transformation network (METNet) for aspect-based sentiment analysis. First, the aspect enhancement module in METNet improves the representation learning of the aspect by the semantic features extracted from the context to give the aspect more abundant information. Second, METNet designs and implements a hierarchical structure to enhance the representations of aspect and context iteratively so as to achieve better sentiment classification accuracy. Experimental results on SemEval 2014 Datasets demonstrate the effectiveness of METNet, and we further prove that METNet is outstanding in multi-aspect scenarios.
Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics
Manik Bhandari, Pranav Narayan Gour, Atabak Ashfaq and Pengfei Liu
In text summarization, evaluating the efficacy of automatic metrics without human judgments has become recently popular. One exemplar work (Peyrard, 2019) concludes that automatic metrics strongly disagree when ranking high-scoring summaries. In this paper, we revisit their experiments and find that their observations stem from the fact that metrics disagree in ranking summaries from any narrow scoring range. We hypothesize that this may be because summaries are similar to each other in a narrow scoring range and are thus, difficult to rank. Apart from the width of the scoring range of summaries, we analyze three other properties that impact inter-metric agreement - Ease of Summarization, Abstractiveness, and Coverage.
Mining Crowdsourcing Problems from Discussion Forums of Workers
Zahra Nouri, Henning Wachsmuth and Gregor Engels
Crowdsourcing is used in academia and industry to solve tasks that are easy for humans but hard for computers, in natural language processing mostly to annotate data. The quality of annotations is affected by problems in the task design, task operation, and task evaluation that workers face with requesters in crowdsourcing processes. To learn about the major problems, we provide a short but comprehensive survey based on two complementary studies: (1) a literature review where we collect and organize problems known from interviews with workers, and (2) an empirical data analysis where we use topic modeling to mine workers’ complaints from a new English corpus of workers’ forum discussions. While literature covers all process phases, problems in the task evaluation are prevalent, including unfair rejections, late payments, and unjustified blockings of workers. According to the data, however, poor task design in terms of malfunctioning environments, bad workload estimation, and privacy violations seems to bother the workers most. Our findings form the basis for future research on how to improve crowdsourcing processes.
Mixup-Transfomer: Dynamic Data Augmentation for NLP Tasks
Lichao Sun, Congying Xia, Wenpeng Yin, Tingting Liang, Philip Yu and Lifang He
Mixup (Zhang et al., 2017) is a latest data augmentation technique that linearly interpolates input examples and the corresponding labels. It has shown strong effectiveness in image classification by interpolating images at the pixel level. Inspired by this line of research, in this paper, we explore i) how to apply mixup to natural language processing tasks since text data can hardly be mixed in the raw format; ii) if mixup is still effective in transformer-based learning models,e.g., BERT.To achieve the goal, we incorporate mixup to transformer-based pre-trained architecture, named“mixup-transformer”, for a wide range of NLP tasks while keeping the whole end-to-end training system. We evaluate the proposed framework by running extensive experiments on the GLUEbenchmark. Furthermore, we also examine the performance of mixup-transformer in low-resource scenarios by reducing the training data with a certain ratio. Our studies show that mixup is a domain-independent data augmentation technique to pre-trained language models, resulting in significant performance improvement for transformer-based models.
Modality Enriched Neural Network for Metaphor Detection
Mingyu WAN and Baixi Xing
Metaphors are prevalent in everyday language and play a significant role for people to understand complex concepts. Detecting metaphors is challenging due to the subtle ontological differences between metaphorical and non-metaphorical expressions. Neural networks have been widely adopted in metaphor detection and become the main stream technology during the past a few decades. Surprisingly, for such a concept-stimulated phenomenon, linguistic insights have been less utilized. In this work, we propose a linguistically enhanced model for metaphor detection extending one published work (WAN et al., 2020) at ACL Figlang 2020 workshop by incorporating the modality norms into attention-based Bi-LSTM. Results show that our method outperforms most highly related works to a great extent (0.5%-11% F1 gain), also approaching a reachable performance (a 4% F1 discrepancy) to the Top 1 work in record (Su et al., 2020). The current experiment further attests and proves the effectiveness of using modality norms for metaphor detection, echoing the hypothesis that metaphors usually involve modality shift. This work provides a new perspective to the introspection of metaphors and also improves the task of metaphor detection in a consistent way.
Modeling Event Salience in Narratives via Barthes’ Cardinal Functions
Takaki Otake, Sho Yokoi, Naoya Inoue, Ryo Takahashi, Tatsuki Kuribayashi and Kentaro Inui
Events in a narrative differ in salience: some events are more important for the story than others. Estimating event salience is useful for tasks such as story generation, and as a tool for text analysis in narratology and folkloristics. To compute event salience without needing any annotations, we adopt Barthes’ definition of event salience and propose several unsupervised methods that only require a pretrained language model. Evaluating on folktales with event salience annotation, we show that the proposed methods outperform baselines. In addition, we reveal that fine-tuning a language model in a transductive setting is a key factor to improve the proposed methods.
Modeling Evolution of Message Interaction for Rumor Resolution
Zhongyu Wei, Qi ZHANG and Xuanjing Huang
Previous work for rumor resolution concentrates on exploiting time-series characteristics or modeling topology structure separately. However, how local interactive pattern affects global information assemblage has not been explored. In this paper, we attempt to address the problem by learning evolution of message interaction. We model confrontation and reciprocity between message pairs via discrete variational autoencoders which effectively reflects the diversified opinion interactivity. Moreover, we capture the variation of message interaction using a hierarchical framework to better integrate information flow of a rumor cascade. Experiments on PHEME dataset demonstrate our proposed model achieves higher accuracy than existing methods.
Modeling language evolution and feature dynamics in a realistic geographic environment
Rhea Kapur and Phillip Rogers
Recent, innovative efforts to understand the uneven distribution of languages and linguistic feature values in time and space attest to both the challenge these issues pose and the value in solving them. In this paper, we introduce a model for simulating languages and their features over time in a realistic geographic environment. At its core is a model of language phylogeny and migration whose parameters are chosen to reproduce known language family sizes and geographic dispersions. This foundation in turn is used to explore the dynamics of linguistic features. Languages are assigned feature values that can change randomly or under the influence of nearby languages according to predetermined probabilities. We assess the effects of these settings on resulting geographic and genealogical patterns using homogeneity measures defined in the literature. The resulting model is both flexible and realistic, and it can be employed to answer a wide range of related questions.
Modeling Local Contexts for Joint Dialogue Act Recognition and Sentiment Classification with Bi-channel Dynamic Convolutions
Jingye Li, Hao Fei and Donghong Ji
In this paper, we target improving the joint dialogue act recognition (DAR) and sentiment classification (SC) tasks by fully modeling the local contexts of utterances. First, we employ the dynamic convolution network (DCN) as the utterance encoder to capture the dialogue contexts. Further, we propose a novel context-aware dynamic convolution network (CDCN) to better leverage the local contexts when dynamically generating kernels. We extended our frameworks into bi-channel version (i.e., BDCN and BCDCN) under multi-task learning to achieve the joint DAR and SC. Two channels can learn their own feature representations for DAR and SC, respectively, but with latent interaction. Besides, we suggest enhancing the tasks by employing the DiaBERT language model. Our frameworks obtain state-of-the-art performances against all baselines on two benchmark datasets, demonstrating the importance of modeling the local contexts.
Modelling Long-distance Node Relations for KBQA with Global Dynamic Graph
Xu Wang, Shuai Zhao, Jiale Han, Bo Cheng, Hao Yang, Jianchang Ao and Zhenzi Li
The structural information of Knowledge Bases (KBs) has proven effective to Question Answering (QA). Previous studies rely on deep graph neural networks (GNNs) to capture rich structural information, which may not model node relations in particularly long distance due to oversmoothing issue. To address this challenge, we propose a novel framework \textbf{GlobalGraph}, which models long-distance node relations from two views: 1) Node type similarity: GlobalGraph assigns each node a global type label and models long-distance node relations through the global type label similarity; 2) Correlation between nodes and questions: we learn similarity scores between nodes and the question, and model long-distance node relations through the sum score of two nodes. We conduct extensive experiments on two widely used multi-hop KBQA datasets to prove the effectiveness of our method.
Molweni: A Challenge Multiparty Dialogues-based Machine Reading Comprehension Dataset with Discourse Structure
Jiaqi Li, Ming Liu, Min-Yen Kan, Zihao Zheng, Zekun Wang, Wenqiang Lei, Ting Liu and Bing Qin
Research into the area of multiparty dialogue has grown considerably over the recent few years. In this paper, we present the Molweni dataset, a machine reading comprehension (MRC) dataset with discourse structure built over multiparty dialogues. Molweni's source samples from the Ubuntu Chat Corpus, including 10,000 dialogues comprising 88,303 utterances. We annotate 32,700 questions on this corpus, including both answerable and unanswerable questions. Molweni also uniquely contributes discourse dependency annotations for its multiparty dialogues, contributing large-scale (78,246 annotated discourse relations) data to bear on the task of multiparty dialogue discourse parsing. Our experiments show that Molweni is a challenging dataset for current MRC models; {\tt BERT-wwm}, a current, strong SQuAD~2.0 performer, achieves only 67.7\% $F_1$ on Molweni's questions, a 20+\% significant drop as compared against its SQuAD~2.0 performance.
Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations
Sheng Liang, Philipp Dufter and Hinrich Schütze
Pretrained language models (PLMs) learn stereotypes held by humans and reflected in text from their training corpora, including gender bias. When PLMs are used for downstream tasks such as picking candidates for a job, people's lives can be negatively affected by these learned stereotypes. Prior work usually identifies a linear gender subspace and removes gender information by eliminating the subspace. Following this line of work, we propose to use DensRay, an analytical method for obtaining interpretable dense subspaces. We show that DensRay performs on-par with prior approaches, but provide arguments that it is more robust and show that it preserves language model performance better. By applying DensRay to attention heads and layers of BERT we show that gender information is spread across all attention heads and most of the layers. Also we show that DensRay can obtain gender bias scores on both token and sentence level. Finally, we demonstrate that we can remove bias multilingually, e.g., from Chinese, using only English training data.
Morph Completion for Morphologically Rich Languages
William Lane and Steven Bird
Technologies for supporting text input in low-resource morphologically-rich languages can be used to promote literacy, facilitate content authoring, and enable language learning applications. We present novel methods for supporting text input in Kunwinjku, a polysynthetic Indigenous Australian language. We use a finite state model to generate morph-based text completions, incorporating spelling variation. We demonstrate portability by applying our method to a finite state model of Turkish morphology. We discuss the challenges to conventional word-based autocomplete posed by morphologically-rich languages, and examine the relative magnitude of possible morph-based completions versus full-word completions in Kunwinjku. We deploy these tools as web services to facilitate the development of language technology applications, and work with native speakers to solicit feedback and improve the method’s orthographic flexibility.
Morphological disambiguation from stemming data
Antoine Nzeyimana
Morphological analysis and disambiguation is an important task and a crucial preprocessing step in natural language processing of morphologically rich languages. Kinyarwanda, a morphologically rich language, currently lacks any tools for automated morphological analysis. While linguistically curated finite state tools can be easily developed for morphological analysis, the morphological richness of the language allows many ambiguous analyses to be produced, requiring effective disambiguation. In this paper, we propose learning to morphologically disambiguate Kinyarwanda verbal forms from a new stemming dataset collected through crowd-sourcing. Using feature engineering and a feedforward neural network -based classifier, we achieve about 89\% non-contextualized disambiguation accuracy. Our experiments reveal that inflectional properties of stems and morpheme association rules are the most discriminative features for disambiguation.
Morphologically Aware Word-Level Translation
Paula Czarnowska, Sebastian Ruder, Ryan Cotterell and Ann Copestake
We propose a novel morphologically aware probability model for bilingual lexicon induction, which jointly models lexeme translation and inflectional morphology in a structured way. Our model exploits the basic linguistic intuition that the lexeme is the key lexical unit of meaning, while inflectional morphology provides additional syntactic information. This approach leads to substantial performance improvements---19% average improvement in accuracy across 6 language pairs over the state of the art in the supervised setting and 16% in the weakly supervised setting. As another contribution, we highlight issues associated with modern BLI that stem from ignoring inflectional morphology, and propose three suggestions for improving the task.
Multi-choice Relational Reasoning for Machine Reading Comprehension
Wuya Chen, Xiaojun Quan, Chunyu Kit, Zhengcheng Min and Jiahai Wang
This paper presents our study of cloze-style reading comprehension by imitating human reading comprehension, which normally involves tactical comparing and reasoning over candidates while choosing the best answer. We propose a multi-choice relational reasoning (McR$^2$) model with an aim to enable relational reasoning on candidates based on fusion representations of document, query and candidates. For the fusion representations, we develop an efficient encoding architecture by integrating the schemes of bidirectional attention flow, self-attention and document-gated query reading. Then, comparing and inferring over candidates are executed by a novel relational reasoning network. We conduct extensive experiments on four datasets derived from two public corpora, Children's Book Test and Who DiD What, to verify the validity and advantages of our model. The results show that it outperforms all baseline models significantly on the four benchmark datasets. The effectiveness of its key components is also validated by an ablation study.
Multi-grained Chinese Word Segmentation with Weakly Labeled Data
Chen Gong, Zhenghua Li, Bowei Zou and Min Zhang
Previous work train and tune multi-grained Chinese word segmentation (MWS) models only on automatically generated pseudo MWS data due to the lack of manually annotated MWS data. In this work, we further take advantage of the rich word boundary information in existing single-grained word segmentation (SWS) data and naturally annotated data from dictionary example (DictEx) sentences, to advance the state-of-the-art MWS model based on the idea of weak supervision. Particularly, we propose to accommodate two types of weakly labeled data for MWS, i.e., SWS data and DictEx data by employing a simple yet competitive graph-based parser with local loss. Besides, we manually annotate a high-quality MWS dataset according to our newly compiled annotation guideline, consisting of over 9,000 sentences from two types of texts, i.e., canonical newswire (NEWS) and non-canonical web (BAIKE) data for better evaluation. Detailed evaluation shows that our proposed model with weakly labeled data significantly outperforms the state-of-the-art MWS model by 1.12 and 5.97 on NEWS and BAIKE data in F1, coupled with several interesting findings due to the availability of our manually annotated high-quality MWS evaluation data.
Multi-label Fine-grained Sexism Classification using Semi-supervised Multi-task Learning
Harika Abburi, Pulkit Parikh, Niyati Chhaya and Vasudeva Varma
Sexism, a pervasive form of oppression, causes profound suffering through various manifestations. Given the rising number of experiences of sexism reported online, categorizing these recollections automatically can aid the fight against sexism, as it can facilitate effective analyses by gender studies researchers and government officials involved in policy making. In this paper, we explore the fine-grained, multi-label classification of accounts (reports) of sexism. To the best of our knowledge, we consider substantially more categories of sexism than any published work through our 23-class problem formulation. Moreover, we propose a multi-task approach for fine-grained multi-label sexism classification that leverages several supporting tasks without incurring any manual labeling cost. Unlabeled accounts of sexism are utilized through unsupervised learning to help construct our multi-task setup. We also devise objective functions that exploit label correlations in the training data explicitly. Multiple proposed methods outperform the state-of-the-art for multi-label sexism classification on a recently released dataset across five standard metrics.
Multi-level Alignment Pretraining for Multi-lingual Semantic Parsing
Bo Shao, Yeyun Gong, Weizhen Qi, Nan Duan and Xiaola Lin
In this paper, we present a multi-level alignment pretraining method in a unified architecture formulti-lingual semantic parsing. In this architecture, we use an adversarial training method toalign the space of different languages and use sentence level and word level parallel corpus assupervision information to align the semantic of different languages. Finally, we jointly train themulti-level alignment and semantic parsing tasks. We conduct experiments on a publicly avail-able multi-lingual semantic parsing dataset ATIS and a newly constructed dataset. Experimentalresults show that our model outperforms state-of-the-art methods on both datasets.
Multi-Task Learning for Knowledge Graph Completion with Pre-trained Language Models
Bosung Kim, Taesuk Hong, Youngjoong Ko and Jungyun Seo
As research on utilizing human knowledge in natural language processing has attracted considerable attention in recent years, knowledge graph (KG) completion has come into the spotlight. Recently, a new knowledge graph completion method using a pre-trained language model, such as KG-BERT, is presented and showed high performance. However, its scores in ranking metrics such as Hits@k are still behind state-of-the-art models. We claim that there are two main reasons: 1) failure in sufficiently learning relational information in knowledge graphs, and 2) difficulty in picking out the correct answer from lexically similar candidates. In this paper, we propose an effective multi-task learning method to overcome the limitations of previous works. By combining relation prediction and relevance ranking tasks with our target link prediction, the proposed model can learn more relational properties in KGs and properly perform even when lexical similarity occurs. Experimental results show that we not only largely improve the ranking performances compared to KG-BERT but also achieve the state-of-the-art performances in Mean Rank and Hits@10 on the WN18RR dataset.
Multi-Word Lexical Simplification
Piotr Przybyła and Matthew Shardlow
In this work we propose the task of multi-word lexical simplification, in which a sentence in natural language is made easier to understand by replacing its fragment with a simpler alternative, both of which can consist of many words. In order to explore this new direction, we contribute a corpus (MWLS1), including 1462 sentences in English from various sources with 7059 simplifications provided by human annotators. We also propose an automatic solution (Plainifier) based on a purpose-trained neural language model and evaluate its performance, comparing to human and resource-based baselines.
Multilingual Epidemiological Text Classification: A Comparative Study
Stephen Mutuvi, Emanuela Boros, Antoine Doucet, Adam Jatowt, Gaël Lejeune and Moses Odeo
In this paper, we approach the multilingual text classification task in the context of the epidemiological field. Multilingual text classification models tend to perform differently across different languages (low- or high-resourced), more particularly when the dataset is highly imbalanced, which is the case for epidemiological datasets. We conduct a comparative study of different machine and deep learning text classification models using a dataset comprising news articles related to epidemic outbreaks from six languages, four low-resourced and two high-resourced, in order to analyze the influence of the nature of the language, the structure of the document, and the size of the data. Our findings indicate that the performance of the models based on fine-tuned language models exceeds by more than 50% the chosen baseline models that include a specialized epidemiological news surveillance system and several machine learning models. Also, low-resource languages are highly influenced not only by the typology of the languages on which the models have been pre-trained or/and fine-tuned but also by their size. Furthermore, we discover that the beginning and the end of documents provide the most salient features for this task and, as expected, the performance of the models was proportionate to the training data size.
Multilingual Irony Detection with Dependency Syntax and Neural Models
Alessandra Teresa Cignarella, Valerio Basile, Manuela Sanguinetti, Cristina Bosco, Farah Benamara and Paolo Rosso
This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution that can arise from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. Three distinct experimental settings are provided. In the first, a variety of syntactic dependency-based features combined with classical machine learning classifiers are explored. In the second scenario two well-known types of word-embeddings are trained on parsed data and tested against gold standard datasets. In the third setting, dependency-based syntactic features are combined into the Multilingual BERT architecture. Results suggest that fine-grained dependency-based syntactic information could be helpful for the detection of irony.
Multilingual Neural RST Discourse Parsing
Zhengyuan Liu, Ke Shi and Nancy Chen
Text discourse parsing plays important role in understanding information flow and argumentative structure in natural language. Previous research under the Rhetorical Structure Theory (RST) has mostly focused on inducing and evaluating models from the English treebank. However, the parsing tasks for other languages such as German and Portuguese are still challenging due to the shortage of annotated data. In this work, we investigate two approaches to establish a neural, cross-lingual discourse parser via: (1) multilingual vector representations, and (2) segment-level translation of the source content. Experiment results show that both methods achieve state-of-the-art performance on cross-lingual, document-level discourse parsing on all sub-tasks.
Multimodal Review Generation with Privacy and Fairness Awareness
Xuan-Son Vu, Thanh-Son Nguyen, Duc-Trong Le and Lili Jiang
Users express their opinions towards entities (e.g., restaurants) via online reviews which can be in diverse forms such as text, ratings and images. Modeling reviews is advantageous for user behavior understanding which, in turn, supports various user-oriented tasks such as recommendation, sentiment analysis, and review generation. In this paper, we propose MG-PriFair, a multimodal neural-based framework, which generates personalized reviews with privacy and fairness aware-ness. Motivated by the fact that reviews might contain personal information and sentiment bias,we propose a novel differentially private (dp)-embedding model for training privacy guaranteedembeddings and an evaluation approach for sentiment fairness in the food-review domain. Ex-periments on our novel review dataset show that MG-PriFair is capable of generating plausibly long reviews while controlling the amount of exploited user data and using the least sentiment-biased word embeddings. To the best of our knowledge, we are the first to bring user privacy and sentiment fairness into the review generation task. The dataset and source codes will be available once the paper is published.
Multimodal Sentence Summarization via Multimodal Selective Encoding
Haoran Li, Junnan Zhu, Jiajun Zhang, Xiaodong He and Chengqing Zong
This paper studies the problem of generating a summary for a given sentence-image pair. Existing multimodal sequence-to-sequence approaches mainly focus on enhancing the decoder by visual signals, while ignoring that the image can improve the ability of the encoder to identify highlights of a news event or a document. Thus, we propose a multimodal selective gate network that considers reciprocal relationships between textual and multi-level visual features, including global image descriptor, activation grids, and object proposals, to select highlights of the event when encoding the source sentence. In addition, we introduce a modality regularization to encourage the summary to capture the highlights embedded in the image more accurately. To verify the generalization of our model, we adopt the multimodal selective gate to the text-based decoder and multimodal-based decoder. Experimental results on a public multimodal sentence summarization dataset demonstrate the advantage of our models over baselines. Further analysis suggests that our proposed multimodal selective gate network can effectively select important information in the input sentence.
Multimodal Topic-Enriched Auxiliary Learning for Depression Detection
Minghui An, Jingjing Wang, Shoushan Li and Guodong Zhou
From the perspective of health psychology, human beings with long-term and sustained negativity are highly possible to be diagnosed with depression. Inspired by this, we argue that the global topic information derived from user-generated contents (e.g., texts and images) is crucial to boost the performance of the depression detection task, though this information has been neglected by almost all previous studies on depression detection. To this end, we propose a new Multimodal Topic-enriched Auxiliary Learning (MTAL) approach, aiming at capturing the topic information inside different modalities (i.e., texts and images) for depression detection. Especially, in our approach, a modality-agnostic topic model is proposed to be capable of mining the topical clues from whether the discrete textual signals or the continuous visual signals. On this basis, the topic modeling w.r.t. the two modalities are cast as two auxiliary tasks for improving the performance of the primary task (i.e., depression detection). Finally, the detailed evaluation demonstrates the great advantage of our MTAL approach to depression detection over the state-of-the-art baselines. This justifies the importance of the multimodal topic information to depression detection and the effectiveness of our approach in capturing such information.
Multitask Easy-First Dependency Parsing: Exploiting Complementarities of Different Dependency Representations
Yash Kankanampati, Joseph Le Roux, Nadi Tomeh, Dima Taji and Nizar Habash
In this paper we present a parsing model for projective dependency trees which takes advantage of the existence of complementary dependency annotations which is the case in Arabic, with the availability of CATiB and UD treebanks. Our system performs syntactic parsing according to both annotation types jointly as a sequence of arc-creating operations, and partially created trees for one annotation are also available to the other as features for the score function.
Multitask Learning-Based Neural Bridging Reference Resolution
Juntao Yu and Massimo Poesio
We propose a multi task learning-based neural model for resolving bridging references tackling two key challenges. The first challenge is the lack of large corpora annotated with bridging references. To address this, we use multi-task learning to help bridging reference resolution with coreference resolution. We show that substantial improvements of up to 8 p.p. can be achieved on full bridging resolution with this architecture. The second challenge is the different definitions of bridging used in different corpora, meaning that hand-coded systems or systems using special features designed for one corpus do not work well with other corpora. Our neural model only uses a small number of corpus independent features, thus can be applied to different corpora. Evaluations with very different bridging corpora (ARRAU, ISNOTES, BASHI and SCICORP) suggest that our architecture works equally well on all corpora, and achieves the SoTA results on full bridging resolution for all corpora, outperforming the best reported results by up to 34.9 p.p..
MZET: Memory Augmented Zero-Shot Fine-grained Named Entity Typing
Tao Zhang, Congying Xia, Chun-Ta Lu and Philip Yu
Named entity typing (NET) is a classification task of assigning an entity mention in the context with given semantic types. However, with the growing size and granularity of the entity types, few previous researches concern with newly emerged entity types. In this paper, we propose MZET, a novel memory augmented FNET (Fine-grained NET) model, to tackle the unseen types in a zero-shot manner. MZET incorporates character-level, word-level, and contextural-level information to learn the entity mention representation. Besides, MZET considers the semantic meaning and the hierarchical structure into the entity type representation. Finally, through the memory component which models the relationship between the entity mention and the entity type, MZET transfers the knowledge from seen entity types to the zero-shot ones. Extensive experiments on three public datasets show superior performance obtained by MZET, which surpasses the state-of-the-art FNET neural network models with up to 8% gain in Micro-F1 and Macro-F1 score.
Named Entity Recognition for Chinese biomedical patents
Yuting Hu and Suzan Verberne
There is a large body of work on Biomedical Entity Recognition (Bio-NER) for English. There have only been a few attempts addressing NER for Chinese biomedical texts. Because of the growing amount of Chinese biomedical discoveries being patented, and lack of NER models for patent data, we train and evaluate NER models for the analysis of Chinese biomedical patent data, based on BERT. By doing so, we show the value and potential of this domain-specific NER task. For the evaluation of our methods we built our own Chinese biomedical patents NER dataset, and our optimized model achieved an F1 score of 0.54±0.15. Further biomedical analysis indicates that our solution can help detecting meaningful biomedical entities and novel gene-gene interactions, with limited labeled data, training time and computing power.
Native-like Expression Identification by Contrasting Native and Proficient Second Language Speakers
Oleksandr Harust, Yugo Murawaki and Sadao Kurohashi
We propose a novel task of native-like expression identification by contrasting texts written by native and proficient second language speakers. This task is highly challenging mainly because 1) the combinatorial nature of expressions prevents us from choosing candidate expressions a priori and 2) the distributions of the two types of texts overlap considerably. Our solution to the first problem is to combine a powerful neural network-based classifier of sentence-level nativeness with an explainability method that measures an approximate contribution of a given expression to the classifier's prediction. To address the second problem, we introduce a special label neutral and reformulate the classification task as complementary-label learning. Our crowdsourcing-based evaluation and in-depth analysis suggest that our method successfully uncovers linguistically interesting usages distinctive of native speech.
Neural Approaches for Natural Language Interfaces to Databases: A Survey
Radu Cristian Alexandru Iacob, Florin Brad, Elena-Simona APOSTOL, Ciprian-Octavian Truică, Ionel Alexandru Hosu and Traian Rebedea
A natural language interface to databases (NLIDB) enables users without technical expertise to easily access information from relational databases. Interest in NLIDBs has resurged in the past years due to the availability of large datasets and improvements to neural sequence-to-sequence models. In this survey we focus on the key design decisions behind current state of the art neural approaches, which we group into encoder and decoder improvements. We highlight the three most important directions, namely linking question tokens to database schema elements (schema linking), better architectures for encoding the textual query taking into account the schema (schema encoding), and improved generation of structured queries using autoregressive neural models (grammar-based decoders). To foster future research, we also present an overview of the most important NLIDB datasets, together with a comparison of the top performing neural models and a short insight into recent non deep learning solutions.
Neural Automated Essay Scoring Incorporating Handcrafted Features
Masaki Uto, Yikuan Xie and Maomi Ueno
Automated essay scoring (AES) is the task of automatically assigning scores to essays as an alternative to grading by human raters. Conventional AES typically relies on handcrafted features, whereas recent studies have proposed AES models based on deep neural networks (DNN) to obviate the need for feature engineering. Furthermore, hybrid methods that integrate handcrafted features in a DNN-AES model have been recently developed and have achieved state-of-the-art accuracy. One of the most popular hybrid methods is formulated as a DNN-AES model with an additional recurrent neural network (RNN) that processes a sequence of handcrafted sentence-level features. However, this method has the following problems: 1) It cannot incorporate effective essay-level features developed in previous AES research. 2) It greatly increases the numbers of model parameters and tuning parameters, increasing the difficulty of model training. 3) It has an additional RNN to process sentence-level features, enabling extension to various DNN-AES models complex. To resolve these problems, we propose a new hybrid method that integrates handcrafted essay-level features into a DNN-AES model.Specifically, our method concatenates handcrafted essay-level features to a distributed essay representation vector, which is obtained from an intermediate layer of a DNN-AES model. Our method is a simple DNN-AES extension, but significantly improves scoring accuracy.
Neural Language Modeling for Named Entity Recognition
Zhihong Lei, Weiyue Wang, Christian Dugast and Hermann Ney
Named entity recognition is a key component in various natural language processing systems, and neural architectures provide significant improvements over conventional approaches. Regardless of different word embedding and hidden layer structures of the networks, a conditional random field layer is commonly used for the output. This work proposes to use a neural language model as an alternative to the conditional random field layer, which is more flexible for the size of the corpus. Experimental results show that the proposed system has a significant advantage in terms of training speed, with a marginal performance degradation.
Neural Machine Translation Models with Back-Translation for the Extremely Low-Resource Indigenous Language Bribri
Isaac Feldman and Rolando Coto-Solano
This paper presents a neural machine translation model and dataset for the Chibchan language Bribri, with a maximum performance of BLEU 19.8. This model was trained on an extremely small dataset (5923 Bribri-Spanish pairs), providing evidence for the applicability of NMT in extremely low-resource environments. We discuss the challenges entailed in managing training input from languages without standard orthographies, we provide evidence of successful learning of Bribri grammar, and also examine the translations of structures that are infrequent in major Indo-European languages, such as positional verbs, ergative markers, numerical classifiers and complex demonstrative systems. In addition to this, we perform an experiment of augmenting the dataset through iterative back-translation (Sennrich et al., 2015a; Hoang et al., 2018), by using Spanish sentences to create synthetic Bribri sentences. This improves the score by up to 2.1 BLEU, but only when the new Spanish sentences belong to the same domain as the other Spanish examples. This contributes to the small but growing body of research on Chibchan NLP.
Neural Networks approaches focused on French Spoken Language Understanding: application to the MEDIA Evaluation Task
Sahar Ghannay, Christophe Servan and Sophie Rosset
In this paper, we present a study on a French Spoken Language Understanding (SLU) task: the MEDIA task. Many works and studies have been proposed for many tasks, but most of them are focused on English language and tasks. The exploration of a richer language like French within the framework of a SLU task implies to recent approaches to handle this difficulty. Since the MEDIA task seems to be one of the most difficult, according several previous studies, we propose to explore Neural Networks approaches focusing of three aspects: firstly, the Neural Network inputs and more specifically the word embeddings; secondly, we compared French version of BERT against the best setup through different ways; Finally, the comparison against State-of-the-Art approaches. Results show that the word embeddings trained on a small corpus need to be updated during SLU model training. Furthermore, the French BERT fine-tuned approaches outperform the classical Neural Network Architectures and achieves state of the art results. However, the contextual embeddings extracted from one of the French BERT approaches achieve comparable results in comparison to word embedding, when integrated into the proposed neural architecture.
Neural text normalization leveraging similarities of strings and sounds
Riku Kawamura, Tatsuya Aoki, Hidetaka Kamigaito, Hiroya Takamura and Manabu Okumura
We propose neural models that can normalize text by considering the similarities of word strings and sounds. We experimentally compared a model that considers the similarities of both word strings and sounds, a model that considers only the similarity of word strings or of sounds, and a model without the similarities as a baseline. Results showed that leveraging the word string similarity succeeded in dealing with misspellings and abbreviations, and taking into account the sound similarity succeeded in dealing with phonetic substitutions and emphasized characters. So that the proposed models achieved higher F1 scores than the baseline.
Neural Transduction for Multilingual Lexical Translation
Dylan Lewis, Winston Wu, Arya D. McCarthy and David Yarowsky
We present a method for completing multilingual translation dictionaries. Our probabilistic approach can synthesize new word forms, allowing it to operate in settings where correct translations have not been observed in text (cf.\ cross-lingual embeddings). In addition, we propose an approximate Maximum Mutual Information (MMI) decoding objective to further improve performance in both many-to-one and one-to-one word level translation tasks where we use either multiple input languages for a single target language or more typical single language pair translation. The model is trained in a many-to-many setting, where it can leverage information from related languages to predict of its many target languages. We focus on 6 languages: French, Spanish, Italian, Portuguese, Romanian, and Turkish. When indirect multilingual information is available, ensembling with mixture-of-experts as well as incorporating related languages leads to a 27% relative improvement in whole-word accuracy of predictions over a single-source baseline. To seed the completion when multilingual data is unavailable, it is better to decode with an MMI objective.
Neural Unsupervised Domain Adaptation in NLP---A Survey
Alan Ramponi and Barbara Plank
Deep neural networks excel at learning from labeled data and achieve state-of-the-art results on a wide array of Natural Language Processing tasks. In contrast, learning from unlabeled data, especially under domain shift, remains a challenge. Motivated by the latest advances, in this survey we review neural unsupervised domain adaptation techniques which do not require labeled target domain data. This is a more challenging yet a more widely applicable setup. We outline methods, from early traditional non-neural methods to pre-trained model transfer. We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention. Lastly, we outline future directions, particularly the broader need for out-of-distribution generalization of future NLP.
New Benchmark Corpus and Models for Fine-grained Event Classification: To BERT or not to BERT?
Jakub Piskorski, Jacek Haneczok and Guillaume Jacquet
We introduce a new set of benchmark datasets derived from ACLED data for fine-grained event classification and compare the performance of various state-of-the-art models on these datasets, including SVM based on TF-IDF character n-grams and neural context-free embeddings (GLOVE and FASTTEXT) as well as deep learning-based BERT with its contextual embeddings. The best results in terms of micro (94.3-94.9%) and macro F1 (86.0-88.9%) were obtained using BERT transformer, with simpler TF-IDF character n-gram based SVM being an interesting alternative. Further, we discuss the pros and cons of the considered benchmark models in terms of their robustness and the dependence of the classification performance on the size of training data.
News Editorials: Towards Summarizing Long Argumentative Texts
Shahbaz Syed, Roxanne El Baff, Johannes Kiesel, Khalid Al Khatib, Benno Stein and Martin Potthast
The automatic summarization of argumentative texts remains relatively unexplored to date. This paper presents first steps in this direction targeting news editorials, namely opinionated articles with a well-defined argumentation structure. With EditorialSum, we present a corpus of 1330 manually acquired and evaluated summaries for 266 news editorials. We acquire and qualitatively evaluate summaries based on an annotation scheme tailored to editorial summaries, which requires a high-quality summary to be: thesis-indicative, persuasive, reasonable, concise, and self-contained. For about 90% of the editorials we provide at least three high-quality summaries. Alongside in-depth corpus analyses, we present the evaluation of two extractive summarization models. With multiple summaries per editorial, each labeled for quality, our corpus lends itself to the development and evaluation of summarization approaches for long argumentative texts.
Noise Isn't Always Negative: Countering Exposure Bias in Sequence-to-Sequence Inflection Models
Garrett Nicolai and Miikka Silfverberg
Morphological inflection, like many sequence-to-sequence tasks, sees great performance from recurrent neural architectures when data is plentiful, but performance falls off sharply in lower-data settings. We investigate one aspect of neural seq2seq models that we hypothesize contributes to overfitting - teacher forcing.
Normalizing Compositional Structures Across Graphbanks
Lucia Donatelli, Jonas Groschwitz, Matthias Lindemann, Alexander Koller and Pia Weißenhorn
The emergence of a variety of graph-based meaning representations (MRs) has sparked an important conversation about how to adequately represent semantic structure. MRs exhibit structural differences that reflect different theoretical and design considerations, presenting challenges to uniform linguistic analysis and cross-framework semantic parsing. Here, we ask the question of which design differences between MRs are meaningful and semantically-rooted, and which are superficial. We present a methodology for normalizing discrepancies between MRs at the compositional level (Lindemann et al., 2019), finding that we can normalize the majority of divergent phenomena using linguistically-grounded rules. Our work significantly increases the match in compositional structure between MRs and improves multi-task learning (MTL) in a low-resource setting, serving as a proof of concept for future broad-scale cross-MR normalization.
NUT-RC: Noisy User-generated Text-oriented Reading Comprehension
Rongtao Huang, Bowei Zou, Yu Hong, Wei Zhang, AiTi Aw and Guodong Zhou
Reading comprehension (RC) on social media such as Twitter is a critical and challenging task due to its noisy, informal, but informative nature. Most existing RC models are developed on formal datasets such as news articles and Wikipedia documents, which severely limit their performances when directly applied to the noisy and informal texts in social media. Moreover, these models only focus on a certain type of RC, extractive or generative, but ignore the integration of them. To well address these challenges, we come up with a noisy user-generated text-oriented RC model. In particular, we first introduce a set of text normalizers to transform the noisy and informal texts to the formal ones. Then, we integrate the extractive and the generative RC model by a multi-task learning mechanism and an answer selection module. Experimental results on TweetQA demonstrate that our NUT-RC model significantly outperforms the state-of-the-art social media-oriented RC models.
NYTWIT: A Dataset of Novel Words in the New York Times
Yuval Pinter, Cassandra L. Jacobs and Max Bittker
We present the New York Times Word Innovation Types dataset, or NYTWIT, a collection of over 2,500 novel English words published in the New York Times between November 2017 and March 2019, manually annotated for their class of novelty (such as lexical derivation, dialectal variation, blending, or compounding). We present baseline results for both uncontextual and contextual prediction of novelty class, showing that there is room for improvement even for state-of-the-art NLP systems. We hope this resource will prove useful for linguists and NLP practitioners by providing a real-world environment of novel word appearance.
Offensive Language Detection on Video Live Streaming Chat
Zhiwei Gao, Shuntaro Yada, Shoko Wakamiya and Eiji Aramaki
This study presented a prototype of a chat room that detects offensive expressions in real-time on video live streaming chat. Focusing on Twitch, one of the most popular live streaming platforms, we created a dataset for the task of detecting offensive expressions. We collected 2,000 chat posts across four popular game titles with genre diversity (e.g., competitive, violent, peaceful). To make use of the similarity in offensive expressions among different social media, we adopt state-of-the-art models trained over the offensive expressions on Twitter to our Twitch data (i.e., transfer learning). We investigated two similarity measurements to predict the transferability, textual similarity and game-genre similarity. Our results show that transfer features from social media to live streaming is effective. However, the two measurements show less correlation on the transferability prediction.
On the Consistency for E-commerce Product Summarization
Peng Yuan, Haoran Li, Song Xu, Youzheng Wu, Xiaodong He and Bowen Zhou
In this work, we present a model to generate e-commerce product summaries. The consistency between the generated summary and the product attributes is an essential criterion for the ecommerce product summarization task. To enhance the consistency, first, we encode the product attribute table to guide the process of summary generation. Second, we identify the attribute words from the vocabulary, and we constrain these attribute words can be presented in the summaries only through copying from the source, i.e., the attribute words not in the source cannot be generated. We construct a Chinese e-commerce product summarization dataset, and the experimental results on this dataset demonstrate that our models significantly improve the consistency.
On the Helpfulness of Document Context to Sentence Simplification
Renliang Sun, Zhe Lin and Xiaojun Wan
Most of the research on text simplification is limited to sentence level nowadays. In this paper, we are the first to investigate the helpfulness of document context on sentence simplification and apply it to the sequence-to-sequence model. We firstly construct a sentence simplification dataset in which the contexts for the original sentence are provided by Wikipedia corpus. We then propose a new model that makes full use of the context information. Our model uses neural networks to learn the different effects of the preceding sentences and the following sentences on the current sentence and applies them to the improved transformer model. Evaluated on the newly constructed dataset, our model achieves 36.52 on SARI value, which outperforms the baselines, indicating that context indeed helps improve sentence simplification. In the ablation experiment, we show that using either the preceding sentences or the following sentences as context can significantly improve simplification.
On the Practical Ability of Recurrent Neural Networks to Recognize Context-Free Languages
Satwik Bhattamishra, Kabir Ahuja and Navin Goyal
While recurrent models have been effective in NLP tasks, their performance on context-free languages (CFLs) has been found to be quite weak. Given that CFLs are believed to capture important phenomena such as hierarchical structure in natural languages, this discrepancy in performance calls for an explanation. We study the performance of recurrent models on Dyck-n languages, a particularly important and well-studied class of CFLs. We find that while recurrent models generalize nearly perfectly if the lengths of the training and test strings are from the same range, they perform poorly if the test strings are longer. At the same time, we observe that RNNs are expressive enough to recognize Dyck words of arbitrary lengths in finite precision if their depths are bounded. Hence, we evaluate our models on samples generated from Dyck languages with bounded depth and find that they are indeed able to generalize to much higher lengths. Since natural language datasets have nested dependencies of bounded depth, this may help explain why they perform well in modeling hierarchical dependencies in natural language data despite prior works indicating poor generalization performance on Dyck languages. We perform probing studies to support our results and provide comparisons with Transformers.
One Comment from One Perspective: An Effective Strategy for Enhancing Automatic Music Comment
Tengfei Huo, Zhiqiang Liu, Jinchao Zhang, Cheng Niu and Jie Zhou
The automatic generation of music comments is of great significance for increasing the popularity of music and the activity of the platform. In human music comments, there exists high distinction and diverse perspectives for the same song. In other words, for a song, different comments stem from different musical perspectives. However, to date, this characteristic has not been considered well in research on automatic comment generation. The existing methods tend to generate common and meaningless comments. In this paper, we propose an effective multi-perspective strategy to enhance the diversity of the generated comments. The experiment results on two music comment datasets show that our proposed model can effectively generate a series of diverse music comments based on different perspectives, which outperforms state-of-the-art baselines by a substantial margin.
Online Versus Offline NMT Quality: An In-depth Analysis on English-German and German-English
Maha Elbayad, Michael Ustaszewski, Emmanuelle Esperança-Rodier, Francis Brunet-Manquat, Jakob Verbeek and Laurent Besacier
We conduct in this work an evaluation study comparing offline and online neural machine translation architectures. Two sequence-to-sequence models: convolutional Pervasive Attention (Elbayad et al. 2018) and attention-based Transformer (Vaswani et al. 2017) are considered. We investigate, for both architectures, the impact of online decoding constraints on the translation quality through a carefully designed human evaluation on English-German and German-English language pairs, the latter being particularly sensitive to latency constraints. The evaluation results allow us to identify the strengths and shortcomings of each model when we shift to the online setup.
Optimized Transformer for Low-resource Neural Machine Translation
Ali Araabi and Christof Monz
Language pairs with limited amounts of parallel data, also known as low-resource languages, remain a challenge for neural machine translation. While the Transformer model has achieved significant improvements for many language pairs, and has become the de facto mainstream architecture, its capability under low-resource conditions has not been fully investigated yet. Our experiments on different subsets of IWSLT14 training data show that the effectiveness of Transformer under low-resource conditions is highly dependent on the hyper-parameter settings. Our experiments show that using an optimized Transformer for low-resource conditions improves the translation quality up to 7.3 BLEU points compared to using the Transformer default settings.
Out-of-Task Training for Dialog State Tracking Models
Michael Heck, Christian Geishauser, Hsien-chin Lin, Nurul Lubis, Marco Moresi, Carel van Niekerk and Milica Gasic
Dialog state tracking (DST) suffers from data sparsity. While many natural language processing (NLP) tasks benefit from transfer learning and multi-task learning, in dialog these methods are limited by the amount of available data and by the specificity of dialog applications. In this work, we propose to utilize non-dialog data from unrelated tasks to train DST. Our results show that we can exploit unrelated data that is available on a much bigger scale. This opens the door to harvest the abundance of unrelated NLP corpora to mitigate the data sparsity issue inherent to DST.
Parsers Know Best: German PP Attachment Revisited
Bich-Ngoc Do and Ines Rehbein
In the paper, we revisit the PP attachment problem which has been identified as one of the major sources for parser errors and discuss shortcomings of recent work. In particular, we show that using gold information for the extraction of attachment candidates as well as a missing comparison of the system's output to the output of a full syntactic parser leads to an overly optimistic assessment of the results. We address these issues by presenting a realistic evaluation of the potential of different PP attachment systems, using fully predicted information as system input. We compare our results against the output of a strong neural parser and show that the full parsing approach is superior to modeling PP attachment disambiguation as a separate task.
PEDNet: A Persona Enhanced Dual Alternating Learning Network for Conversational Response Generation
Bin Jiang, Wanyue Zhou, Jingxu Yang, Chao Yang, Shihan Wang and Liang Pang
Endowing a chatbot with a personality is essential to deliver more realistic conversations. Various persona-based dialogue models have been proposed to generate personalized and diverse responses by utilizing predefined persona information. However, generating personalized responses is still a challenging task since the leverage of predefined persona information is often insufficient. To alleviate this problem, we propose a novel Persona Enhanced Dual Alternating Learning Network (PEDNet) aiming at producing more personalized responses in various open-domain conversation scenarios. PEDNet consists of a Context-Dominate Network (CDNet) and a Persona-Dominate Network (PDNet), which are built upon a common encoder-decoder backbone. CDNet learns to select a proper persona as well as ensure the contextual relevance of the predicted response, while PDNet learns to enhance the utilization of persona information when generating the response by weakening the disturbance of specific content in the conversation context. CDNet and PDNet are trained alternately using a multi-task training approach to equip PEDNet with the both capabilities they have learned. Both automatic and human evaluations on a newly released dialogue dataset Persona-chat demonstrate that our method could deliver more personalized responses than baseline methods.
Personalized Multimodal Feedback Generation in Education
Haochen Liu, Zitao Liu, Zhongqin Wu and Jiliang Tang
The automatic evaluation for school assignments is an important application of AI in the education field. In this work, we focus on the task of personalized multimodal feedback generation, which aims to generate personalized feedback for various teachers to evaluate students' assignments involving multimodal inputs such as images, audios, and texts. This task involves the representation and fusion of multimodal information and natural language generation, which presents the challenges from three aspects: 1) how to encode and integrate multimodal inputs; 2) how to generate feedback specific to each modality; and 3) how to fulfill personalized feedback generation. In this paper, we propose a novel Personalized Multimodal Feedback Generation Network (PMFGN) armed with a modality gate mechanism and a personalized bias mechanism to address these challenges. The extensive experiments on real-world K-12 education data show that our model significantly outperforms several baselines by generating more accurate and diverse feedback. In addition, detailed ablation experiments are conducted to deepen our understanding of the proposed framework.
PG-GSQL: Pointer-Generator Network with Guide Decoding for Cross-Domain Context-Dependent Text-to-SQL Generation
Huajie Wang, Mei Li and Lei Chen
Text-to-SQL is a task of translating utterances to SQL queries, and most existing neural approaches of text-to-SQL focus on the cross-domain context-independent generation task. We pay close attention to the cross-domain context-dependent text-to-SQL generation task, which requires a model to depend on the interaction history and current utterance to generate SQL query. In this paper, we present an encoder-decoder model called PG-GSQL based on the interaction-level encoder and with two effective innovations in decoder to solve cross-domain context-dependent text-to-SQL task. 1) To effectively capture historical information of SQL query and reuse the previous SQL query tokens, we use a hybrid pointer-generator network as decoder to copy tokens from the previous SQL query via pointer, the generator part is utilized to generate new tokens. 2) We propose a guide component to limit the prediction space of vocabulary for avoiding table-column dependency and foreign key dependency errors during decoding phase. In addition, we design a column-table linking mechanism to improve the prediction accuracy of tables. On the challenging cross-domain context-dependent text-to-SQL benchmark SParC, PG-GSQL achieves 34.0% question matching accuracy and 19.0% interaction matching accuracy on the dev set. With BERT augmentation, PG-GSQL obtains 53.1% question matching accuracy and 34.7% interaction matching accuracy on the dev set, outperforms the previous state-of-the-art model by 5.9% question matching accuracy and 5.2% interaction matching accuracy. Our code is publicly available.
PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents
Ryo Fujii, Masato Mita, Kaori Abe, Kazuaki Hanawa, Makoto Morishita, Jun Suzuki and Kentaro Inui
Neural Machine Translation (NMT) has shown drastic improvement in its quality when translating clean input such as text from the news domain. However, existing studies suggest that NMT still struggles with certain kinds of input with plentiful of noises, such as User-Generated Contents (UGC) on the Internet. To make better use of NMT for cross-cultural communication, one of the most promising directions is to develop a model that correctly handles these input. Though its importance being recognized, it is still not clear as to what creates the great gap in performance between the translation of clean input and translation of UGC. To answer the question, we present a new dataset, PheMT, for evaluating robustness of machine translation systems against specific linguistic phenomena. Our experiments with the created dataset revealed that not only our in-house models but even strongest off-the-shelf systems are greatly disturbed by the presence of certain phenomena.
Pick a Fight or Bite your Tongue: Investigation of Gender Differences in Figurative Language Usage
Ella Rabinovich, Hila Gonen and Suzanne Stevenson
A large body of research on gender-linked language has established foundations regarding cross-gender differences in lexical, emotional, and topical preferences, along with their sociological underpinnings. We compile a novel, large and diverse corpus of spontaneous linguistic productions annotated with speakers' gender, and perform a first large-scale empirical study of distinctions in the usage of figurative language between male and female authors. Our analyses suggest that (1) idiomatic choices reflect gender-specific lexical and semantic preferences in general language, (2) men's and women's idiomatic usages express higher emotion than their literal language, with detectable, albeit more subtle, differences between male and female authors along the dimension of dominance compared to similar distinctions in their literal utterances, and (3) contextual analysis of idiomatic expressions reveals considerable differences, reflecting subtle divergences in usage environments, shaped by cross-gender communication styles and semantic biases.
Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis
Michael Lepori and R. Thomas McCoy
As the name implies, contextualized representations of language are typically motivated by their ability to encode context. Which aspects of context are captured by such representations? We introduce an approach to address this question using Representational Similarity Analysis (RSA). As case studies, we investigate the degree to which a verb embedding encodes the verb’s subject, a pronoun embedding encodes the pronoun’s antecedent, and a full-sentence representation en- codes the sentence’s head. In all cases, we show that BERT’s contextualized embeddings reflect the linguistic dependency being studied, and that BERT encodes these dependencies to a greater degree than it encodes less linguistically-salient controls. These results demonstrate the ability of our approach to adjudicate between hypotheses about which aspects of context are encoded in representations of language.
PoD: Positional Dependency-Based Word Embedding for Aspect Term Extraction
Yichun Yin, Chenguang Wang and Ming Zhang
Dependency context-based word embedding jointly learns the representations of word and dependency context, and has been proved effective in aspect term extraction. In this paper, we design the positional dependency-based word embedding (POD) which considers both dependency context and positional context for aspect term extraction. Specifically, the positional context is mod- eled via relative position encoding. Besides, we enhance the dependency context by integrating more lexical information (e.g., POS tags) along dependency paths. Experiments on SemEval 2014/2015/2016 datasets show that our approach outperforms other embedding methods in aspect term extraction. The source code will be publicly available upon publication.
Pointing to Select: A Fast Pointer-LSTM for Long Text Classification
Jinhua Du, Yan Huang and Karo Moilanen
Recurrent neural networks (RNNs) suffer from well-known limitations and complications which include slow inference and vanishing gradients when processing long sequences in text classification. Recent studies have attempted to accelerate RNNs via various ad hoc mechanisms to skip irrelevant words in the input. However, word skipping approaches proposed to date effectively stop at each or a given time step to decide whether or not a given input word should be skipped, breaking the coherence of input processing in RNNs. Furthermore, current methods cannot change skip rates during inference and are consequently unable to support different skip rates in demanding real-world conditions. To overcome these limitations, we propose Pointer- LSTM, a novel LSTM framework which relies on a pointer network to select important words for target prediction. The model maintains a coherent input process for the LSTM modules and makes it possible to change the skip rate during inference. Our evaluation on four public data sets demonstrates that Pointer-LSTM (a) is 1.1x∼3.5x faster than the standard LSTM architecture; (b) is more accurate than Leap-LSTM, the state-of-the-art LSTM skipping model, at high skip rates; and (c) reaches robust accuracy levels even when the skip rate is changed during inference.
Pointing to Subwords for Generating Function Names in Source Code
Shogo Fujita, Hidetaka Kamigaito, Hiroya Takamura and Manabu Okumura
We tackle the task of automatically generating a function name from source code. Existing generators face difficulties in generating low-frequency or out-of-vocabulary subwords. In this paper, we propose two strategies for copying low-frequency or out-of-vocabulary subwords in inputs. Our best performing model showed an improvement over the conventional method in terms of our modified F1 and accuracy on the Java-small and Java-large datasets.
Porous Lattice Transformer Encoder for Chinese NER
Xue Mengge, Bowen Yu, Tingwen Liu, Yue Zhang, Erli Meng and Bin Wang
Incorporating lexicons into character-level Chinese NER by lattices is proven effective to exploit rich word boundary information. Previous work has extended RNNs to consume lattice inputs and achieved great success. However, due to the DAG structure and the inherently unidirectional sequential nature, this method precludes batched computation and sufficient semantic interaction. In this paper, we propose PLTE, an extension of transformer encoder that is tailored for Chinese NER, which models all the characters and matched lexical words in parallel with batch processing. PLTE augments self-attention with positional relation representations to incorporate lattice structure. It also introduces a porous mechanism to augment localness modeling and maintain the strength of capturing the rich long-term dependencies. Experimental results show that PLTE performs up to 11.4 times faster than state-of-the-art methods while realizing better performance. We also demonstrate that using BERT representations further substantially boosts the performance and brings out the best in PLTE.
Pre-trained Language Model Based Active Learning for Sentence Matching
Guirong Bai, Shizhu He, Kang Liu, Jun Zhao and Zaiqing Nie
Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differing from previous active learning, it can provide linguistic criteria to measure instances and help select more efficient instances for annotation. Experiments demonstrate our approach can achieve greater accuracy with fewer labeled training instances.
Predicting Clickbait Strength in Online Social Media
Vijayasaradhi Indurthi, Bakhtiyar Syed, Manish Gupta and Vasudeva Varma
Hoping for a large number of clicks and potentially high social shares, journalists of various news media outlets publish sensationalist headlines on social media. These headlines lure the readers to click on them and satisfy the curiosity gap in their mind. Low quality material pointed to by clickbaits leads to time wastage and annoyance for users. Even for enterprises publishing clickbaits, it hurts more than it helps as it erodes user trust, attracts wrong visitors, and produces negative signals for ranking algorithms. Hence, identifying and flagging clickbait titles is very essential. Previous work on clickbaits has majorly focused on binary classification of clickbait titles. However not all clickbaits are equally clickbaity. It is not only essential to identify a click-bait, but also to identify the intensity of the clickbait based on the strength of the clickbait. In this work, we model clickbait strength prediction as a regression problem. While previous methods have relied on traditional machine learning or vanilla recurrent neural networks, we rigorously investigate the use of transformers for clickbait strength prediction. On a benchmark dataset with ∼39K posts, our methods outperform all the existing methods in the Clickbait Challenge.
Predicting Personal Opinion on Future Events with Fingerprints
Fan Yang, Eduard Dragut and Arjun Mukherjee
Predicting users' opinions in their response to social events has important real-world applications, many of which political and social impacts. Existing approaches derive a population's opinion on a going event from large scores of user generated content. In certain scenarios, we may not be able to acquire such content and thus cannot infer an unbiased opinion on those emerging events. To address this problem, we propose to explore opinion on unseen articles based on one's fingerprinting: the prior reading and commenting history. This work presents a focused study on modeling and leveraging fingerprinting techniques to predict a user's future opinion. We introduce a recurrent neural network based model that integrates fingerprinting. We collect a large dataset that consists of event-comment pairs from six news websites. We evaluate the proposed model on this dataset. The results show substantial performance gains demonstrating the effectiveness of our approach.
Predicting Stance Change Using Modular Architectures
Aldo Porco and Dan Goldwasser
The ability to change a person's mind on a given issue depends both on the arguments they are presented with and on their underlying perspectives and biases on that issue. Predicting stance changes require characterizing both aspects and the interaction between them, especially in realistic settings in which stance changes are very rare.
Priorless Recurrent Networks Learn Curiously
Jeff Mitchell and Jeffrey Bowers
Recently, domain general recurrent neural networks, without explicit inductive biases, have been shown to successfully reproduce a range of human linguistic behaviours, such as accurately predicting number agreement between nouns and verbs. We show that such networks will also learn number agreement within unnatural sentence structures, i.e. structures that are not found within any natural languages and which humans struggle to process. These results suggest that the models are learning from their input in a manner that is substantially different from human language acquisition, and we undertake an analysis of how the learned knowledge is stored in the weights of the network. We find that while the model has an effective understanding of singular versus plural for individual sentences, there is a lack of a unified concept of number agreement connecting these processes across the full range of inputs. Moreover, the weights handling natural and unnatural structures overlap substantially, in a way that underlines the non-human-like nature of the knowledge learned by the network.
Probabilistic Interpretation with Bag of Latent Features for Text Classification
Phong Le and Willem Zuidema
Interpreting how a neural model works is important to build more robust and trustful ones, but challenging because of the complexity of its structure. Following the success of utilising probability, such as within the attention mechanism, we propose a sub-network architecture called BoLF, which represents input by a \emph{bag of latent features} (a latent feature has a score ranging from 0 to 1 indicating to what extent it is in the bag). We show that, different from attention and salience map, it is straight-forward to compute the probability an input component supports a category. For a demonstration, applying BoLF to text classification, we show that, BoLF slightly outperforms the classical CNN and BiLSTM text classifiers on SST2 and AG-news datasets.
Probing classifiers may just learn from linear context features
Jenny Kunz and Marco Kuhlmann
Analyses that seek to interpret the representations learned by neural sentence encoders such as BERT and ELMo have become popular recently, with probing classifiers trained for auxiliary tasks being the most widespread approach. While many researchers are aware of the difficulty to distinguish between "revealing the linguistic structure encoded in the representations'' and "learning the task", the strategies that have been proposed to address this problem and the question of their validity call for further research. Using a word identity prediction task, we show that the token embeddings learned by neural sentence encoders contain a lot of information about the exact linear context of the token, and suggest that, with such information, learning standard probing tasks may be feasible even without traditional hierarchical structure. Based on this observation, we propose a framework in which analysis efforts can be scrutinized and argue that, with current models and baselines, conclusions that representations contain linguistic structure are not well-grounded. Current probing methodology, such as restricting the probe’s expressivity or using strong baselines, can help to better estimate the complexity of learning, but not build a foundation for speculations about the linguistic structure encoded in the learned representations.
Probing Multilingual BERT for Genetic and Typological Signals
Taraka Rama, Lisa Beinborn and Steffen Eger
We probe the layers in multilingual BERT (mBERT) for phylogenetic and geographic language signals across 100 languages and compute language distances based on the mBERT representations. We 1) employ the language distances to infer and evaluate language trees, finding that they are close to the reference family tree in terms of quartet tree distance, 2) perform distance matrix regression analysis, finding that the language distances can be best explained language genetic and worst by typological factors and 3) present a novel measure of diachronic stability for meanings which in turn correlates significantly with published ranked lists based on computational computational historical linguistic approaches. Our results contribute to the nascent field of linguistic typological interpretability of black-box cross-lingual text representations and we bridge cross-lingual approaches with those of computational historical linguistics.
Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case
Adam Dahlgren Lindström, Johanna Björklund, Suna Bensch and Frank Drewes
Semantic embeddings have advanced the state of the art for countless natural language processing tasks, and various extensions to multimodal domains, such as visual-semantic embeddings, have been proposed. While the power of visual-semantic embeddings comes from the distillation and enrichment of information through machine learning, their inner workings are poorly understood and there is a shortage of analysis tools. To address this problem, we generalize the notion ofprobing tasks to the visual-semantic case. To this end, we (i) discuss the formalization of probing tasks for embeddings of image-caption pairs, (ii) define three concrete probing tasks within our general framework, (iii) train classifiers to probe for those properties, and (iv) compare various state-of-the-art embeddings under the lens of the proposed probing tasks. Our experiments reveal an up to 13% increase in accuracy on visual-semantic embeddings compared to the corresponding unimodal embeddings, which suggest that the text and image dimensions represented in the former do complement each other.
QANom: Question-Answer driven SRL for Nominalizations
Ayal Klein, Jonathan Mamou, Valentina Pyatkin, Daniela Stepanov, Hangfeng He, Dan Roth, Luke Zettlemoyer and Ido Dagan
We propose a new semantic scheme for capturing predicate-argument relations for nominalizations, termed QANom. This scheme extends the QA-SRL formalism (He et al., 2015), modeling the relations between nominalizations and their arguments via natural language question-answer pairs. We construct the first QANom dataset using controlled crowdsourcing, analyze its quality and compare it to expertly annotated nominal-SRL annotations, as well as to other QA-driven annotations. In addition, we train a baseline QANom parser for identifying nominalizations and labeling their arguments with question-answer pairs. Finally, we demonstrate the extrinsic utility of our annotations for downstream tasks using both indirect supervision and zero-shot settings.
QE-Anon: Translation Quality Estimation with Cross-lingual Transformers
Tharindu Ranasinghe, Constantin Orasan and Ruslan Mitkov
Recent years have seen big advances in the field of sentence-level quality estimation (QE), largely as a result of using neural-based architectures. However, the majority of these methods work only on the language pair they are trained on and need retraining for new language pairs. This process can prove difficult from a technical point of view and is usually computationally expensive. In this paper we propose a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. Our evaluation shows that the proposed methods achieve state-of-the-art results when trained on datasets from WMT. In addition, the framework proves very useful in transfer learning settings, especially when dealing with low-resourced languages, allowing us to obtain very competitive results.
R-VGAE: Relational-variational Graph Autoencoder for Unsupervised Prerequisite Chain Learning
Irene Li, Alexander Fabbri, Swapnil Hingmire and Dragomir Radev
The task of concept prerequisite chain learning is to automatically determine the existence of prerequisite relationships among concept pairs. In this paper, we frame learning prerequisite relationships among concepts as an unsupervised task with no access to labeled concept pairs during training. We propose a model called the Relational-Variational Graph AutoEncoder (R-VGAE) to predict concept relations within a graph consisting of concept and resource nodes. Results show that our unsupervised approach outperforms graph-based semi-supervised methods and other baseline methods by up to 9.77% and 10.47% in terms of prerequisite relation prediction accuracy and F1 score. Our method is notably the first graph-based model that attempts to make use of deep learning representations for the task of unsupervised prerequisite learning. We also expand an existing corpus which totals 1,717 English Natural Language Processing (NLP)-related lecture slide files and manual concept pair annotations over 322 topics.
RANCC: Rationalizing Neural Networks via Concept Clustering
Housam Khalifa Bashier, Mi-Young Kim and Randy Goebel
We propose a new self-explainable model for Natural Language Processing (NLP) text classification tasks. Our approach constructs explanations concurrently with the formulation of classification predictions. To do so, we extract a rationale from the text then use it to predict a concept of interest as the final prediction. We provide three types of explanations: 1) rationale extraction, 2) a measure of feature importance, and 3) a clustering of concepts. In addition, we show how our model can be compressed without applying complicated compression techniques. We experimentally demonstrate our explainability approach on a number of text classification datasets.
RatE: Relation-Adaptive Translating Embedding for Knowledge Graph Completion
Hao Huang, Guodong Long, Tao Shen, Jing Jiang and Chengqi Zhang
Many graph embedding approaches have been proposed for knowledge graph completion via link prediction. Among those, translating embedding approaches enjoy the advantages of light-weight structure, high efficiency and great interpretability. Especially when extended to complex vector space, they show the capability in handling various relation patterns including symmetry, antisymmetry, inversion and composition. However, previous translating embedding approaches defined in complex vector space suffer from two main issues: 1) representing and modeling capacities of the model are limited by the translation function with rigorous multiplication of two complex numbers; and 2) embedding ambiguity caused by one-to-many relations is not explicitly alleviated. In this paper, we propose a relation-adaptive translation function built upon a novel weighted product in complex space, where the weights are learnable, relation-specific and independent to embedding size. The translation function only requires eight more scalar parameters each relation, but improves expressive power and alleviates embedding ambiguity problem. Based on the function, we then present our Relation-adaptive translating Embedding (RatE) approach to score each graph triple. Moreover, a novel negative sampling method is proposed to utilize both prior knowledge and self-adversarial learning for effective optimization. Experiments verify RatE achieves state-of-the-art performance on four link prediction benchmarks.
Re-framing Incremental Deep Language Models for Dialogue Processing with Multi-task Learning
Morteza Rohanian and Julian Hough
We present a multi-task learning framework to enable the training of one universal incremental dialogue processing model with four tasks of disfluency detection, language modelling, part-of-speech tagging and utterance segmentation in a simple deep recurrent setting. We show that these tasks produce positive inductive biases to each other with optimal contribution of each one relies on the severity of the noise from the task. Our live multi-task model outperforms similar individual task, delivers competitive performance and is useful for future use in psychiatric conversation agents.
Read and Reason with MuSeRC and RuCoS: Datasets for Machine Reading Comprehension for Russian
Alena Fenogenova, Vladislav Mikhailov and Denis Shevelev
The paper introduces two machine reading comprehension (MRC) datasets for Russian, called MuSeRC and RuCoS, which require reasoning over multiple sentences and commonsense knowledge to infer the answer. The datasets are designed in accordance with the SuperGLUE methodology and included in RussianSuperGLUE, the Russian general language understanding benchmark. We provide a comparative analysis and demonstrate that the proposed tasks may be more complex as compared to the original tasks, namely MultiRC and ReCoRD. Besides, the performance results of human solvers and BERT-based models show that MuSeRC and RuCoS represent a challenge for the advanced neural models. The goal of MuSeRC and RuCoS is thus to facilitate research in the field of MRC for Russian.
Real-Valued Logics for Typological Universals: Framework and Application
Tillmann Dönicke, Xiang Yu and Jonas Kuhn
This paper proposes a framework for the expression of typological statements which uses real-valued logics to capture the empirical truth value (truth degree) of a formula on a given data source, e.g. a collection of multilingual treebanks with comparable annotation. The formulae can be arbitrarily complex expressions of propositional logic. To illustrate the usefulness of such a framework, we present experiments on the Universal Dependencies treebanks for two use cases: (i) empirical (re-)evaluation of established formulae against the spectrum of available treebanks and (ii) evaluating new formulae (i.e. potential candidates for universals) generated by a search algorithm.
Reasoning Requirements for Indirect Speech Act Interpretation
Vasanth Sarathy, Alexander Tsuetaki, Antonio Roque and Matthias Scheutz
We perform a corpus analysis to develop a representation of the knowledge and reasoning used to interpret indirect speech acts. An indirect speech act (ISA) is an utterance whose intended meaning is different from its literal meaning. We focus on those speech acts in which slight changes in situational or contextual information can switch the dominant intended meaning of an utterance from direct to indirect or vice-versa. We computationalize how various contextual features can influence a speaker's beliefs, and how these beliefs can influence the intended meaning and choice of the surface form of an utterance. We axiomatize the domain-general patterns of reasoning involved, and implement a proof-of-concept architecture using Answer Set Programming. Our model is presented as a contribution to cognitive science and psycholinguistics, so representational decisions are justified by theoretical work.
Reasoning Step-by-Step: Temporal Sentence Localization in Videos via Deep Rectification-Modulation Network
Daizong Liu, Xiaoye Qu, Jianfeng Dong and Pan Zhou
Temporal sentence localization in videos aims to ground the best matched segment in an untrimmed video according to a given sentence query. Previous works in this field mainly rely on attentional frameworks to align the temporal boundaries by a soft selection. Although they focus on the visual content relevant to the query, these single-step attention are insufficient to model complex video contents and restrict the higher-level reasoning demand for this task. In this paper, we propose a novel deep rectification-modulation network (RMN), transforming this task into a multi-step reasoning process by repeating rectification and modulation. In each rectification-modulation layer, unlike existing methods directly conducting the cross-modal interaction, we first devise a rectification module to correct implicit attention misalignment which focuses on the wrong position during the cross-interaction process. Then, a modulation module is developed to capture the frame-to-frame relation with the help of sentence information for better correlating and composing the video contents over time. With multiple such layers cascaded in depth, our RMN progressively refines video and query interactions, thus enabling a further precise localization. Experimental evaluations on three public datasets show that the proposed method achieves state-of-the-art performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.
Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey
Samuel Louvan and Bernardo Magnini
In recent years, fostered by deep learning technologies and by the high demand for conversational AI, various approaches have been proposed that address the capacity to elicit and understand user’s needs in task-oriented dialogue systems. We focus on two core tasks, slot filling (SF) and intent classification (IC), and survey how neural based models have rapidly evolved to address natural language understanding in dialogue systems. We introduce three neural architectures: independent models, which model SF and IC separately, joint models, which exploit the mutual benefit of the two tasks simultaneously, and transfer learning models, that scale the model to new domains. We discuss the current state of the research in SF and IC, and highlight challenges that still require attention.
Recognizing Paragraph-level Chinese Discourse Relation via Discourse Argument Graph
Zhenhua Sun, Peifeng Li and Qiaoming Zhu
Most previous studies on discourse analysis used various sequence learning models to encode discourse arguments, which not only limited the model to perceive global information, but also made it difficult to deal with long-distance dependencies, especially on paragraph-level or document-level. To address the above issues, we propose a GCN-based neural network model on discourse argument graph to transform discourse relation recognition into a node classification task.Specifically, we first convert all paragraph-level discourse arguments in the entire corpus into a heterogeneous text graph that integrates word-related global information and argument-related keyword information. Then, we use a graph learning method to encode argument semanticsand recognize the relationship between arguments. The experimental results on the Chinese paragraph-level discourse corpus MCDTB show that our proposed model can effectively recognize the paragraph-level discourse relations and outperforms the state-of-the-art models.
Reference and Document Aware Semantic Evaluation Methods for Korean Language Summarization
Dongyub Lee, Myeong Cheol Shin, Taesun Whang, Seungwoo Cho, Byeongil Ko, Daniel Lee, EungGyun Kim and Jaechoon Jo
Text summarization refers to the process that generates a shorter form of text from the source document preserving salient information. Recently, many models for text summarization have been proposed. Most of those models were evaluated using recall-oriented understudy for gisting evaluation (ROUGE) scores. However, as ROUGE scores are computed based on n-gram overlap, they do not reflect semantic meaning correspondences between generated and reference summaries. Because Korean is an agglutinative language that combines various morphemes into a word that express several meanings, ROUGE is not suitable for Korean summarization. In this paper, we propose evaluation metrics that reflect semantic meanings of a reference summary and the original document, Reference and Document Aware Semantic Score (RDASS). We then propose a method for improving the correlation of the metrics with human judgment. Evaluation results show that the correlation with human judgment is significantly higher for our evaluation metrics than for ROUGE scores.
Referring to what you know and do not know: Making Referring Expression Generation Models Generalize To Unseen Entities
Rossana Cunha, Thiago Castro Ferreira, Adriana Pagano and Fabio Alves
Data-to-text Natural Language Generation (NLG) is the computational process of generating natural language in the form of text or voice from non-linguistic data. A core micro-planning task within NLG is referring expression generation (REG), which aims to automatically generate proper noun phrases to refer to entities mentioned as discourse unfolds. A limitation of novel REG models is not being able to generate referring expressions to entities not encountered during the training process. To solve this problem, we propose two extensions to NeuralREG, a state-of-the-art encoder-decoder REG model. The first is a copy mechanism, whereas the second consists in representing the gender and type of the referent as inputs to the model. Using the WebNLG corpus, automatic and human evaluations, and an ablation study, we contend that our proposal contributes to generating more meaningful referring expressions to unseen entities than the original system and related work. Code and all produced data will be made publicly available.
Regrexit or not Regrexit: Aspect-based Sentiment Analysis in Polarized Contexts
Vorakit Vorakitphan, Marco Guerini, Elena Cabrio and Serena Villata
Emotion analysis in polarized contexts represents a challenge for Natural Language Processing modelling. As a step in the aforementioned direction, in this work we present a methodology to extend the task of Aspect-based Sentiment Analysis (ABSA) toward the affect and emotion representation in polarized settings. In particular, we adopt the three-dimensional model of affect based on Valence, Arousal, and Dominance (VAD). We then present a Brexit scenario that proves how affect varies toward the same aspect when politically polarized stances are presented. Our approach captures aspect-based polarization from newspapers regarding the Brexit scenario of 1.2m entities at sentence-level. We demonstrated how basic constituents of emotions can be mapped to the VAD model, along with their interactions respecting the polarized context in ABSA settings using biased key-concepts (e.g., “stop brexit” vs. “support brexit”). Quite intriguingly, the framework achieves to produce coherent aspect evidences of Brexit’s stance from key-concepts, showing that VAD influence the support and opposition aspects.
Regularized Attentive Capsule Network for Overlapped Relation Extraction
Tianyi Liu, Xiangyu Lin, Weijia Jia, Mingliang Zhou and Wei Zhao
Distantly supervised relation extraction has been widely applied in knowledge base construction due to its less requirement of human efforts. However, the automatically established training datasets in distant supervision contain low-quality instances with noisy words and overlapped relations, introducing great challenges to the accurate extraction of relations. To address this problem, we propose a novel Regularized Attentive Capsule Network (RA-CapNet) to better identify highly overlapped relations in each informal sentence. To discover multiple relation features in an instance, we embed multi-head attention into the capsule network as the low-level capsules, where the subtraction of two entities acts as a new form of relation query to select salient features regardless of their positions. To further discriminate overlapped relation features, we devise disagreement regularization to explicitly encourage the diversity among both multiple attention heads and low-level capsules. Extensive experiments conducted on widely used datasets show that our model achieves significant improvements in relation extraction.
Reinforced Multi-task Approach for Multi-hop Question Generation
Deepak Gupta, Hardik Chauhan, Ravi Tej Akella, Asif Ekbal and Pushpak Bhattacharyya
Question generation (QG) attempts to solve the inverse of question answering (QA) problem by generating a natural language question given a document and an answer. While sequence to sequence neural models surpass rule-based systems for QG, they are limited in their capacity to focus on more than one supporting fact. For QG, we often require multiple supporting facts to generate high-quality questions. Inspired by recent works on multi-hop reasoning in QA, we take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context. We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator. In addition, we also proposed a question-aware reward function in a Reinforcement Learning (RL) framework to maximize the utilization of the supporting facts. We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA. Empirical evaluation shows our model to outperform the single-hop neural question generation models on both automatic evaluation metrics such as BLEU, METEOR, and ROUGE and human evaluation metrics for quality and coverage of the generated questions.
Resource Constrained Dialog Policy Via Differentiable Inductive Logic
Zhenpeng Zhou, Ahmad Beirami, Paul Crook, Pararth Shah, Rajen Subba and Alborz Geramifard
Motivated by the needs of resource constrained dialog policy learning, we introduce dialog policy via differentiable inductive logic (DILOG). We explore the tasks of one-shot learning and zero-shot domain transfer with DILOG on SimDial and MultiWoZ. Using a single representative dialog from the restaurant domain, we train DILOG on the SimDial dataset and obtain 99+% in-domain test accuracy. We also show that the trained DILOG zero-shot transfers to all other domains with 99+% accuracy, proving the suitability of DILOG to slot-filling dialogs. We further extend our study to the MultiWoZ dataset achieving 90+% inform and success metrics. We also observe that these metrics are not capturing some of the shortcomings of DILOG in terms of false positives, prompting us to measure an auxiliary Action F1 score. We show that DILOG is 100x more data efficient than state-of-the-art neural approaches on MultiWoZ while achieving similar performance metrics. We conclude with a discussion on the strengths and weaknesses of DILOG.
Rethinking Residual Connection with Layer Normalization
Fenglin Liu, Xuancheng Ren, Zhiyuan Zhang and Xu SUN
Residual connection and layer normalization have shown great success in improving the performance of deep neural networks as well as facilitating convergence. In the literature, the relation between residual networks and batch normalization is explored in limited context, while the combination of residual connection and layer normalization has not been well studied. In this work, we investigate mainly three different ways of combining residual connection and layer normalization. The effectiveness of these combinations is further examined by extensive experiments. Especially, we find that one of the proposed variants, which we name Recursive Residual Connection with Layer Normalization, promotes the performance substantially and generalizes well across diverse tasks. The recursive function allows the enhanced propagation of the residual signals and the layer normalization mitigates the gradient exploding problem. We report improved results using a Transformer model on the WMT-2014 EN-DE, EN-FR and IWSLT-2015 EN-VI machine translation tasks, and a ResNet-110 on CIFAR-10 and CIFAR-100 image classification tasks. More encouragingly, we establish the new state-of-the-art on IWSLT-2015 EN-VI dataset.
Rethinking the Value of Transformer Components
Wenxuan Wang and Zhaopeng Tu
Transformer becomes the state-of-the-art translation model, while it is not well studied how each intermediate component contributes to the model performance, which poses significant challenges for designing optimal architectures. In this work, we bridge this gap by evaluating the impact of individual component (sub-layer) in trained Transformer models from different perspectives. Experimental results across language pairs, training strategies, and model capacities show that certain components are consistently more important than the others. We also report a number of interesting findings that might help humans better analyze, understand and improve Transformer models. Based on these observations, we further propose a new training strategy that can improves translation performance by distinguishing the unimportant components in training.
Retrieving Inductive Bias of Attribute as Reference for Review Generation
Jihyeok Kim, Seungtaek Choi, Reinald Kim Amplayo and Seung-won Hwang
In this paper, we study review generation given a set of attribute identifiers which are user ID, product ID and rating. This is a difficult subtask of natural language generation since models are limited to the given identifiers, without any specific descriptive information regarding the inputs, when generating the text. The capacity of these models is thus confined and dependent to how well the models can capture vector representations of attributes. We thus propose to additionally leverage references, which are selected from a large pool of texts labeled with one of the attributes, as textual information that enriches inductive biases of given attributes. With these references, we can now pose the problem as an instance of text-to-text generation, which makes the task easier since texts that are syntactically, semantically similar with the output text are provided as input. Using this framework, we address issues such as selecting references from a large candidate set without textual context and improving the model complexity for generation. Our experiments show that our models improve over previous approaches on both automatic and human evaluation metrics.
Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework
Akshay Bhola, Kishaloy Halder, Animesh Prasad and Min-Yen Kan
We introduce a deep learning model to learn the set of enumerated job skills associated with a job description. In our analysis of a large-scale government job portal, we observe that as much as 65% of job descriptions miss describing a significant number of relevant skills. Our model addresses this task from the perspective of an extreme multi-label classification (XMLC) problem, where descriptions are the evidence for the binary relevance of thousands of individual skills. Building upon the current state-of-the-art language modeling approaches such as BERT, we show our XMLC method improves on an existing baseline solution by over 9% and 7% absolute improvements in terms of recall and normalized discounted cumulative gain. We further show that our model effectively addresses the missing skills problem and can recover relevant skills missed in the job posting process. To facilitate future research and replication of our work, we have made our dataset and our system’s relevance judgements publicly available
Rhetoric, Logic, and Dialectic: Advancing Theory-based Argument Quality Assessment in Natural Language Processing
Anne Lauscher, Lily Ng, Courtney Napoles and Joel Tetreault
Though preceding work in computational argument quality (AQ) mostly focuses on assessing overall AQ, researchers agree that writers would benefit from feedback targeting individual dimensions of argumentation theory. However, a large-scale theory-based corpus and corresponding computational models are missing. We fill this gap by presenting AQCorpus: the first large-scale English multi-domain (Q&A forums, debate forums, review forums) corpus annotated with theory-based AQ scores. We then propose the first computational approaches to theory-based assessment, which can serve as strong baselines for future work. We demonstrate the feasibility of large-scale AQ annotation, show that exploiting relations between dimensions yields performance improvements, and explore the synergies between theory-based prediction and practical AQ assessment.
RIVA: A Pre-trained Tweet Multimodal Model Based on Text-image Relation for Multimodal NER
Lin Sun, Jiquan Wang, Yindu Su, Fangsheng Weng, Yuxuan Sun, Zengwei Zheng and Yuanyi Chen
Multimodal named entity recognition (MNER) for tweets has received increasing attention recently. Most of the multimodal methods used attention mechanisms to capture the text-related visual information. However, unrelated or weakly related text-image pairs account for a large proportion in tweets. Visual clues unrelated to the text would incur uncertain or even negative effects for multimodal model learning. In this paper, we propose a novel pre-trained multimodal model based on Relationship Inference and Visual Attention (RIVA) for tweets. The RIVA model controls the attention-based visual clues with a gate regarding the role of image to the semantics of text. We use a teacher-student semi-supervised paradigm to leverage a large unlabeled multimodal tweet corpus with a labeled data set for text-image relation classification. In the multimodal NER task, the experimental results show the significance of text-related visual features for the visual-linguistic model and our approach achieves SOTA performance on the MNER datasets.
RoBERT – A Romanian BERT Model
Mihai Masala, Stefan Ruseti and Mihai Dascalu
Deep pre-trained language models tend to become ubiquitous in the field of Natural Language Processing (NLP). These models learn contextualized representations by using a huge amount of unlabeled text data and obtain state of the art results on a multitude of NLP tasks, by enabling efficient transfer learning. For other languages besides English, there are limited options of such models, most of which are trained only on multi-lingual corpora. In this paper we introduce a Romanian-only pre-trained BERT model – RoBERT – and compare it with different multi-lingual models on seven Romanian specific NLP tasks grouped into three categories, namely: sentiment analysis, dialect and cross-dialect topic identification, and diacritics restoration. Our model surpasses the multi-lingual models, as well as a another mono-lingual implementation of BERT, on all tasks.
Robust Machine Reading Comprehension by Learning Soft labels
Zhenyu Zhao, Shuangzhi Wu, Muyun Yang, Kehai Chen and Tiejun Zhao
Neural models have achieved great success on the task of machine reading comprehension (MRC), which are typically trained on hard labels. We argue that hard labels limit the model capability on generalization due to the label sparseness problem. In this paper, we propose a robust training method for MRC models to address this problem. Our method consists of three strategies, 1) label smoothing, 2) word overlapping, 3) distribution prediction. All of them help to train models on soft labels. We validate our approach on the representative architecture - ALBERT. Experimental results show that our method can greatly boost the baseline with 1% improvement in average, and achieve state-of-the-art performance on NewsQA and QUOREF.
Robust Unsupervised Neural Machine Translation with Adversarial Denoising Training
Haipeng Sun, Rui Wang, Kehai Chen, Xugang Lu, Masao Utiyama, Eiichiro Sumita and Tiejun Zhao
Unsupervised neural machine translation (UNMT) has recently attracted great interest in the machine translation community. The main advantage of the UNMT lies in its easy collection of required large training text sentences while with only a slightly worse performance than supervised neural machine translation which requires expensive annotated translation pairs. In most studies, the UMNT is trained with clean data without considering its robustness to the noisy data. However, in real-world scenarios, there usually exists noise in the collected input sentences which degrades the performance of the translation system since the UNMT is sensitive to the small perturbations of the input sentences. In this paper, we first time explicitly take the noisy data into consideration to improve the robustness of the UNMT based systems. First of all, we clearly defined two types of noises in training sentences, i.e., word noise and word order noise, and empirically investigate its effect in the UNMT, then we propose adversarial training methods with denoising process in the UNMT. Experimental results on several language pairs show that our proposed methods substantially improved the robustness of the conventional UNMT systems in noisy scenarios.
RuSemShift: a dataset of historical lexical semantic change in Russian
Julia Rodina and Andrey Kutuzov
We present RuSemShift, a large-scale manually annotated test set for the task of semantic change modeling in Russian for two long-term time period pairs: from the pre-Soviet through the Soviet times and from the Soviet through the post-Soviet times. Target words were annotated by multiple crowd-source workers. The annotation process was organized following the DURel framework and was based on sentence contexts extracted from the Russian National Corpus. Additionally, we report the performance of several distributional approaches on RuSemShift, achieving promising results, which at the same time leave room for other researchers to improve.
SaSAKE: Syntax and Semantics Aware Keyphrase Extraction from Research Papers
T.Y.S.S Santosh, Debarshi Kumar Sanyal, Plaban Kumar Bhowmick and Partha Pratim Das
Keyphrases in a research paper succinctly capture the primary content of the paper and also assist in indexing papers at a concept level. Given the huge rate at which scientific papers are being published today, it is important to have effective ways of automatically extracting keyphrases from a research paper. In this paper, we present our method, Syntax and Semantics Aware Keyphrase Extraction (SaSAKE) from research papers. It uses a transformer architecture, stacking up sentence encoders to incorporate sequential information, and graph encoders to incorporate syntactic and semantic dependency graph information. Incorporation of these dependency graphs helps to alleviate long-range dependency problems and identify the boundaries of multi-word keyphrases effectively. Experimental results on three benchmark datasets show that our proposed method SaSAKE achieves state-of-the-art performance in keyphrase extraction from scientific papers.
Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model
Sungrae Park, Geewook Kim, JUNYEOP LEE, Junbum Cha, Ji-Hoon Kim and Hwalsuk Lee
This paper introduces a method that efficiently reduces the computational cost and parameter size of Transformer. The proposed model, refer to as Group-Transformer, splits feature space into multiple groups, factorizes the calculation paths, and reduces computations for the group interaction. Extensive experiments on two benchmark tasks, enwik8 and text8, prove our model's effectiveness and efficiency in small-scale Transformers. To the best of our knowledge, Group-Transformer is the first attempt to design Transformer with the group strategy, widely used for efficient CNN architectures.
Schema Aware Semantic Reasoning for Interpreting Natural LanguageQueries in Enterprise Settings
Jaydeep Sen, Tanaya Babtiwale, Kanishk Saxena, Yash Butala, Sumit Bhatia and Karthik Sankaranarayanan
Natural Language Query interfaces allow the end-users to access the desired information without the need to know any specialized query language, data storage, or schema details. Even with the recent advances in NLP research space, the state-of-the-art QA systems fall short of understanding implicit intents of real-world Business Intelligence (BI) queries in enterprise systems, since Natural Language Understanding still remains an AI-hard problem. We posit that deploying ontology reasoning over domain semantics can help in achieving better natural language understanding for QA systems. In this paper, we specifically focus on building a Schema Aware Semantic Reasoning Framework that translates natural language interpretation as a sequence of solvable tasks by an ontology reasoner. We apply our framework on top of an ontology based, state-of-the-art natural language question-answering system ATHENA, and experiment with $4$ benchmarks focused on BI queries. Our experimental numbers empirically show that the Schema Aware Semantic Reasoning indeed helps in achieving significantly better results for handling BI queries with an average accuracy improvement of ~30%
Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning
Seoyeon Park and Cornelia Caragea
Scientific Keyphrase Identification and Classification is the process of detecting and classifying keyphrases from scholarly texts with their types from a set of predefined classes. This task has a wide range of benefits but is still challenging in performance due to the lack of large amounts of labeled data required for training deep neural models. In order to overcome this challenge, we explore pre-trained language models BERT and SciBERT with intermediate task transfer learning, using 42 data-rich related intermediate-target task combinations. We reveal that intermediate task transfer learning on SciBERT induces a better starting point for target task fine-tuning and achieves competitive performance in scientific keyphrase identification and classification compared to both previous works and strong baselines. However, interestingly, we observe that BERT with intermediate task transfer learning fails to improve the performance of scientific keyphrase identification and classification potentially due to significant catastrophic forgetting. This highlights that scientific knowledge achieved during the pre-training of language models on large scientific collections plays an important role in the target tasks. We also observe that sequence tagging related intermediate tasks, especially syntactic structure learning tasks such as POS Tagging, tend to work best for scientific keyphrase identification and classification.
Second-Order Unsupervised Neural Dependency Parsing
Songlin Yang, Yong Jiang, Wenjuan Han and Kewei Tu
Most of the unsupervised dependency parsers are based on first-order probabilistic generative models that only consider local parent-child information. Inspired by second-order supervised dependency parsing, we proposed a second-order extension of unsupervised neural dependency models that incorporate grandparent-child or sibling information. We also propose a novel design of the neural parameterization and optimization methods of the dependency models. In second-order models, the number of grammar rules grows cubically with the increase of vocabulary size, making it difficult to train lexicalized models that may contain thousands of words. To circumvent this problem while still benefiting from both second-order parsing and lexicalization, we use the agreement-based learning framework to jointly train a second-order unlexicalized model and a first-order lexicalized model. Experiments on multiple datasets show the effectiveness of our second-order models compared with recent state-of-the-art methods. Our joint model achieves a 10% improvement over the previous state-of-the-art parser on the full WSJ test set.
Seeing Both the Forest and the Trees: Multi-head Attention for Joint Classification on Different Compositional Levels
Miruna Pislar and Marek Rei
In natural languages, words are used in association to construct sentences. It is not words in isolation, but the appropriate use of hierarchical structures that conveys the meaning of the whole sentence. Neural networks have the ability to capture expressive language features; however, insights into the link between words and sentences are difficult to acquire automatically. In this work, we design a deep neural network architecture that explicitly wires lower and higher linguistic components; we then evaluate its ability to perform the same task at different hierarchical levels. Settling on broad text classification tasks, we show that our model, MHAL, learns to simultaneously solve them at different levels of granularity by fluidly transferring knowledge between hierarchies. Using a multi-head attention mechanism to tie the representations between single words and full sentences, MHAL systematically outperforms equivalent models that are not incentivized towards developing compositional representations. Moreover, we demonstrate that, with the proposed architecture, the sentence information flows naturally to individual words, allowing the model to behave like a sequence labeler (which is a lower, word-level task) even without any word supervision, in a zero-shot fashion.
Semantic Role Labeling with Heterogeneous Syntactic Knowledge
Qingrong Xia, Rui Wang, Zhenghua Li, Yue Zhang and Min Zhang
Recently, due to the correlation between syntax and semantics, incorporating syntactic knowledge into neural semantic role labeling (SRL) has achieved much attention. Most of previous syntax-aware SRL works focus on explicitly modeling homogeneous syntactic knowledge over tree outputs. In this work, we propose to encode heterogeneous syntactic knowledge for SRL from both explicit and implicit representations. First, we introduce graph convolutional networks to explicitly encode multiple automatic heterogeneous dependency parse trees. Second, we extract the implicit syntactic representations from the syntactic parser trained with heterogeneous treebanks. Finally, we inject the two kinds of heterogeneous syntax-aware representations into the base SRL model as extra inputs. We conduct experiments on two widely-used benchmark datasets, i.e., Chinese Proposition Bank 1.0 and English CoNLL-2005 dataset. Experimental results show that incorporating heterogeneous syntactic knowledge brings significant improvements over strong baselines. We further conduct detailed analysis to gain insights on the usefulness of heterogeneous (vs. homogeneous) syntactic knowledge and the effectiveness of our proposed approaches for modeling such knowledge.
Semi-supervised Autoencoding Projective Dependency Parsing
Xiao Zhang and Dan Goldwasser
We describe two end-to-end autoencoding models for semi-supervised graph-based dependency parsing. The first model is a Local Autoencoding Parser (LAP) encoding the input using continuous latent variables in a sequential manner; The second model is a Global Autoencoding Parser (GAP) encoding the input into dependency trees as latent variables, with exact inference. Both models consist of two parts: an encoder enhanced by deep neural networks (DNN) that can utilize the contextual information to encode the input into latent variables, and a decoder which is a generative model able to reconstruct the input. Both LAP and GAP admit a unified structure with different loss functions for labeled and unlabeled data with shared parameters. We conducted experiments on WSJ and UD dependency parsing data sets, showing that our models can exploit the unlabeled data to boost the performance given a limited amount of labeled data.
Semi-Supervised Dependency Parsing with Arc-Factored Variational Autoencoding
Ge Wang and Kewei Tu
Mannual annotation for dependency parsing is both labourious and time costly, resulting in the difficulty to learn practical dependency parsers for many languages due to the lack of labelled training corpora. To compensate for the scarcity of labelled data, semi-supervised dependency parsing methods are developed to utilize unlabelled data in the training procedure of dependency parsers. In previous work, the autoencoder framework is a prevalent approach for the utilization of unlabelled data. In this framework, training sentences are reconstructed from a decoder conditioned on dependency trees predicted by an encoder. The tree structure requirement brings challenges for both the encoder and the decoder. Sophisticated techniques are employed to tackle these challenges at the expense of model complexity and approximations in encoding and decoding. In this paper, we propose a model based on the variational autoencoder framework. By relaxing the tree constraint in both the encoder and the decoder during training, we make the learning of our model fully arc-factored and thus circumvent the challenges brought by the tree constraint. We evaluate our model on datasets across several languages and the results demonstrate the advantage of our model over previous approaches in both parsing accuracy and speed.
Semi-supervised Domain Adaptation for Dependency Parsing via Improved Contextualized Word Representations
Ying Li, Zhenghua Li and Min Zhang
In recent years, parsing performance is dramatically improved on in-domain texts thanks to the rapid progress of deep neural network models. The major challenge for current parsing research is to improve parsing performance on out-of-domain texts that are very different from the in-domain training data when there is only a small-scale out-domain labeled data. To deal with this problem, we propose to improve the contextualized word representations via adversarial learning and fine-tuning BERT processes. Concretely, we apply adversarial learning to three representative semi-supervised domain adaption methods, i.e., direct concatenation (CON), feature augmentation (FA), and domain embedding (DE) with two useful strategies, i.e., fused target-domain word representations and orthogonality constraints, thus enabling to model more pure yet effective domain-specific and domain-invariant representations. Simultaneously, we utilize a large-scale target-domain unlabeled data to fine-tune BERT with only the language model loss, thus obtaining reliable contextualized word representations that benefit for the cross-domain dependency parsing. Experiments on a benchmark dataset show that our proposed adversarial approaches achieve consistent improvement, and fine-tuning BERT further boosts parsing accuracy by a large margin. Our single model achieves the same state-of-the-art performance as the top submitted system in the NLPCC-2019 shared task, which uses ensemble models and BERT.
Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities
Hao Zhang, Jae Ro and Richard Sproat
Breaking domain names such as openresearch into component words open and research is important for applications like Text-to-Speech synthesis and web search. We link this problem to the classic problem of Chinese word segmentation and show the effectiveness of a tagging model based on Recurrent Neural Networks (RNNs) using characters as input. To compensate for the lack of training data, we propose a pre-training method on concatenated entity names in a large knowledge database. Pre-training improves the model by 33% and brings the sequence accuracy to 85%.
Sentence Matching with Syntax- and Semantics-Aware BERT
Tao Liu, Xin Wang, Chengguo Lv, Ranran Zhen and Guohong Fu
Sentence matching aims to identify the special relationship between two sentences, and it plays a key role in many natural language processing tasks. However, previous studies mainly focused on exploiting either syntactic or semantic information for sentence matching, and no studies consider integrating both of them. In this study, we propose integrating syntax and semantics into BERT with sentence matching. In particular, we use an implicit syntax and semantics integration method that is less sensitive to the output structure information. Thus the implicit integration can alleviate the error propagation problem. The experimental results show that our approach has achieved state-of-the-art or competitive performance on several sentence matching datasets, demonstrating the benefits of implicitly integrating syntactic and semantic features in sentence matching.
Sentiment Analysis for Emotional Speech Synthesis in a News Dialogue System
Hiroaki Takatsu, Ryota Ando, Yoichi Matsuyama and Tetsunori Kobayashi
As smart speakers and conversational robots become ubiquitous, the demand for expressive speech synthesis has increased. In this paper, to control the emotional parameters of the speech synthesis according to certain dialogue contents, we construct a news dataset with emotion labels (“positive,” “negative,” or “neutral”) annotated for each sentence. We then propose a method to identify emotion labels using a model combining BERT and BiLSTM-CRF, and evaluate its effectiveness using the constructed dataset. The results showed that the classification model performance can be efficiently improved by preferentially annotating news articles with low confidence in the human-in-the-loop machine learning framework.
Sentiment Forecasting in Dialog
Zhongqing Wang, Xiujun Zhu, Yue Zhang, Shoushan Li and Guodong Zhou
Sentiment forecasting in dialog aims to predict the polarity of next utterance to come, and can help speakers revise their utterances in sentimental utterances generation. However, the polarity of next utterance is normally hard to predict, due to the lack of content of next utterance (yet to come). In this study, we propose a Neural Sentiment Forecasting (NSF) model to address inherent challenges. In particular, we employ a neural simulation model to simulate the next utterance based on the context (previous utterances encountered). Moreover, we employ a sequence influence model to learn both pair-wise and seq-wise influence. Empirical studies illustrate the importance of proposed sentiment forecasting task, and justify the effectiveness of our NSF model over several strong baselines.
SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis
Jie Zhou, Junfeng Tian, Rui Wang, Yuanbin Wu, Wenming Xiao and liang he
Pre-trained language models have been widely applied to cross-domain NLP tasks like sentiment analysis, achieving state-of-the-art performance. However, due to the variety of users' emotional expressions across domains, fine-tuning the pre-trained models on the source domain tends to overfit, leading to inferior results on the target domain. In this paper, we pre-train a sentiment-aware language model (SentiX) via domain-invariant sentiment knowledge from large-scale review datasets, and utilize it for cross-domain sentiment analysis task without fine-tuning. We propose several pre-training tasks based on existing lexicons and annotations at both token and sentence levels, such as emoticons, sentiment words, and ratings, without human interference. A series of experiments are conducted and the results indicate the great advantages of our model. We obtain new state-of-the-art results in all the cross-domain sentiment analysis tasks, and our proposed SentiX can be trained with only 1% samples (18 samples) and it achieves better performance than BERT with 90% samples.
Similarity or deeper understanding? Analyzing the TED-Q dataset of evoked questions
Matthijs Westera, Jacopo Amidei and Laia Mayol
We take a close look at a recent dataset of TED-talks annotated with the questions they implicitly evoke, TED-Q (Westera et al., 2020). We test whether the evoked questions may reflect deep semantic/pragmatic interpretation or a more superficial notion of similarity or association. We do so by turning the TED-Q dataset into a binary classification task, constructing an analogous task from explicit questions we extract from the BookCorpus (Zhu et al., 2015), and fitting a BERT-based classifier alongside models based on different notions of similarity. The BERT classifier outperforms all similarity-based models, suggesting that there is more to identifying true evoked questions than plain similarity.
Situated and Interactive Multimodal Conversations
Seungwhan Moon, Satwik Kottur, Paul Crook, Ankita De, Shivani Poddar, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami, Eunjoon Cho, Rajen Subba and Alborz Geramifard
Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, and the user's utterances), and perform multimodal actions (\eg, displaying a route while generating the system's utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take multimodal actions grounded in a co-evolving multimodal input context in addition to the dialog history. We provide two SIMMC datasets totalling ~13K human-human dialogs (~169K utterances) collected using a multimodal Wizard-of-Oz (WoZ) setup, on two shopping domains: (a) furniture -- grounded in a shared virtual environment; and (b) fashion -- grounded in an evolving set of images. Datasets include multimodal context of the items appearing in each scene, and contextual NLU, NLG and coreference annotations using a novel and unified framework of SIMMC conversational acts for both user and assistant utterances.
SLICE: Supersense-based Lightweight Interpretable Contextual Embeddings
Cindy ALOUI, Alexis Nasr, Lucie Barque and Carlos Ramisch
Contextualised embeddings such as BERT have become de facto state-of-the-art references in many NLP applications, thanks to their impressive performances. However, their opaqueness makes it hard to interpret their behaviour. SLICE is a hybrid model that combines supersense labels with contextual embeddings. We introduce a weakly supervised method to learn interpretable embeddings from raw corpora and a small lists of seed words. Our model is able to represent both a word and its context as embeddings into the same compact space, whose dimensions correspond to interpretable supersenses. We assess the model in a task of word sense disambiguation for French nouns. The little amount of supervision required makes it particularly well suited for low-resourced languages. Thanks to its interpretability, we perform linguistic analyses about the predicted supersenses in terms of input word and context representations.
Solving Math Word Problems with Multi-Encoders and Multi-Decoders
Yibin Shen and Cheqing Jin
Math word problems solving remains a challenging task where potential semantic and mathematical logic need to be mined from natural language. Although previous researches employ the Seq2Seq technique to transform text descriptions into equation expressions, most of them achieve inferior performance due to insufficient consideration in the design of encoder and decoder. Specifically, these models only consider input/output objects as sequences, ignoring the important structural information contained in text descriptions and equation expressions. To overcome those defects, a model with multi-encoders and multi-decoders is proposed in this paper, which combines sequence-based encoder and graph-based encoder to enhance the representation of text descriptions, and generates different equation expressions via sequence-based decoder and tree-based decoder. Experimental results on the dataset Math23K show that our model outperforms existing state-of-the-art methods.
SOME: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction
Ryoma Yoshimura, Masahiro Kaneko, Tomoyuki Kajiwara and Mamoru Komachi
We propose a reference-less metric trained on manual evaluations of system outputs for grammatical error correction (GEC). Previous studies have shown that reference-less metrics are promising; however, existing metrics are not optimized for manual evaluations of the system outputs because no dataset of the system output exists with manual evaluation. This study manually evaluates outputs of GEC systems to optimize the metrics. Experimental results show that the proposed metric improves correlation with the manual evaluation in both system- and sentence-level meta-evaluation. Our dataset and metric will be made publicly available.
Span-based Joint Entity and Relation Extraction with Attention-based Span-specific and Contextual Semantic Representations
bin ji, Shasha Li, Jie Yu, Jun Ma, Qingbo Wu and Yusong Tan
Span-based joint extraction models have shown their efficiency on entity recognition and relation extraction. These models regard text spans as candidate entities and span tuples as candidate relation tuples. Span semantic representations are shared in both entity recognition and relation extraction. While existing models cannot well capture semantics of these candidate entities and relations. To address these problems, we introduce a span-based joint extraction framework with attention-based semantic representations. Specially, attentions are utilized to calculate semantic representations, including span-specific and contextual ones. We further investigate effects of four attention variants in generating contextual semantic representations. Experiments show that our model outperforms previous systems and achieves state-of-the-art results on ACE2005, CoNLL2004 and ADE.
SpanAlign: Sentence Alignment Method based on Cross-Language Span Prediction and ILP
Katsuki Chousa, Masaaki Nagata and Masaaki Nishino
In this paper, we propose a novel method of automatic sentence alignment from noisy parallel documents. We first formalize the sentence alignment problem as the independent predictions of spans in the target document from sentences in the source document and we then introduce a total optimization method using integer linear programming to prevent span overlapping and obtain non-monotonic alignments. We implement cross-language span prediction by fine-tuning pre-trained multilingual language model with pseudo-labeled data obtained from unsupervised sentence alignment method. Since we fine-tune pre-trained multilingual language models and these models can capture the context of target document, our proposed method results higher accuracies than the baseline. In sentence alignment experiments on English-Japanese, our method achieved 72.6 F1 scores, which are 10.3 points higher than the baseline method. In Particular, our method improved by 53.9 F1 scores for extracting non-parallel sentences. Our method improved the downstream machine translation accuracy by 4.1 BLEU scores when the extracted bilingual sentences are used for fine-tuning a pre-trained Japanese-to-English translation model.
Speaker-change Aware CRF for Dialogue Act Classification
Guokan Shang, Antoine Tixier, Michalis Vazirgiannis and Jean-Pierre Lorré
Recent work in Dialogue Act (DA) classification approaches the task as a sequence labeling problem, using neural network models coupled with a Conditional Random Field (CRF) as the last layer. CRF models the conditional probability of the target DA label sequence given the input utterance sequence. However, the task involves another important input sequence, that of speakers, which is ignored by previous work. To address this limitation, this paper proposes a simple modification of the CRF layer that takes speaker-change into account. Experiments on the SwDA corpus show that our modified CRF layer outperforms the original one, with very wide margins for some DA labels. Further, visualizations demonstrate that our CRF layer can learn meaningful, sophisticated transition patterns between DA label pairs conditioned on speaker-change in an end-to-end way. Code is publicly available.
Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity
Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti, Anna Korhonen and Goran Glavaš
Unsupervised pretraining models have been shown to facilitate a wide range of downstream NLP applications. These models, however, retain some of the limitations of traditional static word embeddings. In particular, they encode only the distributional knowledge available in raw text corpora, incorporated through language modeling objectives. In this work, we complement such distributional knowledge with external lexical knowledge, that is, we integrate the discrete knowledge on word-level semantic similarity into pretraining. To this end, we generalize the standard BERT model to a multi-task learning setting where we couple BERT’s masked language modeling and next sentence prediction objectives with an auxiliary task of binary word relation classification. Our experiments suggest that our “Lexically Informed” BERT (LIBERT), specialized for the word-level semantic similarity, yields better performance than the lexically blind “vanilla” BERT on several language understanding tasks. Concretely, LIBERT outperforms BERT in 9 out of 10 tasks of the GLUE benchmark and is on a par with BERT in the remaining one. Moreover, we show consistent gains on 3 benchmarks for lexical simplification, a task where knowledge about word-level semantic similarity is paramount, and large gains on lexical semantic reasoning probes.
Specializing Word Vectors by Spectral Decomposition on Heterogeneously Twisted Graphs
Yuanhang Ren and Ye Du
Traditional word vectors, such as word2vec and glove, have a well-known inclination to conflate the semantic similarity with other semantic relations. A retrofitting procedure may be needed to solve this issue. In this work, we propose a new retrofitting method called Heterogeneously Retrofitted Spectral Word Embedding. It heterogeneously twists the similarity matrix of word pairs with lexical constraints. A new set of word vectors is generated by a spectral decomposition of the similarity matrix, which has a linear algebraic analytic form. Our method has a competitive performance compared with the state-of-the-art retrofitting method such as AR \cite{mrkvsic2017semantic}. In addition, since our embedding has a clear linear algebraic relationship with the similarity matrix, we carefully study the contribution of each component in our model. Last but not least, our method is very efficient to execute.
Spotting Text-to-Text Patterns for Multiple-Choice Question Answering
Jheng-Hong Yang, Sheng-Chieh Lin, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang and Jimmy Lin
While internalized implicit “knowledge base” in pre-trained models brings fruitful progress tonatural language understanding tasks, identifying an effective way to retrieve the knowledge fromthe model remains not fully explored. Based on the state-of-the-art unified text-to-text transfertransformer (T5) model, this work explores an approach to extract implicit knowledge on multiple-choice (MC) question answering tasks. Through our experiments on three representative MCdatasets, we find that our text-to-text template approach performs surprisingly well. Furthermore,we verify that the proposed template can be easily extended to other MC tasks with an extraknowledge base. Starting from the MC task, this work initiates further research to find genericnatural language patterns that can effectively extract the stored knowledge in the pre-trainedmodels under the text-to-text paradigm.
SQL Generation via Machine Reading Comprehension
ZEYU YAN, Jianqiang Ma, Yang Zhang and Jianping Shen
On the WikiSQL benchmark, state-of-the-art text-to-SQL systems typically take a slot-filling approachby building several specialized models for each type of slots. Such modularized systems are notonly complex but also fall short in jointly learning for different slots.
Statistical Parsing of Tree Wrapping Grammars
Tatiana Bladier, Jakub Waszczuk and Laura Kallmeyer
We describe an approach to statistical parsing with Tree-Wrapping Grammars (TWG). TWG is a tree-rewriting formalism which includes the tree-combination operations of substitution, sister-adjunction and tree-wrapping substitution. TWGs can be extracted from constituency treebanks and aim at representing long distance dependencies (LDDs) in a linguistically adequate way. We present a parsing algorithm for TWGs based on neural supertagging and A* parsing. We extract a TWG for English from the treebanks for Role and Reference Grammar and discuss first parsing results with this grammar.
Story Generation with Rich Details
Fangzhou Zhai, Vera Demberg and Alexander Koller
Automatically generated stories need to be not only coherent, but also interesting. Apart from realizing a story line, the text also needs to include rich details to engage the readers. We propose a model that features two different generation components: an outliner, which proceeds the main story line to realize global coherence; a detailer, which supplies relevant details to the story in a locally coherent manner. Human evaluations show our model substantially improves the informativeness of generated text while retaining its coherence, outperforming various baselines.
Studying Taxonomy Enrichment on Diachronic WordNet Versions
Irina Nikishina, Varvara Logacheva, Alexander Panchenko and Natalia Loukachevitch
Ontologies, taxonomies and thesauri have always been in high demand in a large number of NLP tasks. However, most studies are focused on the creation of lexical resources rather than maintaining the existing ones and keeping them up-to-date. In this paper we address the problem of taxonomy enrichment. Namely, we explore the possibilities of taxonomy extension in a resource-poor setting. We present a bunch of methods which are applicable to a large number of languages. We create a novel English dataset for training and evaluation of taxonomy enrichment systems and describe a technique of creating such datasets for other languages.
Style versus Content: A distinction without a (learnable) difference?
Somayeh Jafaritazehjani, Gwénolé Lecorvé, Damien Lolive and John Kelleher
Textual style transfer involves modifying the style of a text while preserving its content. This assumes that it is possible to separate style from content. This paper investigates whether this separation is possible. We use sentiment transfer as our case study for style transfer analysis. Our experimental methodology frames style transfer as a multi-objective problem, balancing style shift with content preservation and fluency. Due to the lack of parallel data for style transfer we employ a variety of adversarial encoder-decoder networks in our experiments. Also, we use of a probing methodology to analyse how these models encode style-related features in their latent spaces. The results of our experiments which are further confirmed by a human evaluation reveal the inherent trade-off between the multiple style transfer objectives which indicates that style cannot be usefully separated from content within these style-transfer systems.
Summarize before Aggregate: A Global-to-local Heterogeneous Graph Inference Network for Conversational Emotion Recognition
Dongming Sheng, Dong Wang, Ying Shen, Haitao Zheng and Haozhuang Liu
Conversational Emotion Recognition (CER) is a crucial task in Natural Language Processing (NLP) with wide applications. Prior works in CER generally focus on modeling emotion influences solely with utterance-level features, with little attention paid on phrase-level semantic connection between utterances. Phrases carry sentiments when they are referred to emotional events under certain topics, providing a global semantic connection between utterances throughout the entire conversation. In this work, we propose a two-stage Summarization and Aggregation Graph Inference Network (SumAggGIN), which seamlessly integrates inference for topic-related emotional phrases and local dependency reasoning over neighbouring utterances in a global-to-local fashion. Topic-related emotional phrases, which constitutes the global topic-related emotional connections, are recognized by our proposed heterogeneous Summarization Graph. Local dependencies, which captures short-term emotional effects between neighbouring utterances, are further injected via an Aggregation Graph to distinguish the subtle differences between utterances containing emotional phrases. The two steps of graph inference are tightly-coupled for a comprehensively understanding of emotional fluctuation. Experimental results on three CER benchmark datasets verify the effectiveness of our proposed model, which outperforms the state-of-the-art approaches.
Summarizing Medical Conversations via Identifying Important Utterances
Yan Song, Yuanhe Tian, Nan Wang and Fei Xia
Summarization is an important natural language processing (NLP) task in identifying key information from text. For conversations, the summarization systems need to extract salient contents from spontaneous utterances by multiple speakers. In a special task-oriented scenario, namely medical conversations between patients and doctors, the symptoms, diagnoses, and treatments could be highly important because the nature of such conversation is to find a medical solution to the problem proposed by the patients. Especially consider that current online medical platforms provide millions of public available conversations between real patients and doctors, where the patients propose their medical problems and the registered doctors offer diagnosis and treatment, a conversation in most cases could be too long and the key information is hard to be located. Therefore, summarizations to the patients' problems and the doctors' treatments in the conversations can be highly useful, in terms of helping other patients with similar problems have a precise reference for potential medical solutions. In this paper, we focus on medical conversation summarization, using a dataset of medical conversations and corresponding summaries which were crawled from a well-known online healthcare service provider in China. We propose a hierarchical encoder-tagger model (HET) to generate summaries by identifying important utterances (with respect to problem proposing and solving) in the conversations. For the particular dataset used in this study, we show that high-quality summaries can be generated by extracting two types of utterances, namely, problem statements and treatment recommendations. Experimental results demonstrate that HET outperforms strong baselines and models from previous studies, and adding conversation-related features can further improve system performance.
SumTitles: a Summarization Dataset with Low Extractiveness
Valentin Malykh, Konstantin Chernis, Ekaterina Artemova and Irina Piontkovskaya
The existing dialogue summarization corpora are significantly extractive. We introduce a methodology for dataset extractiveness evaluation and present a new low-extractive corpus of movie dialogues for abstractive text summarization and present a baseline evaluation on it. The corpus contains 153k dialogues and consists of three parts: 1) automatically aligned subtitles, 2) automatically aligned scenes from scripts, and 3) manually aligned scenes from scripts. In addition, we present an alignment algorithm to construct such a corpus.
Supervised Visual Attention for Multimodal Neural Machine Translation
Tetsuroh Nishihara, Akihiro Tamura, Takashi Ninomiya, Yutaro Omote and Hideki Nakayama
In this paper, we propose a supervised visual attention mechanism for multimodal neural machine translation (MNMT), which is trained with constraints based on manual alignments between words in a sentence and their corresponding regions of an image. The proposed visual attention mechanism captures the relationship between a word and an image region more precisely than a conventional visual attention mechanism, which is trained in an unsupervised manner through MNMT training. Our experiments on the English-German and German-English translation tasks using the Multi30k dataset and the English-Japanese and Japanese-English translation tasks using the Flickr30k Entities JP dataset show that a Transformer-based MNMT model can be improved by incorporating our proposed supervised visual attention mechanism and also show that further improvements can be achieved by combining with a supervised cross-lingual attention mechanism (up to +1.61 BLEU, +1.7 METEOR).
SWAFN: Sentimental Words Aware Fusion Network for Multimodal Sentiment Analysis
Minping Chen and Xia Li
Multimodal sentiment analysis aims to predict sentiment of language text with the help of other modalities, such as vision and acoustic features. Previous studies focused on learning the joint representation of multiple modalities, ignoring some useful knowledge contained in language modal. In this paper, we try to incorporate sentimental words knowledge into the fusion network to guide the learning of joint representation of multimodal features. Our method consists of two components: shallow fusion part and aggregation part. For the shallow fusion part, we use crossmodal coattention mechanism to obtain bidirectional context information of each two modals to get the fused shallow representations. For the aggregation part, we design a multitask of sentimental words classification to help and guide the deep fusion of the three modalities and obtain the final sentimental words aware fusion representation. We carry out several experiments on CMU-MOSI, CMU-MOSEI and YouTube datasets. The experimental results show that introducing sentimental words prediction as a multitask can really improve the fusion representation of multiple modalities.
Syllable-based Neural Thai Word Segmentation
Pattarawat Chormai, Ponrawee Prasertsom, Jin Cheevaprawatdomrong and Attapol Rutherford
Word segmentation is a challenging pre-processing step for Thai Natural Language Processing due to the lack of explicit word boundaries.The previous systems rely on powerful neural network architecture alone and ignore linguistic substructures of Thai words. We utilize the linguistic observation that Thai strings can be segmented into syllables, which should narrow down the search space for the word boundaries and provide helpful features. Here, we propose a neural Thai Word Segmenter that uses syllable embeddings to capture linguistic constraints and uses dilated CNN filters to capture the environment of each character. Within this goal, we develop the first ML-based Thai orthographical syllable segmenter, which yields syllable embeddings to be used as features by the word segmenter. Our word segmentation system outperforms the previous state-of-the-art system in both speed and accuracy on both in-domain and out-domain datasets.
Synonym Knowledge Enhanced Reader for Chinese Idiom Reading Comprehension
Siyu Long, Ran Wang, Kun Tao, Jiali Zeng and Xinyu Dai
Machine reading comprehension (MRC) is the task that asks a machine to answer questions based on a given context. For Chinese MRC, due to the non-literal and non-compositional semantic characteristics, Chinese idioms pose unique challenges for machines to understand. Previous studies tend to treat idioms separately without fully exploiting the relationship among them. In this paper, we first define the concept of literal meaning coverage to measure the consistency between semantics and literal meanings for Chinese idioms. With the definition, we prove that the literal meanings of many idioms are far from their semantics, and we also verify that the synonymic relationship can mitigate this inconsistency, which would be beneficial for idiom comprehension. Furthermore, to fully utilize the synonymic relationship, we propose the synonym knowledge enhanced reader. Specifically, for each idiom, we first construct a synonym graph according to the annotations from the high-quality synonym dictionary or the cosine similarity between the pre-trained idiom embeddings and then incorporate the graph attention network and gate mechanism to encode the graph. Experimental results on ChID, a large-scale Chinese idiom reading comprehension dataset, show that our model achieves state-of-the-art performance.
Syntactic Graph Convolutional Network for Spoken Language Understanding
Keqing He, Shuyu Lei, Yushu Yang, Huixing Jiang and Zhongyuan Wang
Slot filling and intent detection are two major tasks for spoken language understanding. In most existing work, these two tasks are built as joint models with multi-task learning with no consideration of prior linguistic knowledge. In this paper, we propose a novel joint model that applies a graph convolutional network over dependency trees to integrate the syntactic structure for learning slot filling and intent detection jointly. Experimental results show that our proposed model achieves state-of-the-art performance on two public benchmark datasets and outperforms existing work. At last, we apply the BERT model to further improve the performance on both slot filling and intent detection.
Syntactically Aware Cross-Domain Aspect and Opinion Terms Extraction
Oren Pereg, Daniel Korat and Moshe Wasserblat
A fundamental task of fine-grained sentiment analysis is aspect and opinion terms extraction. Supervised-learning approaches have shown good results for this task; however, they fail to scale across domains where labeled data is lacking. Non pre-trained unsupervised domain adaptation methods that incorporate external linguistic knowledge have proven effective in transferring aspect and opinion knowledge from a labeled source domain to un-labeled target domains; however, pre-trained transformer-based models like BERT and RoBERTa already exhibit substantial syntactic knowledge. In this paper, we propose a method for incorporating external linguistic information into a self-attention mechanism coupled with the BERT model. This enables leveraging the intrinsic knowledge existing within BERT together with externally introduced syntactic information, to bridge the gap across domains. We successfully demonstrate enhanced results on three benchmark datasets.
Syntax-Aware Graph Attention Network for Aspect-Level Sentiment Classification
Lianzhe Huang, Xin Sun, Sujian Li, Linhao Zhang and Houfeng Wang
Aspect-level sentiment classification aims to distinguish the sentiment polarities over aspect terms in a sentence. Existing approaches mostly focus on modeling the relationship between the given aspect words and their contexts with attention, and ignore the use of more elaborate knowledge implicit in the context. In this paper, we exploit syntactic awareness to the model by the graph attention network on the dependency tree structure and external pre-training knowledge by BERT language model, which helps to model the interaction between the context and aspect words better. And the subwords of BERT are integrated into the dependency tree graphs, which fully exerts the strength of the BERT. Experiments demonstrate the effectiveness of our model.
Taking the Correction Difficulty into Account in Grammatical Error Correction Evaluation
Takumi Gotou, Ryo Nagata, Masato Mita and Kazuaki Hanawa
This paper presents performance measures for grammatical error correction which take into account the difficulty of error correction. To the best of our knowledge, no conventional measure has such functionality despite the fact that some errors are easy to correct and others are not. The main purpose of this work is to provide a way of determining the difficulty of error correction and to motivate researchers in the domain to attack such difficult errors. The performance measures are based on the simple idea that the more systems successfully correct an error, the easier it is considered to be. This paper presents a set of algorithms to implement this idea. It evaluates the performance measures quantitatively and qualitatively on a wide variety of corpora and systems, revealing that they agree with our intuition of correction difficulty. A scorer and difficulty weight data based on the algorithms have been made available on the web.
Target Word Masking for Location Metonymy Resolution
Haonan Li, Maria Vasardani, Martin Tomko and Timothy Baldwin
Existing metonymy resolution approaches rely on features extracted from external resources like dictionaries and hand-crafted lexical resources. In this paper, we propose an end-to-end word-level classification approach based only on BERT, without dependencies on taggers, parsers, curated dictionaries of place names, or other external resources. We show that our approach achieves the state-of-the-art on 5 datasets, surpassing conventional BERT models and benchmarks by a large margin. We also show that our approach generalises well to unseen data.
Task-Aware Representation of Sentences for Generic Text Classification
Kishaloy Halder, Alan Akbik, Josip Krapac and Roland Vollgraf
State-of-the-art approaches for text classification leverage a transformer architecture with a linear layer on top that outputs a class distribution for a given prediction problem. However, this powerful approach suffers from conceptual limitations that affect its utility in few-shot or zero-shot transfer learning scenarios. First, the number of classes to predict needs to be pre-defined. In a transfer learning setting, in which new classes are added to an already trained classifier, all information contained in a linear layer is discarded, and a new layer is trained from scratch. Second, this approach only learns the semantics of classes implicitly from training examples, as opposed to leveraging the explicit semantic information provided by the natural language names of the classes. For instance, a classifier trained to predict the topics of news articles might have classes like
Temporal Relations Annotation and Extrapolation Based on Semi-intervals and Boundig Relations
Alejandro Pimentel, Gemma Bel Enguix, Gerardo Sierra Martínez and Azucena Montes
Computational treatment of temporal relations is based on the work of Allen, who establishes 13 different types, and Freksa, who designs a cognitive procedure to manage them. Freksa’s notation is not widely used because, although it has cognitive and expressive advantages, it is too complex from the computational perspective. This paper proposes a system for the annotation and management of temporal relations that combines the richness and expressiveness of Freksa’s approach with the simplicity of Allen’s notation. Our method is summarized in the application of bounding relations, thanks to which it is possible to obtain the temporary representation of complete neighborhoods capable of representing vague temporal relationships such as those that can be found frequently in a text. The aforementioned advantages are obtained without the need to greatly increase the complexity of the labeling process, since the markup language is almost the same as TimeML, to which only a second temporary relationship type label “relType” is added. Our experiments show that the temporal relationships that present vagueness are in fact much more common than those in which a single relationship can be established precisely. Because of this, our new labeling system achieves a representation of the time more attached to reality.
TeRo: A Time-aware Knowledge Graph Embedding via Temporal Rotation
Chengjin Xu, Mojtaba Nayyeri, Fouad Alkhoury, Hamed Shariat Yazdi and Jens Lehmann
In the last few years, there has been a surge of interest in learning representations of entities and relations in knowledge graph (KG). However, the recent availability of temporal knowledge graphs (TKGs) that contain time information for each fact created the need for reasoning over time in such TKGs. In this regard, we present a new approach of TKG embedding, TeRo, which defines the temporal evolution of entity embedding as a rotation from the initial time to the current time in the complex vector space. Specially, for facts involving time intervals, each relation is represented as a pair of dual complex embeddings to handle the beginning and the end of the relation, respectively. We show our proposed model overcomes the limitations of the existing KG embedding models and TKG embedding models and has the ability of learning and inferring various relation patterns over time. Experimental results on three different TKGs show that TeRo significantly outperforms existing state-of-the-art models for link prediction. In addition, we analyze the effect of time granularity on link prediction over TKGs, which as far as we know has not been investigated in previous literature.
Text Classification by Contrastive Learning and Cross-lingual Data Augmentation for Alzheimer’s Disease Detection
Zhiqiang Guo, Zhaoci Liu, Zhenhua Ling, Shijin Wang, Lingjing Jin and Yunxia Li
Data scarcity is always a constraint on analyzing speech transcriptions for automatic Alzheimer’s disease (AD) detection, especially when the subjects are non-English speakers. To deal with this issue, this paper first proposes a contrastive learning method to obtain effective representations for text classification based on monolingual embeddings of BERT. Furthermore, a cross-lingual data augmentation method is designed by building autoencoders to learn the text representations shared by both languages. Experiments on a Mandarin AD corpus show that the contrastive learning method can achieve better detection accuracy than conventional CNN-based and BERTbased methods. Our cross-lingual data augmentation method also outperforms other compared methods when using another English AD corpus for augmentation. Finally, a best detection accuracy of 81.6% is obtained by our proposed methods on the Mandarin AD corpus.
The ApposCorpus: a new multilingual, multi-domain dataset for factual appositive generation
Yova Kementchedjhieva, Di Lu and Joel Tetreault
News articles, image captions, product reviews and many other texts mention people and organizations whose name recognition could vary for different audiences. In such cases, background information about the named entities could be provided in the form of an appositive noun phrase, either written by a human or generated automatically. We expand on the previous work in appositive generation with a new, more realistic, end-to-end definition of the task, instantiated by a dataset that spans four languages (English, Spanish, German and Polish), two entity types (person and organization) and two domains (Wikipedia and News). We carry out an extensive analysis of the data and the task, pointing to the various modeling challenges it poses. The results we obtain with standard language generation methods show that the task is indeed non-trivial, and leaves plenty of room for improvement.
The Devil is in the Details: Evaluating Limitations of Transformer-based Methods for Granular Tasks
Brihi Joshi, Leonardo Neves, Neil Shah and Francesco Barbieri
Contextual embeddings derived from transformer-based neural language models have shown state-of-the-art performance for various tasks such as question answering, sentiment analysis, and textual similarity in recent years. Extensive work shows how accurately such models can represent abstract, semantic information present in text. In this expository work, we explore a tangent direction and analyze such models' performance on tasks that require a more granular level of representation. We focus on the problem of textual similarity from two perspectives: matching documents on a granular level (requiring embeddings to capture fine-grained attributes in the text), and an abstract level (requiring embeddings to capture overall textual semantics). We empirically demonstrate, across two datasets from different domains, that despite high performance in abstract document matching as expected, contextual embeddings are consistently (and at times, vastly) outperformed by simple baselines like TF-IDF for more granular tasks. We then propose a simple but effective method to incorporate TF-IDF into models that use contextual embeddings, achieving relative improvements of up to 36% on granular tasks.
The Indigenous Languages Technology project: An empowerment-oriented approach to developing language software
Roland Kuhn, Fineen Davis, Alain Désilets, Eric Joanis, Anna Kazantseva, Rebecca Knowles, Patrick Littell, Delaney Lothian, Aidan Pine, Caroline Running Wolf, Eddie Santos, Darlene Stewart, Gilles Boulianne, Vishwa Gupta, Brian Maracle Owennatékha, Akwiratékha’ Martin, Christopher Cox, Marie-Odile Junker, Olivia Sammons, Delasie Torkornoo, Nathan Thanyehténhas Brinklow, Sara Child, Benoît Farley, David Huggins-Daines, Daisy Rosenblum and Heather Souter
KEYWORDS (a bug on the COLING website made it impossible to enter these in the designated field): Indigenous languages in Canada, polysynthetic languages, verb conjugation, bilingual Inuktut-English corpus, transcription bottleneck, text prediction, read-along audiobooks
The SADID Evaluation Datasets for Low-Resource Spoken Language Machine Translation of Arabic Dialects
Wael Abid
Low-resource Machine Translation recently gained a lot of popularity, and for certain languages, it has made great strides. However, it is still difficult to track progress in other languages for which there is no publicly available evaluation data. In this paper, we introduce benchmark datasets for Arabic and its dialects. We describe our design process and motivations and analyze the datasets to understand their resulting properties. Numerous successful attempts use large monolingual corpora to augment low-resource pairs. We try to approach augmentation differently and investigate whether it is possible to improve MT models without any external sources of data. We accomplish this by bootstrapping existing parallel sentences and complement this with multilingual training to achieve strong baselines.
The Transference Architecture for Automatic Post-Editing
Santanu Pal, Hongfei Xu, Nico Herbig, Sudip Kumar Naskar, Antonio Krüger and Josef van Genabith
In automatic post-editing (APE) it makes sense to condition post-editing (pe) decisions on both the source (src) and the machine translated text (mt) as input. This has led to multi-encoder based neural APE approaches. A research challenge now is the search for architectures that best support the capture, preparation and provision of src and mt information and its integration with pe decisions. In this paper we present an efficient multi-encoder based APE model, called transference. Unlike previous approaches, it (i) uses a transformer encoder block for src, (ii) followed by a decoder block, but without masking for self-attention on mt, which effectively acts as second encoder combining src --> mt, and (iii) feeds this representation into a final decoder block generating pe. Our model outperforms the best performing systems by 1 BLEU point on the WMT 2016, 2017, and 2018 English--German APE shared tasks (PBSMT and NMT). Furthermore, the results of our model on the WMT 2019 APE task using NMT data shows a comparable performance to the state-of-the-art system. The inference time of our model is similar to the vanilla transformer-based NMT system although our model deals with two separate encoders. We further investigate the importance of our newly introduced second encoder and find that a too small amount of layers does hurt the performance, while reducing the number of layers of the decoder does not matter much.
The Two Shades of Dubbing in Neural Machine Translation
Alina Karakanta, Supratik Bhattacharya, Shravan Nayak, Timo Baumann, Matteo Negri and Marco Turchi
Dubbing two shades; synchronisation constraints are applied only when the actor's mouth is visible on screen, while the translation is unconstrained for off-screen dubbing. Consequently, different synchronisation requirements, and therefore translation strategies, are applied depending on the type of dubbing. In this work, we manually annotate an existing dubbing corpus (Heroes) for this dichotomy. We show that, even though we did not observe distinctive features between on- and off-screen dubbing at the textual level, on-screen dubbing is more difficult for MT (-4 BLEU points). Moreover, synchronisation constraints dramatically decrease translation quality for off-screen dubbing. We conclude that, distinguishing between on-screen and off-screen dubbing is necessary for determining successful strategies for dubbing-customised Machine Translation.
TIMBERT: Toponym Identifier For The Medical Domain Based on BERT
MohammadReza Davari, Leila Kosseim and Tien Bui
In this paper, we propose an approach to automate the process of place name detection in the medical domain to enable epidemiologists to better study and model the spread of viruses. We created a family of Toponym Identification Models based on BERT (TIMBERT), in order to learn in an end-to-end fashion the mapping from an input sentence to the associated sentence labeled with toponyms. When evaluated with the SemEval 2019 task 12 test set (Weissenbacher et al., 2019), our best TIMBERT model achieves an F1 score of 90.85%, a significant improvement compared to the state-of-the-art of 89.10% (Wang et al., 2019).
Tiny Word Embeddings Using Globally Informed Reconstruction
Sora Ohashi, Mao Isogawa, Tomoyuki Kajiwara and Yuki Arase
We reduce the model size of pre-trained word embeddings by a factor of $200$ while preserving its quality. Previous studies in this direction created a smaller word embedding model by reconstructing pre-trained word representations from those of subwords, which allows to store only a smaller number of subword embeddings in the memory. However, previous studies that train the reconstruction models using only target words cannot reduce the model size extremely while preserving its quality. Inspired by the observation of words with similar meanings having similar embeddings, our reconstruction training learns the global relationships among words, which can be employed in various models for word embedding reconstruction. Experimental results on word similarity benchmarks show that the proposed method improves the performance of the all subword-based reconstruction models.
To What Degree Can Language Borders Be Blurred In BERT-based Multilingual Spoken Language Understanding?
Quynh Do, Judith Gaspers, Tobias Roeding and Melanie Bradford
This paper addresses the question as to what degree a BERT-based multilingual Spoken Language Understanding (SLU) model can transfer knowledge across languages. Through experiments we will show that, although it works surprisingly well even on distant languages, there is still a gap to the ideal multilingual performance. In addition, we propose a novel BERT-based model architecture trained adversarially to learn language-shared and language-specific representations for multilingual SLU. In our experiments, the model proved to be able to narrow the gap to the ideal multilingual performance.
ToHRE: A Top-Down Classification Strategy with Hierarchical Bag Representation for Distantly Supervised Relation Extraction
Erxin Yu, Wenjuan Han, Yuan Tian and Yi Chang
Distantly Supervision Relation Extraction (DSRE) has proven to be effective to find relational facts from texts, but it still suffers from two main problems: the wrong labeling problem and the long-tail problem. Most of the existing approaches address these two problems through flat classification, which lacks hierarchical information of relations. To leverage the informative relation hierarchies, we formulate DSRE as a hierarchical classification task and propose a novel hierarchical classification framework, which extracts the relation in a top-down manner. Specifically, in our proposed framework, 1) we use a hierarchically-refined representation method to achieve hierarchy-specific representation; 2) a novel top-down classification strategy is introduced instead of training a set of local classifiers. The experiments on NYT dataset demonstrate that our approach significantly outperforms other state-of-the-art approaches, especially for the long-tail problem.
Token Drop mechanism for Neural Machine Translation
Huaao Zhang, Shigui Qiu, Xiangyu Duan and Min Zhang
Neural machine translation with millions of parameters is vulnerable to unfamiliar inputs. We propose Token Drop to improve generalization and avoid overfitting for the NMT model. Similar to word dropout, whereas we replace dropped token with a special token instead of setting zero to words. We further introduce two self-supervised objectives: Replaced Token Detection and Dropped Token Prediction. Our method aims to force model generating target translation with less information, in this way the model can learn textual representation better. Experiments on Chinese-English and English-Romanian benchmark demonstrate the effectiveness of our approach and our model achieves significant improvements over a strong Transformer baseline.
Topic-driven Ensemble for Online Advertising Generation
Egor Nevezhin, Nikolay Butakov, Maria Khodorchenko, Maxim Petrov and Denis Nasonov
Online advertising is one of the most widespread ways to reach and increase a target audience for those selling products. Usually having a form of a banner, advertising engages users into visiting a corresponding webpage. Professional generation of banners requires creative and writing skills and a basic understanding of target products. The great variety of goods presented in the online market enforce professionals to spend more and more time creating new advertisements different from existing ones. In this paper, we propose a neural network-based approach for the automatic generation of online advertising using texts from given webpages as sources. The important part of the approach is training on open data available online, which allows avoiding costly procedures of manual labeling. Collected open data consist of multiple subdomains with high data heterogeneity. The subdomains belong to different topics and vary in used vocabularies, phrases, styles that lead to reduced quality in adverts generation. We try to solve the problem of identifying existed subdomains and proposing a new ensemble approach based on exploiting multiple instances of a seq2seq model. Our experimental study on a dataset in the Russian language shows that our approach can significantly improve the quality of adverts generation.
Topic-relevant Response Generation using Optimal Transport for an Open-domain Dialog System
Shuying Zhang, Tianyu Zhao and Tatsuya Kawahara
Conventional neural generative models tend to generate safe and generic responses which have little connection with previous utterances semantically and would disengage users in a dialog system. To generate relevant responses, we propose a method that employs two types of constraints - topical constraint and semantic constraint. Under the hypothesis that a response and its context have higher relevance when they share the same topics, the topical constraint encourages the topics of a response to match its context by conditioning response decoding on topic words' embeddings. The semantic constraint, which encourages a response to be semantically related to its context by regularizing the decoding objective function with semantic distance, is proposed. Optimal transport is applied to compute a weighted semantic distance between the representation of a response and the context. Generated responses are evaluated by automatic metrics, as well as human judgment, showing that the proposed method can generate more topic-relevant and content-rich responses than conventional models.
Towards A Friendly Online Community: An Unsupervised Style Transfer Framework for Profanity Redaction
Minh Tran, Yipeng Zhang and Mohammad Soleymani
Offensive and abusive language is a pressing problem on social media platforms. In this work, we propose a method for transforming offensive comments, statements containing profanity or offensive language, into non-offensive ones. We design a Retrieve, Generate and Edit unsupervised style transfer pipeline to redact the offensive comments in a word-restricted manner while maintaining a high level of fluency and preserving the content of the original text. We extensively evaluate our method's performance and compare it to previous style transfer models using both automatic metrics and human evaluations. Experimental results show that our method outperforms other models on human evaluations and is the only approach that consistently performs well on all automatic evaluation metrics.
Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction
Tong Zhu, Haitao Wang, Junjie Yu, Wenliang Chen, Wei Zhang and Min Zhang
In recent years, distantly-supervised relation extraction has achieved a certain success by using deep neural networks. Distant Supervision (DS) can automatically generate large-scale annotated data by aligning entity pairs from Knowledge Bases (KB) to sentences. However, these DS-generated datasets inevitably have wrong labels that result in incorrect evaluation scores during testing, which may mislead the researchers. To solve this problem, we build a new dataset NYTH, where we use the DS-generated data as training data and hire annotators to label test data. Compared with the previous datasets, NYT-H has a much larger test data and then we can perform more accurate and consistent evaluation. Finally, we present the experimental results of several widely used systems on NYT-H. The experimental results show that the ranking lists of the comparison systems on the DS-labelled test data and human-annotated test data are different. This indicates that our human-annotated data is necessary for evaluation of distantly-supervised relation extraction.
Towards automatically generating Questions under Discussion to link information and discourse structure
Kordula De Kuthy, Madeeswaran Kannan, Haemanth Santhi Ponnusamy and Detmar Meurers
Questions under Discussion (QUD; Roberts, 2012) are emerging as a conceptually fruitful approach to spelling out the connection between the information structure of a sentence and the nature of the discourse in which the sentence can function. To make this approach useful for analyzing authentic data, Riester, Brunetti & De Kuthy (2018) presented a discourse annotation framework based on explicit pragmatic principles for determining a QUD for every assertion in a text. De Kuthy et al. (2018) demonstrate that this supports more reliable discourse structure annotation, and Ziai and Meurers (2018) show that based on explicit questions, automatic focus annotation becomes feasible. But both approaches are based on manually specified questions. In this paper, we present an automatic question generation approach to partially automate QUD annotation by generating all potentially relevant questions for a given sentence. While transformation rules can concisely capture the typical question formation process, a rule-based approach is challenged by the substantial general variability of authentic data. We therefore use a transformation-based approach to generate a large set of question-answer pairs and train a neural question generation model to obtain both systematic question type coverage and robustness.
Towards Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
Weipeng Huang, Xingyi Cheng, Kunlong Chen, Taifeng Wang and Wei Chu
The ambiguous annotation criteria lead to divergence of Chinese Word Segmentation (CWS) datasets in various granularities. Multi-criteria Chinese word segmentation aims to capture various annotation criteria among datasets and leverage their common underlying knowledge. In this paper, we propose a domain adaptive segmenter to exploit diverse criteria of various datasets. Our model is based on Bidirectional Encoder Representations from Transformers (BERT), which is responsible for introducing open-domain knowledge. Private and shared projection layers are proposed to capture domain-specific knowledge and common knowledge, respectively. We also optimize computational efficiency via distillation, quantization, and compiler optimization. Experiments show that our segmenter outperforms the previous state of the art (SOTA) models on 10 CWS datasets with superior efficiency.
Towards Instance-Level Parser Selection for Cross-Lingual Transfer of Dependency Parsers
Robert Litschko, Ivan Vulić, Željko Agić and Goran Glavaš
Current methods of cross-lingual parser transfer focus on predicting the best parser for a low-resource target language globally, that is, "at treebank level". In this work, we propose and argue for a novel cross-lingual transfer paradigm: instance-level parser selection (ILPS), and present a proof-of-concept study focused on instance-level selection in the framework of delexicalized parser transfer. We start from an empirical observation that different source parsers are the best choice for different Universal POS sequences in the target language. We then propose to predict the best parser at the instance level. To this end, we train a supervised regression model, based on the Transformer architecture, to predict parser accuracies for individual POS-sequences. We compare ILPS against two strong single-best parser selection baselines (SBPS): (1) a model that compares POS n-gram distributions between the source and target languages (KL) and (2) a model that selects the source based on the similarity between manually created language vectors encoding syntactic properties of languages (L2V). The results from our extensive evaluation, coupling 42 source parsers and 20 diverse low-resource test languages, show that ILPS outperforms KL and L2V on 13/20 and 14/20 test languages, respectively. Further, we show that by predicting the best parser "at treebank level" (SBPS), using the aggregation of predictions from our instance-level model, we outperform the same baselines on 17/20 and 16/20 test languages.
Towards Knowledge-Augmented Visual Question Answering
Maryam Ziaeefard and Freddy Lecue
Visual Question Answering (VQA) remains algorithmically challenging while it is effortless for humans. Humans combine visual observations with general and commonsense knowledge to answer a question about a given image. In this paper, we address the problem of incorporating general knowledge into VQA models while leveraging the visual information. We propose a model that captures the interactions between objects in a visual scene and entities in an external knowledge source. Our model is a graph-based approach that combines scene graphs with entity graphs, which learns a question-adaptive graph representation of related knowledge instances. We use Graph Attention Networks to set higher importance to key knowledge instances that are mostly relevant to each question. We exploit ConceptNet as the source of general knowledge and evaluate the performance of our model on the challenging OK-VQA dataset.
Towards Privacy by Design in Learner Corpora Research: A Case of On-the-fly Pseudonymization of Swedish Learner Essays
Elena Volodina, Yousuf Ali Mohammed, Sandra Derbring, Arild Matsson and Beata Megyesi
This article reports on an ongoing project aiming at automatization of pseudonymization of learner essays. The process includes three steps: identification of personal information in an unstructured text, labeling for a category, and pseudonymizing. We experiment with rule-based methods for detection of 15 categories out of the suggested 19 that we deem important and/or doable with automatic approaches. For the detection and labeling steps, we use resources covering personal names, geographic names, company and university names and others. For the pseudonymization step, we use only most relevant (sometimes in the sense of most frequent) items in the above-mentioned resources. Evaluation of the detection and labeling steps are made on a set of manually anonymized essays.
Towards the First Machine Translation System for Sumerian Transliterations
Ravneet Punia and Niko Schenk
The Sumerian cuneiform writing script was invented more than 5,000 years ago and represents one of the oldest in history. In this paper, we present the first attempt to translate Sumerian texts into English automatically. We publicly release high-quality corpora for standardized training and evaluation and report results on experiments with supervised, phrase-based, and transfer learning techniques for machine translation. Both a quantitative and qualitative evaluation indicate the usefulness of the resulting translations. Our proposed methodology provides a broader audience of researchers with novel access to the data, accelerates the costly and time-consuming manual translation process, and helps them better explore the relationships between Sumerian cuneiform and Mesopotamian culture.
Towards Topic-Guided Conversational Recommender System
Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, Xiaoke Wang and Ji-Rong Wen
Conversational recommender systems (CRS) aim to recommend high-quality items to users through interactive conversations. To develop an effective CRS, the support of high-quality datasets is essential. Existing CRS datasets mainly focus on immediate requests from users, while lack proactive guidance to the recommendation scenario. In this paper, we contribute a new CRS dataset named TG-ReDial (Recommendation through Topic-Guided Dialog). Our dataset has two major features. First, it incorporates topic threads to enforce natural semantic transitions towards the recommendation scenario. Second, it is created in a semi-automatic way, hence human annotation is more reasonable and controllable. Based on TG-ReDial, we present the task of topic-guided conversational recommendation, and propose an effective approach to this task. Extensive experiments have demonstrated the effectiveness of our approach on three sub-tasks, namely topic prediction, item recommendation and response generation.
TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking
Yucheng Wang, Bowen Yu, Yueyang Zhang, Tingwen Liu, Hongsong Zhu and Limin Sun
Extracting entities and relations from unstructured text has attracted increasing attention in recent years but remains challenging, due to the intrinsic difficulty in identifying overlapping relations with shared entities. Prior works show that joint learning can result in a noticeable performance gain. However, they usually involve sequential interrelated steps and suffer from the problem of exposure bias. At training time, they predict with the ground truth conditions while at inference it has to make extraction from scratch. This discrepancy leads to error accumulation. To mitigate the issue, we propose in this paper a one-stage joint extraction model, namely, TPLinker, which is capable of discovering overlapping relations sharing one or both entities while being immune from the exposure bias. TPLinker formulates joint extraction as a token pair linking problem and introduces a novel handshaking tagging scheme that aligns the boundary tokens of entity pairs under each relation type. Experiment results show that TPLinker performs significantly better on overlapping and multiple relation extraction, and achieves state-of-the-art performance on two public datasets.
Train Once, and Decode As You Like
Chao Tian, Yifei Wang, Hao Cheng, Yijiang Lian and Zhihua Zhang
In this paper, we propose a unified approach for supporting different generation manners of machine translation, including autoregressive, semi-autoregressive, and refinement-based non-autoregressive models. Our approach works by repeatedly selecting positions and generating tokens at these selected positions. After being trained once, our approach achieves better or competitive translation performance compared with some strong task-specific baseline models in all settings. This generalization ability benefits mainly from the new training objective that we propose. We validate our approach on the WMT'14 English-German and IWSLT'14 German-English translation tasks. The experimental results are encouraging.
Transformation of Dense and Sparse Text Representations
Wenpeng Hu, Mengyu Wang, Bing Liu, Feng Ji, Jinwen Ma and Dongyan Zhao
Sparsity is regarded as a desirable property of representations, especially in terms of explanation. However, its usage has been limited due to the gap with dense representations. Most NLP research progresses in recent years are based on dense representations. Thus the desirable property of sparsity cannot be leveraged. Inspired by Fourier Transformation, in this paper, we propose a novel Semantic Transformation method to bridge the dense and sparse spaces, which can facilitate the NLP research to shift from dense space to sparse space or to jointly use both spaces. Experiments using classification tasks and natural language inference task show that the proposed Semantic Transformation is effective.
Translation vs. Dialogue: A Comparative Analysis of Sequence-to-Sequence Modeling
Wenpeng Hu, Ran Le, Bing Liu, Jinwen Ma, Dongyan Zhao and Rui Yan
Understanding neural models is a major topic of interest in the deep learning community. In this paper, we propose to interpret a general neural model comparatively.~Specifically, we study the sequence-to-sequence (Seq2Seq) model in the contexts of two mainstream NLP tasks--machine translation and dialogue response generation--as they both use the seq2seq model. We investigate how the two tasks are different and how their task difference results in major differences in the behaviors of the resulting translation and dialogue generation systems. This study allows us to make several interesting observations and gain valuable insights, which can be used to help develop better translation and dialogue generation models. To our knowledge, no such comparative study has been done so far.
Tree Representations in Transition System for RST Parsing
Jinfen Li and Lu Xiao
The transition-based systems in the past studies propose a series of actions, to build a right-heavy binarized tree for the RST parsing. However, the nodes of the binary-nuclear relations (e.g., Contrast) have the same nuclear type with those of the multi-nuclear relations (e.g., Joint) in the binary tree structure. In addition, the reduce action only construct binary trees instead of multi-branch trees, which is the original RST tree structure. In our paper, we design a new nuclear type for the multi-nuclear relations, and a new action to construct a multi-branch tree. We enrich the feature set by extracting additional refined dependency feature of texts from the Bi-Affine model. We also compare the performance of two approaches for RST parsing in the transition-based system: a joint action of reduce-shift and nuclear type (i.e., Reduce-SN) vs a separate one that applies Reduce action first and then assigns nuclear type. We find that the new devised nuclear type and action are more capable of capturing the multi-nuclear relation and the joint action is more suitable than the separate one. Our multi-branch tree structure obtains the state-of-the-art performance for all the 18 coarse relations.
Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet
Bairu Hou, Fanchao Qi, Yuan Zang, Xurui Zhang, Zhiyuan Liu and Maosong Sun
Word sense disambiguation (WSD) is a fundamental natural language processing task. Unsupervised knowledge-based WSD only relies on a lexical knowledge base as the sense inventory and has wider practical use than supervised WSD. HowNet is the most widely used lexical knowledge base in Chinese WSD. Because its uniqueness, however, most of existing unsupervised WSD methods cannot work for HowNet-based WSD, and the tailor-made methods have not obtained satisfying results. In this paper, we propose a new unsupervised method for HowNet-based Chinese WSD, which exploits the masked language model task of pre-trained language models. In experiments, considering existing evaluation dataset is small and out-of-date, we build a new and larger HowNet-based WSD dataset. Experimental results demonstrate that our model achieves significantly better performance than all the baseline methods. All the code and data of this paper will be made public.
TWEETSUM: Event oriented Social Summarization Dataset
Ruifang He, Liangliang Zhao and Huanyu Liu
With social media becoming popular, a vast of short and noisy messages are produced by millions of users when an hot event happens. Developing social summarization systems becomes more and more critical for people to quickly grasp core and essential information. However, the publicly available and high-quality large scale social summarization dataset is rare. Constructing such corpus is not easy and very expensive since short texts have very complex social characteristics. In this paper, we construct TWEETSUM, a new event-oriented dataset for social summarization. The original data is collected from twitter and contains 12 real world hot events with a total of 44,034 tweets and 11,240 users. Each event has four expert summaries, and we also have the annotation quality evaluation. In addition, we collect additional social signals (i.e. user relations, hashtags and user profiles) and further establish user relation network for each event. Besides the detailed dataset description, we show the performance of several typical extractive summarization methods on TWEETSUM to establish baselines. For further researches, we will release this dataset to the public.
Two-level classification for dialogue act recognition in task-oriented dialogues
Philippe Blache, Massina Abderrahmane, Stéphane Rauzy, Magalie Ochs and Houda Oufaida
Dialogue act classification becomes a complex task when dealing with fine-grain labels. Many applications require such level of labelling, typically automatic dialogue systems. We present in this paper a 2-level classification technique, distinguishing between generic and specific dialogue acts (DA). This approach makes it possible to benefit from the very good accuracy of generic DA classification at the first level and proposes an efficient approach for specific DA, based on high-level linguistic features. Our results show the interest of involving such features into the classifiers, outperforming all other feature sets, in particular those classically used in DA classification.
Understanding Pre-trained BERT for Aspect-based Sentiment Analysis
Hu Xu, Lei Shu, Philip Yu and Bing Liu
This paper aims to analyze the hidden representations learned from BERT for tasks in aspect-based sentiment analysis (ABSA). Our work is motivated by the recent progress in BERT-based LMs for ABSA. However, it is not clear how the proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions helps learn important features for end-tasks in ABSA. We propose several evaluations to investigate both the attentions and the learned representations of BERT trained from reviews on tasks in ABSA. We found that most features in the representation of an aspect are dedicated to the fine-grained semantics of the domain (or product category) and the aspect itself instead of carrying summarized opinions in the context. BERT uses very few self-attention heads to encode context words (such as prepositions or pronouns that indicating an aspect) and opinion words for an aspect. We hope this investigation can help future research in improving self-supervised learning for ABSA.
Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English
Gongbo Tang, Rico Sennrich and Joakim Nivre
Recent work has shown that deeper character-based neural machine translation (NMT) models can outperform subword-based models. However, it is still unclear what makes deeper character-based models successful. In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morphological inflections and the attention mechanism. We reveal that word-level information is distributed over the entire character sequence rather than over a single character and characters at different positions play different roles in learning linguistic knowledge. In addition, character-based models need more layers to encode word senses which explains why only deeper models outperform subword-based models. The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states capture the full word-level information. Experimental results show that the word-level attention with a single head only captures limited information from the source.
Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation
Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz and Felipe Sánchez-Martínez
This paper studies the effects of word-level linguistic annotations in under-resourced neural machine translation, for which there is incomplete evidence in the literature. The study covers eight language pairs, different training corpus sizes, two architectures, and three types of annotation: dummy tags (with no linguistic information at all), part-of-speech tags, and morpho-syntactic description tags, which consist of part of speech and morphological features. These linguistic annotations are interleaved in the input or output streams as a single tag placed before each word. In order to measure the performance under each scenario, we use automatic evaluation metrics and perform automatic error classification. Our experiments show that, in general, source-language annotations are helpful and morpho-syntactic descriptions outperform part of speech for some language pairs. On the contrary, when words are annotated in the target language, part-of-speech tags systematically outperform morpho-syntactic description tags in terms of automatic evaluation metrics, even though the use of morpho-syntactic description tags improves the grammaticality of the output. We provide a detailed analysis of the reasons behind this result.
Understanding Translationese in Multi-view Embedding Spaces
Koel Dutta Chowdhury, Cristina España-Bonet and Josef van Genabith
The term ”translationese” refers to systematic differences between translations and text originally authored in the target language of the translation (in the same genre and style). In this paper, we use departures from isomorphism between embedding-based vector spaces from translations and originally authored data to estimate phylogentic language family relations induced from single target language translation from multiple source languages. We explore multi-view embedding spaces based on words, Part-of-Speech, Semantic Tags, and Synsets, to capture lexical, morphological and semantic aspects of translationese and to investigate the impact of topic on the data. Our results show that (i) language family relationships can be inferred from the monolingual embedding data, providing evidence for shining-through (source language interference) translationese effects in the data and (ii) that, perhaps surprisingly, even delexicalised embeddings exhibit significant source language interference, indicating that the lexicalised results are not ”just” due to possible differences in topic between original and translated texts.
Understanding Unnatural Questions Improves Reasoning over Text
Xiaoyu Guo, Yuan-Fang Li and Gholamreza Haffari
Complex question answering (CQA) over raw text is a challenging task. A prominent approach to this task is based on the programmer-interpreter framework, where the programmer maps the question into a sequence of reasoning actions which is then executed on the raw text by the interpreter. Learning an effective CQA model requires large amounts of human-annotated data, consisting of the ground-truth sequence of reasoning actions, which is time-consuming and ex-pensive to collect at scale. In this paper, we address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions which are more convenient to parse. We firstly generate synthetic (question, action sequence) pairs by a data generator, and train a semantic parser that associates synthetic questions with their corresponding action sequences. To capture the diversity when applied to natural questions, we learn a projection model to map natural questions into their most similar unnatural questions for which the parser can work well. Without any natural training data, our projection model provides high-quality action sequences for the CQA task. Experimental results show that the QA model trained exclusively with synthetic data generated by our method outperforms its state-of-the-art counterpart trained on human-labeled data.
Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis
Michael Lepori
We present a new approach for detecting human-like social biases in word embeddings, using representational similarity analysis. Specifically, we probe contextualized and non-contextualized embeddings for evidence of intersectional biases against Black women. We show that these embeddings represent Black women as simultaneously less feminine than White women, and less black than Black men. This finding aligns with intersectionality theory, which argues that multiple identity categories (such as race or sex) layer on top of each other in order to create unique modes of discrimination that are not shared by any individual category.
Unifying Input and Output Smoothing in Neural Machine Translation
Yingbo Gao, Baohao Liao and Hermann Ney
Soft contextualized data augmentation is a recent method that replaces one-hot representation of words with soft posterior distributions of an external language model, smoothing the input of neural machine translation systems. Label smoothing is another effective method that penalizes over-confident model outputs by discounting some probability mass from the true target word, smoothing the output of neural machine translation systems. Having the benefit of updating all word vectors in each optimization step and better regularizing the models, the two smoothing methods are shown to bring significant improvements in translation performance. In this work, we study how to best combine the methods and stack the improvements. Specifically, we vary the prior distributions to smooth with, the hyperparameters that control the smoothing strength, and the token selection procedures. We conduct extensive experiments on small datasets, evaluate the recipes on larger datasets, and examine the implications when back-translation is further used. Our results confirm cumulative improvements when input and output smoothing are used in combination, giving up to +1.9 BLEU scores on standard machine translation tasks and reveal reasons why these smoothing methods should be preferred.
Unleashing the Power of Neural Discourse Parsers - A Context and Structure Aware Approach Using Large Scale Pretraining
Grigorii Guz, Patrick Huber and Giuseppe Carenini
RST-based discourse parsing is an important NLP task with numerous downstream applications, such as summarization, machine translation and opinion mining. In this paper, we demonstrate a simple, yet highly accurate discourse parser, incorporating recent contextual language models. Our parser establishes the new state-of-the-art (SOTA) performance on two key RST datasets, RST-DT and Instr-DT. We further demonstrate that pretraining our parser on the recently available large-scale ``silver-standard" discourse treebank MEGA-DT provides even larger performance benefits, suggesting a novel and promising research direction in the field of discourse analysis.
Unsupervised Deep Language and Dialect Identification for Short Texts
Koustava Goswami, Rajdeep Sarkar, Bharathi Raja Chakravarthi, Theodorus Fransen and John P. McCrae
Automatic Language Identification (LI) or Dialect Identification (DI) of short texts of closely related languages or dialects, is one of the primary steps in many natural language processing pipelines. Language identification is considered a solved task in many cases; however, in the case of very closely related languages, or in an unsupervised scenario (where the languages are not known in advance), performance is still poor. In this paper, we propose the Unsupervised Deep Language and Dialect Identification (UDLDI) method, which can simultaneously learn sentence embeddings and cluster assignments from short texts. The UDLDI model understands the sentence constructions of languages by applying attention to character relations which helps to optimize the clustering of languages. We have performed our experiments on three short-text datasets for different language families, each consisting of closely related languages or dialects, with very minimal training sets. Our experimental evaluations on these datasets have shown significant improvement over state-of-the-art unsupervised methods and our model has outperformed state-of-the-art LI and DI systems in supervised settings.
Unsupervised Fact Checking by Counter-Weighted Positive and Negative Evidential Paths in A Knowledge Graph
Jiseong Kim and KEY-SUN CHOI
Misinformation spreads across media, community, and knowledge graphs in the Web by not only human agents but also information extraction algorithms that extract factual statements from unstructured textual data to populate the existing knowledge graphs. Traditional fact checking by experts or crowds is increasingly difficult to keep pace with the volume of newly created misinformation in the Web. Therefore, it is important and necessary to enhance the computational ability to determine whether a given factual statement is truthful or not. We view this problem as a truth scoring task in a knowledge graph. We present a novel rule-based approach that finds positive and negative evidential paths in a knowledge graph for a given factual statement and calculates a truth score for the given statement by unsupervised ensemble of the found positive and negative evidential paths. For example, we can determine the factual statement “United States is the birth place of Barack Obama” as truthful if there is the positive evidential path (Barack Obama, birthPlace, Hawaii) ∧ (Hawaii, country, United States) in a knowledge graph. For another example, we can determine the factual statement “Canada is the nationality of Barack Obama” as untruthful if there is the negative evidential path (Barack Obama, nationality, United States) ∧ (United States, ≠, Canada) in a knowledge graph. For evaluating on a real-world situation, we constructed an evaluation dataset by labeling truth or untruth label on factual statements that were extracted from Wikipedia texts by using the state-of-the-art BERT-based information extraction system. Our evaluation results show that our approach outperforms the state-of-the-art unsupervised approaches significantly by up to 0.12 AUC-ROC and even outperforms the supervised approach by up to 0.05 AUC-ROC not only in our dataset but also in the two different standard datasets.
Unsupervised Fine-tuning for Text Clustering
Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang and Ming Zhou
Fine-tuning with pre-trained language models (e.g. BERT) has achieved great success in many language understanding tasks in supervised settings (e.g. text classification). However, relatively little work has been focused on applying pre-trained models in unsupervised settings, such as text clustering. In this paper, we propose a novel method to fine-tune pre-trained models unsupervisedly for text clustering, which simultaneously learns text representations and cluster assignments using a clustering oriented loss. Experiments on three text clustering datasets (namely TREC-6, Yelp, and DBpedia) show that our model outperforms the baseline methods and achieves state-of-the-art results.
User Memory Reasoning for Conversational Recommendation
Hu Xu, Seungwhan Moon, Honglei Liu, Bing Liu, Pararth Shah, Bing Liu and Philip Yu
We study an end-to-end approach for conversational recommendation that dynamically manages and reasons over users' past (offline) preferences and current (online) requests through a structured and cumulative user memory knowledge graph. For this study, we create a new Memory Graph (MG) <-> Conversational Recommendation parallel corpus called MGConvRex with 7K+ human-to-human role-playing dialogs, grounded on a large-scale user memory bootstrapped from real-world user scenarios. MGConvRex captures human-level reasoning over user memory and has disjoint training/testing sets of users for zero-shot (cold-start) reasoning for recommendation. We propose a simple yet expandable formulation for constructing and updating the MG, and an end-to-end graph-based reasoning model that updates MG from unstructured utterances and predicts optimal dialog policies (e.g. recommendation) based on updated MG. The prediction of our proposed model inherits the graph structure, providing a natural way to explain the model's recommendation. Experiments are conducted for both offline metrics and online simulation, showing competitive results. The dataset, code and models will be released for future research.
Using a Penalty-based Loss Re-estimation Method to Improve Implicit Discourse Relation Classification
xiao li, Yu Hong, Huibin Ruan and Zhen Huang
We tackle implicit discourse relation classification, a task of automatically determining semantic relationships between arguments. The attention-worthy words in arguments are crucial clues for classifying the discourse relations. Attention mechanisms have been proven effective in highlighting the attention-worthy words during encoding. However, our survey shows that some inessential words are unintentionally misjudged as the attention-worthy words and, therefore, assigned heavier attention weights than should be. We propose a penalty-based loss re-estimation method to regulate the attention learning process, integrating penalty coefficients into the computation of loss by means of overstability of attention weight distributions. We conduct experiments on the Penn Discourse TreeBank (PDTB) corpus. The test results show that our loss re-estimation method leads to substantial improvements for a variety of attention mechanisms, and it obtains highly competitive performance compared to the state-of-the-art methods.
Using Bilingual Patents for Translation Training
John Lee, Benjamin Tsou and Tianyuan Cai
While bilingual corpora have contributed to the development of machine translation systems and translation memory, they have been under-explored as a pedagogical tool for translation. We investigate the use of bilingual corpora for translation training in the technical writing domain. A user study shows that students were better able to improve translation accuracy of technical terms through concordancing with a Chinese-English corpus of patents, compared to a general-domain corpus.
Using Eye-tracking Data to Predict the Readability of Brazilian Portuguese Sentences in Single-task, Multi-task and Sequential Transfer Learning Approaches
Sidney Evaldo Leal, Erica dos Santos Rodrigues and Sandra Aluísio
Sentence complexity assessment is a relatively new task in Natural Language Processing. One of its goals is to highlight in a text which are the more complex sentences to support the simplification of contents for a target audience (e.g, children, cognitively impaired users, non-native speakers and low-literacy readers (Scarton and Specia, 2018)). The task is evaluated using datasets of pairs of aligned sentences, with the complex and simple version of the same sentence. For Brazilian Portuguese, the task was approached by (Leal et al., 2018), who set up the first dataset to evaluate the task in this language, reaching 87.8% of accuracy with linguistic features. The present work advances these results, using models inspired by the work of (Gonzalez-Garduño and Søgaard, 2018), which hold the state-of-the-art for the English language, with multi-task learning and eye-tracking measures. First-Pass Duration, Total Regression Duration and Total Fixation Duration were used in two moments, first to select a subset of linguistic features and then as an auxiliary task in the multi-task and sequential learning models. The best model proposed here reaches the new state-of-the-art for Portuguese with 97.5% accuracy, an increase of almost 10% over the best previous result, in addition to proposing improvements of the public dataset after analyzing the errors of our best model.
Utilizing Subword Entities in Character-Level Sequence-to-Sequence Lemmatization Models
Nasser Zalmout and Nizar Habash
In this paper we present a character-level sequence-to-sequence lemmatization model, utilizing several subword features in multiple configurations. In addition to generic n-gram embeddings (using FastText), we experiment with concatenative (stems) and templatic (roots and patterns) morphological subwords. We present several architectures that embed these features directly at the encoder side, or learn them jointly at the decoder side with a multitask learning architecture. The results indicate that using the generic n-gram embeddings (through FastText) outperform the other linguistically-driven subwords. We use Modern Standard Arabic and Egyptian Arabic as test cases, with up to 22% and 13% relative error reduction, respectively, from a strong baseline. An error analysis shows that our best system is even able to handle word/lemma pairs that are both unseen in the training data.
Variation in Coreference Strategies across Genres and Production Media
Berfin Aktaş and Manfred Stede
In response to (i) inconclusive results in the literature as to the properties of coreference chains in written versus spoken language, and (ii) a general lack of work on automatic coreference resolution on both spoken language and social media, we undertake a corpus study involving the various genre sections of Ontonotes, the Switchboard corpus, and a corpus of Twitter conversations. Using a set of measures that previously have been applied individually to different data sets, we find fairly clear patterns of "behavior" for the different genres/media. Besides their role for psycholinguistic investigation (why do we employ different coreference strategies when we write or speak) and for the placement of Twitter in the spoken--written continuum, we see our results as a contribution to approaching genre-/media-specific coreference resolution.
Variational Autoencoder with Embedded Student-t Mixture Model for Authorship Attribution
Benedikt Boenninghoff, Steffen Zeiler, Robert Nickel and Dorothea Kolossa
Traditional computational authorship attribution describes a classification task in a closed-set scenario. Given a finite set of candidate authors and corresponding labeled texts, the objective is to determine which of the authors has written another set of anonymous or disputed texts. In this work, we propose a probabilistic autoencoding framework to deal with this supervised classification task. Variational autoencoders (VAEs) have had tremendous success in learning latent representations. However, existing VAEs are currently still bound by limitations imposed by the assumed Gaussianity of the underlying probability distributions in the latent space. In this work, we are extending a VAE with an embedded Gaussian mixture model to a Student-t mixture model, which allows for an independent control of the "heaviness" of the respective tails of the implied probability densities. Experiments over an Amazon review dataset indicate superior performance of the proposed method.
Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
Steffen Eger and Martin Kerscher
We unveil the language encoded in sentence embeddings by generating from them. We perceive of this as a new unsupervised probing task and show that it correlates well with downstream task performance. We also illustrate how the language generated from different encoders differs, finding that BERT does not seem to fully store lexical information. We apply our approach to generate sentence analogies from sentence embeddings.
Verbal Multiword Expression Identification: Do We Need a Sledgehammer to Crack a Nut?
Caroline Pasquer, Agata Savary, Carlos Ramisch and Jean-Yves Antoine
Automatic identification of multiword expressions (MWEs), like "to cut corners" (to do an incomplete job), is a pre-requisite for semantically-oriented downstream applications. This task is challenging because MWEs, especially verbal ones (VMWEs), exhibit surface variability. This paper deals with a subproblem of VMWE identification: the identification of occurrences of previously seen VMWEs. A simple language-independent system based on a combination of filters competes with the best systems from a recent shared task: it obtains the best averaged F-score over 11 languages (0.6653) and even the best score for both seen and unseen VMWEs due to the high proportion of seen VMWEs in texts. This highlights the fact that focusing on the identification of seen VMWEs could be a strategy to improve VMWE identification in general.
VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks
Caren Han, SIQU LONG, Siwen Luo, Kunze Wang and Josiah Poon
Text-to-image multimodal tasks, generating/retrieving an image from a given text description, are extremely challenging tasks since raw text descriptions cover quite limited information in order to fully describe visually realistic images. We propose a new visual contextual text representation for text-to-image multimodal tasks, VICTR, which captures rich visual semantic information of objects from the text input. First, we use the text description as initial input and conduct dependency parsing to extract the syntactic structure and analyse the semantic aspect, including object quantities, to extract the scene graph. Then, we train the extracted objects, attributes, and relations in the scene graph and the corresponding geometric relation information using Graph Convolutional Networks, and it generates text representation which integrates textual and visual semantic information. The text representation is aggregated with word-level and sentence-level embedding to generate both visual contextual word and sentence representation. For the evaluation, we attached VICTR to the state-of-the-art models in text-to-image generation.VICTR is easily added to existing models and improves across both quantitative and qualitative aspects.
Visual-Textual Alignment for Graph Inference in Visual Dialog
Tianling Jiang, Yi Ji, Chunping Liu and Hailin Shao
As a conversational intelligence task, visual dialog entails answering a series of questions grounded in an image, using the dialog history as context. To generate correct answers, the comprehension of the semantic dependencies among implicit visual and textual contents is critical. Prior works usually ignored the underlying relation and failed to infer it reasonably. In this paper, we propose a Visual-Textual Alignment for Graph Inference (VTAGI) network. Compared with other approaches, it makes up the lack of structural inference in visual dialog. The whole system consists of two modules, Visual and Textual Alignment (VTA) and Visual Graph Attended by Text (VGAT). Specially, the VTA module aims at representing an image with a set of integrated visual regions and corresponding textual concepts, reflecting certain semantics. The VGAT module views the visual features with semantic information as observed nodes and each node learns the relationship with others in visual graph. We also qualitatively and quantitatively evaluate the model on VisDial v1.0 dataset, showing our VTAGI outperforms previous state-of-the-art models.
Weighed Domain-Invariant Representation Learning for Cross-domain Sentiment Analysis
Minlong Peng and Qi Zhang
Cross-domain sentiment analysis is currently a hot topic in both the research and industrial areas. One of the most popular framework for the task is domain-invariant representation learning (DIRL), which aims to learn a distribution-invariant feature representation across domains. However, in this work, we find out that applying DIRL may degrade domain adaptation performance when the label distribution $\rm{P}(\rm{Y})$ changes across domains. To address this problem, we propose a modification to DIRL, obtaining a novel weighted domain-invariant representation learning (WDIRL) framework. We show that it is easy to transfer existing models of the DIRL framework to the WDIRL framework. Empirical studies on extensive cross-domain sentiment analysis tasks verified our statements and showed the effectiveness of our proposed solution.
What Can We Learn from Noun Substitutions in Revision Histories?
Talita Anthonio and Michael Roth
In community-edited resources such as wikiHow, sentences are subject to revisions on a daily basis. Recent work has shown that resulting improvements over time can be modelled computationally, assuming that each revision contributes to the improvement. We take a closer look at a subset of such revisions, for which we attempt to improve a computational model and validate in how far the assumption that ‘revised means better’ actually holds. The subset of revisions considered here are noun substitutions, which often involve interesting semantic relations, including synonymy, antonymy and hypernymy. Despite the high semantic relatedness, we find that a supervised classifier can distinguish the revised version of a sentence from an original version with an accuracy close to 70%, when taking context into account. In a human annotation study, we observe that annotators identify the revised sentence as the ‘better version’ with similar performance. Our analysis reveals a fair agreement among annotators when a revision improves fluency. In contrast, noun substitutions that involve other lexical-semantic relationships are often perceived as being equally good or tend to cause disagreements. While these findings are also reflected in classification scores, a comparison of results shows that our model fails in cases where humans can resort to factual knowledge or intuitions about the required level of specificity
What Does This Acronym Mean? Introducing a New Dataset for Acronym Identification and Disambiguation
Amir Pouran Ben Veyseh, Franck Dernoncourt, Quan Hung Tran and Thien Huu Nguyen
Acronyms are the short forms of phrases that facilitate conveying lengthy sentences in documents and serve as one of the mainstays of writing. Due to their importance, identifying acronyms and corresponding phrases (i.e., acronym identification (AI)) and finding the correct meaning of each acronym (i.e., acronym disambiguation (AD)) are crucial for text understanding. Despite the recent progress on this task, there are some limitations in the existing datasets which hinder further improvement. More specifically, limited size of manually annotated AI datasets or noises in the automatically created acronym identification datasets obstruct designing advanced high-performing acronym identification models. Moreover, the existing datasets are mostly limited to the medical domain and ignore other domains. In order to address these two limitations, we first create a manually annotated large AI dataset for scientific domain. This dataset contains 17,506 sentences which is substantially larger than previous scientific AI datasets. Next, we prepare an AD dataset for scientific domain with 62,441 samples which is significantly larger than previous scientific AD dataset. Our experiments show that the existing state-of-the-art models fall far behind human-level performance on both datasets proposed by this work. In addition, we propose a new deep learning model which utilizes the syntactical structure of the sentence to expand an ambiguous acronym in a sentence. The proposed model outperforms the state-of-the-art models on the new AD dataset, providing a strong baseline for future research on this dataset.
What Meaning-Form Correlation Has to Compose With
Timothee Mickus, Timothée Bernard and Denis Paperno
Compositionality is a widely discussed property of natural languages, although its exact definition has been elusive. We focus on the proposal that compositionality can be assessed by measuring meaning-form correlation. We analyze meaning-form correlation on three sets of languages: (i) synthetic toy languages tailored to be compositional, (ii) a set of English dictionary definitions, and (iii) a set of English sentences drawn from literature. We find that confounding factors weigh on meaning-form correlation measurements, and that straightforward methods to mitigate their effects have widely varying results depending on the dataset they are applied to.
When and Who? Conversation Transition Based on Bot-Agent Symbiosis Learning Network
Yipeng Yu, Ran Guan, Jie Ma, Zhuoxuan Jiang and Jingchang Huang
In online customer service applications, multiple chatbots that are specialized in various topics are typically developed separately and are then merged with other human agents to a single platform, presenting to the users with a unified interface. Ideally the conversation can be transparently transferred between different sources of customer support so that domain-specific questions can be answered timely and this is what we coined as a Bot-Agent symbiosis. Conversation transition is a major challenge in such online customer service and our work formalises the challenge as two core problems, namely, when to transfer and which bot or agent to transfer to and introduces a deep neural networks based approach that addresses these problems. Inspired by the net promoter score (NPS), our research reveals how the problems can be effectively solved by providing user feedback and developing deep neural networks that predict the conversation category distribution and the NPS of the dialogues. Experiments on realistic data generated from an online service support platform demonstrate that the proposed approach outperforms state-of-the-art methods and shows promising perspective for transparent conversation transition.
When Beards Start Shaving Men: A Subject-object Resolution Test Suite for Morpho-syntactic and Semantic Model Introspection
Patricia Fischer, Daniël de Kok and Erhard Hinrichs
In this paper, we introduce the SORTS Subject-Object Resolution Test Suite of German minimal sentence pairs for model introspection. The full test suite consists of 18,502 transitive clauses with manual annotations of 8 word order types, 5 morpho-syntactic and 11 semantic property classes. The test suite has been constructed such that sentences are minimal pairs with respect to a property class. Each property has been selected with a particular focus on its effect on subject-object resolution, the second-most error-prone task within syntactic parsing of German after prepositional phrase attachment (Fischer et al., 2019). The size and detail of annotations make the test suite a valuable resource for natural language processing applications with syntactic and semantic tasks. We use dependency parsing to demonstrate how the test suite allows insights into the process of subject-object resolution. Based on the test suite annotations, word order and case syncretism can be identified as most important factors that affect subject-object resolution. SORTS is available online at www.url-final-submission.com and will continuously be extended.
WikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking
Afshin Rahimi, Timothy Baldwin and Karin Verspoor
We present our work on aligning the Unified Medical Language System (UMLS) to Wikipedia, to facilitate manual alignment of the two resources. % and to enable cross-lingual access to medical terminology. We propose a cross-lingual neural reranking model to match a UMLS concept with a Wikipedia page, which achieves a recall at 1 of 71%, a substantial improvement of $20\%$ over word- and char-level BM25, enabling manual alignment with minimal effort. We release our resources, including ranked Wikipedia pages for 700k UMLS concepts, and WikiMesh, a dataset for training and evaluation of alignment models between UMLS and Wikipedia. This will provide easier access to Wikipedia for health professionals, patients, and NLP systems, including in multilingual settings.
Wiktionary Normalization of Translations and Morphological Information
Winston Wu and David Yarowsky
We extend the Yawipa Wiktionary Parser (Wu and Yarowsky, 2020) to extract and normalize translations from etymology glosses, and form-of relations, resulting in 300K unique translations and over 4 million instances of 168 annotated morphological relations. We propose a method to identify typos in translation annotations. Using the extracted morphological data, we develop multilingual neural models for predicting three types of word formation---clipping, contraction, and eye dialect---and improve upon a standard attention baseline by using copy attention.
Word Embedding Binarization with Semantic Information Preservation
Samarth Navali, Praneet Sherki, Ramesh Inturi and Vanraj Vala
With growing applications of Machine Learning in daily lives Natural Laguage Processing (NLP) has emerged as a heavily researched area. Finding its applications in tasks ranging from simple Q/A chatbots to Fully fledged conversational AI, NLP models are vital. Word and Sentence embedding are one of the most common starting points of any NLP task. A word embedding represents a given word in a predefined vector-space while maintaining vector relations with similar or dis-similar entities. As such different pretrained embedding such as Word2Vec, GloVe, fasttext have been developed. These embedding generated on millions of words are however very large in terms of size. Having embedding with floating point precision also makes the downstream evaluation slow. In this paper we present a novel method to convert continuous embedding to its binary representation, thus reducing the overall size of the embedding while keeping the semantic and relational knowledge intact. This will facilitate an option of porting such big embedding onto devices where space is limited. We also present different approaches suitable for different downstream tasks based on the requirement of contextual and semantic information. Experiments have shown comparable result in downstream tasks with 7 to 15 times reduction in file size and about 5 % change in evaluation parameters.
Words are the Window to the Soul: Language-based User Representations for Fake News Detection
Marco Del Tredici and Raquel Fernández
Cognitive and social traits of individuals are reflected in language use. Moreover, individuals who are prone to spread fake news online often share common traits. Building on these ideas, we introduce a model that creates representations of individuals on social media based only on the language they produce, and use them to detect fake news. We show that language-based user representations are beneficial for this task. We also present an extended analysis of the language of fake news spreaders, showing that its main features are mostly domain independent and consistent across two English datasets. Finally, we exploit the relation between language use and connections in the social graph to assess the presence of the Echo Chamber effect in our data.
Would you describe a leopard as yellow? Evaluating crowd-annotations with justified and informative disagreement
Pia Sommerauer, Antske Fokkens and Piek Vossen
Semantic annotation tasks contain ambiguity and vagueness and require varying degrees of world knowledge. Disagreement is an important indication of these phenomena. Most traditional evaluation methods, however, critically hinge upon the notion of inter-annotator agreement. While alternative frameworks have been proposed, they do not move beyond agreement as the most important indicator of quality. Critically, evaluations usually do not distinguish between instances in which agreement is expected and instances in which disagreement is not only valid, but desired because it captures the linguistic and cognitive phenomena in the data. We attempt to overcome these limitations using the example of a dataset that provides semantic representations for diagnostic experiments on language models. Ambiguity, vagueness and difficulty are highly relevant for semantic representations and diagnostic experiments require highly informative data. We establish an additional, agreement-independent quality metric based on answer-coherence and evaluate it in comparison to existing metrics. We compare against a gold standard and evaluate on expected disagreement. Despite generally low agreement, annotations follow expected behavior and have high accuracy when selected based on coherence. We show that combining different quality metrics enables a more comprehensive evaluation than relying exclusively on agreement.
WSL-DS: Weakly Supervised Learning with Distant Supervision for Query Focused Multi-Document Abstractive Summarization
Md Tahmid Rahman Laskar, Enamul Hoque and Jimmy Xiangji Huang
In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of documents and a query are given where the goal is to generate a summary from these documents based on the given query. However, one major challenge for this task is the lack of availability of labeled training datasets. To overcome this issue, in this paper, we propose a novel weakly supervised approach via utilizing distant supervision. In particular, we use datasets similar to the target dataset as our training data where we utilize sentence similarity models to generate weak reference summary of each individual document from multi-document gold summaries to train our summarization model. Experimental results in Document Understanding Conferences (DUC) datasets show that our proposed approach sets a new state-of-the-art result in various evaluation metrics.
XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection
Emily Öhman, Kaisla Kajava, Marc Pàmies and Jörg Tiedemann
We introduce XED, a multilingual fine-grained human-annotated emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 43 additional languages, providing new resources to many low-resource languages. We use Plutchik's core emotions to annotate the dataset with the addition of neutral. The dataset is carefully evaluated using language-specific BERT models to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.
XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages
Goran Glavaš, Mladen Karan and Ivan Vulić
We present XHate-999, a multi-domain and multilingual evaluation data set for abusive language detection. By aligning test instances across six typologically diverse languages, XHate-999 for the first time allows for disentanglement of the domain transfer and language transfer effects in abusive language detection. We conduct a series of domain- and language-transfer experiments with state-of-the-art monolingual and multilingual transformer models, setting strong baseline results and profiling XHate-999 as a comprehensive evaluation resource for abusive language detection. Finally, we show that domain- and language-adaption, via intermediate masked language modeling on abusive corpora in the target language, can lead to substantially improved abusive language detection in the target language in the zero-shot transfer setups.
“Suggest me a movie for tonight”: Leveraging Knowledge Graphs for Conversational Recommendation
Rajdeep Sarkar, Koustava Goswami, Mihael Arcan and John P. McCrae
Conversational recommender systems focus on the task of suggesting products to users based on the conversation flow. Recently, the use of external knowledge in the form of knowledge graphs has shown to improve the performance in recommendation and dialogue systems. Information from knowledge graphs aids in enriching those systems by providing additional information such as closely related products and textual descriptions of the items. However, knowledge graphs are incomplete since they do not contain all factual information present on the web. Also, when working on a specific domain, knowledge graphs in its entirety contribute towards extraneous information and noise. In this work, we study several subgraph construction methods and compare their performance across the recommendation task. We incorporate pre-trained embeddings from the subgraphs along with positional embeddings in our models. Extensive experiments show that our method has a relative improvement of at least 5.62% compared to the state-of-the-art on multiple metrics on the recommendation task.
“What is on your mind?” Automated Scoring of Mindreading in Childhood and Early Adolescence
Venelin Kovatchev, Philip Smith, Mark Lee, Imogen Grumley Traynor, Irene Luque Aguilera and Rory Devine
In this paper we present the first work on the automated scoring of mindreading ability in middle childhood and early adolescence. We create MIND-CA, a new corpus of 11,726 question-answer pairs in English from 1066 children aged from 7 to 14. We perform several machine learning experiments and carry out extensive quantitative and qualitative evaluation proposed by our interdisciplinary team of psychologists, computer scientists, and linguists. We obtain promising results, demonstrating the applicability of state-of-the-art NLP solutions to a new domain and task.