We will introduce a series of 6 main papers on graph-to-text and text-to-graph generation:
- Method Papers Method:
- (1) Top-1 supervised system at WebNLG 2020 Challenge,
- (2) Unsupervised cycle training of graph-to-text and text-to-graph generation (Oral@INLG 2020 Workshop),
- (3) Unsupervised one-to-many cycle training (AISTATS 2021).
- Dataset Paper Dataset:
- Distant supervision dataset (COLING 2020).
- Text-to-Graph Papers Sub-method:
- (1) Document-level relation extraction,
- (2) Document-level named entity recognition (NAACL 2019).
What are Graph-to-Text and Text-to-Graph Tasks?
Text-to-Graph (T2G) is to parse graphical knowledge from text. Its important applications include knowledge graph/database construction from text documents.
Graph-to-Text (G2T) is to generate text description based on graphical data. It can be applied to verbalize knowledge graphs, which is very useful for intelligent bots.
Paper Series (by Amazon AI, Shanghai)
Method Paper #1 (Top 1 @WebNLG 2020 Challenge)
Our 1st model, P2, approaches the supervised G2T task by a plan-and-pretrain approach based on the T5 model. Our model is the top #1 system in the leaderboard of WebNLG 2020 Challenge at the INLG conference.
Method Paper #2 (INLG 2020 Workshop)
Our 2nd model, CycleGT, looks at the unsupervised G2T and T2G. We formulate them as a joint learning task through cycle training. The performance is on par with supervised baselines. This paper is an oral presentation at the WebNLG workshop at INLG 2020.
Method Paper #3 (AISTATS 2021)
Our 3nd model, CycleCVAE, extends the cycle training framework to one-to-many mappings between graphs and text. In contrast with cycle training methods assuming the one-to-one mapping (e.g., one graph corresponds to one text description), our work enables the one-to-many mapping (e.g., one graph corresponds to multiple text descriptions). We improve cycle training with the conditional variational autoencoder (CVAE).
Our CycleCVAE paper has been accepted at AISTATS 2021.
Dataset paper (COLING 2020)
We build a distantly supervised dataset, called GenWiki, for G2T and T2G conversion. GenWiki has 1.3 million content-sharing text and graphs. It can be used for unsupervised training or distantly supervised training. GenWiki’s paper is at the top NLP conference COLING 2020.
Sub-Method Paper #1
A component in our CycleGT/CycleCVAE framework is the T2G task. T2G is usually done by two steps: named entity recognition (NER), and relation extraction (RE). In our paper “Relation of the Relations” (RoR), we address relation extraction of textual documents. Our RoR model extracts relations among given entities in a text document. We use Graph Neural Networks (GNN) to model the relations among multiple entities, and the meta-level interdependencies among multiple relations.
Our model outperforms the state-of-the-art approaches by +1.12% on the ACE05 dataset and +2.55% on SemEval 2018 Task 7.2.
Sub-Method Paper #2 (NAACL 2019)
Besides relation extraction, the other component of T2G is named entity recognition (NER). We have an earlier paper, GraphIE, on document-level NER using graph neural networks (GNN). The GraphIE paper is at NAACL 2019.
Since many of our paper use Graph Neural Networks (GNN), we composed a simple survey on GNN for NLP.
NLP Graph Neural Net Applications for Natural Language Processing
Xipeng Qiu, Zhijing Jin, Xiangkun Hu
The researchers of the above papers are from Amazon AI Shanghai (China), Max Planck Institute of Intelligent Systems, Tübingen (Germany), Fudan University (China), Tsinghua University (China), Jiao Tong University (China), and Massachusetts Institute of Technology (US).