publications

Generated by jekyll-scholar.

2024

NAACL

M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation

Benjamin Hsu, Xiaoyu Liu, Huayang Li, Yoshinari Fujinuma, Maria Nadejde, Xing Niu, Ron Litman, Yair Kittenplon, and Raghavendra Pappagari

In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), Jun 2024

PDF Code Data

2023

EMNLP Findings

A Multi-Modal Multilingual Benchmark for Document Image Classification

Yoshinari Fujinuma*, Siddharth Varia*, Nishant Sankaran, Srikar Appalaraju, Bonan Min, and Yogarshi Vyas

In Proceedings of the Findings of Empirical Methods in Natural Language Processing (EMNLP), Jun 2023

License: CC-BY-SA 4.0 for Wikipedia texts, CC-BY-4.0 for Eurlex documents

PDF Data
EMNLP

Comparing Biases and the Impact of Multilingual Training across Multiple Languages

Sharon Levy, Neha Anna John, Ling Liu, Yogarshi Vyas, Jie Ma, Yoshinari Fujinuma, Miguel Ballesteros, Vittorio Castelli, and Dan Roth

In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Jun 2023

PDF
ACL Findings

Diable: Efficient Dialogue State Tracking as Operations on Tables

Pietro Lesci, Yoshinari Fujinuma, Momchil Hardalov, Chao Shang, and Lluis Marquez

In Proceedings of the Findings of Association for Computational Linguistics (ACL), Jun 2023

PDF Code

2022

ACL

Match the Script, Adapt if Multilingual: Analyzing the Effect of Multilingual Pretraining on Cross-lingual Transferability

Yoshinari Fujinuma, Jordan Boyd-Graber, and Katharina Kann

In Proceedings of the Association for Computational Linguistics (ACL), Jun 2022

PDF Code Talk (English) Talk (Japanese) Poster Slides

2021

Workshop

Semi-Supervised Joint Estimation of Word and Document Readability

Yoshinari Fujinuma, and Masato Hagiwara

In Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15), Jun 2021

PDF Code Data

2020

ACL

Why Overfitting Isn’t Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries

Mozhi Zhang*, Yoshinari Fujinuma*, Michael J. Paul, and Jordan Boyd-Graber

In Proceedings of the Association for Computational Linguistics (ACL), Jun 2020

PDF Code
AAAI

Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification

Mozhi Zhang, Yoshinari Fujinuma, and Jordan Boyd-Graber

In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), Jun 2020

PDF

2019

ACL

A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings based on Graph Modularity

Yoshinari Fujinuma, Jordan Boyd-Graber, and Michael J. Paul

In Proceedings of the Association for Computational Linguistics (ACL), Jun 2019

PDF Code Talk (English) Slides
PLOS ONE

Zika discourse in the Americas: A multilingual topic analysis of Twitter

Dasha Pruss, Yoshinari Fujinuma, Ashlynn R. Daughton, Michael J. Paul, Brad Arnot, Danielle Albers Szafir, and Jordan Boyd-Graber

PLOS ONE, May 2019

Abs PDF

This work examines Twitter discussion surrounding the 2015 outbreak of Zika, a virus that is most often mild but has been associated with serious birth defects and neurological syndromes. We introduce and analyze a collection of 3.9 million tweets mentioning Zika geolocated to Noarth and South America, where the virus is most prevalent. Using a multilingual topic model, we automatically identify and extract the key topics of discussion across the dataset in English, Spanish, and Portuguese. We examine the variation in Twitter activity across time and location, finding that rises in activity tend to follow to major events, and geographic rates of Zika-related discussion are moderately correlated with Zika incidence (ρ = .398).

2018

Workshop

Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification

Mozhi Zhang, Yoshinari Fujinuma, and Jordan Boyd-Graber

In ACL Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing, May 2018

2017

IJCNLP

Substring Frequency Features for Segmentation of Japanese Katakana Words with Unlabeled Corpora

Yoshinari Fujinuma, and Alvin Grissom II

In Proceedings of the Eighth International Joint Conference on Natural Language Processing (IJCNLP), May 2017

PDF Code

2015

PACLIC

Distant-supervised Language Model for Detecting Emotional Upsurge on Twitter

Yoshinari Fujinuma, Hikaru Yokono, Pascual Martı́nez-Gómez, and Akiko Aizawa

In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation (PACLIC), May 2015

PDF