Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training Paper โข 2506.01732 โข Published Jun 2, 2025 โข 6
NLP for Economics 1.2 Collection NLP tools for sentiment analysis and relevance detection โข 4 items โข Updated Mar 25, 2025 โข 1
Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language Paper โข 2103.08052 โข Published Mar 14, 2021 โข 1