Papers
arxiv:2307.06018

PolyLM: An Open Source Polyglot Large Language Model

Published on Jul 12, 2023
ยท Submitted by
AK
on Jul 13, 2023
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,

Abstract

PolyLM, a multilingual LLM trained on 640 billion tokens, enhances multilingual capabilities through bilingual data and curriculum learning, outperforming other models on multilingual tasks while maintaining English performance.

Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and generate following nature language instructions. However, the development of LLMs has been primarily focused on high-resource languages, such as English, thereby limiting their applicability and research in other languages. Consequently, we present PolyLM, a multilingual LLM trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning. To assess the model's performance, we collect several existing multilingual tasks, including multilingual understanding, question answering, generation, and translation. Extensive experiments show that PolyLM surpasses other open-source models such as LLaMA and BLOOM on multilingual tasks while maintaining comparable performance in English. Our models, alone with the instruction data and multilingual benchmark, are available at: https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation.

Community

PolyLM

This comment has been hidden

Breaking Language Barriers: PolyLM - The Open Source Polyglot LLM

Links ๐Ÿ”—:

๐Ÿ‘‰ Subscribe: https://www.youtube.com/@Arxflix
๐Ÿ‘‰ Twitter: https://x.com/arxflix
๐Ÿ‘‰ LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

01_preproline_B.png
target; bhh

5=targetb2
7=targetb7
0=targetb6
9=targetb0
4=targetb3

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2307.06018
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 8

Browse 8 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2307.06018 in a dataset README.md to link it from this page.

Spaces citing this paper 4

Collections including this paper 1