Papers
arxiv:2407.19527

Open Sentence Embeddings for Portuguese with the Serafim PT* encoders family

Published on Jul 28, 2024
Authors:
,
,
,
,

Abstract

A systematic study of sentence encoders for Portuguese, focusing on learning objectives and parameters for optimal performance.

Sentence encoder encode the semantics of their input, enabling key downstream applications such as classification, clustering, or retrieval. In this paper, we present Serafim PT*, a family of open-source sentence encoders for Portuguese with various sizes, suited to different hardware/compute budgets. Each model exhibits state-of-the-art performance and is made openly available under a permissive license, allowing its use for both commercial and research purposes. Besides the sentence encoders, this paper contributes a systematic study and lessons learned concerning the selection criteria of learning objectives and parameters that support top-performing encoders.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2407.19527
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 6

Browse 6 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.19527 in a dataset README.md to link it from this page.

Spaces citing this paper 3

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.