prometheus-eval

university

AI & ML interests

None defined yet.

Recent Activity

jinulee-v authored a paper 5 days ago

Evaluating Step-by-step Reasoning Traces: A Survey

jinulee-v authored a paper 5 days ago

Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators

jinulee-v authored a paper 5 days ago

LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation

View all activity

authored 8 papers 5 days ago

Evaluating Step-by-step Reasoning Traces: A Survey

Paper • 2502.12289 • Published Feb 17, 2025

Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators

Paper • 2503.19877 • Published Mar 25, 2025 • 2

LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation

Paper • 2505.23832 • Published May 28, 2025 • 1

Cognitive Foundations for Reasoning and Their Manifestation in LLMs

Paper • 2511.16660 • Published Nov 20, 2025 • 11

Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics

Paper • 2512.01020 • Published Nov 30, 2025 • 1

Entailment-Preserving First-order Logic Representations in Natural Language Entailment

Paper • 2502.16757 • Published Feb 24, 2025

SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning

Paper • 2402.12806 • Published Feb 5, 2025

ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

Paper • 2606.05402 • Published 11 days ago • 1

authored 2 papers 8 days ago

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

Paper • 2605.26457 • Published 19 days ago • 6

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published 13 days ago • 56

authored 4 papers 11 days ago

A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls

Paper • 2412.01340 • Published Dec 2, 2024

K-EXAONE Technical Report

Paper • 2601.01739 • Published Jan 5 • 95

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Paper • 2605.20668 • Published 25 days ago • 12

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published 13 days ago • 56

updated a dataset 11 days ago

prometheus-eval/k-browsecomp

Viewer • Updated 11 days ago • 700 • 904 • 6

submitted a paper to Daily Papers 11 days ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published 13 days ago • 56

updated a dataset 12 days ago

prometheus-eval/k-browsecomp

Viewer • Updated 11 days ago • 700 • 904 • 6

published a dataset 12 days ago

prometheus-eval/k-browsecomp

Viewer • Updated 11 days ago • 700 • 904 • 6

authored 2 papers 15 days ago

CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation

Paper • 2505.24456 • Published May 30, 2025

PRISM: Fine-Grained Paper-to-Paper Retrieval with Multi-Aspect-Aware Query Optimization

Paper • 2507.10057 • Published Jul 14, 2025