aidevblogs
⌘
K
Blogs
Videos
Tweets
All
LLMs
Computer Vision
MLOps
Agents
Data Engineering
Research
Safety
Uncovering Competency Gaps in Large Language Models and Their Benchmarks
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
Large Language Models Approach Expert Pedagogical Quality in Math Tutoring but Differ in Instructional and Linguistic Profiles
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
Investigating Model Editing for Unlearning in Large Language Models
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
Semantic Deception: When Reasoning Models Can't Compute an Addition
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
EssayCBM: Rubric-Aligned Concept Bottleneck Models for Transparent Essay Grading
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
How important is Recall for Measuring Retrieval Quality?
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
Architectural Trade-offs in Small Language Models Under Compute Constraints
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
Neural Probe-Based Hallucination Detection for Large Language Models
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
MultiMind at SemEval-2025 Task 7: Crosslingual Fact-Checked Claim Retrieval via Multi-Source Alignment
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
Reflection Pretraining Enables Token-Level Self-Correction in Biological Sequence Models
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs
Automatic Replication of LLM Mistakes in Medical Conversations
arXiv CS.CL
·
arxiv.org
·
1 day ago
·
LLMs