aidevblogs
⌘K
BlogsVideosTweets
AllLLMsComputer VisionMLOpsAgentsData EngineeringResearchSafety
Uncovering Competency Gaps in Large Language Models and Their Benchmarks
arXiv CS.CL·arxiv.org·1 day ago·LLMs
SA-DiffuSeq: Addressing Computational and Scalability Challenges in Long-Document Generation with Sparse Attention
arXiv CS.CL·arxiv.org·1 day ago·Computer Vision
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior
arXiv CS.CL·arxiv.org·1 day ago·LLMs
Adversarial Training for Failure-Sensitive User Simulation in Mental Health Dialogue Optimization
arXiv CS.CL·arxiv.org·1 day ago·MLOps
Large Language Models Approach Expert Pedagogical Quality in Math Tutoring but Differ in Instructional and Linguistic Profiles
arXiv CS.CL·arxiv.org·1 day ago·LLMs
Investigating Model Editing for Unlearning in Large Language Models
arXiv CS.CL·arxiv.org·1 day ago·LLMs
Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?
arXiv CS.CL·arxiv.org·1 day ago·LLMs
Semantic Deception: When Reasoning Models Can't Compute an Addition
arXiv CS.CL·arxiv.org·1 day ago·LLMs
EssayCBM: Rubric-Aligned Concept Bottleneck Models for Transparent Essay Grading
arXiv CS.CL·arxiv.org·1 day ago·LLMs
MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs
arXiv CS.CL·arxiv.org·1 day ago·LLMs
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
arXiv CS.CL·arxiv.org·1 day ago·LLMs
How important is Recall for Measuring Retrieval Quality?
arXiv CS.CL·arxiv.org·1 day ago·LLMs
NVIDIA Nemotron 3: Efficient and Open Intelligence
arXiv CS.CL·arxiv.org·1 day ago·Agents
Architectural Trade-offs in Small Language Models Under Compute Constraints
arXiv CS.CL·arxiv.org·1 day ago·LLMs
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation
arXiv CS.CL·arxiv.org·1 day ago·LLMs
Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study
arXiv CS.CL·arxiv.org·1 day ago·Research
Neural Probe-Based Hallucination Detection for Large Language Models
arXiv CS.CL·arxiv.org·1 day ago·LLMs
MultiMind at SemEval-2025 Task 7: Crosslingual Fact-Checked Claim Retrieval via Multi-Source Alignment
arXiv CS.CL·arxiv.org·1 day ago·LLMs
Reflection Pretraining Enables Token-Level Self-Correction in Biological Sequence Models
arXiv CS.CL·arxiv.org·1 day ago·LLMs
Automatic Replication of LLM Mistakes in Medical Conversations
arXiv CS.CL·arxiv.org·1 day ago·LLMs