mastouille.fr est l'un des nombreux serveurs Mastodon indépendants que vous pouvez utiliser pour participer au fédiverse.
Mastouille est une instance Mastodon durable, ouverte, et hébergée en France.

Administré par :

Statistiques du serveur :

649
comptes actifs

#nlproc

0 message0 participant0 message aujourd’hui

I'll give a talk this Thursday at 5:45 at Universität zu Köln: "Adapting Language Models for the Analysis of Real World Textual Data – Train the model, change the prompt, or adapt the data?" – the talk is based on aclanthology.org/2025.coling-... and arxiv.org/abs/2412.11653 #NLProc

ACL AnthologyMOPO: Multi-Objective Prompt Optimization for Affective Text GenerationYarik Menchaca Resendiz, Roman Klinger. Proceedings of the 31st International Conference on Computational Linguistics. 2025.

Someone from U Zurich did an undisclosed persuasion experiment on Reddit users in r/ChangeMyView using #LLM bots. This kind of social media research is absolutely unethical and the "results" should not be published.
Additional shame on the ethics committee for arguing *for* publication. In my view, this is outrageous scientific misconduct. #nlproc #academia #ethics #socialMedia
reddit.com/r/changemyview/comm

www.reddit.comReddit - Le cœur d’Internet

#PhD job in the Dept. of Language and Information Sciences at the University of Lausanne: my colleague Davide Picca an open PhD position starting on October 1, 2025 in an SNSF-funded project focused on the computational analysis of Charles S. #Peirce’s manuscripts.

Deadline for application: May 19, 2025

career5.successfactors.eu/care

career5.successfactors.euOpportunités de carrière : Doctorant·e FNS en Digital Humanities et Études Sémiotiques Computationnelles (22226)

Moved all my stuff out of Dropbox (didn’t have much there), Google drive is next (but it’s a bit more messy and complicated).
I have a colleague who years back was concerned about keeping work stuff (eg paper drafts, grant proposals) on Google -we’re in #nlproc, kind of the same area as they are-, and I thought he was a bit paranoid. Now I think it’s probably best to keep our stuff closer to home instead of on US clouds. #academicChatter #europe #warOnScience

Gestern erschien der Podcast "Sockenpuppenzoo - Angriff auf Wikipedia”, in dem die Investigativjournalisten @daniellaufer und @Schattleitner dokumentieren, wie über Jahre hinweg deutsche Wikipediatexte gezielt von rechtsextremen Netzwerken manipuliert wurden.

In Folge 3 trafen sich die beiden u.a. mit meinen Studierenden und mir zur Frage, ob automatische Autorschaftserkennung beim Aufdecken der Identitäten hilfreich sein könnte. 3 Studierende haben dann Projekte anhand der Wikipedia-Daten durchgeführt! #RUB #nlproc #wikipedia #forensischeLinguistik #podcast

ardaudiothek.de/sendung/socken

Suite du fil

8/n

[2] Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, and Kelvin Guu. 2024. Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? arxiv.org/abs/2406.13121

arXiv.orgCan Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-end modeling that minimizes cascading errors in complex pipelines, and allows for the application of sophisticated prompting techniques across the entire system. To assess this paradigm shift, we introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks. However, LCLMs still face challenges in areas like compositional reasoning that are required in SQL-like tasks. Notably, prompting strategies significantly influence performance, emphasizing the need for continued research as context lengths grow. Overall, LOFT provides a rigorous testing ground for LCLMs, showcasing their potential to supplant existing paradigms and tackle novel tasks as model capabilities scale.
#NLP#NLProc#RAG
Suite du fil

7/

REFERENCES

[1] Yifu Qiu, Varun Embar, Yizhe Zhang, Navdeep Jaitly, Shay B Cohen, and Benjamin Han. 2025. Eliciting in-context Retrieval and reasoning for long-context large language models. arxiv.org/abs/2501.08248

arXiv.orgEliciting In-context Retrieval and Reasoning for Long-context Large Language ModelsRecent advancements in long-context language models (LCLMs) promise to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their expanded context windows, LCLMs can process entire knowledge bases and perform retrieval and reasoning directly -- a capability we define as In-Context Retrieval and Reasoning (ICR^2). However, existing benchmarks like LOFT often overestimate LCLM performance by providing overly simplified contexts. To address this, we introduce ICR^2, a benchmark that evaluates LCLMs in more realistic scenarios by including confounding passages retrieved with strong retrievers. We then propose three methods to enhance LCLM performance: (1) retrieve-then-generate fine-tuning, (2) retrieval-attention-probing, which uses attention heads to filter and de-noise long contexts during decoding, and (3) joint retrieval head training alongside the generation head. Our evaluation of five well-known LCLMs on LOFT and ICR^2 demonstrates significant gains with our best approach applied to Mistral-7B: +17 and +15 points by Exact Match on LOFT, and +13 and +2 points on ICR^2, compared to vanilla RAG and supervised fine-tuning, respectively. It even outperforms GPT-4-Turbo on most tasks despite being a much smaller model.
#NLP#NLProc#RAG
Suite du fil

6/

Through extensive experiments on five LCLMs using both the LOFT and ICR² benchmarks, our best approach on Mistral-7B with a 32K token limit outperformed Vanilla RAG and SFT baselines by an average of +17 and +15 points (Exact Match) on LOFT, and by +13 and +2 points on ICR², respectively (picture). It even achieved performance comparable to the state-of-the-art GPT-4, despite having only 7B parameters.

#NLP#NLProc#RAG
Suite du fil

4/

With a more realistic benchmark in hand, we systematically explored three approaches to enhance model performance:

1. Retrieve-then-generate supervised fine-tuning (picture): we train LCLMs to first retrieve relevant information from the context and then generate the final responses.

2. Retrieval-attention-probing: During inference, we probe attention heads activated for in-context retrieval, and use their top predictions to filter out confounders.

#NLP#NLProc#RAG
Suite du fil

3/

This limitation often leads to inflated results. To address this, we created a more realistic dataset ICR². It uses five retrievers to generate challenging negative documents (picture 1). Our results show significant performance drop with standard RAG setups. For example, with GPT-4-Turbo, accuracy on NQ dropped from 0.85 to 0.67, and on HPQA, it fell from 0.78 to 0.64 (picture 2).

#NLP#NLProc#RAG
Suite du fil

2/

But are current LCLMs up to the task? If not, how can we improve their performance?

In our preprint [1], we evaluated five popular LCLMs using the LOFT benchmark [2], which involves answering questions paired with documents. However, LOFT relies on random sampling to create irrelevant (negative) documents for each query, failing to include confounding documents — those that are relevant but misleading — which are common in real-world scenarios.

#NLP#NLProc#RAG

1/

What if #LLMs had context windows so large that an entire knowledge base could fit into a single prompt? This would revolutionize Retrieval-Augmented Generation (RAG) applications by enabling retrieval, re-ranking, reasoning, and generation all in one step. With a Long-Context Language Model (LCLM), we could simplify RAG architecture by leveraging the model’s capability for In-Context Retrieval and Reasoning (ICR²).

#NLP#NLProc#RAG