mastouille.fr est l'un des nombreux serveurs Mastodon indépendants que vous pouvez utiliser pour participer au fédiverse.
Mastouille est une instance Mastodon durable, ouverte, et hébergée en France.

Administré par :

Statistiques du serveur :

686
comptes actifs

#nlp

4 messages4 participants0 message aujourd’hui

Generating Shakespeare-like text with an n-gram language model is straight forward and quite simple. But, don't expect to much of it. It will not be able to recreate a lost Shakespear play for you ;-) It's merely a parrot, making up well sounding sentences out of fragments of original Shakespeare texts...

#ise2025 #lecture #nlp #llm #languagemodel @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #shakespeare #generativeAI #statistics

This week, we were discussing the central question Can we "predict" a word? as the basis for statistical language models in our #ISE2025 lecture. Of course, I wasx trying Shakespeare quotes to motivate the (international) students to complement the quotes with "predicted" missing words ;-)

"All the world's a stage, and all the men and women merely...."

#nlp #llms #languagemodel #Shakespeare #AIart lecture @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #brushUpYourShakespeare

Last week, our students learned how to conduct a proper evaluation for an NLP experiment. To this end, we introduced a small textcorpus with sentences about Joseph Fourier, who counts as one of the discoverers of the greenhouse effect, responsible for global warming.

github.com/ISE-FIZKarlsruhe/IS

#ise2025 #nlp #lecture #climatechange #globalwarming #historyofscience #climate @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique

Last leg on our brief history of NLP (so far) is the advent of large language models with GPT-3 in 2020 and the introduction of learning from the prompt (aka few-shot learning).

T. B. Brown et al. (2020). Language models are few-shot learners. NIPS'20

proceedings.neurips.cc/paper/2

#llms #gpt #AI #nlp #historyofscience @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #ise2025

Next stop in our NLP timeline is 2013, the introduction of low dimensional dense word vectors - so-called "word embeddings" - based on distributed semantics, as e.g. word2vec by Mikolov et al. from Google, which enabled representation learning on text.

T. Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space.
arxiv.org/abs/1301.3781

#NLP #AI #wordembeddings #word2vec #ise2025 #historyofscience @fiz_karlsruhe @fizise @tabea @sourisnumerique @enorouzi

Building on the 90s, statistical n-gram language models, trained on vast text collections, became the backbone of NLP research. They fueled advancements in nearly all NLP techniques of the era, laying the groundwork for today's AI.

F. Jelinek (1997), Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA

#NLP #LanguageModels #HistoryOfAI #TextProcessing #AI #historyofscience #ISE2025 @fizise @fiz_karlsruhe @tabea @enorouzi @sourisnumerique

Next stop on our NLP timeline (as part of the #ISE2025 lecture) was Terry Winograd's SHRDLU, an early natural language understanding system developed in 1968-70 that could manipulate blocks in a virtual world.

Winograd, T. Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. MIT AI Technical Report 235.
dspace.mit.edu/bitstream/handl

#nlp #lecture #historyofscience @fiz_karlsruhe @fizise @tabea @sourisnumerique @enorouzi #AI

With the advent of ELIZA, Joseph Weizenbaum's first psychotherapist chatbot, NLP took another major step with pattern-based substitution algorithms based on simple regular expressions.

Weizenbaum, Joseph (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Com. of the ACM. 9: 36–45.

dl.acm.org/doi/pdf/10.1145/365

#nlp #lecture #chatbot #llm #ise2025 #historyofScience #AI @fizise @fiz_karlsruhe @tabea @enorouzi @sourisnumerique

We've been working on a little library that might be useful if you work with #TEI and NER or text analysis:

• Extract plaintext from TEI
• Run your NER/NLP tools
• Map results back into the original TEI—without breaking anything!

Perfect for adding automated annotations to existing markup.

👉 github.com/recogito/tei-stando

GitHubGitHub - recogito/tei-standoffconverter-js: Converts between XML tree and a flat plaintext and standoff (position-based table) representationConverts between XML tree and a flat plaintext and standoff (position-based table) representation - recogito/tei-standoffconverter-js

Next stop in our NLP timeline are the (mostly) futile tries of machine translation during the cold war era. The rule-based machine translation approach was used mostly in the creation of dictionaries and grammar programs. It’s major drawback was that absolutely everything had to be made explicit.

#nlp #historyofscience #ise2025 #lecture #machinetranslation #coldwar #AI #historyofAI @tabea @enorouzi @sourisnumerique @fiz_karlsruhe @fizise

Next step in our NLP timeline is Claude Elwood Shannon, who already laid the foundations for statistical language modeling by recognising the relevance of n-grams to model properties of language and predicting the likelihood of word sequences.

C.E. Shannon ""A Mathematical Theory of Communication" (1948) web.archive.org/web/1998071501

#ise2025 #nlp #lecture #languagemodel #informationtheory #historyofscience @enorouzi @tabea @sourisnumerique @fiz_karlsruhe @fizise

"Asking scientists to identify a paradigm shift, especially in real time, can be tricky. After all, truly ground-shifting updates in knowledge may take decades to unfold. But you don’t necessarily have to invoke the P-word to acknowledge that one field in particular — natural language processing, or NLP — has changed. A lot.

The goal of natural language processing is right there on the tin: making the unruliness of human language (the “natural” part) tractable by computers (the “processing” part). A blend of engineering and science that dates back to the 1940s, NLP gave Stephen Hawking a voice, Siri a brain and social media companies another way to target us with ads. It was also ground zero for the emergence of large language models — a technology that NLP helped to invent but whose explosive growth and transformative power still managed to take many people in the field entirely by surprise.

To put it another way: In 2019, Quanta reported on a then-groundbreaking NLP system called BERT without once using the phrase “large language model.” A mere five and a half years later, LLMs are everywhere, igniting discovery, disruption and debate in whatever scientific community they touch. But the one they touched first — for better, worse and everything in between — was natural language processing. What did that impact feel like to the people experiencing it firsthand?

Quanta interviewed 19 current and former NLP researchers to tell that story. From experts to students, tenured academics to startup founders, they describe a series of moments — dawning realizations, elated encounters and at least one “existential crisis” — that changed their world. And ours."

quantamagazine.org/when-chatgp

Quanta Magazine · When ChatGPT Broke an Entire Field: An Oral History | Quanta MagazineResearchers in “natural language processing” tried to tame human language. Then came the transformer.

We are starting #ISE2025 lecture 02 with a (very) brief history of #NLP pointing out only some selected highlights. Linguist Ferdinand de Saussure was laying the foundations of today's NLP by describing languages as “systems.” He argued that meaning is created inside language, in the relations and differences between its parts.

Course in general linguistics. ia600204.us.archive.org/0/item

#linguistics #historyofscience @fiz_karlsruhe @fizise @enorouzi @tabea @sourisnumerique @KIT_Karlsruhe #AIFB