mastouille.fr @admin

4 messages4 participants0 message aujourd’hui

**isws** @isws@sigmoid.social · 1 j

Dear registered students for ISWS 2025, please keep in mind that the deadline for the poster submission is not far away...
We are looking forward to your submissions!

#isws2025 #summerschool #semanticweb #semweb #llm #nlp #knowledgegraph #AI #responsibleAI #reliableAI @sourisnumerique @lysander07

Poster of the ISWS 2024 summer school. The image is of a poster affixed to a fence, highlighting topics related to scientific research. The poster discusses the "VIEWSARI Knowledge Graph 1.0" and references Giorgio Vasari's perspectives on his new renaissance in art and science. There seems to be a focus on ontology mapping within this context as well as recognition of works related to different figures or concepts such as Panoisky's theories or methodologies (referred to as lemeks). The overall theme suggests an evolution or advancement in understanding scientific knowledge through a graphical representation, emphasizing both historical and contemporary insights into various works recognized within this field.

**Harald Sack** @lysander07@sigmoid.social · 1 j

1 j

Harald Sack @lysander07@sigmoid.social

Generating Shakespeare-like text with an n-gram language model is straight forward and quite simple. But, don't expect to much of it. It will not be able to recreate a lost Shakespear play for you ;-) It's merely a parrot, making up well sounding sentences out of fragments of original Shakespeare texts...

#ise2025 #lecture #nlp #llm #languagemodel @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #shakespeare #generativeAI #statistics

Slide from the Information Service Engineering lecture 04, Natural Language Procerssing 03, 2.9 Language Models, N-Gram Shakespeare Generation.
The background of the slide shows an AI-generated portrait of William Shakespeare as an ink drawing. There are 4 speech-bubbles around Shakespeare's head, representing artificially generated text based on 1-grams, 2-grams, 3-grams and 4-grams: '
1-gram: To him swallowed confess hear both. Which. Of save on trail for are ay device and rote life have Hill he late speaks; or! a more to leg less first you enter.
2-gram: Why dost stand forth thy canopy, forsooth; he is this palpable hit the King Henry. Live king. Follow. What means, sir. I confess she? then all sorts, he is trim, captain
3-gram: Fly, and will rid me these news of price. Therefore the sadness of parting, as they say,’tis done. This shall forbid it should be branded, if renown made it empty.
4-gram: King Henry. What! I will go seek the traitor Gloucester. Exeunt some of the watch. A great banquet serv’d in; It cannot be but so.

The magic happens somehow at 4-grams, basically because it IS Shakesperare ;-)

**Harald Sack** @lysander07@sigmoid.social · 4 j

4 j

Harald Sack @lysander07@sigmoid.social

In our #ISE2025 lecture last Wednesday, we learned how in n-gram language models via Markov assumption and maximum likelihood estimation we can predict the probability of the occurrence of a word given a specific context (i.e. n words previous in the sequence of words).

#NLP #languagemodels #lecture @fizise @tabea @enorouzi @sourisnumerique @fiz_karlsruhe @KIT_Karlsruhe

Slide from the Information Service Engineering 2025 lecture, 03 Natural Language Processing 02, 2.9, Language MOdels:
Title: N-Gram Language Model
The probability of a sequence of words can be computed via contitional probability and the Bayes Rule (including the chain rule for n words). Approximation is performed via Markov assumption (dependency only on the n last words), and the Maximum Likelihood estimation (approximating the probabilities of a sequence of words by counting and normalising occurrences in large text corpora).

**Harald Sack** @lysander07@sigmoid.social · 6 j

6 j

Harald Sack @lysander07@sigmoid.social

This week, we were discussing the central question Can we "predict" a word? as the basis for statistical language models in our #ISE2025 lecture. Of course, I wasx trying Shakespeare quotes to motivate the (international) students to complement the quotes with "predicted" missing words ;-)

"All the world's a stage, and all the men and women merely...."

#nlp #llms #languagemodel #Shakespeare #AIart lecture @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #brushUpYourShakespeare

Slide from the Information Service Engineering 2025 lecture, Natural Language Processing 03, 2.10 Language Models. The Slide shows a graphical portrait of William Shakespeare (created by midjourney AI) as an ink sketch with yellow accents. The text states "Can we "predict" a word?"

**Harald Sack** @lysander07@sigmoid.social · 13 mai

13 mai

Harald Sack @lysander07@sigmoid.social

Last week, our students learned how to conduct a proper evaluation for an NLP experiment. To this end, we introduced a small textcorpus with sentences about Joseph Fourier, who counts as one of the discoverers of the greenhouse effect, responsible for global warming.

https://github.com/ISE-FIZKarlsruhe/ISE-teaching/blob/b72690d38911b37748082256b61f96cf86171ace/materials/dataset/fouriercorpus.txt

#ise2025 #nlp #lecture #climatechange #globalwarming #historyofscience #climate @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique

Slide of the Information Service ENgineering lecture 03, Natural Language Processing 02, section 2.6: Evaluation, Precision, and Recall
Headline: Experiment
Let's consider the following text corpus (FOURIERCORPUS):
1
In 1807, Fourier's work on heat transfer laid the foundation for understanding the greenhouse effect.
2
Joseph Fourier's energy balance analysis showed atmosphere's heat-trapping role.
3
Fourrier's calculations, though rudimentary, suggested that the atmosphere acts as an insulator.
4
Fourier’s greenhouse effect explains how atmospheric gases influence global temperatures.
5
Jean-Baptiste Joseph Fourier's mathematical treatment of heat flow is essential to climate modeling.
6
Climate science acknowledges that Fourier helped to understand the atmospheric absorption of heat.
7
Climate change origins often cite Fourier's mathematical work on radiative heat.
8
J. Fourier published his "Analytical theory of heat" in 1822.
9
Fourier analysis is used in signal processing.
10
Fourier series are key in heat conduction math.
11
Fourier and related algebras occur naturally in the harmonic analysis of locally compact groups.
12
The Fourier number is the ratio of time to a characteristic time scale for heat diffusion.

The corpus is available at https://github.com/ISE-FIZKarlsruhe/ISE-teaching/blob/b72690d38911b37748082256b61f96cf86171ace/materials/dataset/fouriercorpus.txt

On the right side in the background is a portrait engraving of Joseph Fourier

**Harald Sack** @lysander07@sigmoid.social · 12 mai

12 mai

Harald Sack @lysander07@sigmoid.social

Last leg on our brief history of NLP (so far) is the advent of large language models with GPT-3 in 2020 and the introduction of learning from the prompt (aka few-shot learning).

T. B. Brown et al. (2020). Language models are few-shot learners. NIPS'20

https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

#llms #gpt #AI #nlp #historyofscience @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #ise2025

Slide from Information System Engineering 2025 lecture, 02 - Natural Language Processing 01, A brief history of NLP, NLP Timeline.
The NLP timeline is in the middle of the page from top to bottom. The marker is at 2020. On the left side, an original screenshot of GPT-3 is shown, giving advise on how to present a talk about "Symbolic and Subsymbolic AI - An Epic Dilemma?".
The right side holds the following text:
2020: GPT-3 was released by OpenAI, based on 45TB data crawled from the web. A “data quality” predictor was trained to boil down the training data to 550GB “high quality” data. Learning from the prompt is introduced (few-shot learning)

Bibliographical Reference:
T. B. Brown et al. (2020). Language models are few-shot learners. In Proceedings of the 34th Int. Conf. on Neural Information Processing Systems (NIPS'20). Curran Associates Inc., Red Hook, NY, USA, Article 159, 1877–1901.

**Harald Sack** @lysander07@sigmoid.social · 11 mai

11 mai

Harald Sack @lysander07@sigmoid.social

Next stop in our NLP timeline is 2013, the introduction of low dimensional dense word vectors - so-called "word embeddings" - based on distributed semantics, as e.g. word2vec by Mikolov et al. from Google, which enabled representation learning on text.

T. Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space.
https://arxiv.org/abs/1301.3781

#NLP #AI #wordembeddings #word2vec #ise2025 #historyofscience @fiz_karlsruhe @fizise @tabea @sourisnumerique @enorouzi

Slide from the Information Service Engineering 2025 lecture, lecture 02, Natural Language Processing 01, NLP Timeline. The timeline is in the middle of the slide from top to bottom, indicating a marker at 2013. On the left, a diagram is shown, displaying vectors for "man" and "woman" in a 2D diagram. An arrow leades from the point of "man" to the point of "woman". Above it, there is also the point marked for "king" and the same difference vector is transferred from "man - > woman" to "king - ?" asking, what might be the appropriate completion.
Right of the timeline, the following text is displayed: Word2Vec neural network based framework to learn distributed representations of words as dense vectors in continuous space (word embeddings) was developed by Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean at Google.
These language models are based on the Distributional Hypothesis in linguistics i.e. words that are used and occur in the same contexts tend to purport similar meanings.

Bibliographical reference:
T. Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781

**Harald Sack** @lysander07@sigmoid.social · 9 mai

9 mai

Harald Sack @lysander07@sigmoid.social

Building on the 90s, statistical n-gram language models, trained on vast text collections, became the backbone of NLP research. They fueled advancements in nearly all NLP techniques of the era, laying the groundwork for today's AI.

F. Jelinek (1997), Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA

#NLP #LanguageModels #HistoryOfAI #TextProcessing #AI #historyofscience #ISE2025 @fizise @fiz_karlsruhe @tabea @enorouzi @sourisnumerique

Slide from Information Service Engineering 2025, LEcture 02, Natural Language PRocessing 01, A Brief History of NLP, NLP timeline. The timeline is located in the middle of the slide from top to bottom. The pointer on the timeline indicates 1990s. On the left, the formula for conditional probability of a word, following a given series of words, is given as a formula. Below, an AI generated portrait of William Shakespeare is displayed with 4 speech buubles, representing artificially generated text based on 1-grams, 2-grams, 3-grams and 4 grams. The 4-grams text example looks a lot like original Shakespeare text. On the right side the following text is displayed:
N-grams for statistical language modeling were introduced and popularised by Frederick Jelinek and Stanley F. Chen from IBM Thomas J. Watson Research Center, who developed efficient algorithms and techniques for estimating n-gram probabilities from large text corpora for speech recognition and machine translation.

Bibliographical reference:
F. Jelinek (1997), Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA.

**Harald Sack** @lysander07@sigmoid.social · 8 mai

8 mai

Harald Sack @lysander07@sigmoid.social

Next stop on our NLP timeline (as part of the #ISE2025 lecture) was Terry Winograd's SHRDLU, an early natural language understanding system developed in 1968-70 that could manipulate blocks in a virtual world.

Winograd, T. Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. MIT AI Technical Report 235.
http://dspace.mit.edu/bitstream/handle/1721.1/7095/AITR-235.pdf

#nlp #lecture #historyofscience @fiz_karlsruhe @fizise @tabea @sourisnumerique @enorouzi #AI

Slide from the Information Service Engineering 2025 lecture, Natural Language Processing 01, A Brief History of NLP, NLP Timeline. The picture depicts a timeline in the middle from top to bottom. There is a marker placed at 1970. Left of the timeline, a screenshot of the SHRDLU system is shown displaying a block world in simple line graphics. On the right side, the following text is displayed: SHRDLU was an early natural language understanding system developed by Terry Winograd in 1968-70 that could manipulate blocks in a virtual world. Users could issue commands like “Move the red block onto the green block,” and SHRDLU would execute the task accordingly. This demonstration highlighted the potential of NLP in understanding and responding to complex instructions.

Bibliographical references:
Winograd, Terry (1970-08-24). Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. MIT AI Technical Report 235.

**Harald Sack** @lysander07@sigmoid.social · 7 mai

7 mai

Harald Sack @lysander07@sigmoid.social

With the advent of ELIZA, Joseph Weizenbaum's first psychotherapist chatbot, NLP took another major step with pattern-based substitution algorithms based on simple regular expressions.

Weizenbaum, Joseph (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Com. of the ACM. 9: 36–45.

https://dl.acm.org/doi/pdf/10.1145/365153.365168

#nlp #lecture #chatbot #llm #ise2025 #historyofScience #AI @fizise @fiz_karlsruhe @tabea @enorouzi @sourisnumerique

Slide from the Information Service Enguneering 2025 lecture slidedeck, lecture 02, Natural language processing 01, Excursion: A Brief History of NLP, NLP timeline
On the right side of the image, a historic text terminal screenshot of a starting ELIZA dialogue is depicted. The timeline in the middle of the picture (from top to bottom) indicates the year 1966. The text left of the timeline says: ELIZA was an early natural language processing computer program created from 1964 to 1966 at the MIT Artificial Intelligence Laboratory by Joseph Weizenbaum which simulated conversation giving users an illusion of understanding on the part of the program based on pattern matching and pre-scripted response templates.

Bibliographical reference:
Weizenbaum, Joseph (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM. 9: 36–45.

**Rainer Simon** @aboutgeo@vis.social · 6 mai

6 mai

Rainer Simon @aboutgeo@vis.social

We've been working on a little library that might be useful if you work with #TEI and NER or text analysis:

• Extract plaintext from TEI
• Run your NER/NLP tools
• Map results back into the original TEI—without breaking anything!

Perfect for adding automated annotations to existing markup.

https://github.com/recogito/tei-standoffconverter-js

GitHubGitHub - recogito/tei-standoffconverter-js: Converts between XML tree and a flat plaintext and standoff (position-based table) representationConverts between XML tree and a flat plaintext and standoff (position-based table) representation - recogito/tei-standoffconverter-js

#DigitalHumanities #NER #NLP

**Harald Sack** @lysander07@sigmoid.social · 5 mai

5 mai

Harald Sack @lysander07@sigmoid.social

Next stop in our NLP timeline are the (mostly) futile tries of machine translation during the cold war era. The rule-based machine translation approach was used mostly in the creation of dictionaries and grammar programs. It’s major drawback was that absolutely everything had to be made explicit.

#nlp #historyofscience #ise2025 #lecture #machinetranslation #coldwar #AI #historyofAI @tabea @enorouzi @sourisnumerique @fiz_karlsruhe @fizise

Slide from Information Service Engineering lecture 02, Natural Language Processing 1. Title: NLP Timeline
The indicated era on the timeline is 1954-1966. On the right side of the timeline, an AI generated picture of a military parade with mobile missiles in front of the Kremlin basilica is sketched, overlapped with the following machine translation example:
English: "The spirit was willing, but the flesh was weak". This sentence was automatically translated to Russian. Then, it was translated back again into English with the following result: "The vodka was good, but the meat was rotten."

The text left of the timeline says: 1954 - 1966
Futile cold-war motivated efforts in rule-based machine translation from Russian to English. The rule-based machine translation approach was used mostly in the creation of dictionaries and grammar programs. It’s major drawback was that everything had to be made explicit.

Bibliographical references:
John A.Kouwenhoven ‘The trouble with translation’ in Harper's Magazine, August 1962
and W. John Hutchins, Machine Translation: Past, Present, and Future, Longman Higher Education, 1985, p. 5.

**Harald Sack** @lysander07@sigmoid.social · 4 mai

4 mai

Harald Sack @lysander07@sigmoid.social

Next step in our NLP timeline is Claude Elwood Shannon, who already laid the foundations for statistical language modeling by recognising the relevance of n-grams to model properties of language and predicting the likelihood of word sequences.

C.E. Shannon ""A Mathematical Theory of Communication" (1948) https://web.archive.org/web/19980715013250/http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

#ise2025 #nlp #lecture #languagemodel #informationtheory #historyofscience @enorouzi @tabea @sourisnumerique @fiz_karlsruhe @fizise

Slide from the Information Service ENgineering lecture 02, Natural Language Processing 01. Title: NLP Timeline.
A black & white portrait picture of Claude Elwood Shannon (1916-2001) is shown on the left side of a timeline marked with "1948". Shannon is depicted in front of an old 1950s "electronic" computer. The text on the right side of the timeline says: Claude Shannon proposed the idea of using n-grams as a means to analyse the statistical properties of language in "A Mathematical Theory of Communication" (1948). While Shannon's primary focus was on communication and information transmission, he recognised the relevance of n-grams in modeling language and predicting the likelihood of word sequences.

BIbliographical reference:
Shannon, Claude Elwood (July 1948). A Mathematical Theory of Communication, Bell System Technical Journal. 27 (3): 379–423.

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · 2 mai

2 mai

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"Asking scientists to identify a paradigm shift, especially in real time, can be tricky. After all, truly ground-shifting updates in knowledge may take decades to unfold. But you don’t necessarily have to invoke the P-word to acknowledge that one field in particular — natural language processing, or NLP — has changed. A lot.

The goal of natural language processing is right there on the tin: making the unruliness of human language (the “natural” part) tractable by computers (the “processing” part). A blend of engineering and science that dates back to the 1940s, NLP gave Stephen Hawking a voice, Siri a brain and social media companies another way to target us with ads. It was also ground zero for the emergence of large language models — a technology that NLP helped to invent but whose explosive growth and transformative power still managed to take many people in the field entirely by surprise.

To put it another way: In 2019, Quanta reported on a then-groundbreaking NLP system called BERT without once using the phrase “large language model.” A mere five and a half years later, LLMs are everywhere, igniting discovery, disruption and debate in whatever scientific community they touch. But the one they touched first — for better, worse and everything in between — was natural language processing. What did that impact feel like to the people experiencing it firsthand?

Quanta interviewed 19 current and former NLP researchers to tell that story. From experts to students, tenured academics to startup founders, they describe a series of moments — dawning realizations, elated encounters and at least one “existential crisis” — that changed their world. And ours."

https://www.quantamagazine.org/when-chatgpt-broke-an-entire-field-an-oral-history-20250430/

Quanta Magazine · 30 avr.When ChatGPT Broke an Entire Field: An Oral History | Quanta MagazineResearchers in “natural language processing” tried to tame human language. Then came the transformer.

#AI #GenerativeAI #ChatGPT

**Harald Sack** @lysander07@sigmoid.social · 1 mai

1 mai

Harald Sack @lysander07@sigmoid.social

We are starting #ISE2025 lecture 02 with a (very) brief history of #NLP pointing out only some selected highlights. Linguist Ferdinand de Saussure was laying the foundations of today's NLP by describing languages as “systems.” He argued that meaning is created inside language, in the relations and differences between its parts.

Course in general linguistics. https://ia600204.us.archive.org/0/items/SaussureFerdinandDeCourseInGeneralLinguistics1959/Saussure_Ferdinand_de_Course_in_General_Linguistics_1959.pdf

#linguistics #historyofscience @fiz_karlsruhe @fizise @enorouzi @tabea @sourisnumerique @KIT_Karlsruhe #AIFB

Slide from the ISE2025 lecture. Headline: NLP Timeline. On the left side a sepia-toned old portrait picture of Swiss linguist Ferdinand de Saussure (1857-1913) is shown. In the middle is a timeline depicted as a ray from top to bottom with an indicator at "1916". The text says: Ferdinand de Saussure, Professor at the University of Geneva, developed an approach describing languages as “systems.” Saussure argued that meaning is created inside language, in the relations and differences between its parts. A shared language system makes communication possible.
After his death in 1916, his colleagues Albert Sechehaye and Charles Bally published “Cours de Linguistique Générale” from Saussure’s manuscript notes and lecture notes from his students.

Bibliographical references:
Saussure, Ferdinand. Course in general linguistics. Eds. Charles Bally & Albert Sechehaye. Trans. Wade Baskin. NY: The Philosophical Society, 1959.

**Erika Varis Doggett** @erikavaris@mas.to · 30 avr.

30 avr.

Erika Varis Doggett @erikavaris@mas.to

At #NAACL this week and I’m delighted to see the name change to “Nations of the Americas” as well as the special theme for this year of multi- and cross-culturalism in #NLP.

#NLProc #AI #LLMs

**AmyFou** @amyfou@lingo.lol · 30 avr.

30 avr.

AmyFou @amyfou@lingo.lol

Friends and colleagues - please enjoy this talk by Rolando Coto Solano about the goals and methods behind our 'Advancing Indigenous Language Technologies' initiative (https://ailt.arizona.edu)

#language #technology #languageRevitalization #languageDocumentation #NLP

@linguistics

https://www.youtube.com/live/owC7nO3ls8Y?si=klmffzfdXyNoMpMU

thumbnail for youtube video: "Empowering community language workers with technology: with Rolando Coto Solano of Advancing Indigenous Language Technologies. April 28, 12pm. Sealaska Heritage. 2025 Spring Lecture Series."

**Harald Sack** @lysander07@sigmoid.social · 30 avr.

30 avr.

Harald Sack @lysander07@sigmoid.social

Today, the 2nd lecture of #ISE2025 took place with an introduction into Natural Language Processing, which will be subject of our lecture for the next 4 weeks.

#AI #nlp #informationextraction #ocr #ner #linguistics #computationallinguistics #morphology #pos #ambiguity #language @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique #AIart #generativeAI #machinetranslation #languagemodels #llm

Cover slide of the slide deck presentation for the ISE 2025 lecture. It states: Information Service ENgineering, Lecture 2: Natural Language Processing 01, Prof. Dr. Harald Sack, FIZ Karlsruhe, AIFB, KIT Karlsruhe, Summer Semester 2025. It shows the two logos of FIZ Karlsruhe and KIT. In the background there is an AI-generated image of a (female) bald head connected to many wires forming a kind of graph network.

**Harald Sack** @lysander07@sigmoid.social · 28 avr. *

28 avr. *

Harald Sack @lysander07@sigmoid.social

As knowledge and understanding were the main subjects of last week's first #ise2025 lecture, I was also introducing the semiotic triangle as in C.K. Ogden, I.A. Richards: The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism, 1923.

#language #understanding #philosophy #nlp #linguistics #ontology #semiotics @fiz_karlsruhe @fizise @tabea @enorouzi @sourisnumerique

The image is a slide from a presentation titled "The Art of Understanding 1/4. The Art of Understanding Communication and Meaning." It features a diagram illustrating the relationship between a sender and a receiver using a tin can telephone, symbolizing communication. The diagram includes four key concepts: "Concept," "Symbol," "Object," and "Context," with arrows indicating their interconnections. The word "Jaguar" is used as an example, symbolizing a concept that refers to an object, with context influencing the meaning. The slide also includes images of a race car, a jaguar, and a computer desktop with Mac OS X, representing different contexts and symbols. The text "Pragmatics" is prominently displayed, emphasizing the practical aspects of language use.

**WriterOfMinds (she)** @WriterOfMinds@sigmoid.social · 27 avr.

27 avr.

WriterOfMinds (she) @WriterOfMinds@sigmoid.social

This month in Acuitas, I've done more conversation work, with a focus on giving Acuitas a way to begin and conduct a conversation for a specific purpose.
https://writerofminds.blogspot.com/2025/04/acuitas-diary-83-april-2025.html
#AI #ArtificialIntelligence #chatbots #NLP

Recherches récentes

Options de recherche

Administré par :

Statistiques du serveur :

#nlp