mastouille.fr est l'un des nombreux serveurs Mastodon indépendants que vous pouvez utiliser pour participer au fédiverse.
Mastouille est une instance Mastodon durable, ouverte, et hébergée en France.

Administré par :

Statistiques du serveur :

616
comptes actifs

#textmining

0 message0 participant0 message aujourd’hui

📯 Diese Woche im #DigitalHistoryOFK: Torsten Hiltmann und @DigHisNoah präsentieren "RAG den Spiegel" – ein innovatives RAG-System zur Analyse des SPIEGEL-Archivs. Der Vortrag zeigt, wie #LLMs Geschichtswissenschaft verändern und hermeneutische mit computationellen Methoden verbinden.
📅 25. Juni, 16-18 Uhr, online (Zugang auf Anfrage)
ℹ️ Abstract: dhistory.hypotheses.org/10912 #TextMining #4memory #DigitalHistory @historikerinnen @histodons @digitalhumanities

Open Access book edited by Silke Schwandt: Digital Methods in the Humanities.
Explore interdisciplinary challenges, case studies, and innovative perspectives on digital tools in textual research.
Includes: From Serial Sources to Modeled Data, OCR, text mining & more.
transcript-verlag.de/978-3-837
#DigitalHumanities #OpenAccess #DigitalMethods #TextMining #HumanitiesResearch #SilkeSchwandt #transcriptVerlag

transcript VerlagDigital Methods in the HumanitiesVolume 1 of »Digital Humanities Research« offers a unique perspective on digital methods for and in the humanities.

Vom #Archiv zur #Datenbank. Was #TextMining und #GraphModelling Verfahren zu einer vergleichenden #Sozialgeschichte des Zwangs im #Spätmittelalter beitragen können: Juliane Schiel (Univ. Wien) beim morgigen #Jeudi-Vortrag mit Kommentar von Simona Cerutti (EHESS)

10.04. | 18:00 | hybrid | DE-FR

dhi-paris.fr/veranstaltungsdet

@histodons #WORCK #DH #digitaleTextanalyse #DigitalHumanties #DigitalHistory

Vom #Archiv zur #Datenbank. Was #TextMining und #GraphModelling Verfahren zu einer vergleichenden #Sozialgeschichte des Zwangs im #Spätmittelalter beitragen können: Juliane Schiel (Univ. Wien) beim nächsten #Jeudi-Vortrag mit Kommentar von Simona Cerutti (EHESS)

10.04. | 18:00 | hybrid | DE-FR

dhi-paris.fr/veranstaltungsdet

@histodons #WORCK #DH #digitaleTextanalyse #DigitalHumanties #DigitalHistory

Resulting from an @snsf_ch SPARK grant this took some time to mature, but the outcome is very imformative and builds a foundation for where to head next - how to liberate facts/information locked in the published literature #textmining #biodiversity preprints.arphahub.com/article

ARPHA PreprintsFrom literature to biodiversity data: mining arthropod organismal and ecological traits with machine learningThe fields of taxonomy and biodiversity research have witnessed an exponential growth in published literature. This vast corpus of articles holds information on the diverse biological traits of organisms and their ecologies. However, access to and extraction of relevant data from this extensive resource remain challenging. Advances in text and data mining (TDM) and Natural Language Processing (NLP) techniques offer new opportunities for liberating such information from the literature. Testing and using such approaches to annotate articles in machine actionable formats is therefore necessary to enable the exploitation of existing knowledge in new biology, ecology, and evolution research. Here we explore the potential of these methods to annotate and extract organismal and ecological trait data for the most diverse animal group on Earth, the arthropods. The article processing workflow uses manually curated trait dictionaries with trained NLP models to perform labelling of entities and relationships of thousands of articles. A subset of manually annotated documents facilitated the formal evaluation of the performance of the workflow in terms of entity recognition and normalisation, and relationship extraction, highlighting several important technical challenges. The results are made available to the scientific community through an interactive web tool and queryable resource, the ArTraDB Arthropod Trait Database. These methodological explorations provide a framework that could be extended beyond the arthropods, where TDM and NLP approaches applied to the taxonomy and biodiversity literature will greatly facilitate data synthesis studies and literature reviews, the identification of knowledge gaps and biases, as well as the data-informed investigation of ecological and evolutionary trends and patterns.

🌍 Automating Nature Detection in Historical Travelogues?

At #Dhd2025 Michela Vignoli & Doris Gruber (ONiT Project) explore how #LLM Llama 3.1 70B can analyze nature representations in multilingual travel reports

⚠️ Challenges remain:
❌ LLMs always produces results—even with flawed data
❌ LLM-corrected texts did not improve searchability in vector databases (3–14% drop)
🔎 Conclusion: LLMs aids discovery but manual review is essential for a reliable dataset.

[Atelier Data] Le lab INA organise un atelier @iscpif le 12 mars à 17h30 consacré à l’exploration (#statistique, #TAL…) de transcriptions de JT TF1 et FR2
Il reste encore quelques places : framaforms.org/atelier-donnees

Une certaine autonomie avec les outils d'analyse quantitative (Python ou R, CSV, etc.) est nécessaire afin de pouvoir profiter pleinement de l'atelier.

framaforms.orgAtelier données INA | Framaforms.org

Resulting from an Swiss National Science Foundation SNSF SPARK grant this took some time to mature, but the outcome is very imformative and builds a foundation for where to head next - how to liberate facts/information locked in the published literature #textmining #biodiversity biorxiv.org/content/10.1101/20

bioRxiv · From literature to biodiversity data: mining arthropod organismal and ecological traits with machine learningThe fields of taxonomy and biodiversity research have witnessed an exponential growth in published literature. This vast corpus of articles holds information on the diverse biological traits of organisms and their ecologies. However, access to and extraction of relevant data from this extensive resource remain challenging. Advances in text and data mining (TDM) and Natural Language Processing (NLP) techniques offer new opportunities for liberating such information from the literature. Testing and using such approaches to annotate articles in machine actionable formats is therefore necessary to enable the exploitation of existing knowledge in new biology, ecology, and evolution research. Here we explore the potential of these methods to annotate and extract organismal and ecological trait data for the most diverse animal group on Earth, the arthropods. The article processing workflow uses manually curated trait dictionaries with trained NLP models to perform labelling of entities and relationships of thousands of articles. A subset of manually annotated documents facilitated the formal evaluation of the performance of the workflow in terms of entity recognition and normalisation, and relationship extraction, highlighting several important technical challenges. The results are made available to the scientific community through an interactive web tool and queryable resource, the ArTraDB Arthropod Trait Database. These methodological explorations provide a framework that could be extended beyond the arthropods, where TDM and NLP approaches applied to the taxonomy and biodiversity literature will greatly facilitate data synthesis studies and literature reviews, the identification of knowledge gaps and biases, as well as the data-informed investigation of ecological and evolutionary trends and patterns. ### Competing Interest Statement The authors have declared no competing interest.

Excited that Keli Du is going to be presenting work we talked about a lot: "Shifting Sentiments? What happens to BERT-based Sentiment Classification when derived text formats are used for fine-tuning".

Results show that, as long as you don't remove too much information from the texts, the performance stays at pretty acceptable levels, even when DTFs are used for fine-tuning.

More reasons to love @joss ... this journal is a DREAM, from a text & data mining perspective.

I wanted to do a deeper dive on research acknowledgement sections in JOSS. I briefly started fiddling about with the rcrossref package, then realised I could just clone the joss-papers repository: github.com/openjournals/joss-p

Sincerely, I wish all OA journals were this easy to download!!!

GitHubGitHub - openjournals/joss-papers: Accepted JOSS papersAccepted JOSS papers. Contribute to openjournals/joss-papers development by creating an account on GitHub.