mastouille.fr @admin

0 message0 participant0 message aujourd’hui

**Computo** @computo@mathstodon.xyz · 15 juil.

Summer read: a new paper on model-based clustering just appeared in Computo!

Julien Jacques and Brendan Thomas Murphy publish a new method for clustering multivariate count data. The method combines feature selection and clustering, and is based on conditionally independent Poisson mixture models and Poisson generalized linear models.

On simulations, the Adjusted Rand Index (ARI) of the model with selected variables is close to the optimal ARI obtained with the true clustering variables.

The paper and accompanying R code are available at https://computo-journal.org/published-202507-jacques-count-data/

Figure showing 3 plots, under 3 scenarios. Each plot shows two box plots, representing the differences in ARI between a model obtained with the true clustering variables and one obtained with either all variables (on the left) or the selected variables (on the right). The selection of variables always provides an ARI closes to the optimal ARI.

#machineLearning #clustering #Rstats

**Europe Says** @europesays@pubeurope.com · 8 juil.

8 juil.

Europe Says @europesays@pubeurope.com

https://www.europesays.com/2228357/ Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks #Clustering #ComputationalBiologyAndBioinformatics #ComputationalComplexity #Data #engineering #HumanitiesAndSocialSciences #LogisticRegression #MachineLearning #ModelSelection #Modularity #multidisciplinary #NetworkInference #NetworkScience #RandomForest #ScaleFreeNetworks #science

Optimizing machine learning for network inference through comparative analysis of model performance in synthetic and real-world networks

**Greg Cocks** @GregCocks@techhub.social · 26 févr.

26 févr.

Greg Cocks @GregCocks@techhub.social

A Methodology For The Multitemporal Analysis Of Land Cover Changes And Urban Expansion Using Synthetic Aperture Radar (SAR) Imagery - A Case Study Of The Aburrá Valley In Colombia
--
https://doi.org/10.3390/rs17030554 <-- shared paper
--
#GIS #spatial #mapping #SyntheticApertureRadar #SAR #remotesensing #multitemporalanalysis #landcover #landcoverchange #clustering #kurtosis #fuzzylogic #kernelbasedmethod #machinelearning #spatialanalysis #spatiotemporal #geostatistics #model #modeling #AburráValley #Columbia #urban #urbanexpansion #population #growth #topography #monitoring #satellite #sentinel #valley #landuse #distribution #infrastructure #building #roads #naturalresources #environmental #conservation #monitoring #multitemporal

photo - looking down into Medellin, Aburrá Valley, Columbia from a surrounding high ground

annotated maps with imagery - Areas of analysis of the results by the SMA1 methodological route and kurtosis. (A). Central Park in Bello (B). Parques del Río Medellín (C). Arkadia Shopping center; (D). Peldar Plant (E). La García water supply reservoir (F). Conasfaltos dam (G). La Ayurá stream basin in Envigado (H). Central Park in Bello (I). Avenida Regional Norte (J). Vía Distribuidora Sur.

schematic / work flow - proposed methodology for analysis of zonal land cover changes

annonated maps - The Aburrá Valley (white line) between the valleys of the Magdalena and Cauca rivers. Data were acquired from ALOS PALSAR Terrain Corrected and data from IGAC.

**JuliaR** @jromanowska@fosstodon.org · 3 févr.

3 févr.

JuliaR @jromanowska@fosstodon.org

Hi all #Rstats enthusiasts!
I'm looking for someone who has time now to conduct a review of a piece of software for Journal of Open Source Software (JOSS). Details are here:
https://github.com/openjournals/joss-reviews/issues/7319

The review process is quite simple - you get a checklist and you run some tests. It's all open, on GitHub.

GitHub[REVIEW]: corrp: An R package for multiple correlation-like analysis and clustering in mixed data · Issue #7319 · openjournals/joss-reviewsPar editorialbot

#PeerReview #softwaredevelopment #OpenSource

**Gilgwath** @gilgwath@social.tchncs.de · 15 janv. *

15 janv. *

Gilgwath @gilgwath@social.tchncs.de

When you are reading up on deploying #databases the most frequent piece of drive-by advice is "don't use networked storage". Before you can ask the smart ass what they suggest instead in an age of #virtualization #clustering and #kubernetes they have already disappeared into the ether. Not an easy nut to crack, especially in a #homelab. This guy has an actual workable answer: https://medium.com/@camphul/cloudnative-pg-in-the-homelab-with-longhorn-b08c40b85384 using #longhorn and #cloundnativepg and some smart sheduling. #k8s #selfhosting

Medium · 24 juin 2024CloudNative-PG in the homelab with Longhorn - Luca Camphuisen - MediumPar Luca Camphuisen

**Europe Says** @europesays@pubeurope.com · 12 janv.

12 janv.

Europe Says @europesays@pubeurope.com

https://www.europesays.com/1760442/ Statistical and data visualization techniques to study the role of one-electron in the energy of neutral and charged clusters of Na39 #Clustering #Data #DataVisualization #DensityFunctionalTheory #Energy #HumanitiesAndSocialSciences #multidisciplinary #Regression #science #SodiumCluster #StatisticalAnalysis #StatisticalPhysics #statistics #TimeSeries

Statistical and data visualization techniques to study the role of one-electron in the energy of neutral and charged clusters of Na39

**Barry Schwartz** @rustybrick@c.im · 6 déc. 2024

6 déc. 2024

Barry Schwartz @rustybrick@c.im

How clustering works with localization in Google Search https://www.seroundtable.com/google-search-clustering-localization-38531.html

#google #seo #localizations

**Barry Schwartz** @rustybrick@c.im · 6 déc. 2024

6 déc. 2024

Barry Schwartz @rustybrick@c.im

Google on the difference between clustering and canonicalization: "Clustering is basically taking the pages that we think are the same. And then canonicalization is, from those pages, which one is the best one" @johnmu said https://www.seroundtable.com/google-search-clustering-canonicalization-38529.html

#seo #google #canonicalization

A répondu dans un fil de discussion

**Kevin Karhan** @kkarhan@infosec.space · 24 nov. 2024

24 nov. 2024

Kevin Karhan @kkarhan@infosec.space

@ai6yr @dthacker9 @fuchsiii I just found them cheap as surplus - there are also others from Dell (WYSE), Fujitsu (Futro) & IGEL.

Basically almost all of them are cheap (like €50 at most, sometimes <€10 in a 10-pack lot) and fanless, so ideal to do some #BareMetal #clustering or just to have chugging along silently in the background...

**Greg Cocks** @GregCocks@techhub.social · 21 oct. 2024 *

21 oct. 2024 *

Greg Cocks @GregCocks@techhub.social

Stanford Researchers Map ‘White-Only’ Properties In Santa Clara Co. Using AI [ historic deeds / covenants ]
--
https://www.kron4.com/news/bay-area/stanford-researchers-map-white-only-properties-in-santa-clara-co-using-ai/ <-- shared media article
--
https://dho.stanford.edu/wp-content/uploads/Covenants.pdf <-- shared research
--
https://reglab.github.io/racialcovenants/static/maps/dotmap_lot_level.html <-- link to shared webmap
--
#GIS #spatial #mapping #California #deeds #property #racial #racism #redlining #covenenants #race #minorities #propertyrecords #discrimination #history #historical #USHistory #legalreform #records #AI #machinelearning #openlargelanguagemodel #model #modeling #geography #clustering #demographics #spatialanalysis #spatiotemporal

snapshot - 1913 housing advertisement for the Palm Haven neighborhood in San Jose. 1913 is several years before Buchanan found “restricted districts” based on race unconstitutional, and the advertisement emphasizes the “restricted district[].” Palm Haven construction dates straddled Buchanan. It was developed by Thomas Herschbach who came to be responsible for 161 racial covenants in the County

snapshot - portion of deed - Although racially restrictive covenants are no longer legally enforceable and are considered illegal under the Fair Housing Act today, they still exist in thousands, possibly even millions, of historical property records in California. One such example, found in a 1940 real property deed from Santa Clara County’s archives, contains the following discriminatory language: “No persons not of the Caucasian Race shall be allowed to occupy, except as servants of residents, said real property or any part thereof.” The deed further specifies that “[t]hese covenants are to run with the land and shall be binding on all parties,” thereby affecting not only the tenants at the time but also the potential future owners of the land.

Charts –
Top: Number of property deeds with restrictive covenants from 1905–1974, divided by whether specific racial groups were excluded or only white/Caucasian individuals were permitted. Most pre-1915 covenants specifically exclude Black and Asian individuals, but the vast majority of later covenants are whiteonly. The small number of restrictive covenants matched after 1970 consists largely of older deeds filed for reference, rather than new restrictive covenants being introduced.
Bottom: The number of occurrences of specific racial groups in covenants that exclude specific groups. East Asian and Black were by far the most commonly excluded demographics, but some covenants targeted other groups, such as Italian, Portuguese, Indian, and Mexican individuals.

Maps –
Top: Clusters of racial covenants on a map of modern-day Santa Clara County. Some of the largest and most notable racially restricted developments – discussed in this section – are shown in red.
Bottom left: Racial covenants in south Palo Alto and Mountain View. Bottom right: Racial covenants in downtown San Jose. Dots represent individual subdivisions and are scaled in proportion to the number of racial covenants within the subdivision.

**Vis Lab @ Khoury, Northeastern** @KhouryVis@vis.social · 14 oct. 2024

14 oct. 2024

Vis Lab @ Khoury, Northeastern @KhouryVis@vis.social

ICYMI you can find @ebertini & friends' paper "Towards a Visual Perception-Based Analysis of Clustering Quality Metrics" from Sunday's VDS workshop here: https://www.visualdatascience.org/2024/index.html #IEEEVIS #Perception #Clustering #DataViz #VDS

Subset of the 1000 scatterplots judged by the 34 human
subjects with the percentage of them judging they display more than
one cluster.

**Daniel Pomarède** @pomarede@mastodon.social · 8 oct. 2024

8 oct. 2024

Daniel Pomarède @pomarede@mastodon.social

in the #arXiv

2D watershed void clustering for probing the cosmic large-scale structure

by Yingxiao Song and co-authors
https://arxiv.org/abs/2410.04898

#Cosmology #universe #voids

Suite du fil

**Tim Kellogg** @kellogh@hachyderm.io · 2 janv. 2024

2 janv. 2024

Tim Kellogg @kellogh@hachyderm.io

I’ve also gone deep into #clustering algorithms. I’m coming to the conclusion that K-Means has assumptions that don’t work well for me, and probably usually don’t work. Some big ones:

- clusters are the same size
- the number of clusters is known

I’m clustering posts by embedding (text content/meaning). Most of the time I don’t know how many posts there are, and my feed is too dynamic for these assumptions to hold.

I’m learning about other algorithms, like DBSCAN

Suite du fil

**Fabrice Tshimanga** @fabrice13@neuromatch.social · 9 nov. 2023

9 nov. 2023

Fabrice Tshimanga @fabrice13@neuromatch.social

5/5

Our dataset comprises also CT and MRI scans with patients lesions segmented by an expert.
This allowed us to look at the distribution of lesions cluster-wise, and validate the associations between symptoms and lesions.

Check our pre-print and comment, make questions, offer suggestions!
Although it is not simple to share data, we will release code soon, as a means to replicate the approach on similar data and more.
The link is already in the paper!
And let us know if you have data you'd like to share and analyse with our developing methods

We are deciding on the best match for a journal to review and possibly publish this work, of which I am super proud and thankful to co-authors Andrea Zanola, Antonio Bisogno, Silvia Facchini, Lorenzo Pini, Manfredo Atzori, and Maurizio Corbetta!

#scicomm #paperthread #preprints

Suite du fil

**Fabrice Tshimanga** @fabrice13@neuromatch.social · 9 nov. 2023

9 nov. 2023

Fabrice Tshimanga @fabrice13@neuromatch.social

4/n

Reverting our General Distance matrix into the General Similarity matrix yields an ambiguous spectrum, whose eigenvalues do not help to determine the number of clusters in the data.
But repeating clustering and tracing which subjects consistently get clustered together, actually yields the right information, encoded in a co-occurrence matrix.
This latter is quite evidently composed of 5 main clusters.
Our second approach, affinity propagation, found autonomously 7 clusters, that are mainly finer grained partitions of the former 5.

#machinelearning #clustering

Suite du fil

**Fabrice Tshimanga** @fabrice13@neuromatch.social · 9 nov. 2023

9 nov. 2023

Fabrice Tshimanga @fabrice13@neuromatch.social

3/n

We thus decided to use the General Distance Measure to compute pairwise similarities between our 172 subjects, and obtained a matrix, which as math savy people know, is also the description of a network (an "adjacency matrix" for a "weighted undirected graph").
The problem was then to find cliques, communities or clusters of similar patients in such a network, and we used spectral clustering.
Spectral clustering is a family of techniques that use spectra of matrices describing networks, i.e. use eigenvalues of matrices to understand the structure of those networks.

#spectralanalysis #spectralclustering #clustering

**Fabrice Tshimanga** @fabrice13@neuromatch.social · 9 nov. 2023

9 nov. 2023

Fabrice Tshimanga @fabrice13@neuromatch.social

1/n
Our pre-print is finally out!
Here's my first #paperthread
In this work, co-authors and I clustered ischaemic stroke patients profiles, and recovered common patterns of cognitive, sensorimotor damage.

...Historically many focal lesions to specific cortical areas were associated with specific distinction, but most strokes involve subcortical regions and bring multivariate patterns of deficits.
To characterize those patterns, many studies have turned to correlation analysis, factor analysis, PCA, focusing on the relations among variables==domains of impairments...

https://www.medrxiv.org/content/10.1101/2023.11.08.23297808

medRxiv · 9 nov. 2023Behavior Clusters in Ischemic Stroke using NIHSSBACKGROUND Stroke is one of the leading causes of death and disability. The resulting behavioral deficits can be measured with clinical scales of motor, sensory, and cognitive impairment. The most common of such scales is the National Institutes of Health Stroke Scale, or NIHSS. Computerized tomography (CT) and magnetic resonance imaging (MRI) scans show predominantly subcortical or subcortical-cortical lesions, with pure cortical lesions occurring less frequently. While many experimental studies have correlated specific deficits (e.g. motor or language impairment) with stroke lesion locations, the mapping between symptoms and lesions is not straightforward in clinical practice. The advancement of machine learning and data science in recent years has shown unprecedented opportunities even in the biomedical domain. Nevertheless, their application to medicine is not simple, and the development of data driven methods to learn general mathematical models of diseases from healthcare data is still an unsolved challenge. METHODS In this paper we measure statistical similarities of stroke patients based on their NIHSS scores, and we aggregate symptoms profiles through two different unsupervised machine learning techniques: spectral clustering and affinity propagation. RESULTS We identify clusters of patients with largely overlapping, coherent lesions, based on the similarity of behavioral profiles. CONCLUSIONS Overall, we show that an unsupervised learning workflow, open source and transferable to other conditions, can identify coherent mathematical representations of stroke lesions based only on NIHSS data. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement This work was supported by the Department of excellence 2018-2022 initiative of the Italian Ministry of education (MIUR) awarded to the Department of Neuroscience-University of Padua. ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: For data of patients of the Saint Louis cohort: the Internal Review Board of Washington University School of Medicine (WUSM) gave ethical approval for this work. For data of patients of the Padua cohort: the Ethics Committee of the Azienda Ospedale Universit&agrave Padova (AOUP) gave ethical approval for this work. I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes Data can be made available upon reasonable request to Maurizio Corbettta at maurizio.corbetta{at}unipd.it. * AP : Affinity Propagation. GDM : General Distance Measure. GSM : General Similarity Measure. NIHSS : National Institutes of Health Stroke Scale. RSC : Repeated Spectral Clustering.

#stroke #neuroscience #machinelearning

**Françoise Bahoken** @fbahoken@mapstodon.space · 31 août 2023

31 août 2023

Françoise Bahoken @fbahoken@mapstodon.space

[TTT] Deviner la Tchéquie
Rechercher avec @recifs (un grand merci ! ) d'anciennes partitions territoriales en mobilisant l’algèbre linéaire pour classifier/clusteriser des flux
#TTT #cartostats #flux #clustering #tobler

Dans Néocarto https://neocarto.hypotheses.org/16937

Carnet (neo)cartographiqueDeviner la TchéquiePréambule : Ce travail s’inscrit dans un axe de recherche du projet Tribute to Tobler – TTT portant sur l’application des méthodes d’algèbre linéaire à l’analyse des matrices origine-destination dans un objectif de cartographie thématique. Ce billet contextualise, commente et présente après en français (après une traduction libre) le Notebook Guessing Czechoslovakia [accéder] réalisé par Philippe Rivière (Visions Carto) pour le #30DayMapChallenge de 2021 : Day22-Boundaries. Contexte de la recherche : L’objectif dans lequel s’inscrit ce travail consiste à examiner – suite à une demande de Waldo Tobler lui-même – les conditions théoriques et méthodologiques du transfert des méthodes de...

**Karsten Schmidt** @toxi@mastodon.thi.ng · 26 août 2023 *

26 août 2023 *

Karsten Schmidt @toxi@mastodon.thi.ng

#HowToThing #006 - Clustering arbitrary n-dimensional data using https://thi.ng/k-means and customizable distance functions and/or centroid strategies. For example, here to cluster 20 world cities into 5 groups based on their latitude/longitude...

Snippet source code:
https://gist.github.com/postspectacular/3a7970af491304fe7262e4701efa7d52

For a visual example (also fully commented) using thousands of items and SVG output, check out:

Demo:
https://demo.thi.ng/umbrella/kmeans-viz/

Source code:
https://github.com/thi-ng/umbrella/blob/develop/examples/kmeans-viz/src/index.ts

TypeScript source code of the linked world cities k-means clustering example. Link to source code is in the toot...

Screenshot of the other linked k-means example/visualization, showing a thousands of particles clustered into 18 groups, which are then shown in different colors and their convex hull shown as outlines

#ThingUmbrella #TypeScript #JavaScript

**preLights** @preLights@biologists.social · 17 août 2023

17 août 2023

preLights @preLights@biologists.social

Weak supervision - Strong results!

Smith and team introduce Perturbational Metric Learning (PeML) to extract biological relationships from noisy high-throughput perturbational datasets.

Team effort from preLighters Benjamin Dominik Maier & Anna Foix Romero – read their preLight!

#preLight https://prelights.biologists.com/highlights/similarity-metric-learning-on-perturbational-datasets-improves-functional-identification-of-perturbations/

Schematic of the weakly supervised ML similarity metric learning method Perturbational Metric Learning (PeML).

#bioinformatics #SystemsBiology #clustering

Recherches récentes

Options de recherche

Administré par :

Statistiques du serveur :

#clustering