mastouille.fr est l'un des nombreux serveurs Mastodon indépendants que vous pouvez utiliser pour participer au fédiverse.
Mastouille est une instance Mastodon durable, ouverte, et hébergée en France.

Administré par :

Statistiques du serveur :

584
comptes actifs

#scrapers

0 message0 participant0 message aujourd’hui

Website owner? Not keen on the Mellowtel browser library building a botnet of untraceable scrapers from unwitting users who are using a browser plugin that contains Mellowtel? I've raised a GitHub issue for them to explain how much contempt they have for our consent. Join in, politely, make them look like the jerks they are. github.com/mellowtel-inc/mello

GitHubAs a website owner, I want to be able to identify or prevent scraping by Mellowtel · Issue #41 · mellowtel-inc/mellowtel-jsPar futzle

List of AI bots to add to robots.txt (although they may not obey -- may need to throw them in the bitbucket and 404 or 444 them). In addition to these, you may have to block specific random browser versions for the most aggressive bots who ignore robots.txt.

github.com/ai-robots-txt/ai.ro

GitHubai.robots.txt/robots.txt at main · ai-robots-txt/ai.robots.txtA list of AI agents and robots to block. Contribute to ai-robots-txt/ai.robots.txt development by creating an account on GitHub.

You might have heard this already, but if you haven’t it’s important:

Russian #propaganda has been targeting #ai data #scrapers for intentionally with disinformation for some time.

”Despite having minimal organic reach among human audiences, the network’s focus on SEO ensures its content ranks highly in search results. This makes it more likely for AI chatbots ..”

What US oligarchs think makes them rich, #Russia uses to make itself more powerful.

disa.org/russian-propaganda-in

DISA · Russian Propaganda Infiltrates Global AI Tools via Moscow-Backed Network | DISAKremlin-Backed Disinformation Network "Pravda" Targets AI Chatbots to Spread Propaganda

I’ve said this before, but the fact that pretty much every fedi scraper/bot/etc. developer gets immediately shot with a gun is bad, and seriously misaligns incentives. :kjubej_pafita:

A strange developer that announces their obvious scraper bot (that individuals and instances can block) is bleeding out, while the evil ones with secret bots sip their tea contentedly. People need to be okay with opt-out here, or we just functionally encourage the evil ones. :gutkato_malica_kunmetas_manojn:

Posts on fedi are literally public, in any case. This isn’t a private chat-room, I don’t think it’s reasonable to expect others to treat it that way. :jamada_gesto_stulteta:

#lang_en #fedi #fediverse #privacy #scrapers #bots [Re: 🔗]

jam.xwx.moeMansardo Jamada
En réponse à Toni Aittoniemi

@gimulnautti @khobochka The only thing that #RoyalitySchemes like that created are rich #CollectingAgencies that act as #ValueRemoving #Rentseekers (i.e. #GEMA only pas out 9,0909% of all the royalities collected and innan intransparent manner!) on every "Reproduction Device" (i.e. printers, burners, copiers, scanners) and "Blank (Recordable) Media" (i.e. USB drives, SD csrds, recordable BD-RWs) AND more criminalization.

Anything else is just not gonna work...
youtube.com/watch?v=9XN57BhyZw
infosec.space/@kkarhan/1137257

A répondu dans un fil de discussion

If #Cloudflare is to be believed, #Lemmy instances have a built-in AI scraping bot operating beneath the covers. Do you think the developers have snuck it in?

Looking through my logs, these requests have all been blocked by Cloudflare because they are identified as "AI Bots". There are many more requests by Lemmy instances blocked in the logs. This is just a sample. Other Lemmy requests from these servers get through. Only a few are blocked as AI Bots.

Cloudflare says they use AI to determine if a request is a legitimate request or an AI bot trying to scrape.

207.204.58.144
AS19045 DIRECTCOM
United States
User agent: Lemmy/0.19.5; +lemmy.cryonex.net

23.127.223.238
AS7018 ATT-INTERNET4
United States
User agent: Lemmy/0.19.3; +lemux.minnix.dev

2a01:cb19:f85:ec00:82fa:5bff:fe51:ed4a
AS3215 France Telecom - Orange
France
User agent: Lemmy/0.19.5; +lemmy.sidh.bzh

50.247.53.42
AS7922 COMCAST-7922
United States
User agent: Lemmy/0.19.5; +toast.ooo

69.42.19.234
AS11404 AS-WAVE-1
United States
User agent: Lemmy/0.19.5; +lemmy.schlunker.com

155.138.226.183
AS20473 AS-CHOOPA
United States
User agent: Lemmy/0.19.5; +lemmy.mbl.social

lemmy.cryonex.netlemmy.cryonex.net
Suite du fil

What I do expect however is more #CAPTCHA bs to crop up in like the ugly way, like all the #Google #reCaptcha, #hCaptcha and worst of all #ClownFlare / #CloudFlare #bloatware...

web.archive.org/web/2018010301

New #blog: Autodetecting and Announcing #Mastodon Scrapers and Crawlers

There've been quite a few #fedisearch issues recently, but the common thread is that there's usually a gap in reporting - they're often live for weeks before people are made aware.

It's not just people's pet projects either, there are other #scrapers active, quietly consuming posts

So, I built a bot to detect and out them so that fedi admins can block as necessary

bentasker.co.uk/posts/blog/sec

www.bentasker.co.uk · Creating A Log-Analysis System To Autodetect and Announce Mastodon ScrI decided to build a scraper bot detection system to run against my mastodon instance, it uses behavioural scoring to fimd scrapers and then toots details to help other instance admins protect their u