Max Resing<p>Also <a href="https://infosec.exchange/tags/Wikimedia" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Wikimedia</span></a> complaints about the constant surge in <a href="https://infosec.exchange/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> <a href="https://infosec.exchange/tags/crawling" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>crawling</span></a>. It is a strain to the free project of <a href="https://infosec.exchange/tags/Wikipedia" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Wikipedia</span></a> and such. For my MSc thesis, I fetched wikipedia <a href="https://infosec.exchange/tags/archives" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>archives</span></a> which they regularly snapshot and hand out for free.</p><p>Is this resource simply ignored, because it would cause additional processing effort next to the anyway running <a href="https://infosec.exchange/tags/webcrawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>webcrawlers</span></a>? <span class="h-card" translate="no"><a href="https://wikimedia.social/@wikimediafoundation" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>wikimediafoundation</span></a></span> , can you briefly comment on whether or not you have numbers on LLM companies that turn to archives, instead of crawling?</p>