1/
Thoughts after a cyberattack to @internetarchive (the #WaybackMachine seems to be back read-only)
Facts:
- https://mastodon.archive.org/@internetarchive/113290094683712789
- By @brewsterkahle: https://mastodon.archive.org/@brewsterkahle/113304263804387160
- Among the key services, the global Persistent URL system (#PURL, https://en.wikipedia.org/wiki/Persistent_uniform_resource_locator#History) is still offline: https://purl.org/
Disturbing how this seems to align with other attempts to cancel key parts of the digital memory of our age. So, part of our culture
On the #InternetArchive:
https://help.archive.org/help/where-does-my-donation-go/
2/
Some context on what's at stake for #science.
A 2007 study on #InformationScience (IS) literature found that its "half-life value obtained was approximately 5 years, indicating that 50 % of the Web citations in our body of IS literature will become inaccessible after that period" [1].
In 2012, [2] noted how "if readers are not able to gain access to the original source of the cited material, the link as well as the cited material would be much less useful than references to print sources".
3/
[2] noted:
"Apart from the quality and narrow scope there are several other reasons which may affect the accessibility of the web resources. Frequent change of URLs, funding issues, lack of time and resources to maintain the published applications are some of them"
and suggested:
"Possibly, the best solution to prevent decay or disappearance of web citations and to diminish URLs decay will be use of WebCite-enhanced reference" [2]
However, WebCite (https://webcitation.org) is now read-only!
4/
A 2021 study [3] verified 174 #OpenAccess (OA) "journals that have vanished from the web. [...] We want to emphasize that this should be considered as a lower-bound count and that the number of vanished journals is likely to be much greater"
[3] suggests "that vanishing of OA journals occurs across all academic disciplines and geographical regions. Furthermore, this issue should be considered as an ongoing process that will continue unless we fully commit to preserving the scholarly record"
5/
In 2023, [4] focused on #LinkRot concerning website-based references in academic papers.
"In #academic papers, citations are integral data and information integrators that link facts and statements with sources [...]. Over time, [...] the URLs of web-based references might become dysfunctional (i.e., links may become broken), or the URL might disappear altogether if the website shuts down or ceases to exist, a phenomenon known as #ReferenceRot" [4]
6/
[4] notes how "a large gap (in terms of solutions and practical understanding) still exists between information and data creators, and data and information preservation"
The authors [4] "propose the #InternetArchive as a possible solution to address reference rot, but also note its weaknesses and limitations"
For example, "if the content of a website changes, unless there is a regular or automated archive, there may be information gaps, especially if websites are not regularly updated"
7/
In [4] the "rationale for suggesting the #InternetArchive as a practically useful solution is threefold"
"First, it is a fairly simple process to archive a URL [...]
Second, it is a free service (for now), so there are no financial burdens on authors, editors or publishers.
Third, given its fairly long history [...] it seems to be a stable and lasting tool"
However, if "charitable funds dry up, the existence and continuity of this website and service may [...] be at risk of disappearing"
8/
More generally, Cerf noted [5]
"Of course, newer media have not been around as long as the older ones so their longevity has not been demonstrated but I think it is arguable that the more recent media do not have the resilience of stone or baked clay. Modern photographs may not last more than 150-200 years before they fade or disintegrate. Modern books, unless archival paper is used, may not last more than 100 years. [...] modern media from the 1800s forward seem to have shrinking lifetimes"
9/
In his comentary [3], Cerf continues: "It seems inescapable that our society will need to find its own formula for underwriting the cost of preserving knowledge in media that will have some permanence. That many of the digital objects to be preserved will require executable software for their rendering is also inescapable. Unless we face this challenge in a direct way, the truly impressive knowledge we have collectively produced in the past 100 years or so may simply evaporate with time."
10/
This is why initiatives such as the #InternetArchive are a foundation not to sink the culture of our age, which has a strong digital component.
The ability to digitally preserve some expressions of our global culture (texts, software, music, videos, ...) is a premise for part of that culture to remain accessible in, say, 100 or 1000 years - for new generations to freely use and transform these ancient expressions into new culture, should they wish.
It's a possibility precondition.
11/
[1] Goh, D.H., Ng, P.K., 2007. Link decay in leading information science journals. Journal of the American Society for Information Science and Technology 58 (1), 15–24. https://doi.org/10.1002/asi.20513
(#FreeAccess: https://scholar.google.com/scholar?cluster=4291520808481480763 )
[2] Saberi, M.K., Abedi, H., 2012. Accessibility and decay of web citations in five open access ISI journals. Internet Research 22 (2), 234–247. https://doi.org/10.1108/10662241211214584
(https://scholar.google.com/scholar?cluster=4408705938273344234 )
12/
[3] Laakso, M., Matthias, L., Jahn, N., 2021. Open is not forever: a study of vanished open access journals. Journal of the Association for Information Science and Technology 72 (9), 1099–1112. https://doi.org/10.1002/asi.24460
(free access versions: https://scholar.google.com/scholar?cluster=5769716549982138875 )
13/
[4] Teixeira Da Silva, J.A., Nazarovets, M., 2023. Archiving website‐based references in academic papers: problems caused by reference rot, potential solutions and limitations. Learned Publishing 36 (3), 477-487. https://doi.org/10.1002/leap.1560
(free access versions: https://scholar.google.com/scholar?cluster=576780938888356916 )
[5] Cerf, V.G., 2016. “We’re going backward!” Communication of the ACM 59 (10), 7. https://doi.org/10.1145/2993746
(#PDF: https://dl.acm.org/doi/pdf/10.1145/2993746)
14/
By #InternetArchive ( https://mastodon.archive.org/@internetarchive/113349650637331168 ):
https://blog.archive.org/2024/10/21/internet-archive-services-update-2024-10-21/
"services will have limited availability as we continue maintenance"
"We stand with all libraries that have faced similar attacks—British Library, Seattle Public Library, Toronto Public Library, and Calgary Public Library—and with the communities we serve"
By @textfiles
https://mastodon.archive.org/@textfiles/113342937735712879
A key service still offline: the global Persistent URL system (#PURL, https://en.wikipedia.org/wiki/Persistent_uniform_resource_locator#History) https://purl.org/
15/
Following the full recovery after the cyberattack to @internetarchive on October 2024 and the current increasing attacks to #DigitalPreservation and #KnowledgeFreedom, worst-case scenarios unfortunately are no longer unthinkable. While planning, maybe they should be considered as entirely possible instead.
The context is changing, and preserving our fragile #DigitalCulture may require an ever-deeper awareness of the possible failures for core foundations until recently taken for granted
16/
@brewsterkahle interview (noted by @remixtures https://tldr.nettime.org/@remixtures/114196771823446685)
https://www.kqed.org/news/12031980/what-happens-if-the-internet-archive-goes-dark
"The bigger picture [...] and the real contest is not about money, it’s actually about control. Can #libraries own anything in the digital world? Is there digital ownership?" [...] " The average life of a web page is 100 days before it’s changed or deleted. If we do not actively collect them and preserve them and keep them accessible, we’re living in the memory [hole] universe of #GeorgeOrwell"
17/
The interview (https://www.kqed.org/news/12031980/what-happens-if-the-internet-archive-goes-dark ) to @brewsterkahle continues:
"those who control the past control the present, those who control the present control the future. The idea of a #library is part of an ecosystem of how society remembers. That’s how it thinks of itself. If you were to erase the #InternetArchive and the libraries, which is in many ways happening now, then we will live in a danger of having people be able to recast what happened"
18/
In Europe, in the context of Internet Archive Europe @stichtinginternetarchive (https://www.internetarchive.eu/brewster-kahle-on-the-future-of-internet-archive-europe-highlights-from-the-14-march-borrel/ ), @brewsterkahle underlined the concept of "Public/Collective Intelligence" noting "the importance of freely accessible knowledge across cultural and linguistic barriers"
As #redundancy of #DigitalPreservation infrastructure is becoming more and more vital, how the @stichtinginternetarchive will be able to potentially support part of this redundancy may matter even more