Archiving the Hub: Not a simple job

- by Anne Finlay-Stewart, Editor

I often refer to old minutes, newspaper articles, campaign literature, newsletters and other material from the past that helps me put the present into perspective. It has all been catalogued and saved in a format I can find for my research.

Before a technical issue or retirement takes the Owen Sound Hub offline, I want to archive it so that this decade of our community life – the letters, stories, news items, opinion pieces, politics and irreplaceable photographs – are not lost to researchers, students, writers, nor to the simply curious.

Anyone who is interested in assisting with this project is encouraged to contact us at owensoundhub@gmail.com.

When I began to research how to do this, I was directed to the Digital Preservation Archivist at Simcoe County Archives, and I am sharing her thoughts below:

I have a few suggestions listed below. Unfortunately, there isn’t a one-size-fits-all solution for saving web content; it depends on what features you wish to prioritize preserving (e.g. the appearance of the website, the experience of scrolling through the website, the content, etc.).

1. Archive-It https://archive-it.org/ is a subscription service provided by the Internet Archives that allows you to crawl websites.
2. Waybackmachine https://archive.org/web/ is run by the Internet Archive and auto captures websites. They appear to have captured instances of the Owen Sound Hub from 2014 to 2023. https://web.archive.org/web/20230000000000*/https://www.owensoundhub.org

The downside is that the Internet Archive may not save every page on a website. There are extensions you can download to manually save pages from a website to Internet Archive (detailed in this blog post: https://blog.archive.org/2017/01/25/see-something-save-something/)

However, you would be relying on a third-party to preserve the website, with less control over what happens to the information over time. The benefit is that someone can see what the website looked like in 2014, for example.

1. Conifer https://conifer.rhizome.org/ or Archiveweb.page https://webrecorder.net/tools#archivewebpage

Are both user-driven web crawlers that produce a WARC file (which essentially saves the appearance and content of the website). WARC files can be uploaded to replayweb.page https://replayweb.page/ to access a saved version of a website. These sites require users to manually scroll through the website, clicking through any videos or opening any documents to save content. The benefit is that you can save the website in its original appearance and “replay” the experience of using the website. However, I’m not sure where you could put the WARC file to have other people access it in the long term.

1. Another option could be to manually save each photo, opinion piece, letter and news separately as individual files, impose a consistent naming system (e.g. name of author, date, title), and upload them into some sort of cloud service such as Google Drive or DropBox and provide a link to interested parties. The downside is that the appearance of the website would not be preserved, and the context of the documents will be less apparent.

I hope these resources can provide some assistance.

All the best,

Sincerely,

Olivia White, MI, MMSt
Digital Preservation Archivist – Simcoe County Archives