About This Archive

MediaOCU Historical Archive (2010-2024)

← Back to Archive Home

Archive Overview

This archive preserves content from MediaOCU, Oklahoma City University's student newspaper, covering the period from 2010 to 2024. The original WordPress site was compromised in an attack, and this static archive was created using data recovered from the Internet Archive's Wayback Machine.

15
Years Archived
1,043
Full Articles
2,023
Article Headers
3,066
Total Article Pages
Purpose: This archive serves as a historical record of OCU student journalism and campus life. While the university has rebuilt their live site at mediaocu.com, this archive preserves the past for research, reference, and remembrance.

How This Archive Was Built

1. Data Recovery (Wayback Machine)

All content was recovered from the Internet Archive's Wayback Machine using the open-source tool wayback-machine-downloader by Hartator.

Recovery Process:

  1. Identified snapshots of mediaocu.com in the Wayback Machine (2010-2024)
  2. Downloaded HTML, CSS, JavaScript, and image files using wayback-machine-downloader
  3. Retrieved approximately 1.4GB of data with 10,595 HTML files
  4. Extracted WordPress database exports when available

2. Data Cleanup (Automated + AI)

The recovered WordPress site contained significant bloat and artifacts. Cleanup was performed using custom Python scripts and AI assistance (Claude by Anthropic).

Cleanup Tasks:

3. Redesign (AI-Assisted)

The original WordPress theme was stripped away and replaced with a clean, minimal design created with assistance from Claude (AI). The new design emphasizes:

Data Coverage by Year

The archive contains varying levels of completeness for each year, depending on what was captured by the Wayback Machine and how the content was stored in WordPress.

Year Full Articles Headers Only Total Pages Status
201064139203✓ Good
2011219459678✓ Excellent
2012244218462✓ Excellent
2013175259434✓ Excellent
20142997126◐ Partial
201555136191✓ Good
2016206132338✓ Excellent
201751104155✓ Good
20180128128✗ Headers Only
201902828✗ Headers Only
20200102102✗ Headers Only
20210154154✗ Headers Only
202204646✗ Headers Only
202301010✗ Headers Only
202401111✗ Headers Only

Status Definitions

Known Limitations & Missing Data

What We Know Is Missing

What Might Be Missing (Unknown Unknowns)

Note on Completeness: This archive represents the best possible recovery given the available data from the Wayback Machine. The Internet Archive's crawlers visit sites periodically, so content published between crawls or removed before archiving cannot be recovered.

Percentage Analysis

Based on article file analysis:

Tools & Technology Used

Data Recovery

Processing & Cleanup (Python Scripts)

The project includes 16 Python scripts organized in the maintenance/ directory:

All scripts include comprehensive documentation in maintenance/README.md and subfolder READMEs.

AI Assistance (Claude by Anthropic)

Claude, an AI assistant by Anthropic, was used extensively throughout this project:

Project Timeline: The entire recovery, cleanup, and redesign process was completed in a single collaborative session between the archive maintainer and Claude, demonstrating the power of AI-assisted archival work.

Documentation

Comprehensive documentation is organized in the docs/ directory:

Hosting & Deployment

Future Improvements

Potential enhancements for this archive:

Contact & Contributing

This archive is maintained as a service to the OCU community. If you have:

Please contact the archive maintainer or submit contributions via the project repository.

Acknowledgments: Thank you to the Internet Archive for preserving web history, to Hartator for the wayback-machine-downloader tool, and to Anthropic for Claude AI assistance. Most importantly, thank you to all MediaOCU student journalists whose work is preserved here.