(Public mirrors: https://www.reddit.com/r/roblox/comments/7xg6ps/roblox_forum_archive_beta/, Roblox Forum Archive Beta)
I archived the entire Roblox forum last month and it is now available for public access at https://archive.froast.io. The source of this archive is available. Read below for more info.
First of all I’d like to thank the contributors who made this project possible. This was not cheap at all in terms of human and technical resources. I really appreciate donations and contributions from all of the following (and more).
- Metsfan2009
- Ravenshield
- Jaxon (anonymous - did not make contact with me after donating)
- CPTKurkIV
- FractalGeometry
- Happywalker
- Growncool7
- sanjay2003
Continued donations will allow me to keep the site ad-free and possibly add more features. You can donate on the site.
I did this for the community and will not keep this archive for myself to abuse. You will not be charged for viewing any posts or making any searched; however, please do not run bots on my website. If you want to get the archive for yourself, you can download it without slowing down the website for everyone else (see the Source section). I would even like to make the website code open source when I have the chance to.
The archive is currently in beta but the beta is mature: it is in beta because I have not had the time to add many features that I would like to. A list of currently available features and planned features is below.
Introduction
I was disappointed when Roblox announced that the forums were closing and that they weren't leaving a read-only copy of it anywhere on the internet. A lot of the forum deserved to be deleted but much of it is real history of real people that deserves to be preserved. Even though I learned about the closure late and had very little time, I quickly and sloppily wrote a program to archive the forum.I succeeded in archiving the entire forum and was left with tens of millions of little files containing hundreds of millions of posts. The reason I am releasing this archive over a month after the forum shut down (and the archive itself was complete) is because I’ve been working on a website to actually make the archive accessible (and school started after December which meant I didn’t have many chances to work on it).
I did not archive the forum just for the sake of archiving it. Although this is part of the reason, a big reason is allowing people easy access to any posts they may want to see or need from the past. Another archive was supposedly made by ArchiveTeam but this archive remains difficult to access, does not preserve the structure of the whole forum, and has no viable way of being searched. My archive actually fulfilled another project I had thought of before but never tried: an actual forum search.
Roblox’s searching by keyword didn’t work and searching by user barely worked, meaning posts that were not tracked or in their recently posted were essentially lost if the ID had not been recorded. The big feature I have added is the ability to view any single user’s posts. You can go back and time to see the first post you made and jump to anywhere in between. I would like to add searching by keyword as well but have not had much time and I do not want to delay the release of the archive any longer.
Source
All files, including raw files, are stored in cheap archive storage indefinitely. The AWS glacier vault, however, is difficult to access and will not be made public for the time being. Right now, the files are also stored on AWS EBS drives. Although much easier to access, storage costs are very expensive. Funds are limited: if you plan on working with the raw files, you are highly recommended to download them now. Once the SQL dump of the archive is released, it will remain available for longer.Even if you don’t need the files, I encourage others to download and spread the source files of the archive to others.
The following data are not available in this archive but consists of the most trivial pieces of forum posts:
- How many views the post has
- Whether the post was marked “popular”
- Whether the post was locked or not
- Whether the post was pinned or not
Raw
Every _thread_, including all its pages, was saved to its own file in the format `postId-page.gz` (for example, `20128158-1.gz`). This was done on 10 servers for 10 post ranges onto 10 different hard disks, all of which are on AWS.Below are the snapshot IDs for every single drive. Each drive is 100GB with about 50GB used on each. They each contain ranges of 23 million posts, therefore the first has posts 23 million through 46 million, the second has posts 46 million through 69 million, etc.
- snap-08acac73d5c493f0e
- snap-01ab1ecbbf927da8d
- snap-020e0bac39caed5c5
- snap-058d5c6fa4586f253
- snap-0220d0241806fb57c
- snap-005056bd489ad8f9a
- snap-0fdf1dcaca43114cc
- snap-02a98683b2874d0ff
- snap-064920a7e4cd5c612
- snap-0e8b4974f0f4bcbb1
Compiled
This is a compilation of the 10 drives into one drive of 500GB. It contains 11 tar files with the format `archive[number].tar`. Each archive contains 23 million posts. The 11th file is a compilation of all log files from both the archive process as well as the import process into a database.The snapshot ID is: snap-01cb07f8ec7df3c9b
SQL
This is by far the easiest form to work with. If you do not need to work with the raw files, you should just use the SQL database.A SQL database exists but I won’t be able to provide a dump right now. I will release information once I can produce one.
Features
- View all threads and posts made on the forum, including all pages of threads (that existed on December 21st)
- View threads made in subforums
- Search for posts made by a user
- Search for posts starting at a certain date for a user
- Roblox links are clickable, Roblox forum links will go to the archived post instead of Roblox
(Planned)
- Search for posts by keyword (posts in subforums, by a specific user, all posts, etc.) This may or may not be viable.
- Sort subforums by ascending or descending
- Search for posts after a certain date
(Would like to have) - Go to a specific page in a subforum (seems to be unviable with the current schema)
Known Issues
- Posts with over 1000 pages will not display page counts correctly. All posts will still exist and every page can be accessed, but the page counter will only display the last three digits of its actual total pages. This pretty much exclusively effects botted posts and isn’t a huge deal. I plan to fix this eventually.
- Some posts with hundreds of pages were not fully saved. Most, if not all of these are botted posts.
- Some posts were missed. It’s unsure how many, but so far I have only one confirmed post that should have been saved but was not.
Feel free to comment with questions or suggestions that are not already planned features.