Great work! Glad to see it come together properly! Support well spent
Thank you everyone for the immense support, I really appreciate it!
Special thank you to Kegan and Baheeg for their generous donations, it really helps with keeping the servers up!
Iāve gotten a few requests for people to remove posts they made in the past in fear that it is embarrassing, exposes personal information, or is regretful. I am stuck, especially on certain reasoning, over whether to accept these requests or not. The archive is not only something for people to explore but represents a preservation of history, Iām stuck between the historical integrity of the archive and a few peopleās comforts.
I would like to clarify that it is difficult for me to and I have no desire to edit the raw archive files shared here, and that if I decide to remove posts at all it will only be on the publicly accessible website. This will not stop people who use the raw files and host their own websites.
I would appreciate peopleās thoughts on the topic because Iām unsure how to approach it.
Do not remove them! These are funny classics, and they donāt harm us on the real Roblox site! And if you still look at Roblox when youāre older, youāll wanna see how cringy you were because of how funny and simple it was! Blast from the past > deleting now, because you might get embarrassed doesnāt mean you should remove the posts.
I really hope you donāt remove them
Iād say it sounds good. Would say that considering the fact that people already likely downloaded the raw files, removing the info in there wouldnāt work too well anyhow.
Raw snapshots will be deleted 24 hours from the time of this post. If you want to work with the raw archive files easily copy them as soon as possible. The raw files will remain in .tar files on the compiled disk, however untarring the files is time consuming.
Man, there is some pretty cringe stuff in hereā¦
(I mean hey at least I got over 50,000 robux nowā¦)
Perhaps in the future. There are actually groups of very complete forum archives on archive.org done by a different team (Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine), but they are completely unprocessed and include data for every single page (including replies and nonexistent posts), not to mention difficult to access. My archive is a bit more practical and includes only existing threads with replies. The raw files for my archive will still exist but are inaccessible to the public and once I am able to make the SQL dump available I will keep it up for longer.
Iāve been notified about an issue regarding the post count number for users. If you changed your username it may be displayed lower than it really is even though all your posts are actually archived.
The reason is because when displaying old posts Roblox will show the post count of that specific username, so your post count for that username would have stopped as soon as you changed your username. Even though your post count carries to your next username, the old one does not include the new usernameās posts.
The import did not account for this so oftentimes it copied the post count from an old username instead of the newest one. I am looking into fixing this soon along with fixing the error with pages not displaying over 1000 pages (turns out there are posts over 18000 pages long).
Now Iām curious which one has the most. Iām assuming one of those forum game threads? I remember one which got huge and pretty much never really stopped getting bumped up to my recent threads for the time I had my first username.
Most if not all of the posts over 1000 pages are botted posts where they were only able to get so long because they were in deleted subforums where mods couldnāt see the spam.
Aww, thatās a shame. It wouldāve been amazing to see such a vast thread of legitimate posts. Guess thereās gotta be something interesting in a similar regard, if I start digging deep enough.
Yeah. Also unfortunately some posts with very high page counts were not saved correctly, so even the one with the most pages that I have may not be the actual post with the most pages (but it could be, the failures were random and not based on the number of pages).
Posts over 1000 pages should display correctly now (some posts may take up to 24 hours to update). The longest thread (that I have) is 8242472 with 18527 pages and 463167. This thread alone constitutes .2% of the entire Roblox forum. There have been suggestions that the longest thread is around 60 thousand pages, but if this is true not all the pages were successfully archived.
Iām late but this is an AWESOME idea!
Hi guys.
I apologize for the lack of updates. Unfortunately Iāve been extremely busy with other things yet I still took on rewriting the entire siteās codebase, making work extremely slow and having not much to show for it.
Iāve fixed one bug that displayed post counts (and usernames) incorrectly if a user had changed their username before. It was due to Roblox not updating post counts on old posts if a user changed their username, something I was not aware of when I initially constructed the archive.
Despite the very generous donations from many of you, and me putting 100% of those donations toward server costs, money ran out a couple months ago. Server costs are extremely high because of the large database and I have been paying (quite a lot) out of pocket recently.
The site is still running at the time Iām writing this but without donations I cannot see the site being kept up much longer. I would really like to keep the database some how because it can be very tedious and even expensive to construct it from the raw files. I will continue paying out of pocket to keep the raw files because they are cheaper to store.
Downloads Available
The one update I have is that the raw files are once again publicly available (as the snapshot downloads went down quite a while ago). This time they are in a public s3 bucket named ārbx-archiveā located in the US West (Oregon) region. The bucket contains 154964 gzipped folders. The name of each file describes which threads it contains. For example, file ā100000229-1.gz~100001184-1.gzā contains all threads between page 1 of thread 100000229 and page 1 of thread 100001184 inclusive. Each folder typically contains up to 200 gzipped files that each represent one page of one thread. For example, the file ā100000229-1.gzā has page 1 of the thread with the postID 100000229. Note this does not allow you to find a post directly, you have to know which thread it belongs to and which page of the thread it is on. An index exists but is currently not available publicly.
You will need knowledge of amazon s3 to extract these files. To download the entire archive you will likely need to do it programatically. Please refer to amazonās documentation if you are interested in doing this.
The s3 bucket is ārequester paysā, which means that the person downloading pays for the data transfer costs associated with the download. On a small scale this is nothing: if you want to download the entire 482.8 GB, however, it will cost somewhere around $50 (there may be a loophole using lightsail - if you are very serious about downloading the archive and you know how you can contact me for a cheaper method).
Again, because the cost of hosting the files on s3 is not significant, I will continue to pay it for the time being. It is really the database which allows for searching and immediate retrieval that is very expensive.
I will be forced to shut down the archive in the next few days without donations. I may or may not be able to get a database dump out.
ā¦this is my first time seeing this thread. Holy smokes, man, youāve captured history right here. I just looked back through my post history and dang:
- 2008 HanSolo996 should not have been allowed on a computer
- The memories. Wow.
- For being such a large online community, there were still tons of little tight-knit clusters of friends
What are you rmonthly costs? What would it take to keep this running? Iām interested in pitching in some funding to help keep this alive (at least for a database dump).
I deeply appreciate you having done this. You keeping this archive alive for so long has given me ample time to reread and archive everything on the old forum that I cared about.
Thank you.
The monthly cost is around $50 a month, which adds up real quickly. Considering the archive went live 7 months ago the total cost just for hosting is around $350. This isnāt even counting the original archive process last December or the database import, which cost an additional $100-$200 (at least).
Iād say itās already a feat that the donations paid for hosting for 6 or so months (the archiving costs came from me), but theyāve been dwindling ever since release and itās going to start digging back in my pockets which just arenāt deep enough.