Neural Network Training Data (408k+ Tweets)

krissynull · July 10, 2018, 6:59pm

I was thinking of making a neural network similar to Word2Vec and decided to use Twitter’s APIs to record tweets from all the people I’ve spoken to in the past week, the people they’ve spoken to in the past week, and repeat. The result was in 49 minutes I was able to save 408,000+ tweets from 2500+ users. If anybody else also has a usage for this set of data here’s the files needed.

Had to be uploaded to Mega since Discourse wouldn’t let me upload it. (File is 23 MB)
408k Tweets File

Place the rbxm folder in this script and it’ll convert it all into one table. (Was too lazy to convert it all into a module script)
Twitter Data to Table.rbxm (920 Bytes)

If you plan on doing a similar project to mine here is a script that’ll check if something is a word from a module that returns a table with 80,000+ words from the English language.
English Module.rbxm (310.9 KB)

This script will downloads the tweets from a specified UserId and people who they have spoken to in the past week. It doesn’t store any other data than the text in the tweet & ignores protected accounts. (I did not make the API module I don’t know who made the original but credit to them.)
Twitter Tweet Downloader.rbxm (4.9 KB)
To get a Twitter Account’s UserId I uses this website. You’ll also need to create a Twitter App then the consumer keys are under “Keys and Access Tokens” for the website I just put roblox.com.

I’m sure I could’ve made the storage method better as I don’t think storing it like a table and using string manipulation was the best method. Anyways I hope somebody found this set of data useful.