Elasticsearch for Analytics and Why It May Be Useful for Your Game

fizzypine007 · December 30, 2022, 8:32am

*This post is based on my limited experience implementing elasticsearch as a full analytics service and user-data storage. This post will only be about the learnings of implementing it as an analytics service. Implementing it to store user data may come in a separate post later down the line.

If you have any questions, please let me know in the comments or in my discord DMs, and I will do my best to answer them!

With that out of the way

Elasticsearch for Analytics and Why It May Be Useful for Your Game

Introduction

Game development is a complex problem, and some of you may have wondered at some point or another, “I wish I knew exactly what my users like the most in the game” or “Why is my game lagging in this exact part of the map?” Or even, “Can I spend more time of my day looking at numbers going up and down with pretty colors?” All of these can be answered using logging and analytics!

Some people use services like GameAnalytics, Super Biz, or the now-depreciated Playfab through AnalyticsService. Another solution that is more in line with commercial game development is running your own analytics, logging, and APM (Application Performance Monitoring) system through a platform like Elasticsearch

Warning: These “Enterprise” systems like Elasticsearch and the entire ELK Stack (Elasticsearch, Logstash, Kibana) can get very technically involved, can get very expensive if set up incorrectly/inefficiently, and inefficient setups can become very difficult to recover from without losing a ton of data. But they can also give us a ton of insight into our games, to tell us what players are enjoying, what they aren’t, how the game may be breaking and when, to better tailor your localization, and more.

Problem Statement

“So why not use something free like GameAnalytics, Google Analytics, or one of the other tons of free commercial off-the-shelf (COTS) analytics services I can find online?” Honestly, no reason you can’t; in fact, you probably should and will be perfectly fine using one of these free services forever. But that’s not to say they are perfect or will do everything you want them to do, let me list some of my significant issues with these services.

Lack of Data Control
Limited Flexibility
Lack of Visibility
Vendor lock-in

Overview of Elasticsearch

“Okay, okay, I See. But what even is this Elasticsearch you keep talking about, and why should I care?”

Elasticsearch is defined as “A distributed, RESTful search and analytics engine capable of solving a growing number of use cases, able to index, search, and analyze large volumes of data quickly and in near real-time. Elasticsearch is a NoSQL document-oriented database, which means it stores and manages data in the form of JSON documents. These documents can contain a wide range of data types, including strings, numbers, Booleans, dates, and arrays.”

And as for why you should care? I laid out a couple of issues I had with other analytics services above that Elasticsearch just so happens to solve, so let me step through those.

Data Control

When you use a third-party service, you give up some control over the data you collect from your users; in the Roblox game dev industry, this isn’t too big of a deal, but coming from a security background, being able to ensure that the servers I put my players’ data on to are secure and up to date is vital to me.

Flexibility

COTS solutions are often designed to be generic and cater to a wide range of use cases. This means that they may not be able to fully meet the specific needs of your game. For example, I may need to collect data that these services don’t allow me to collect, such as chat logs, moderation events, POI visits, and other custom events and data types. They may also not let me look back at my data beyond a certain point in time or run let me advanced queries on my data to find specific occurrences of bugs or find relationships between some data points. You may also want to extend the functionality of your analytics service to do something new; Elasticsearch has a built-in module for security that you could use to run analytics-driven anti-cheat or storing logs and validating that every crate/case/bag/egg/whatever you call your lootboxes are valid and have not been duped.

Visibility

These services may not let you see the raw data behind the analytics and insights they show you. They may also have you add some black-box code where you may not be sure what’s going on in the background. This can lead to serious issues if you need to diagnose a problem with the game or even with the analytics the game is outputting.

Vendor lock-in

You may be tied to using a specific vendor’s product and services. If you’re unhappy with the COTS solution or the vendor goes out of business, switching to a different solution can be challenging or even impossible. With Elasticsearch and other self-hosted options, there is usually a way to dump all the data in your database to CSV or JSON if you want to move off the platform.

And my favorite quality of life things:

Search Capability
Need to look for a particular trade someone did since they opened a ticked in your support community and need to verify it’s legit? Investor asks you for some Key Performance indicators? Want to know how well your latest update is doing? Easy done, Elasticsearch is well known for its powerful search capabilities, which can be helpful in analyzing and querying large amounts of data.
Real-time analytics
Do you want to A/B test some UI functions or see how your game is responding to an update in real time? Wait no more for discord pings from users; check the dashboard for the new update and see how much people are using the latest thing you put in the game.

And those are just some of the things I have run into that made me want to move our game to Elasticsearch and spin up our own infrastructure for analytics. We did run a couple other tests that are outside the scope of this post, I talked about one in the Lack of Flexibility section, but we also looked at transferring user data to Elasticsearch to give moderators an off-site look at data before Open Cloud was announced.

The limitations of Elasticsearch

Now - I’ll sing the praises for Elasticsearch all day, but it’s not a one-stop shop for perfection. It has its limitations, and here are (What I find to be) the biggest ones:

Security
I mentioned earlier that I have a security background, and it puts in work for elastic. We aren’t storing any sensitive data or PII on our elastic instances (and you shouldn’t either), but if you are running a more complicated setup (like if you are a large studio and are hosting some sensitive internal metrics), it is vital that you have safeguards in place to protect your systems. But overall, backups and good general cyber-hygiene go a long way.
Cost
Elasticsearch is a commercial product, and the configuration I recommend and will be using for the tutorial does come with some steep prices after the trial. While a free, open-source version is available, it may only include some of the features you need. The cost of using Elasticsearch will depend on your usage, but it can become quite expensive if you need to scale up your usage or if you need to use additional features and don’t configure your services correctly.
Complexity and Expertise
Setting up and maintaining your own Elasticsearch infrastructure can be complex, especially if you are unfamiliar with it. It requires installing and configuring the Elasticsearch software, setting up servers, and managing the underlying infrastructure. Additionally, using Elasticsearch effectively requires a certain level of expertise. You will need to have a good understanding of the technology and how it works, and you may need to hire or train staff to use it effectively as you scale up operations.
Data Volume
This ties into Complexity a lot; depending on the size and popularity of your game, you may need to handle a large volume of data. This can be challenging, especially if you are unfamiliar with handling big data.

Using Elasticsearch for analytics in a Roblox game

Congrats on reading this far and being this interested in Elastic for Roblox. Take a quick break, and get some water because this is where things get technical.

Preparing our game for Elasticsearch, and Elasticsearch for our game

Let’s start off with getting elastic and the game set up.

For this tutorial, I will be making a new elastic cloud account (all the default settings are fine, name it whatever you want) and will be basing all of the code off of the capture-the-flag template provided in Studio.

Once you initialize the elastic stack on the cloud, it will ask you what you want to focus on. Choose anything or skip over it, as we will ignore elastic for a little while as we get Roblox set up to transmit. You will want to get far enough to where you can see the hamburger menu in the top left of the screen, select that, then select “Manage this deployment” you will want to copy the Elasticsearch endpoint found in the image below and put it somewhere safe.

Then in the sidebar, go to Elasticsearch > API Console and paste the following code with HTTP method POST to /_security/api_key. Copy the “encoded” response somewhere safe; we will need that and the endpoint in a moment.

{
  "name": "testing-api-key",
  "role_descriptors": { 
    "role-a": {
      "cluster": ["all"],
      "index": [
        {
          "names": ["*"],
          "privileges": ["all"]
        }
      ]
    }
  },
  "metadata": {
    "application": "Roblox Testing",
    "environment": {
       "level": 1,
       "trusted": true,
       "tags": ["dev", "staging"]
    }
  }
}

In Roblox Studio, if you have not already, we are going to want to go to File > Game Settings > Security and enable “Allow HTTP Requests” and “Enable Studio Access to API Services”

We also need to make some scripts and functions inside our workspace so we can send data to the instance - I’m going to say now most of these scripts are probably super unoptimized and terrible; sorry, I don’t deal with lua much, as I’m normally in the data caves, spelunking through logs, feel free to tell me what I did wrong. But I don’t recommend editing the Http and HttpServerScript Scripts too much, as they contain some kajiggery that this implementation needs to function correctly

File Structure

Http

-- Get the HttpService
local http = game:GetService("HttpService")

-- Define the module
local module = {
	-- Define the request function
	request = function(url,method,headers,body)
		-- Initialize variables to store the response body and headers
		local bData,hData = nil,nil

		-- Create a table with a return function that stores the response body and headers, as well as fields for the URL, method, headers, and body
		table.insert(shared.requestQueue,#shared.requestQueue+1,
			{
				returnFunction = function(b,h)
					bData,hData = b,h
				end,
				url = url,
				method = method,
				headers = headers,
				body = body
			}
		)

		-- Wait until the return function has been called
		local st = tick()
		repeat wait() until bData

		-- If a response was received, return the response body and headers
		if bData then
			return bData,hData
			-- If no response was received after 120 seconds, print a warning message
		elseif tick()-st > 120 then
			warn("request timed out after 120 seconds",url,method,headers,body)
		end
	end
}

-- Define the updateUserMeta function
module.updateUserMeta = function(stringId,func)
	-- Convert the string ID to a string
	stringId = tostring(stringId)
	if stringId then
		-- Wait until the meta template is available
		while not shared.metaTemplate do task.wait() end

		-- Get the URL for the user's metadata
		local url = module.getUrl("users")

		-- Retrieve the user's metadata from the server
		local rawData = module.request(url,"GET",{UID=stringId,FileName="meta"})
		rawData = rawData and http:JSONDecode(rawData)

		-- Initialize a variable to store the metadata
		local data = nil

		-- If the raw data contains a status code, check if the status code is 204 (No Content)
		if rawData and rawData.statusCode then
			if tonumber(rawData.statusCode) ~= 204 then
				-- If the status code is not 204, set the data to the raw data body
				data = rawData.body
			end
		end

		-- If no data was set, set it to a copy of the meta template
		data = data or shared.env.dupeTable(shared.metaTemplate)

		-- Pass the data to the provided function for modification
		func(data)

		-- Encode the modified data as JSON
		local encoded = http:JSONEncode(data)
		print("updating to\n"..encoded)

		-- Send the modified data back to the server
		module.request(url,"PUT",{UID=stringId,FileName="meta"},encoded)
	end
end

-- Define the JSONEncode function
function module:JSONEncode(...) 
	-- Delegate to the HttpService's JSONEncode function
	return http:JSONEncode(...) 
end

-- Define the JSONDecode function
function module:JSONDecode(...) 
	-- Delegate to the HttpService's JSONDecode function
	return http:JSONDecode(...) 
end

-- Set the shared.request field to the module's request function
shared.request = module.request

-- Set the shared.getUrl field to the module's getUrl function
shared.getUrl = module.getUrl

-- Return the module
return module

HttpServerScript

-- Define a table to store the request queue
shared.requestQueue = {}

-- Keep track of the last request made
local lastRequest = 0

-- Get the RunService and HttpService
local runService = game:GetService("RunService")
local http = game:GetService("HttpService")

-- Bind the script to close when the request queue is empty
game:BindToClose(function() repeat wait() until #shared.requestQueue == 0 end)

-- Define a function to make HTTP requests
local function request(url,method,headers,body)
	-- Set the "Content-Type" header to "application/json" if not specified
	headers = headers or {}
	headers["Content-Type"] = "application/json"

	-- Convert any non-string values in the headers table to strings
	for i,v in pairs(headers) do
		if type(v) ~= "string" then
			if type(v) == "table" then
				headers[i] = http:JSONEncode(v)
			else
				headers[i] = tostring(v)
			end
		end
	end

	-- Make an asynchronous HTTP request
	local response = http:RequestAsync({
		Url = url,
		Method = method,
		Headers = headers,
		Body = body
	})

	-- If the request is successful, return the response body and headers
	if response.Success then
		print("Request succeeded:",response.StatusCode,response.StatusMessage)
		return response.Body,response.Headers
	else
		-- If the request fails, print a warning message and return the response status code and message
		warn("Request failed:",response.StatusCode,response.StatusMessage)
		print("Response body:",response.Body)
		if response.Headers then
			warn("Headers:")
			for i,v in pairs(response.Headers) do
				warn("   "..i..":",v)
			end
		end
	end
end

-- Process requests in the queue with a delay of half a second between each request
while true do
	-- Wait for the RunService.Stepped event
	runService.Stepped:wait()

	-- Check if half a second has passed since the last request was made
	if tick()-lastRequest > .5 then
		-- Get the next request in the queue
		local nextRequest = shared.requestQueue[1]
		if nextRequest then
			-- Update the time of the last request made
			lastRequest = tick()

			-- Unpack the request arguments
			local url,method,headers,body = nextRequest.url,nextRequest.method,nextRequest.headers,nextRequest.body

			-- Call the request function and pass the return value to the request's return function
			pcall(function() nextRequest.returnFunction(request(url,method,headers,body)) end)

			-- Remove the request from the queue
			table.remove(shared.requestQueue,1)
		end
	end
end

ElasticAPI

local module = {}

function module:Log(c,f) script.Send:Fire(c,f) end

return module

ElasticAPIServerScript

-- Set the default place ID
local defaultPlaceId = 1234567890

-- Define a table of place IDs and names
local placeids = {
	[1234567890] = "Main",
	[0987654321] = "Testing"
}

-- Initialize the lastSend and toSend variables
local lastSend = 0
local toSend = {}

-- Require the Http module
local http = require(script.Parent.Parent:WaitForChild("Http"))

-- Get the RunService
local runService = game:GetService("RunService")

-- Define the send function
function send(category,fields)
	-- Construct the URL for the Elasticsearch server
	local url = "https://{ELASTICSEARCH.ENDPOINT}/"..category.."/_bulk/"

	-- If the category and fields are both provided
	if category and fields then
		-- Set the timestamp field to the current date and time
		fields.timestamp = DateTime.now():ToIsoDate()

		-- Set the placeid field to the name of the current place or the place ID if the place is not recognized
		fields.placeid = placeids[game.PlaceId] or tostring(game.PlaceId)

		-- Encode the fields table as JSON
		local json = [[{ "create" : { "_index" : "]].."analyticstest-"..category.."-"..os.date("%y")..[[" } }]]
		json = json .. "\n" .. http:JSONEncode(fields) .. "\n"

		-- Print the JSON
		print(json)

		-- Send the JSON to the server
		http.request(url,"POST",{Authorization="ApiKey {APIKEY}"},json)
	end
end

-- Set up a connection to the Send event in the script.Parent script
script.Parent:WaitForChild("Send").Event:connect(function(c,f) send(c,f) end)

***NOTE: IN ELASTICAPISERVERSCRIPT, CHANGE THE URL TO THE Elasticsearch ENDPOINT ADDRESS FOUND ON THE DEPLOYMENT INFO PAGE, AND THE API KEY YOU GOT BEFORE

Let’s talk about Indexes

There are a lot of lines in the code I just had you copy, but two in particular in the ElasticAPIServerScript that make the world turn around.

local json = [[{ "create" : { "_index" : "]].."analyticstest-"..category.."-"..os.date("%y")..[[" } }]] 

json = json .. "\n" .. http:JSONEncode(fields) .. "\n

This defines the index that we want to send data to. You may remember that I said Elasticsearch is a document-oriented NoSQL database; well, here is the document part of the statement. An index is a collection of documents that have similar characteristics. An index is a logical namespace to organize your data and can be compared to a database in a traditional relational database management system, or a DataStore in traditional Roblox data management.

You can create an index in Elasticsearch by sending a POST request to the Elasticsearch server with the name of the index and the settings and mappings that you want to use, or the way we are going to do it is define an index mapping in the GUI and have the system sort out what index each document should go to by including the _index field in the document.

and on that note

Lets create your first Index

Now, while I would love to go on and on about how to create and organize a massive amount of indexes, this post is already at 20,000 characters, so I’m going to show you how to create one, and you can do the rest

This index is going to be very basic in the info it logs; it will get updated when a player leaves, and will log their UserID, Locale, membership_type (if they have premium or not), Country code, Platform, and session time. Not the most insane stuff, but enough to get you going.

In Kibana (The web frontend for Elasticsearch), if you are on the cloud, when you log in it will ask you to add some integrations. Feel free to scroll through these to see what they have, but our implementation doesn’t need anything right now. Open up the hamburger menu in the top left, scroll down to Management, Select Stack Management, and on the management screen on the sidebar under Data, select Index Management. This should show no indices, but if it does, it’s nothing to worry about. At the top, select Index Templates and select the blue “Create Template” button.

There is a lot of technical info you can get into with templates, and I highly suggest looking through the documentation for Index Templates if you get a chance. For now, we are going to name the index template whatever you want, I’m naming mine “analytics-leave-test” and for my index pattern, I’m going to use analyticstest-leave* this will match any new index that starts with “analyticstest-leave” letting us add dates or other strings to the end if we need to split up some indices later on. We can leave Data stream, Priority, Version, and _meta field as default. We can also skip component templates, as none of those fit our use case.

Index Settings can also be left blank as the defaults are very safe and good for what we want in testing. If you plan on having very large indices, you should look into these options, but I will leave that experimentation to you.

Mappings can also be safely skipped in most instances; however, I will do them today for demonstration; I also recommend configuring them for essential or complex objects that may need to be handled as keywords, with runtime fields or other advanced settings. The Documentation on mapping can be found here There is also an aliases section which can be safely ignored. For us, the mapping should look like below

{
  "mappings": {
    "dynamic": "true",
    "dynamic_date_formats": [
      "strict_date_optional_time",
      "yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
    ],
    "dynamic_templates": [],
    "date_detection": true,
    "numeric_detection": false,
    "properties": {
      "country": {
        "type": "keyword"
      },
      "locale": {
        "type": "keyword"
      },
      "mebership_type": {
        "type": "keyword"
      },
      "plr_id": {
        "type": "keyword"
      },
      "session_time": {
        "type": "long"
      }
    }
  }
}

Alright, now that we got that out of the way, let’s create the template and add some more Lua to our game.

The Lua is going to look a lot more familiar and should be pretty straightforward. We are just going to a script in ServerScriptService, call it AnalyticsScript, make an OnJoin and OnLeave function and add some of the analytics we want to collect, for example:

-- Roblox Services
local LocalizationService = game:GetService("LocalizationService")
local players = game:GetService("Players")
local Players = game.Players

-- Game Services
local elastic = require(game.ServerScriptService.ElasticAPI)
local player = game.Players.LocalPlayer

join_time = 0
leave_time = 0
member_type = nil
session_time = 0

local function OnJoin(player)
	join_time = os.time()
end

local function OnLeave(player)
	-- Get Country/Region code for a player, better estimation of userbase than locale ID
	local countryResult, countryCode = pcall(function()
		return LocalizationService:GetCountryRegionForPlayerAsync(player)
	end)

	if player.MembershipType == Enum.MembershipType.Premium then
		member_type="Premium"
	else
		member_type="None"
	end
	
	leave_time = os.time()

	session_time = leave_time - join_time

	-- Fire elastic on player leave
	elastic:Log("leave", {plr_id=player.UserId, locale=player.LocaleId, mebership_type=member_type, country=countryCode, session_time = session_time})
end

Players.PlayerAdded:connect(OnJoin)
Players.PlayerRemoving:connect(OnLeave)

Great! Now, if we set everything up right, we should be able to go to our elastic deployments menu, open Kibana, open our analytics discover menu, and see our raw data! But we can’t. And that brings me to my nicely manufactured segue to my next section.

Visualizing data with Kibana

We have data! But we have a few more steps before seeing all our beautiful data. To start, we have to make a data view. This is EZPZ, up in the top left, just under our hamburger menu, you should see a blue box that says logs-* we want to drop this down, select “create a data veiw” and we should be able to see the index we just made, along with any default data streams and indices elastic made before. Feel free to name this what you want, I named mine “analytics test - leave” and the index pattern is “analyticstest-leave*” this will match our current index and any other index that gets made that starts with analyticstest-leave. We do have a timestamp field that we didn’t map manually that gets auto-added to any request (this is in the ElasticAPIServerScript), so we can select that for our Timestamp field. We can safely ignore the advanced settings for this view. Now save the data view, and elastic should auto-set that as your data view, and voila, data!

Now take your time on this screen, and if elastic tells you to take a tour of the screen, I highly recommend it as there is some cool stuff you can do here with the field selections and KQL (Kibana Query Language).

Once you get comfortable in the discover window, let’s open up the hamburger menu again and go over to the dashboards in the analytics menu. There may be a couple there elastic auto creates for agent metrics or any other modules you installed at the beginning, but we want to dive right in and create a dashboard using the button in the top right.

If you are like me, a blank dashboard is like a blank canvas ready to be painted on. If you are normal, this may be confusing, and I’m not going to lie to you; it can be, but for what we want, it should be simple.

Start off by selecting “create visualization.” Once that opens, we have to change the data view to the one we just made, so go ahead and change that. Additionally, we may need to change the time view to see the data we made. On the right side of the top bar, right next to the refresh button, select the calendar icon and increase the time frame to last 24 hours, or however long it has been since you started importing data.

To start, we can drag in any of our fields, i’m going to start with country. This will (if you are the only person that has been testing this) make a very beautiful singular vertical bar graph with one value. As humans, we know that there is a better way to represent countries than by a bar graph, and as of recently, so does elastic. Down at the bottom of the visualization, it will show you some suggestions; we want the one that looks like puzzle pieces. That will make us a map, with our country (or any users that have joined the games) so it should look a bit like this

Now go ahead and click save and return up in the top right corner. Go ahead and explore making some visualizations; below is a dashboard from the game I do this for primarily to show you the kinds of dashboards you can make with some more data and more experience. This dashboard uses some of the stats I showed you how to collect (User Country, Session Time) as well as some other ones like what people are purchasing, where people are going in the map, how many people are getting themselves banned each day, and player counts.

Conclusion

TL;DR: Elasticsearch and Kibana are excellent products that can help you diagnose game issues, log moderation events, and other helpful analytics for your game. I only touched on not even the basics of Elasticsearch and it’s specific function for me as a game dev, and I hope this guide will lead you to at least trying it out for the 14-day trial they give you.

Elastic is a massively extensible tool that can also aid in application debugging and anti-cheat/security (which I plan to cover in a future thread). Being a 3rd party service, you can integrate this into a discord bot or a website if you want your users or moderators to be able to modify data or perform trades or other transactions while not in the game.

If you need any help with Elasticsearch, the setup, integration, or what analytics may be helpful for your game, let me know in the comments or in a DM on here or discord, and I would love to answer your questions

unix_system · January 18, 2023, 9:04am

This is awesome! I’ve used ElasticSearch in logging a huge amount of data from Roblox games and also run / managed it professionally in enterprise. I think it’d be tremendously helpful for people as it is a lot more powerful than some other platforms.

I’ll add, though, that heavier logging (e.g. what I was doing, which was logging every RemoteEvent Fire), can quickly expense your HTTP budget if you’re not careful (or you might lose some data). FYI, if you use the legacy HttpService GetAsync/PostAsync, you can actually gzip compress the data which will reduce the total amount of data sent over. Elastic can actually support this compression if you edit the elasticsearch.yml to mark it to accept compressed requests.

Also worth making people being aware of retention; Elastic is heavily index based, so it’s important for people make sure their indexes are in fact date-based so that you can create index management rules to delete data beyond a certain period (or the storage costs will add up!)

If you really want to shift large amounts of data through ElasticSearch, it may be worth creating a Logstash instance somewhere with a custom config which reads far more compressed “log-line” data and forwards a corrected formatted Elastic document to the cluster.

fizzypine007 · January 23, 2023, 10:44pm

This is amazing feedback, thanks! Didn’t know that Elastic supported gzip compressed data, will definitely look into that as I go forward. Although, i’ve been less concerned with the bandwidth of data we send over than the log lines we send over. although with the /bulk/ endpoint and some queueing we have been keeping well inside our HTTP budget.

Elastic is heavily index based, so it’s important for people make sure their indexes are in fact date-based so that you can create index management rules to delete data beyond a certain period

Yeah, this bit us super hard the first time we rolled Elastic, even with a small amount of data, if you have too many indexes in the way that i put in this guide, without configuring lifecycle management rules… it becomes very problematic very quickly.

Elasticsearch for Analytics and Why It May Be Useful for Your Game

Elasticsearch for Analytics and Why It May Be Useful for Your Game

Table of Contents

Introduction

Problem Statement

Overview of Elasticsearch

The limitations of Elasticsearch

Using Elasticsearch for analytics in a Roblox game

Preparing our game for Elasticsearch, and Elasticsearch for our game

Http

HttpServerScript

ElasticAPI

ElasticAPIServerScript

Let’s talk about Indexes

Lets create your first Index

Visualizing data with Kibana

Conclusion