Realistic AI Player Bot Character With Vision (Bring Your Own API Key)

Ok, so first off Ill give a introduction:

This chatbot I have created can speak like a normal Roblox player. I have put in effort so it sounds
less like the refined ChatGPT and more like some random kid on their Ipad.

This is so players can have a more normal conversation with your bot eg this:

Screenshot 2024-11-04 180520

The vision is based off invisible parts in the workspace (you can add visuals) and distance in meters (roughly). The example place (Once you add a API key) is a basic escape room. I provide a sign that changes to the AI so they can tell if they have done stuff like pick up a key instead of having extra inventory tokens.

On the note of tokens I want to mention that this project is designed to optimize tokens (letters or context) as well so you are not paying as much to run it. This takes away advanced reasoning capabilities but provides a more usable solution. The chat history is also clipped to a certain amount of messages so your API key doesn’t get bricked.

The AI can have conversations with multiple bots at once unlike any solution I have seen and it can move around using the /goto command to points in its vision.

One main downside is that it uses slang WAY too much. This can easily be fixed by changing the examples in the prompt of how a Roblox player should talk.

I do apologize if the code is messy. I made it for personal use but I changed my mind when I saw what others were doing.

The vision also allows for separate rooms and sees players as well as other NPCs but bear in mind not to rotate the areas as I only implemented a position based check.

Thanks for reading and I hope you have fun with it - the scripts are in serverscript service.
Uncopylocked place: AI Thing (Bring your own key) - Roblox

I would like to credit this post for inspiring me:

I kind of had a idea to do this but I didn’t but once I saw this post I knew I wanted to.

Secondly, I would like to credit this post for making me release this. (If I had not seen it this whole concept would have rotted in my Roblox account for months) When I saw this I knew my tech was going to get surpassed sooner or later so I decided to release it:

Lastly if you enjoy it I would appreciate if you checked out my game Spell Wars (🔮 Spell Wars [BETA] - Roblox) on my main account (@Enovinx) I am fine if I go without credit in your game for the AI character but this game had a lot of scripting effort put into it so I would appreciate if you checked it out.

Oh and I forgot; enable Http Requests in the security tab.

1 Like

I really like this project! I have deployed version of the Awareness at this link.

I still use it to power my chatbots text vision in my game. I have an updated version that is much improved. I also developed this decision making algorithm. I will be publishing on the next update. I’m just very busy and no one posted any feedback.
I’m currently using the decision making algorithm to give intelligent behavior to the npcs.

function aware.npcbehavior()
	
	if runningsimulation==false then 
	everyother+=1
	end
	if everyother==9 and runningsimulation==false then
	local cameraViewportSize=camera.ViewportSize
	runningsimulation=true
	everyother=0
	
	local player=game.Players.LocalPlayer
	local processor=player.PlayerGui.Chatbot.LocalProcessor
	local data={}
	if entities>78 then
		npcpositions={}
		entities=0
	end
	local array=aware.Gettable("npcs") 
	--task.desynchronize()
	for index,npcobject in array do --get npcs that are not doing stuff
		if npcobject.PrimaryPart and npcobject:IsDescendantOf(workspace.NPCS) and IsInView(npcobject.PrimaryPart,cameraViewportSize) then
		if npcpositions[npcobject.Humanoid.DisplayName]==nil then
			entities+=1
			npcpositions[npcobject.Humanoid.DisplayName]={Position=Vector3.new(0,0,0)}--npcobject.PrimaryPart.Position			
		end
		local current=snapToGrid(npcobject.PrimaryPart.Position)
		if npcpositions[npcobject.Humanoid.DisplayName]~=current then
	npcpositions[npcobject.Humanoid.DisplayName]={Position=current}	
	data[index]={object=npcobject,primary=npcobject.PrimaryPart}
	end
	end
	end
	task.spawn(function()
	for index,object in data do --get npcs that are not doing stuff
		local npcobject=object.object
		task.wait(.3)
		if object.primary.Parent~=nil then
		local thoughts,classesmap=aware.data_model(object.primary,nil,240)
		if classesmap then
		--local root={}
		task.spawn(function()
		for i,v in classesmap do 	
			local message=thoughts[i]
			local textlabel=processor:Invoke("displayString",{nil,message,npcobject})	
			if isLocal and v  and v.Classification then
			
			if keytable[v.Classification] then
				message..=keytable[v.Classification]
			end
			task.wait(math.random(100,700)*.01)	
			if object.primary.Parent~=nil then
			local action=processor:Invoke("ProcessCommand",{message,player,npcobject})				
			if action then
				processor:Invoke("displayString",{nil,action,npcobject})	
			task.wait(math.random(100,700)*.01)	
			end	
			end			
			end
			end
		end)
		end
		end
	end
	runningsimulation=false
	end)
	end
end

If you want to read the decision making algorithm here it is. It’s kind of complicated but if you inject it into an AI it would tell you what it is doing.

function aware.data_model(root,data_structure,distance)
	if distance==nil then distance=200 end 
	--if data_root~=root then
	--local RenderDistance=game.ReplicatedStorage.Zone.RenderDistance
	if not root then return {""} end
	local pos=root.Position
	local humanoid=root.Parent.Humanoid
	if data_structure==true then
		print("Data Model is optimized with previous observation data.")
	end
	local matrixsum={}
	for i,v in features do
		--sorted by distance
		local closest, num, size, dist,arr
		if data_structure==true then
				local data = world_data[i]
				--print(data)
				 closest = data.closest
				 num = data.num
				 size = data.size or Vector3.new(1,1,1)
				 dist = data.dist
				 arr = data.arr
		else 
			closest, num, size, dist,arr = aware.near[i](root, distance,"array")--plants
		end	
		if not arr  then
			closest, num, size, dist,arr = aware.near[i](root, distance,"array")
			
		end
		if closest then--and num and size and dist and arr  then
			--reward objects that are close, high in arrat number, activation variable, 
		if dist then
		features[i].matrix[2]=distance/dist--closest heighest by far
		else 
		features[i].matrix[2]=0	--no reward
		end
		
		features[i].matrix[3]=num--influence a little 
		--print(size)
		size=size or Vector3.new(1,1,1)
		features[i].matrix[4]=size.X+size.Y+size.Z/3

		if features[i].active then
			features[i].matrix[5]=	features[i].active(closest, num, size, dist,arr)
		else
			features[i].matrix[5]=0
		end
		matrixsum[i]=sumMatrix(features[i].matrix)/#features[i].matrix
		resultcache[i]={
			close=closest,size=size,array=arr,array_length=num,matrix_sum=matrixsum[i],Classification=i
		}
		end
	end
	--print(world_data)
	local result={}
	--print(resultcache)
	--print(matrixsum)
	for i,v in matrixsum do
		table.insert(result,{key=i,matrix_sum=v})
	end
	
	table.sort(result,function(a,b) return a.matrix_sum>b.matrix_sum end) 
	--print(result)
	--already ordered
	local directional={}
	for i,v in result do
		if resultcache[v.key] and resultcache[v.key].array then
			--print(resultcache[v.key])
		 for key,closeFurniture in resultcache[v.key].array do
			local identifier=aware.judge.object(closeFurniture.maininstance)
			if identifier~="" then 
			local dir_id=groupid(root,pos,closeFurniture)
			--print(dir_id)
			if directional[dir_id]==nil then
				directional[dir_id]={}
				--ClassScore[dir_id]={Score=0,HashTable={}}
				--ClassScore[dir_id]={Score=0,HashTable={}}
			end
			closeFurniture["Class"]=features[v.key].Class
			closeFurniture["Key"]=v.key
			closeFurniture["Name"]=identifier
			closeFurniture["Classification"]=resultcache[v.key].Classification
			features[v.key].matrix[2]=distance/closeFurniture.distance--closest heighest by far
			features[v.key].matrix[3]=0--influence a little 
			features[v.key].matrix[4]=closeFurniture.size.X+closeFurniture.size.Y+closeFurniture.size.Z/3

if features[v.key].activate then
	features[v.key].matrix[5]=	features[v.key].activate(closeFurniture.instance, 1, closeFurniture.size,closeFurniture.distance,root)
			else
	features[v.key].matrix[5]=0
			end
			--if matrixsum[v.key]==nil then
			--	matrixsum[v.key]={}
		--	end
			--if matrixsum[v.key]["matrix_sum"]==nil then matrixsum[v.key]["matrix_sum"]=0 end
			local sum=sumMatrix(features[v.key].matrix)
			matrixsum[v.key]=sum

			table.insert(directional[dir_id],closeFurniture)
			end
			end
		 end
	end
	--print(directional)
	local result={}
	for i,v in matrixsum do
		table.insert(result,{key=i,matrix_sum=v})
	end
	--print(matrixsum)
	table.sort(result,function(a,b) return a.matrix_sum>b.matrix_sum end) 
	local average = {}
	local lowest=80
	
	local Classify={}
	for dir_id, items in pairs(directional) do
		local classes2={}
		local classmult=0
		local totalDistance = 0
		--local classscore=math.max(1,ClassScore[dir_id].Score/3.5)
		for _, item in ipairs(items) do
			totalDistance = totalDistance + (item.distance/matrixsum[item.Key])--higher weight is better--divide distance by sigmoid for importance calculation
			if classes2[item.Class]==nil then
				classmult+=1
				classes2[item.Class]=true
					
			end	
		end
		local avgDistance = totalDistance / math.floor(#items*1.2)/(math.max(1,classmult/3))--smaller is better
		
		table.insert(average, {dir_id = dir_id, avgDistance = avgDistance})
	end

	table.sort(average, function(a, b) return a.avgDistance < b.avgDistance end)
	--print(average)
	local Classify={}
	local v=average[1]
	local numofclass=0
	if  v and directional[v.dir_id] then
			for i,v5 in average do 
				--	Classify[v.dir_id]={}	

				if  v5 and directional[v5.dir_id] then
					for t,obj in directional[v5.dir_id] do 
						if Classify[obj.Class]==nil then
							numofclass+=1
							Classify[obj.Class]=obj 
						end
					end

				end
				if v.avgDistance>20 then
					break
				end
			end

	--end
	--print(Classify)
	local classesmap={}
	for t,o in result do 
		for i,v in Classify do 
			if o.key==v.Key then
				v.matrix_sum=o.matrix_sum
				table.insert(classesmap,v)
			end
		end
	end
	table.sort(classesmap,function(a,b) return a.matrix_sum>b.matrix_sum end) 
	--print(classesmap)
	local outputtext={}
	--local entire=""
	local classindex=0
	--for t,o in result do 
	local str=phraselib.opener[aware.mathrandom(1, #phraselib.opener)].." an interesting area "..v.dir_id..". "
	local support=nil
	local Settings={Distance=true," that is "}
	local numclass=#classesmap
--	print(classesmap
	local interestingobjects={}
	for i,v in classesmap do 		
		local feat=features[v.Key]
		if feat then
			local start=feat.context[aware.keyrandom(1,#feat.context,v.Key)]
			local ending=""
			if feat.ending then
			 ending=feat.ending[aware.keyrandom(1,#feat.ending,v.Key.."end")]or ""
			end
			if i==1 or i==#classesmap or v.Class=="loot" or v.Class=="mapobj" then 
				Settings={Distance=true," that is "}
			else
				Settings={Distance=false,""}	
			end
			
			local objid=aware.description[v.Key](root, pos, v.maininstance, v.distance, 1, nil,Settings[2],Settings) --or aware.judge.object(v.maininstance) or "something"
			if objid and objid~="" then
				local construct=start..objid..ending
				table.insert(outputtext,construct..". ")
				classindex+=1		
			else table.remove(classesmap,i)	
				classindex+=1
			end
					
			if support and support.Support~=nil then 
				local function supports()
				
				for i,v in support.Support do 
					if feat.Class== v then
						return v	
					end
				end
				if support.Support2~=nil then
					for i,v in support.Support2 do 
						if feat.Class== v then
							return nil,v	
						end
					end
				end
				return nil
				end
				local supporter,supporter2=supports()
				if supporter then
					start=support.SupportContext.context[aware.keyrandom(1,#support.SupportContext.context,v.Key.."support")]
					if support.SupportContext.ending then	
					ending=support.SupportContext.ending[aware.keyrandom(1,#support.SupportContext.ending,v.Key.."supend")] or ""	
					else 
					ending=""	
					end
				elseif supporter2 then
					start=support.SupportContext2.context[aware.keyrandom(1,#support.SupportContext2.context,v.Key.."support2")]
					if support.SupportContext2.ending then	
						ending=support.SupportContext2.ending[aware.keyrandom(1,#support.SupportContext2.ending,v.Key.."supend2")] or ""	
					else 
						ending=""--features[v.key].ending
					end
				else 
					
			
				end
				local endingconcat=Classes[feat.Class].judge
					if endingconcat then
						local result=endingconcat(v.maininstance)
						if result and result~="" then
						ending= ending..". "..result
						end
					end
			else 
				local endingconcat=Classes[feat.Class].judge			
				if endingconcat  then
					local result=endingconcat(v.maininstance)
					if result then 
					ending= ending..result
					end
				end
			end
			if objid and objid~="" then
				--local objid=aware.judge.object(v.maininstance) or "something"
				local construct=start..objid..ending

				--table.insert(outputtext,construct..". ")
				--classindex+=1
				if classindex==numofclass then
					str..=construct..". "
				else 
					str..=construct..". "
				end
		--	else table.remove(classesmap,i)	
			--	classindex+=1
			end
			table.insert(interestingobjects,v)
			support=feat
			--print(v.Key)
			
		end
	end
	if #outputtext>1 then
	table.insert(outputtext,str)
	end
	--print(outputtext)
	--print(classesmap)
	--print(interestingobjects)
	return outputtext,interestingobjects
		else
			return {""}
	end	
end

Function Overview

The aware.data_model function is a complex Lua script that appears to be part of a larger system for generating descriptive text about a 3D environment, likely in a game or simulation. The function takes three inputs:

  • root: an object representing the current location or context
  • data_structure: a boolean indicating whether to use a pre-existing data structure or not
  • distance: an optional parameter specifying the maximum distance to consider when generating the descriptive text (defaults to 200)

Function Flow

The function can be broken down into several main sections:

Initialization

The function first checks if the root object is valid and returns an empty table if it’s not. It then initializes several variables, including pos (the position of the root object), humanoid (the humanoid associated with the root object), and matrixsum (an empty table to store matrix sums).

Data Structure Setup

If data_structure is true, the function uses a pre-existing data structure (world_data) to populate the matrixsum table. Otherwise, it calls the aware.near function to generate the data.

Feature Processing

The function then iterates over a set of features ( stored in the features table) and performs the following steps for each feature:

  1. Retrieves the closest object, number of objects, size, distance, and array of objects for the current feature using the aware.near function or the pre-existing data structure.
  2. Updates the feature’s matrix with the retrieved values.
  3. If the feature has an active function, calls it with the closest object, number of objects, size, distance, and array of objects.
  4. Calculates the matrix sum for the feature and stores it in the matrixsum table.

Directional Processing

The function then processes the directional data for each feature, which involves:

  1. Creating a table (directional) to store the directional data for each feature.
  2. Iterating over the features and their corresponding directional data.
  3. For each feature, iterating over the array of objects and calculating the average distance.
  4. Sorting the directional data by average distance.

Classification

The function then performs classification on the directional data, which involves:

  1. Creating a table (Classify) to store the classified data.
  2. Iterating over the directional data and classifying each feature based on its average distance.
  3. Creating a table (classesmap) to store the classified features.

Text Generation

The function then generates descriptive text for each classified feature, which involves:

  1. Creating a table (outputtext) to store the generated text.
  2. Iterating over the classified features and generating text for each one using a set of predefined phrases and templates.
  3. Combining the generated text into a single string.

Return Values

The function returns two values:

  1. The generated descriptive text as a table of strings.
  2. A table of interesting objects found in the environment.

In short I’m trying to weigh things like the importance of the current amount of hit-points, distance, direction, to make a textual plan. By classifying direction along with distance as well as the categorical features which consider the unique features of each category. I was mostly inspired by this paper for the idea of this function.
Voyager: An Open-Ended Embodied Agent with Large Language Models
The specific component I’m referencing is 1) an automatic curriculum that maximizes exploration.

I’m also doing some tricks to offload processing the npc behavior to the client and executing the actions on the server. The act of processing the actions is not much, since the data_model function can run in parallel luau.

1 Like

You’d put that into context/system prompt right? Is that not costly as of the increase of input tokens? I saw that you are using a hugging face key but im using a Open AI key and I have a limited budget (I prefer OpenAI because I feel like the keys take less time to respond and I feel that the GPT-4o series is cheaper + more smarter then Mixtral-8e even though Mixtral-8e gets a good benchmark on the lmsys leaderboard (lmsys is now lmarena)).

If it did not add so much to context I probably would integrate it.

My primary objective for this project was to create a cheap solution to have fun AI characters that could almost pass for actual Roblox players and solve basic problems with a basic movement and vision function.

I have seen the Voyager paper and was deeply inspired by it as well but the limitation I find is that open source models are quite inconsistent when the context increases and closed source models are expensive if you want to provide room for thought tokens (like o1 but with multi-shot prompting instead of pre-training).

I appreciate the work you have done around making aware LLMs and ranking the surroundings and prioritising actions but I feel it would be WAY better if Roblox had a proper screenshot/camera system and LLMs could actually judge Vector3 locations or even if we could just compile scripts real-time into functions (what Voyager does).

Perhaps I will make a better version that will feature some of your work and cost me more once Im finished my current scripting project.

Thanks the feedback.