Giving LLM Text Vision and Self Awareness in Luau [OPEN SOURCE]

Here’s an interesing youtube video discussing building councious AI systems.

yes but it would be more performant to use a GPU powered AI as BLIP instead of 50000 if statements, you are eating a lot of cpu power here for specific objects while BLIP could identify more forms and objects while eating less computing power
For the LLM itself I only see openai api as an option here as the gpt infrastructure is specially made to run on parallel processing to answer multiple people at the same time (running llama 2 on llama.cpp will require batching, edit: huggingface can provide u with free hosting thru gradio but u will get limited and banned if you have many instances running)

base64 does the job here
also in each screenshot U sent I see “FPS: 15” (cant assume its from the AI)

This forces people to use your procedural map generation algorithm so that the AI can recognize objects.

In conclusion the code you posted on the 1st post of this thread is not even close to AI, its not using natural language processing techniques and is not even having the ability to do so without some external api, I am pretty sure there is no way to “screenshot” something in game thru lua and get the bytes of it to base64 encode it and stream to some BLIP inference.
At the moment this module is basically useless as it wont understand anything u say on chat and will just output random strings based on nearest objects.
There are also custom ways of doing BLIP by constructing points in space which doesnt require taking a “screenshot” of the environment, problem might be with textured objects that will not have a color variable but aside that will recognize objects and forms

edit: now talking about performance, BLIP currently knows around 10 million times more things than ur small script does while taking even under 1 second to describe an image

what i see here is no where close to any modern LLM technique, just math random to choose answer
this type of code existed for decades and was never considered “LLM”, also table.insert is slow you could use array indexing instead

moral of the story this is not a LLM (Large Language Model) as there are no modern techniques related to that used and the title is wrong
edit: also in most videos u only have 1 instance running. 1 instance which has so many big tables to iterate thru to run this AI will lag phone players if ran on a client script, or will lag server if ran on it for 30 players at the same time

1 Like

You probably do not understand that there are people who play roblox at around 15 fps due to their specs, this is enough to give them freezes
I said that running on server is not even an option as running 30 instances of this will eat a lot of server CPU while running on client might have issues with some people’s phones, not everyone has a new gen phone running android 13 with better cpu than some laptops

My game runs incredibly smooth, I’ve always paid strict attention to that. I’ve always done modifications to reduce performance costs as well. Already done multiple tests on mobile and It works great, and future improvement will be made. Just last night I improved my searchquery function by 33% in terms of performance.

“My game”, perfectly said.
Some people struggle in under 20 fps on some games that only run a few scripts, you said this is a module, so this will run with probably your chatbox script that I saw on the forum some days ago, now add other game scripts on top of these 2.
Eh this seems fine for roblox but I don’t really see any in game use or anything revolutionary as u still need to put a lot of time into it to add game specific objects to the awareness algorithm, if your goal was recreating that youtube clip posted in 1st post then yeah its fine

Yes it is much deeper than that. I haven’t updated that post in a while because I’ve been completing this new system. But I’m not releasing it because it’s too powerful and would make my game less competitive in the game market. Also al lthe game objects for the algorithm are already setup, I’ve just had the map generator disabled for testing convienience. Since the world generator is a multi threaded system that actually does take a fair bit of processing from the perspective of a local machine, But I’ve done high speed flying tests to validate the performance of the world generator and made it run as efficient as possible, to the desired results on a server.
image

Yes when I see this screenshot I can totally agree that this is a Chatbot algorithm, but the title was not very explaining due to the “(Complete)” tricking me and I thought that its a LLM related post, yes in some cases you could pass this to a LLM but it might give you kinda bad output. A good approach would be a multi modal LLM that can understand both images and text, thats where an Awareness AI powered NPC will be at his full capacity

I’ve done a lot of testing, and it works great. My system is very elegant, and creates very convincing strings. I did this to limit API usage to scale this bot up. It records all of the interactions it makes with the AI models.
But in regards to your idea for a visual model you should understand how those work. They literally turn the picture into words.
So you can be like oh thats not AI, cause its not a machine learning function. But it actually is an AI that I specifically designed. You can evaluate its form of ‘intelligence’ by interacting with it. But it’s mainly designed to be a NPC companion for a MMORPG, that outputs educational and character relevant outputs.
Utilizing servers to process things is not very usable in a performant persepctive which is why I designed my bot to utilize AI when it needs to. and it also utilized zero-shot classifcation.

yeah im pretty familiar to them as I have fine tuned a model for my own application to help my users with problems, LLaVA model could power any awareness module that exists as its a multimodal LLM
difference between an actual AI powered model and a normal lua script module is that the AI can be used to determine what an object/person is doing inside the game faster than what u would implement by code

The machine learning function can be use to learn how to act in a open world but thats not text related and if you are going for a text output you can a make a api-call everytime someone does a query, or you could b. have a architecture that hdnles most queries in a conversational context.

cool idea instead of just awareness of objects you could try to also add explaining for zones as its a MMORPG, like explain that mobs could spawn there, what cool items you could get from there, maybe ways to defeat a boss and things like that

Yes I am utilizing the Zone module posted in the community resources! All of the area creeated by my world generator have randomly generated names based on their theme such as Greek/Roman themed areas are called something like “Island of Athens”, “Mountain of Sparta”, “Tower of Olympus”, etc it’s a very interesting project! But in total this architecture is like a base model and my future model will be a subcription option for players to access GPt-4

local strut={" We are in a place called "..aware.judgeobject(closestDungeon).."",
					"I believe, we are in an area called "..aware.judgeobject(closestDungeon).."",
					""..aware.judgeobject(closestDungeon).."... I believe that's what this place is called",
					"This area is called "..aware.judgeobject(closestDungeon).."",
					"If I recall correctly this place is known as "..aware.judgeobject(closestDungeon).."",
					"If my navigation skills are correct, this area is called "..aware.judgeobject(closestDungeon).."",
					"This place is known as "..aware.judgeobject(closestDungeon).."",
					""..aware.judgeobject(closestDungeon).." is the name of this land.",
					"According to my map, this place is called "..aware.judgeobject(closestDungeon).."",
					"I have heard that this place is called "..aware.judgeobject(closestDungeon).."",
					""..aware.judgeobject(closestDungeon)..", that's the name of this place.",
					"This location is called "..aware.judgeobject(closestDungeon).."",
					"From what I know, this place is known as "..aware.judgeobject(closestDungeon).."",
					"My compass tells me that this area is called "..aware.judgeobject(closestDungeon).."",
					"I have learned that this place is called "..aware.judgeobject(closestDungeon).."",
					""..aware.judgeobject(closestDungeon)..", that's the name of this area.",
					"This spot is called "..aware.judgeobject(closestDungeon).."",
					"Based on my information, this place is known as "..aware.judgeobject(closestDungeon).."",
					"My guidebook says that this area is called "..aware.judgeobject(closestDungeon).."",
					"I have been told that this place is called "..aware.judgeobject(closestDungeon)..""}
2 Likes

I have updated this module, it is now complete! I included all of the different strings and this can provide thousands of different outputs!
The main update I did today that spurred me to share this was my super efficient table handler!
After trying to implement my procedural world generator the childadded caused up to 1200ms ping! So here is a ‘lazy’ tablehandler that is super efficient. :slight_smile:

ticktime=workspace.GlobalSpells.Tick

local function tablehandler(location)
	-- Get the folder from the workspace
	local tableobj={}
	local mapObjectsFolder = location
	local result=mapObjectsFolder:GetChildren()
	
	return result
end

-- Call the function and assign it to a variable
--ticks=.6 seconds 100 ticks=60seconds 100ticks =1min
tablekeys={

["mapobject"]={{["datatable"]=nil, ["writetime"]=ticktime.Value, ["refreshrate"]=200}, workspace.MapObjects},

["plants"]={{["datatable"]=nil, ["writetime"]=ticktime.Value, ["refreshrate"]=300}, workspace.Plants},

["enemys"]={{["datatable"]=nil, ["writetime0]"]=ticktime.Value, ["refreshrate"]=100}, workspace.Enemys},

["rubble"]={{["datatable"]=nil, ["writetime"]=ticktime.Value, ["refreshrate"]=200}, workspace.Rubble},

["trees"]={{["datatable"]=nil, ["writetime"]=ticktime.Value, ["refreshrate"]=600}, workspace.Trees},

["chests"]={{["datatable"]=nil, ["writetime"]=ticktime.Value, ["refreshrate"]=300}, workspace.Chests},

["grounditems"]={{["datatable"]=nil, ["writetime"]=ticktime.Value, ["refreshrate"]=100}, workspace.GroundItems},

["houses"]={{["datatable"]=nil, ["writetime"]=ticktime.Value, ["refreshrate"]=500}, workspace.Houses},

["players"]={{["datatable"]=nil, ["writetime"]=ticktime.Value, ["refreshrate"]=100}, game.Players},

["dungeons"]={{["datatable"]=nil,["writetime"]=ticktime.Value, ["refreshrate"]=100}, workspace.AmbientAreas},

["npcs"]={{["datatable"]=nil, ["writetime"]=ticktime.Value,["refreshrate"]=100}, workspace.NPCS}

}
--aware.Gettable(Key)

function aware.Gettable(Key)
	if tablekeys[Key] then
		local results,location=tablekeys[Key][1],tablekeys[Key][2]	
		local timer=results["writetime"]
		local refresh=results["refreshrate"]
		local data=results["datatable"]
		if timer+refresh<ticktime.Value and data~=nil then
			return results["datatable"]
		else 
			tablekeys[Key]={{["datatable"]=tablehandler(location), ["writetime"]=ticktime.Value,["refreshrate"]=refresh},location}
			return tablekeys[Key][1]["datatable"]
		end	
	else 
		return nil
	end	
end
1 Like

I don’t think it’s right to say you’re creating an A.I. if all you’re doing is converting a 3D area into text and prompting chatGPT (or any other LLM, which you absolutely did not create). My point is that you are not creating the A.I. part of this whatsoever and you should stop thinking you are. This module might be useful for having a bot answer questions in-game via chatGPT, but to be fair, it’d be much easier and more comprehensible to hard-code responses. (tbh you already “hardcode” most of the responses anyway, an A.I. that can actually recognize objects based on their appearance would need a LOT of data)

Though to be completely honest the idea of creating an “algorithm” which can describe a 3D area in text is very interesting.

Long story short it’d be good if you changed the name of this post as it is very misleading and ends up confusing a lot of people, this way you would avoid fruitless arguments about what you “actually meant” and “what this actually is”.

3 Likes

Regardless of your opinions on the subject, my original example was an AI. It was doing something relatively interesting. The code example I shared can recognize who it’s talking to, it has a emotional state based on an evaluation of each personalities dataset, these work independently. But I provided my chatmodule to use in conjunction with this module to use the emotional output. I will reiterate the open source module I provided and you can skip to the bottom and see that it currently can provide over 4 million different outputs. without considering the names of the objects which makes that an infinite number. That is the module alone, furthermore, these are use to do a database specific search query. The winning observation entry is then connected to a database entry related to that observation. This output is combined with a third result by also doing a search query of the wisdom and databases.

If you were to train an AI model on ROBLOX it would have to be smaller so training a language model locally wouldn’t work unless you had a bunch of supporting systems in place. You can host your own fine-tuned model, but you would still incurr server costs, this minimizes server costs and API calls.

Also logic is based on if and thens, a neural network with a machine learning algorithm is using a structure that says X(or) which weights the (or) based on the weight of the parameter. to make generalizations, based on the stored weights for the vector matrix it generates from training. This is done by tokening the input/output through its predictive layers which provides the solution to x(or). You saying this is not AI is beyond arrogant when nowhere is a coded machine learning algorithm defined as the only form of AI, since you do not always require a x(or) condition when the solution only requires logic.

  • AI is the field of computer science that aims to create systems that can perform tasks that normally require human intelligence, such as learning, reasoning, problem-solving, etc.
  • Logic is a way of expressing and manipulating information using rules and symbols, such as if, or, and, then, etc.
  • Machine learning is a branch of AI that focuses on creating systems that can earn from data and improve their performance without explicit programming.
  • Neural networks are a type of machine learning model that can learn complex patterns and functions from data by adjusting the weights of their connections.
  • x(or) is a type of logic that assigns a weight to each condition and then combines them using the or operator. For example, x(or) (A, B) means A or B, but with different weights for each condition.
  • Machine learning can be generalized to a condition of x(or) by considering the weights of the neural network as the weights of the conditions, and the output of the neural network as the result of the x(or) operation. For example, if we have a neural network that takes two inputs A and B and outputs C, we can write C = x(or) (A, B), where the weights of A and B are determined by the neural network.

In conclusion this module is about Creating Self Aware AI: (Super Complete) LLM Utility Text Vision Module (Open S

2 Likes

(post deleted by author)

1 Like

Uploaded a New demonstration of using this module to illustrate Candyland :smiley:

The Text Vision is given as part of the system message for the LLM giving it complete immersion into the environment and ticking off a piece of it being conscious and aware.

1 Like

Would not recommend calling it ChatModule seeing as that’s quite similar to Roblox’s name for the chat system, and it also isn’t very descriptive of what this module does.
You could try naming it something that relates to describing surroundings.
(This is supposed to be constructive btw)
EDIT: If you want to get a little less hate for your project, I would avoid using the words ‘conscious’ and ‘aware’ seeing as AI cannot be conscious.

1 Like