StringDiff - String Similarity Metric Diffing Algorithm

You can use my StringDiff module that uses my very own diffing algorithm to find the similarity metric / ratio between two sentences.

Works within the confines of Roblox without making external API (HttpService) calls.

If you have the sentences:

  • “Can I have a cheeseburger?”
  • “Can you give me a cheeseburger please?”

It will print a similarity ratio of ~42.655… between those sentences and optionally return the longest match that it got from that:

image

This algorithm has a few use cases, 1 that I can think of right now is for primitive chat bots. If your dataset is small and you have a lookup table that looks something like this:

local commands = {
    ['Hey.'] = "Hey!!",
    ['Hi.'] = "Oh hi!",
    ['Hello.'] = "Hello there.",
    ["How's it going?"] = "Hello there.",
    ['How are you.'] = "I am good.",
    ['Nice to meet you.'] = "It's very nice to meet you as well.",
}

It doesn’t matter if you write “Hey it’s so nice to meet you”, it’s going to response to you correctly with “It’s very nice to meet you as well.”.

The way this would work (I’ve left out the code for this) is to iterate over every key and calculate the similarity ratio between your original input towards each key in the lookup table. This however becomes very performance heavy with large datasets due to the complexity of O(n), so a proper trained language model neural network is advised for that.

If you don’t want to use a language model, you can alternatively use a stemmer library, tokenize the sentence and create lexemes as lookup keys with larger datasets (“Full-Text Search” in database querying works like this and I’m currently working on that as my next project within Roblox), then iterate and apply the ratio calculation on the narrower return queryset to find out a more specific response.

This could be very useful in for example Roblox cafés or restaurants if you want automated order systems if you only have say 10 to 30 orders.

The code is open sourced on github:
splinestein/splinestein-diffing-algorithm: A string similarity metric diffing algorithm invented by splinestein for chat bot use. (github.com)

Here’s a link to the module:
StringDiff - Roblox

How do you use it?

  1. Put the module into ReplicatedStorage.
  2. In your script, require it with: local sdiff = require(game:GetService("ReplicatedStorage"):FindFirstChild("StringDiff"))
  3. Run it with ratio, _ = sdiff.compare("Hey is this working?, "Hey this is working?")
  4. First return value is the ratio from 0 - 100, second optional return value is the longest match.
  5. print(ratio)
9 Likes

I was looking for a way to do this a while back, very neat!

1 Like

I’m nowhere near the level of understanding AI and neural networks, but I don’t think this is the way to go about it. Hardcoding input messages and responses won’t work in the long run, even if it’s just for the conversation starter.

Maybe a topic like this could help? Maybe.
It’s something to do with generating text with AI.

Not in the long run no with large datasets but it is definitely a viable choice if your dataset is small for primitive chat bot use. If you apply FTS to the lookup table for smarter querying it absolutely becomes viable even with larger lookup tables. If you want a real chat bot that can ‘hallucinate’ humanlike responses you would need a language model neural network yes so for that type of use-cases it isn’t viable.

My library works as intended by solely finding the similarity metric between two sentence and for what it is worth, it works better than other diffing algorithms I’ve had the privilege to get my hands on in the confines of Roblox. Perhaps it is a bit misleading but the only use case I could think right out of the blue for this was using it for primitive chat bots. An AI algorithm is definitely the way to go in the long run if your dataset is larger but you need to train it and use a proper language model for that, let alone doing something of that magnitude within Roblox is quite the task (If you don’t want to make API calls). This solution is the way to go if you only have a handful of chat commands, perfect for small cafes with say 10 to 30 menu orders and you want to use it within the confines of Roblox without making requests to an external API.

I left out the part where I show how this could be built to work as a primitive chat bot. You need to iterate over every key in the lookup table and calculate the similarity metric between both of those inputs. This becomes quite performance heavy due to the complexity of O(n) with large lookup tables which is why as my next project right now I am building a stemmer library with tokenization of the sentence, then creating lexemes based on what the commands are for lookup much like how FTS (Full text search) in database querying works.

After you have those lexemes (once I actually finish my next project) you can then further proceed to use my module to find the closest result within that smaller return dataset since it’ll again be like 10 to 30 results depending on how large your queryset is.

I will edit the thread to explain this.