I’m sorry I’m going to rant about large language model bots now
Instagram have started creating fake accounts that post entirely machine-generated images and text. They will reply to you if you comment under their posts or direct message them.
Update: Apparently the specific account I was referencing here is an experiment from 2023 and doesn’t post any more but still replies to direct messages. People thought it was new due to some recent announcement about bot accounts and I didn’t dig into it myself. The rest of my frustrations with how people treat these things remains.
I mostly want to talk about stuff I like on here and not just add to endless bitching about bad stuff but I just needed to write out some frustrations I have with how people talk about these kinds of large language model bots.
I have seen people posting their conversations with it and attempting to ask it various things about itself and how it was created to try and dig for information on it or just to create gotcha moments they can screenshot and it makes me want to scream.
I won’t link to any of those for a few reasons, but the one I want to state is the same reason that these screenshots make me want to tear my hair out: Large language models are machines that make up nonsense constantly. The text it generates does not have semantic content and least of all meaningful information about itself. It is not worth looking at specific responses it gives. And so because of that: Stop asking it about itself. It is not capable of knowing anything. It takes a processed set of training data and whatever input it has (its prompting that you are not able to see plus whatever text you send it) and then generates output that is a statistically likely continuation of that.
People are asking it what is the diversity of its team of creators and get a response about how it was mostly white people. It doesn’t know that. Again. It doesn’t know anything. It is incapable of knowledge. But it will generate something that is a likely response to a question of this nature. The training data that was used to create it has many texts about how American software development is very white and it will create a response in this vein.
People have asked it who led the project that created it and it spat out a name and people are trying to dig up information from the Linked In profile of a Facebook employee with a similar name.
People are asking it to repeat its prompt and then analysing its response to make judgements on the people that wrote it. This is at least asking it for something that the model actually has as part of its input so it might result in text that is a copy of or close to things it was actually prompted with. But it mightn’t. It could be anything. People asking bots to repeat their prompt is now an established trope in online writing that is pulled for training data for these models and so its output to this type of question will be influenced by the presence of such texts in its training corpus as well.
There is no way to trick these bots into meaningfully divulging information because facts are not a thing that exist to it. You are at best nudging it in a direction where it is more likely to output text that happens to line up with reality. This is more likely to happen for widely known and repeated information that exists in its training data a lot. If you ask it the capital of France it will probably say Paris. But there is no reason to think that its training data contains any meaningful information about itself and even if it was prompted with information like that what information it was given would be entirely under Instagram’s control and why the fuck would trust Instagram to be honest either?