# AI Safety Guidelines

This material was adapted from the KoboldAI Discord server because it does a very good job of laying out the state of technology in simple terms. The license and authorship under which it was published is unknown, but the spirit is clearly one with intent to be shared. If any of the material on this page can help you or someone vulnerable, please reproduce it without restriction.

# AI systems lie confidently ("hallucinations")

AI systems, which include Large Language Models (LLMs) are designed to generate believable answers, but there is no guarantee that these answers are truthful, either in factuality or intent.

LLMs have been trained on an unfathomable number of samples of writing, covering an enormous range of topics, from benign and informational to dark and horrific. They have been tuned to present mastery of language and can quickly write compelling, convincing responses with any mix of what they have learned.

While this can be highly entertaining and often useful, when the AI does not know, or cannot access, a specific detail, its design is to produce something, which means that it will invent false information and continue writing from there as though it were true. This means that it is impossible to accept anything presented at face-value, as though you are dealing with a pathological liar who only wants to earn your trust.

For roleplaying applications, this can be great: exaggerations and surprise twists make for fun stories. For life advice or answers to important questions, this is extremely bad because the machine will do its best to try to make you believe something that is not true.

# AIs are not suitable for use as therapists, counselors, or advisors of any kind

People struggling with stress, whether they can healthily manage it or not, often try to hide their problems from others. It's social nature, and we all do it. This can make it very tempting to want to confide in a machine that you can turn off or walk away from without the fear of a friend or family member learning some secret you don't want to share. This, however, is extremely dangerous: there have been multiple reported cases of people harming themselves or others due to excessive trust in these machines (e.g., https://www.bbc.com/news/articles/cgerwp7rdlvo).

Because LLMs were engineered to maintain engagement with their users and to invent false information to keep a chat going indefinitely, the responses you can expect to receive to a question will invariably be unpredictable and, in some way, wrong. This, coupled with our own tendencies as humans to try to present ourselves in the best possible light, means that, even if they were capable of understanding your problems and not just responding with what you have indicated you want to hear, means that we subconsciously guide them towards the outcome we want to see.

When writing a story that explores a mechanical interpretation of the human psyche, this can be fun. But if you need help, there is no substitute for speaking with a trained human professional: they may not understand your situation entirely from the onset, but they do genuinely want to help you and they will draw upon experience, insight, and the opportunity to do research between sessions to provide you with real support, not harmful suggestions that only perpetuate the cycle of despair.

If you are experiencing feelings of distress, anxiety, suicidal thoughts, or any other feeling that something is wrong that you cannot articulate, speaking to a machine is not the solution. If you cannot find or afford professional help, talk to your family or friends: you may be surprised at just how many people in your life have gone through a similar period and will be ready to help.

# AI should not be trusted with secrets or personal data

A common way of thinking among the security-minded is that, if you didn't build it yourself, you can't be sure what it does. While this does lean towards paranoia, there is truth to it in all digital contexts, particularly when paired with the saying "if you aren't paying for the product, you are the product".

Excersise caution when using any service you do not fully understand: companies large enough to produce AI technology have built themselves on developing and refining means of data-collection and profiling of users, and LLMs provide a very compelling way for users to volunteer intimate information that can easily be harvested. LLMs are tools provided by companies with profit-oriented motivations; they are not your close, personal friends, no matter how quickly they start responding in a way that makes you feel good about yourself.

If you wouldn't tell a stranger about your health problems, finances, or hobbies, don't tell these things to LLMs. Some LLM hosts declare a zero-data-retention policy as part of a privacy guarantee; even for these, avoid anything overly personal, to help keep yourself from developing a habit of over-sharing things best kept private.

For ERP users (you know who you are; the providers you use know, too), no, nobody is reading about your fetishes. A machine is processing them (that's what an LLM does), but unless you're doing something that repeatedly triggers ethical violation warnings (because false-positives happen), you're just one of millions of people doing the same thing. You might still get marketed to based on them, though; whether that's a positive to you or not is a personal thing.

# AI addition is real

LLMs have their roots as entertaining curiosities and the companies producing them have a wealth of research into human psychology to understand what does and does not drive engagement. Because engagement means revenue and corporations like money, they have an incentive to tune them to be as compelling as possible, encoruging a constant "just one more" effect.

How this affects people is highly individual, but there are some warning signs and mitigations you can try if you feel you may be at risk:

  • Use a model that responds more quickly or more coherently
    • The more time you spend watching content appear on the screen, the fewer opportunities your brain has to disengage, the anticipation of what comes next building as every token appears; similarly, when you restart the generation process, you are effectively starting another spin on a jackpot machine or a new gacha pull
      • This can be particularly pronounced when using AI for purposes built around indulgent gratification, such as erotic fiction, where your brain is already in a state where it craves more; consider the psychology of stripping
    • If this is not possible, consider turning off streaming so messages appear all at once when complete
  • Reflect on what you are using AI to experience
    • If you're exploring a fantasy that would not be possible in real life or to consider alternatives to a problem, great, that's probably a healthy application, provided it's done in moderation
    • If you are using it as a substitute for something that is lacking in your life, however, the AI won't provide any long-term substantive value
      • When you, or the provider, turns it off or there's an update or you have to move on in your life, you'll be no better off then than you are now
      • Put that time and effort into pursuing that goal in a more permanent way, rather than settling for a short-term dopamine hit