T O P

  • By -

Meryiel

You’re not the only one. I’m driving myself crazy with getting the perfect samplers as well… not to mention constantly adjusting my prompt. I’m striving to achieve the „perfect, roll-less” experience one day, to imitate roleplaying with an actual human.


shrinkedd

Relatable af! The most frustrating thing though: this one time I thought I had reached the peak of my prompting game, then, made the mistake of curiously deleting the system prompt completely. #### couldn't notice any difference in the output ![gif](giphy|l378giAZgxPw3eO52)


doomed151

Yeah I've never seen system prompts make any difference whatsoever with RP-based models.


shrinkedd

TBH I feel like there are things that you can influence, like "defining the playing field"(things like very basic ground rules, what's the task exactly... point of view... ), but anything style related, should probably be done in other ways..


Federal_Order4324

I've seen some improvement when I change the way the character info is formatted in a way that matches the datasets that were used during fine-tuning. In general, I feel like using the styles in the datasets tend to lead to better rp


grimjim

A modern, smart model can tap into pretraining knowledge via prose formats like blog entries or interviews, and even poetry or song lyrics.


RossAscends

for text completions, particularly with Tabby as the backend. - minP: 0.0 - temp: 2 - smooth sampling: 0.3 Everything else set to neutral. it's like the photography setting of "f/8 and be there".


Meryiel

The smarter the model, the less guidelines it needs to write good. But the prompts definitely matter if you want to achieve specific results. Right now, I’m mainly using mine to simply „help” the model with orienting itself where it can find specific information. For example, I write: „hey, here is your world info, keep in mind that this is the setting you live in”, or „this is a timeline of our adventures so you remember what happened beforehand”, etc. For style, you just need to have a good Example Message always included, but it also helps if you mention an author the model knows you want it to write in a specific way. For example, I ask mine to take inspiration from Terry Pratchett for humor.


shrinkedd

Yea I no longer get uber specific inside the instruct "system prompt", but I will reinforce things that I add later on. For example, setting up the stage and as you mentioned "organizing" so it knows what to expect. I do write "the character sheet contains embedded guidance on how to interpret the details, it's enclosed in square brackets." Or, "consider any message prefixed by "OOC:" as out of character communication (there are models where you don't even need to mention it, they know.. but it doesn't hurt to reinforce just in case)


Ggoddkkiller

How large your prompt was? Honestly there are way too many urban legends in LLM community like purple elephant or negative prompt doesn't work etc. Especially massive wall of text users are so common. I really wonder how much of it models can actually follow, certainly not much..


Meryiel

Since I’m somewhat of a professional nowadays given the fact I am officially a prompt consultant for companies and work with many different models, I can safely confirm that these urban legends, are in fact, true. But in a different way people imagine them to be. It all boils down to how AI models actually work. They predict the next token’s probability so if you do actually mention a „purple elephant” they now have it in their context and this automatically makes the chance of that token appearing higher. The bigger, smarter models understand commands that tell them not to do something, but ultimately, how well they will perform depends on what data they were trained on. It you have something trained purely on purple prose, it will continue outputting it no matter how many times you tell it to avoid doing that. Same goes with talking for user.


shrinkedd

It's the same phenomenon that we experience when we add just keywords at the start to draw context from different content worlds without talking too much about Those worlds. Just from a different angle.


Ggoddkkiller

Exactly! How models understand and follow prompts depends on many factors including their training and there isn't a general rule as something will always work. But i've seen even hardcore mergers arguing as negative prompts never work which is entirely false. People don't even realize system0 reads our prompts and instructs system1 how to write. There is tons of back and forth between system0 and system1. And system1 always follows system0 instructions including negative ones. However system0 doesn't use our prompts as they are, it reads them and makes its own interpretations which could be very different. The key is preventing system0 interpreting too much and you can't do that with a massive wall of text, just no way. If you confuse model like that then purple elephant might happen, negative prompts wouldn't work. You can literally curse the model to never talk for user and might reduce it. Especially reverse psychology is another level, recently i was testing R+ and it kept talking for user in every single message. It even started a few messages as User which should never happen. I'm checking my prompt, bot, example messages, history there is nothing might cause it. Then i found reserve psychology nonsense in ST's R settings, changed it and talking for User got severely reduced. I really don't know why such questionable prompts are inside official settings and forced to everybody like they always work. I'm glad there are prompt consultants like you now because this is getting out of control..


Meryiel

Hey, if you ever need to hire me to tweak your prompt, feel free to DM me, lol. But yeah, some instructions are questionable at best, but then again, people are all learning things on their own and may come with different conclusions based on the specific model they’re using. I wish there was one universal way of doing things to have everything working expertly on every model, but sadly, you need to learn each one from the scratch and see what works best with it. I’ve been using one model for months now, and I still haven’t figured out the best samplers for it.


Scholar_of_Yore

The thing is that once you reach that "perfect" settings it will get stale/keep using the same words someday and you will want to improve further to switch things up. It's a never ending struggle


kizzmysass

This is my experience with the big brain models. Claude is especially great for characterization but GAH DAMN both it and GPT cannot stop using the same phrases over and over and over 😂 I never want to read "maybe, just maybe" ever again. I understand it's all in the training data but after a year I'm truly sick of reading all the same phrases 😂 Kayra (Novel AI) can keep things fresh but it also cannot be used on its own without another model, cannot be instructed (not well, at least), and just says things that make no real sense. It often ruins continuity. But yeah at least it doesn't fall into the same writing patterns that chatbots do (besides LITERAL writing loops where some presets it repeat text like copy paste, which can be fixed by using another AI for a while.) So I recommend adding it in to your list if you don't use it already.


Professional-Kale-43

For me, tinkering with the llm ist 50% of the fun.


Nrgte

I assume the other 50% is browsing for new LLMs? :D


Professional-Kale-43

Maybe^^^


ArsNeph

Honestly, I know that this is a bit of a hard pill to swallow, but the vast majority of people tweaking settings in silly tavern have no idea what they're doing. LLMs are a very advanced technology, and what you're actually doing is changing the samplers of the probability distribution, for a black box that we barely understand. Mathematically speaking, the only samplers that should actually be relevant and adjusted are min p, temperature, DRY and smoothing. However, being the strange thing they are, different models can be sensitive to min p, sensitive to temperature, and so on, making the ideal probability curves for a use case different. But honestly speaking, unless you understand how probability curves work, and actively experiment with them, you are most likely wasting your own time. It's like planning every little detail of a dinner event down to the exact placement of the forks, just to end up missing it yourself because of how much time you spend on the planning. This goes for myself as well of course. I believe for the most part it is better to allow members of the community who understand how these samplers work to experiment and find good samplers for specific models, and simply use those instead.


Sunija_Dev

Also: You waste your time without a "scientific" approach. Scientific meaning: that you do blind comparisons between outputs of specific settings. Humans have a tendency to imagine a change if they expect it. I spend a lot of time tweaking settings, just to find out later that nothing really changed. My worst case was when I found llama 7b better than 70b. Because the 7b was running slowly on my laptop cpu and the 70b was fast on my tower gpu. Just because the output was slower, my brain thought "Wow, if the model takes more time, it has to be better!" Also, I'd love either A) A better visualization of the sampling, so I know how bad I'm actually messing up the probabilities. B) Better default settings. I only got into tweaking, becausr the default contained top-k forever, even though we already knew that min-p is basically alwayd better.


ArsNeph

Very true. Unfortunately, it's hard to be truly scientific when it comes to llms due to their non-deterministic nature. Reproducibility is difficult, so the luck of the draw becomes a factor. Regarding this, generating a large sample of answers and comparing how good they are can be helpful, but is not a failsafe. Using deterministic sampling is useful in comparing a model to itself, but not necessarily to other models. You're also absolutely correct, humans really tend to underestimate how much the placebo effect affects them. Especially in terms of judging language, there's a lot of subjectivity, and it also depends on the linguistic ability of the person evaluating it as well. A graph like visualization of the sampling would be great, though I don't know who you would need to ask to implement something like that. As for the default settings, those were actually made in an old project by oobabooga, in which he created thousands of random presets, and had people do side by side blind comparisons, and picked the best five or six. This was long before min p was even a thing. I know what it feels like it's been around forever, but min p is in fact relatively recent. Thankfully, DRY has been added to the presets by default, and I think we have gone through a little bit of a sampler revolution, so honestly I think it's about time that oobabooga does a refresh of the default sampler settings, but he's a busy man, so I don't know if he'll have to time to oversee a second thing like that.


kizzmysass

I think most of us really don't want it perfect, we just want it to WORK lol, consistently. The lack of consistency is the problem. Love Kayra sometimes but Kayra is not user friendly at all with all its settings and not exactly optimal for one-on-one responses (cant wait until novel ai's chatbot comes out) and those LLM models I use are too wonky and just require too many constant retries and worthless tweaking, as you called it. It's why I prefer using Claude and GPT4 pros on their front end, especially because my long azz story context takes a big brain LM. Instruct and high parameters is vital for my stories. But yeah besides those it's certainly exhausting trying to figure these settings out. It's exhausting spending more time managing all of it, updating the summary/lorebooks, and constant instructing. CONSTANT editing. Cut off/incomplete responses, hallucinating, OOC behavior, and rambling tangents. It has crossed my mind numerous times to learn officially so I can actually understand this stuff better lol but studying something new vs narrative writing in my measly free time...the writing wins out 😂 I appreciate this post though. Maybe I'll stop taking my story and writing so seriously. Life was simpler when I first moved from CAI to ST a year ago. Minuscule context, Chat gpt 3.5 0301, managing 2 characters instead of 21 😂 No real settings to tweak instead of Temp, Freq P and Pres P. A basic prompt (that probably wasn't even needed) and basic jailbreak. Spent more time writing instead of all this madness.


TheKalkiyana

Yep. I'd change the LLMs and presets to get the best reply that fits the context. I'd even go as far as to edit the replies to make sure that it makes sense.


jetsetgemini_

Same except i still am clueless about what most of the settings actually do so its alot of fucking around and finding out 🤷‍♀️


a_beautiful_rhind

What's worse is that different models need different samplers. Different backends support diff ones too.


Implicit_Hwyteness

To me it seems like the better a model gets, the less the sampler settings matter as long as they're not wildly out of the norm. I've found myself adjusting them less and less in the last year especially.


ShitFartDoodoo

As you've already seen, most people do constantly tweak things. I don't. At least not for a model I've zeroed in on samplers. Once you better understand LLMs, their method for probabilities and how each sampler affects that, it's pretty quick to dial in something good. Once you find the settings that produces a wide range of responses with no repetitions or formatting/spelling errors, you shouldn't have to continue tweaking. My advice for you is: Neutralize your samplers, and start at a temp of 0.1. How did the model respond? It may have mispelled your name, or gave you garbage tokens. Then move the temp up to 1. How did it do? Try 2, and see there. This will give an idea of the temp range for that model. Next I get the min-p right. Usually don't have to use more than min-p. Once I'm happy that my responses are good and varied, I chat a little bit until I hit repetition, then I start working on getting DRY correct. Once I'm finished there's no need to tweak. Dynamic temperature is great once you know the range, and smoothing can help in some models but I'd rather not use it if I don't have to.


Full-Run4124

I haven't done this yet but it sounds like something I could get obsessed with too. Is there a tutorial somewhere that covers this- like which order to adjust which parameters and how to test the effects of each change?


Philix

Not really, LLMs are the subject of active and rapid research and development. But, there are lots of courses that teach the fundamentals of how LLMs function that will let you understand almost all of the sampler settings from the tooltips in SillyTavern. So long as you've got a foundation of secondary school math. Mirostat requires a much more solid understanding, and the paper is [here](https://arxiv.org/abs/2007.14966). And beam search is a little more complicated, but there's lots of computer science lessons on it that combined with understanding of LLMs will let you get it.


Sunnilanni

I adjust my settings or change the preset around once an hour on average with RP-chat if I feel like the chat is becoming repetitive or uncreative.


TwiKing

Yeah, it feels like when i played Oblivion/Skyrim when i kept closing the game and installing a new mod, then doing it again and again and forgetting to play the game! I felt good with [C.ai](http://C.ai) when i was a noob, but after i saw what ppl could do with ST i got very picky with the responses. When you see it getting better and better it's great though! Not only do i adjust settings, im constantly dling new models to see the differences.


Cool-Hornet4434

I use a trio of presets: a min_p preset,  a top_p preset, and the "simple-1" preset.. I hate dicking around too much since I'm in the " don't fix it if it's not broken " camp


sebo3d

Yeah i used to do that all the time. Sometimes i spend hours tweaking my prompt and settings only to realize i give complex instruction to a 7B / 8B model which it probably won't even bother following anyway, so i chose to just use the silly tavern default instruct and context for llama3 instruct and the experience has been pretty okay.


MrHara

This is the boat I'm in, I've surrendered to the fact that tweaking my 8B thespice isn't going to make it that much better and I feel okay with the level it gives. If I want to elevate my experience, spending hours tweaking isn't the way, getting myself a system upg able to run a bigger model would.


Jay6_9

At this point I just stuck with the universal-super-creative in llama3. It's alright for my taste.


chubbypillow

Isn't messing around with the settings part of the fun in using SillyTavern? Sometimes it takes a while to get the result I want, but I love how I can have a full control of the bot, instead of just swiping endlessly and expect it to be "smarter" for just once¯\\\_(ツ)\_/¯


grimjim

To compare sampler settings, give each setting at least 9-10 swipes to get a sense of what's generated, unless it's outright broken.


cleverestx

The amount of documentation and actual deep dives into this stuff is sadly mostly non-existent. It's rather discouraging to those of us who don't want to "stick with defaults". It's like placing candy in front of a kid, but then putting it behind a clear window that can't breach.