A step-by-step guide to creating highly tailored LLM responses in your automations.
Imagine you have a super-smart robot (that's your LLM in LM Studio). This robot can talk like anyone – a pirate, a scientist, or just a helpful friend.
We're using n8n (which is like a digital LEGO set for connecting apps) to tell the robot WHO to be and WHAT to talk about.
That's it! We're just making n8n the boss that tells our smart robot how to act for different questions.
Here's how your request travels through the system:
Welcome! If you're here, you already know the power of n8n for workflow automation and the incredible potential of Large Language Models (LLMs). This tutorial focuses on a particularly potent combination: using n8n to dynamically instruct a locally-run LLM (via LM Studio) with different "personas" or system prompts to achieve highly specific and context-aware responses.
Imagine asking the same LLM to be a formal business analyst for one task, a witty creative writer for another, and a precise technical documenter for a third – all within different n8n workflows, or even dynamically within the same workflow! This is what we'll build.
Key Idea: By changing the system_prompt
sent to your LLM, you can dramatically alter its tone, style, knowledge focus, and overall behavior, making a single model incredibly versatile.
http://localhost:1234/v1/chat/completions
).We'll create an n8n workflow that can take a user's query and a "persona" identifier. Based on this identifier, it will fetch a specific system prompt and use it when querying your LLM through LM Studio.
For this tutorial, we'll manage our personas and system prompts directly within an n8n Function node for simplicity. In a production scenario, you might use a database, Google Sheet, or a dedicated API for this.
Let's start with a Webhook trigger so we can easily send data to our workflow.
{ "user_query": "Explain quantum computing in simple terms.", "persona_id": "eli5_explainer", "session_id": "user123_chat789", // Add for potential advanced integration "user_id": "user123" // Add for potential advanced integration }
Note: session_id
and user_id
are shown for forward compatibility with the advanced Postgres integration discussed later.
[Imagine an image here showing the n8n Webhook node configuration panel.]
This node will store our personas and their corresponding system prompts. It will also select the correct system prompt based on the persona_id
from the Webhook.
// Define our personas and their system prompts const personas = { "eli5_explainer": { "name": "ELI5 Explainer", "system_prompt": "You are an expert at explaining complex topics in a very simple way, as if explaining to a 5-year-old. Use analogies and avoid jargon." }, "formal_analyst": { "name": "Formal Business Analyst", "system_prompt": "You are a professional business analyst. Provide concise, data-driven, and formal responses. Use bullet points for key takeaways." }, "creative_writer": { "name": "Witty Creative Writer", "system_prompt": "You are a witty and creative writer. Your responses should be engaging, imaginative, and perhaps a little humorous. Feel free to use storytelling." }, "default": { "name": "Helpful Assistant", "system_prompt": "You are a helpful AI assistant. Provide clear and accurate information." } }; // Get data from the input (Webhook) const inputData = items[0].json; const personaId = inputData.persona_id; const userQuery = inputData.user_query; // Select the persona, or use default if not found let selectedPersona = personas[personaId] || personas["default"]; // Prepare the data for the next node (HTTP Request to LLM) // Pass through session_id and user_id if they exist for advanced integrations return [{ json: { user_query: userQuery, system_prompt: selectedPersona.system_prompt, persona_name: selectedPersona.name, session_id: inputData.session_id, user_id: inputData.user_id } }];
[Imagine an image here showing the n8n Function node with the code.]
This is where we send the user's query and the selected system prompt to your LLM running in LM Studio.
POST
http://localhost:1234/v1/chat/completions
). Replace if yours is different.true
JSON
{ "model": "loaded-model-name", // IMPORTANT: Replace with the actual model identifier from LM Studio "messages": [ { "role": "system", "content": "{{ $json.system_prompt }}" // From our Function node }, { "role": "user", "content": "{{ $json.user_query }}" // From our Function node } ], "temperature": 0.7, // Adjust as needed (can also be set dynamically) "max_tokens": 500, // Adjust as needed (can also be set dynamically) // "stream": false, // Set to true if you want to stream responses // "stop": ["\nUser:", "###"] // Optional stop strings }
Important: In the JSON body above, replace "loaded-model-name"
with the actual name or identifier of the model you have loaded in LM Studio (e.g., "local-model"
or the specific GGUF file name if that's what your server expects). You might need to check your LM Studio server logs or documentation for the exact model identifier it uses in the API.
The {{ $json.system_prompt }}
and {{ $json.user_query }}
are expressions that pull data from the previous Function node. Parameters like temperature
, max_tokens
, and stop
strings can also be dynamically set by n8n based on the selected persona or task type if you add them to the output of the Function node.
None
for local LM Studio, but adjust if you've set up API keys.[Imagine an image here showing the n8n HTTP Request node configuration.]
The LLM response will come back in a JSON format. We need to extract the actual message content.
llm_response
{{ $json.choices[0].message.content }}
persona_used
{{ $item(0).$node["Function"].json.persona_name }}
(Adjust node name if different)Inspect your data! After running a test, always inspect the output of the HTTP Request node in n8n to confirm the correct path to the LLM's message content. It might be nested differently.
[Imagine an image here showing the n8n Set node configuration.]
If you want to send the LLM's response back to whatever triggered the webhook:
llm_response
and perhaps persona_used
. Example Body:
{ "personaResponse": "{{ $json.llm_response }}", "personaUsed": "{{ $json.persona_used }}" }
[Imagine an image here showing the n8n Respond to Webhook node.]
curl
to send a POST request to your n8n Webhook Test URL with JSON data like:
curl -X POST -H "Content-Type: application/json" \ -d '{ "user_query": "What are the benefits of n8n?", "persona_id": "formal_analyst", "session_id": "chat1", "user_id": "dev_user" }' \ YOUR_N8N_WEBHOOK_TEST_URL
YOUR_N8N_WEBHOOK_TEST_URL
with your actual URL.
persona_id
values!To get the best out of your local LLMs when driven by n8n, consider how you name your models in LM Studio and how you configure their settings for different types of tasks. This allows n8n to call the right model with the right parameters for optimal performance and response quality.
Clear naming is crucial, especially when you begin to manage multiple models, potentially specialized for different tasks or response lengths. A good convention helps you (and your n8n workflows, if you dynamically specify model names) instantly identify a model's purpose and characteristics.
Consider a structure like: [Purpose/Task]_[BaseModelFamily]_[Size/Quantization]_[Optional:Version/FineTuneID]
QuickAnswer
, ContentSummarizer
, ReportGenerator
, Sentiment
.Mistral
, Llama3
, Phi3
, Gemma
, Mixtral
.7B-Q4_K_M
, Mini-BF16
, 8x7B-Q5_0
. This indicates the model's parameter count and the quantization method used, which affects performance and resource usage.v0.2
, FinetuneCorpusX
. Useful for tracking iterations or specific fine-tuned versions.Examples:
QuickAnswer_Mistral-7B_Q4_K_M
SentimentAnalysis_Phi3-Mini_BF16_v1.1
CreativeWriter-BlogDraft_Solar-10.7B_Q5_K_S
ReportGenerator_Mixtral-8x7B_Q4_0_FinetuneLegalV2
This structured naming makes it easier for your n8n workflow to potentially select different models for different tasks by constructing or looking up these names.
The settings you configure in LM Studio (some at model load, many per API call which n8n can control) significantly impact the LLM's response. Tailor them to the desired output length and style.
Remember: These are starting points! The ideal settings depend heavily on the specific LLM you are using and your exact use case. Always experiment! Your n8n workflow can dynamically pass many of these parameters (like temperature
, max_tokens
, stop
strings) in the API call to LM Studio.
Feature/Setting | Short Requests (e.g., quick facts, commands, classification) |
Medium Requests (e.g., summaries, short explanations, creative snippets) |
Long Requests (e.g., reports, detailed analysis, story generation) |
---|---|---|---|
Primary Goal | Speed, accuracy, conciseness | Balance of detail & creativity, coherence | Depth, comprehensiveness, sustained creativity/logic |
System Prompt | Highly directive, focused on the specific task. E.g., "You are an AI that answers in one sentence." or "Classify sentiment as POSITIVE, NEGATIVE, or NEUTRAL. Output only the class name." | Clear role, desired style, and output format. E.g., "You are a helpful explainer. Explain the following concept clearly and concisely using an analogy." | Detailed persona, context, desired structure, and constraints. E.g., "You are a historian writing a chapter on X. Cover points A, B, C. Maintain a formal tone and provide citations." |
Stop Strings / Additional Stop Strings | Crucial. Use specific phrases, sentence-ending punctuation (like . or ! followed by a newline if appropriate), or even single newlines (\n ) for extremely short, predictable output. Essential for tasks like "YES/NO" answers or single-word classifications. |
Useful for defining section breaks, ensuring a concluding phrase, or preventing rambling. Can help structure medium-length content. | Less critical for the overall end of the output but can be used to signal transitions between major sections if the model is part of a larger generation chain. |
Reasoning Section Parsing (Start/End Strings) e.g., <think>...</think> |
Usually not needed unless the short task is surprisingly complex and you want to see the "thought process" of a very small, efficient model. | Can be useful if the model is designed to output its thought process (chain-of-thought) before the final answer. This helps in debugging or understanding its logic for moderately complex tasks. | Very helpful for complex tasks to understand the model's reasoning path, especially if you're chaining LLM calls, need to verify its approach, or if the task involves multi-step reasoning. |
Temperature |
Low (e.g., 0.1 - 0.4 ). For deterministic, factual, and consistent answers. |
Medium (e.g., 0.5 - 0.8 ). Allows for some creativity and variation while maintaining coherence. |
Medium to High (e.g., 0.7 - 1.0+ ). Encourages more creative, diverse, and less predictable output. Adjust based on how "exploratory" you want the generation. |
max_tokens / Limit Response Length |
Very Low. Set aggressively (e.g., 10-50 tokens) to ensure brevity and speed. |
Moderate. Enough for the explanation/summary but prevents excessive length (e.g., 150-500 tokens). |
High. Allow ample space (e.g., 1000-4000+ tokens, up to the model's context limit and your needs). |
Context Length (Model Load Setting) |
Can be lower if interactions are stateless. If any follow-up or minor context is needed, ensure it's adequate for that. | Should be sufficient to hold the current interaction and some recent history if part of a conversation or multi-turn task. | Maximize this (within reason for your hardware performance) to allow for long-form generation and remembering earlier parts of the text or provided documents. |
Repeat Penalty |
1.0 - 1.1 . Good to prevent simple token repetition even in short outputs. |
1.1 - 1.2 . Helps keep text fresh and varied. |
1.1 - 1.2 . Important for long text to avoid sounding monotonous or getting stuck in loops. |
Top K Sampling |
Higher (e.g., 40-50 ) or even off if Temperature is very low. Focuses on the most probable words. |
Moderate (e.g., 40 ). Balances predictability and variety. |
Lower (e.g., 20-40 ) or adjust with Top P . Allows for more diverse word choices in creative tasks. |
Top P Sampling |
Higher (e.g., 0.9 - 0.95 ). Narrows the field of likely words, good for factual recall. |
Moderate (e.g., 0.9 ). A common default that works well. |
Lower (e.g., 0.7-0.9 ) if you want more surprising word choices with higher temperatures, or higher to stay focused. |
By fine-tuning these settings in LM Studio (many of which can be passed via the API from n8n for each call), you can significantly enhance the performance and relevance of the LLM responses within your automated workflows.
persona_id
or the nature of the user_query
.While dynamic personas are powerful, an AI agent becomes truly intelligent when it can remember past interactions and access relevant external data. Integrating a Postgres database allows your LM Studio-powered agent (orchestrated by n8n) to achieve this.
Heads up! This section describes a more complex setup involving database interactions and potentially specific n8n LangChain nodes. Ensure you have your Postgres database ready and the necessary n8n credentials configured.
query
, user_id
, request_id
, and crucially, session_id
.@n8n/n8n-nodes-langchain.memoryPostgresChat
): To load and save conversation history.n8n-nodes-base.postgresTool
): To define functions the AI agent can use to query your database.n8n-nodes-base.postgres
): For general database operations like logging.The following steps outline how to modify your n8n workflow. You'll likely be using an "AI Agent" type node from n8n's LangChain collection (or a similar agent-focused node) instead of just a simple HTTP Request node for this advanced setup, as agents are designed to use memory and tools.
If you are using a dedicated AI Agent node in n8n:
{{ $('Prep Input Fields').item.json.query }}
) is correctly mapped to the primary input/prompt field of your AI Agent node.This gives your agent conversation history.
@n8n/n8n-nodes-langchain.memoryPostgresChat
node. Typically, this comes after "Prep Input Fields."{{ $('Prep Input Fields').item.json.session_id }}
.{{ $('Prep Input Fields').item.json.user_id }}
.chat_history
). Ensure this table exists with columns like session_id
(TEXT or VARCHAR), message_type
(TEXT, e.g., 'human', 'ai'), content
(TEXT), timestamp
(TIMESTAMPZ).[Imagine an image here showing the n8n Postgres Chat Memory node configuration.]
These are functions your AI Agent can call to query your database.
n8n-nodes-base.postgresTool
node. These nodes are made available to the agent but might not be in the direct execution flow unless called by the agent.search_product_docs
, get_customer_details
).{{ $fromAI('parameter_name') }}
where parameter_name
is what you'veinstructed the LLM to provide in the tool description.
-- Example for search_product_docs: SELECT document_text FROM product_manuals WHERE content @@ to_tsquery('english', {{ $fromAI('search_query') }});
Note: The exact syntax for $fromAI
might vary based on the specific n8n agent node. Check its documentation.
postgresTool
node to the "Tools" (or similarly named) input of your AI Agent node. An agent can be equipped with multiple tools.[Imagine an image here showing an n8n Postgres Tool node configuration.]
The AI Agent needs to understand how to use the memory and tools effectively.
// Example System Message Snippet for AI Agent: You are a helpful AI assistant for our company. Use the provided chat history to understand the context of the conversation. You have access to the following tools: - 'search_product_docs': Use this to search our product documentation when the user asks about product features or troubleshooting. Input should be a JSON object like {"search_query": "search terms"}. - 'get_customer_details': Use this to fetch customer information if you have a customer ID. Input should be {"customer_id": "ID"}. Think step by step. If you need information from the database, use the appropriate tool. If you can answer from the conversation history or general knowledge, do so.
After the agent generates a response, you might want to log the complete exchange.
n8n-nodes-base.postgres
node. Place it after the AI Agent node (or after your "Prep Output Fields" node).chat_history
table or a separate audit log table).session_id
: {{ $('Prep Input Fields').item.json.session_id }}
user_id
: {{ $('Prep Input Fields').item.json.user_id }}
user_message
: {{ $('Prep Input Fields').item.json.query }}
ai_response
: {{ $('AI Agent').item.json.output }}
(Adjust path based on your AI Agent node's output structure).timestamp
: {{ $now.toISO() }}
[Imagine an image here showing the n8n Postgres node for logging.]
1. User sends message (triggers n8n Webhook). Input: { query, persona_id, session_id, user_id } | V 2. Prep Input Fields (n8n node - extracts and prepares data) Output: { query, session_id, user_id, persona_id, ... } | V 3. Postgres Chat Memory (n8n LangChain node - loads history for session_id) Input: session_id Output: chat_history_object (to be fed into AI Agent) | V 4. AI Agent (n8n LangChain Agent node - powered by LM Studio via HTTP) Inputs: - Current User Query (from Prep Input Fields) - System Prompt (dynamically set, possibly based on persona_id) - Chat History (from Postgres Chat Memory node) - Available Tools (PostgresTool nodes connected to its 'Tools' input) Process: - LLM (in LM Studio) processes inputs. - Decides if it needs to use a tool (e.g., search_product_docs). - If yes: - AI Agent triggers the specific postgresTool. - postgresTool queries Postgres with parameters from LLM. - postgresTool returns data to AI Agent. - LLM formulates final response using all available info. Output: { ai_response_text, any_tool_calls_made, ... } | V 5. Prep Output Fields (n8n Set node - structures final response) Input: ai_response_text Output: { personaResponse: ai_response_text, ... } | V (Optional Path) 6. Postgres Logging Node (n8n node - logs interaction) Input: session_id, user_query, ai_response_text, timestamp Action: Inserts log into Postgres table. | V 7. Respond to Webhook (n8n node - sends response back to user) Input: { personaResponse, ... }
By implementing these steps, your n8n workflow will leverage Postgres for persistent memory and dynamic data retrieval, making your LM Studio-powered agent significantly more context-aware and capable. Remember to test each component incrementally and consult the documentation for the specific n8n LangChain or agent nodes you choose to use, as their exact configuration details can vary.