The Data Analyst’s LLM Masterclass (Part 1): How to Write Better LLM Prompts

Share

Date: November 5, 2025

filed in: AI, Analysis

Doubtless you’re using LLMs like ChatGPT, Gemini, and Claude. But you probably have a nagging feeling that you’re not using them to their full potential. When you use an LLM for a real analytical task, I’ll bet the results are frustrating. You ask for an in-depth analysis; you get a generic summary. You ask for a Python script; you get a buggy file. These experiences confirm your fear: You’re missing out on the LLM’s real power.

The problem isn’t the tool. It’s your approach. You’re asking the LLM for output. You’re not directing it. We’re going to fix that, starting today.

This post is the first in a new six-part series: The Data Analyst’s LLM Masterclass. Today, we lay the foundation. I’m giving you the three-part framework for building powerful analytical prompts. This framework is the key.

The Core Concept

You hear the term “Prompt Engineering” in tech circles. I have a problem with this term. It implies a specialized, technical skill. It suggests a separate class of operator. This is gatekeeping. It’s wrong.

The truth is simple. You use an LLM. You write prompts. By definition, you are already a “Prompt Engineer.” Calling it engineering is as silly as calling someone who writes an Excel formula a “Spreadsheet Engineer.”

So, congratulations, you’re in the club. But just because you can do it doesn’t mean you’re good at it.

To get good, you must understand the tool. An LLM is a prediction engine. Your job isn’t to ask it a question. Your job is to set up the problem so the LLM can predict the correct answer.

This is true whether the answer is a block of code or a narrative summary.

Think of it this way.

A novice analyst connects to a data warehouse and asks: “What’s going on with our declining sales?” The warehouse returns a stack of unrelated tables. The output answers the lazy question but fails the analyst’s intent. That’s a bad prompt. It’s lazy and lacks direction.

A professional analyst does it differently. They issue a directive:

“Act as a forensic accountant. Your task is to identify the three most likely drivers for our recent sales decline. Analyze our weekly sales and our customer segment performance data. For each driver you find, explain how it contributed to our sales decline and provide your supporting evidence.”

You see the difference? This is a powerful prompt. It defines a persona. It provides context. It sets a crystal-clear goal.

You’re not just asking for information. You are leading an investigation. This prompt transforms the AI from a passive data provider into an active, intelligent partner. Prompting is a structured analytical process, not random artistry. To get consistent, high-value output, you need discipline.

The Strategic Framework

I call this discipline the “Command Framework.” It has three pillars.

First, Provide the Context.

The AI knows nothing about your specific problem. You must give it intelligence. This means two things: role and context. The role defines its expertise: “Act as a senior data analyst specializing in e-commerce subscription models.”

The role is your creative lever. It points the AI to the section of its training data most relevant to your needs. Don’t use vague roles like “dat analyst.” Instead, try “Act as a skeptical CFO” to produce a risk-focused analysis. This is the art of framing. The context provides your facts. This includes your target audience, the structure of your data, and key business rules.

Second, Define the Goal.

You must list a clear, explicit goal. Use a precise action verb. Don’t say, “Help me with this data.” Say, “Analyze this dataframe…”

Then, state the business objective. The task—”Analyze this dataframe”—is the science. The purpose—”…to identify customers at high risk of churn so we can proactively help them”—is the art.

That “why” is what separates an ineffective prompt from an effective one; and a poor analyst from a great one.

Third, Set the Rules.

You control the output. Be explicit. Demand a specific structure:

“Output this result as a JSON object.”

“Format this as a CSV file.”

“Write in a friendly tone for a non-technical audience.”

Just as important, use negative constraints. Tell the AI what not to do: “Do not use any libraries outside the standard Python installation.” “Exclude customers who signed up in the last 30 days.” Setting these rules ensures the output is not just correct, but immediately usable.

The Analyst’s Playbook

This framework is the theory. Now, here is the most effective way to put it into practice: XML-style tags.

LLMs were trained on the entire internet, which is full of highly structured code like HTML and XML. These models are innately built to understand that content inside a tag is a specific, self-contained block of information.

When you write a prompt as one large paragraph, the AI has to guess where your role definition ends and your data schema begins. When you use tags, you are drawing bright, clear lines. You are showing it the precise blueprint of your request. This is the single most effective way to remove guesswork and get a reliable, consistent output.

Your playbook is to map your tags directly to the three-pillar framework:

For Pillar 1 (Context), use two tags: <role> and <context>. The <role> tag is where you input your “Act as” command:

<role>

Act as an expert R developer specializing in data wrangling with dplyr.

</role>

The <context> tag is where you paste your dataframe schema, audience description, and other facts:

<context>

The target audience is the segment labeled as ’03’ in the column named ‘segment’

</context>

For Pillar 2 (Goal), use the single most important tag: <instructions>. This is your mission. This is your primary command:

<instructions>

Analyze the provided customer_churn_data to identify the top 3 statistically significant features that predict churn. Your objective is to output your findings as a summary report.

</instructions>

For Pillar 3 (Rules), use two tags: <constraints> and <examples>. The <constraints> tag is for all your rules:

<constraints>

Only use libraries X and Y.

The output must be in JSON.

Do not use a for-loop.

</constraints>

The <examples> tag is for when “showing” is better than “telling.” When you need a specific output format, you provide a few concrete, in-text examples:

<examples>

Input: ‘U.S.A.’ -> Output: ‘USA’

Input: ‘CA’ -> Output: ‘CAN’

</examples>

These five tags create a perfectly structured analytical command. You remove all ambiguity. You tell the AI exactly how to process every part of your request. This is how you move from asking to directing.

Here are two pro-tactics to use with this playbook immediately:

  • Build Your Role Library: Stop starting from scratch. Keep a document with pre-built personas (<role> tags) for your common tasks.
  • Provide Data Schemas: This is the most critical tactic for an analyst. Never ask an LLM to work with data it cannot see. For any dataframe, paste the column names, their data types, and the first few rows into the <context> tag. This gives the AI the blueprint it needs to write code that actually works.

Final Thoughts

Artificial intelligence will not replace the data analyst who masters both the art and the science of their craft. It will replace the analyst who masters neither.

The analyst who asks vague questions will get vague answers and fall behind. The analyst who masters the discipline of prompting—who directs the ‘science’ of the AI with the ‘art’ of analytical curiosity—will multiply their output, accelerate their insights, and become indispensable.

Your career trajectory is now a function of your ability to translate business problems into precise analytical commands. Stop asking. Start directing.

Keep Analyzing!

Reply...

Download your comprehensive 6-month roadmap to equip you with the necessary skills and expertise to become a proficient data analyst candidate and succeed in the field.

Getting Your Data Analyst Career Up And Running: Your 6-Month Starter’s Guide

download