The Data Analyst’s LLM Masterclass (Part 3): How to Analyze Data with an LLM

We are back with Part 3 of The Data Analyst’s LLM Masterclass.

In Part 1, you learned the Command Framework to direct the AI. In Part 2, you learned to treat the LLM as a Data Engineer to build surgical cleaning pipelines.

Now, you have a clean dataset. The typical analyst’s next move is predictable. They open a spreadsheet or launch Posit Cloud. They make a descriptive statistics table. They make a few visuals. They look for the trends they already expect to find.

This is not Exploratory Data Analysis (EDA). This is “Confirmatory Data Analysis.” You are just checking boxes to prove what you already think you know.

When analysts try to use AI to fix this, they make a fatal mistake. They upload their data and ask: “What insights can you find in this data?”

Stop. Do not do this.

An LLM cannot find insights. It does not know your business context, it does not know “truth,” and it is prone to hallucination. If you ask for insights, it will fabricate plausible-sounding narratives based on its training data, not your actual data. It will tell you what you want to hear.

You don’t need an insight generator. You need a Sparring Partner.

The Core Concept

Confirmation bias is the silent killer of analytics. You assume price drives volume, so you plot price vs. volume. You see a correlation, you high-five your boss, and you move on. You miss the fact that the correlation inverts on Tuesdays because of a competitor’s promo schedule.

You missed it because you didn’t look for it.

The LLM’s role in EDA is not to give you answers. Its role is to stress-test your questions.

We are going to flip the script. Instead of asking the AI to analyze the data, you will feed it your initial hypothesis, and you will command it to try and destroy that hypothesis.

You are using the LLM as a hostile peer reviewer. It has read every statistics textbook in existence. It knows every edge case, every confounding variable, and every statistical fallacy. Your job is to force it to use that knowledge to expose your blind spots before you present them to stakeholders.

The Strategic Framework

This approach relies on a loop I call the “Clash-Code-Confirm” cycle.

Pillar 1: The Clash (The Sparring Match) You provide the metadata (column names, data types) and your working hypothesis. You then demand that the LLM play “Devil’s Advocate.” You ask for specific reasons why your hypothesis might be wrong. You ask for confounding variables and alternative explanations.

Pillar 2: The Code (The Counterpunch) Once the LLM gives you an alternative explanation (e.g., “Seasonality might be skewing your price/volume correlation”), you do not ask it if that is true. You ask it for the mechanism to check if it is true.

You ask for the specific code (Python/R) or the specific spreadsheet formula to visualize that specific relationship.

Pillar 3: The Confirmation (The Knockout) You run the code or the formula. You look at the chart. You decide if the insight is real. The AI is the critic; you are the judge.

The Analyst’s Playbook

Here is how to run the Clash-Code-Confirm cycle in your daily workflow, whether you are in Python, R, or Excel.

Step 1: Define the Context & Hypothesis Do not paste your data rows. Paste your schema.

Python/R Users: Paste the output of df.info() or glimpse(df).
Excel Users: Paste the header row and a brief description of what each column represents.

Then, state what you think is happening.

Step 2: The “Sparring Partner” Prompt Use this prompt structure to force the LLM to attack your assumption.

<role>

Act as a Skeptical Senior Statistician. Your goal is to disprove my findings.

</role>

<context>

My dataset schema is: [PASTE HEADERS/SCHEMA HERE] My primary hypothesis is: “Higher marketing spend is driving the increase in new user signups.”

</context>

<instructions>

Do not agree with me. Provide 3 statistical reasons or confounding variables that could explain this relationship other than direct causation. For each reason, suggest a specific visualization or pivot table structure I should create to test if your counter-argument is true.

</instructions>

Step 3: The Execution (Code or Formulas) The LLM might return a counter-argument like: “The increase might be due to a ‘Day of Week’ effect where both spend and signups naturally spike on weekends.”

Now, you need the tools to check that.

Option A (Python/R): Ask for the script. “Write the [Python/R] code to plot ‘Signups’ vs ‘Marketing Spend’, colored by ‘Day of Week’. Use a scatterplot with separate regression lines for each day.”
Option B (Excel/Sheets): Ask for the formula. “How do I check this in Excel? Give me the specific Pivot Table setup (Rows, Columns, Values) and the formula to extract ‘Day of Week’ from my Date column.”

Step 4: The Verification You take that code or that formula instruction. You run it in your tool.

If the chart shows that the regression lines are flat on weekends despite high spend, the AI was right to doubt you. You just saved yourself from presenting a false correlation to your VP.

Final Thoughts

An analyst who stops digging the moment the data confirms their bias is not an asset. They are a liability.

Your value is not in your ability to generate charts that please your stakeholders. Your value is in your ability to protect them from false confidence.

Use the LLM to guard against your own blind spots. Use it as a sparring partner to suggest the tests you didn’t think to run. But never, ever let it tell you what the data means. That is your job.

Next week, in Part 4, we advance to the main event: Data Analysis & Modeling. We will take our verified data and use the Command Framework to build advanced predictive models and interpret the results without losing our strategic edge.

The Data Analyst’s LLM Masterclass (Part 3): How to Analyze Data with an LLM

Getting Your Data Analyst Career Up And Running: Your 6-Month Starter’s Guide