When I ask ChatGPT to describe itself, its response includes terms like “a transformative technology” offering users “a broad tapestry of ideas” and that I can “think of ChatGPT as an AI that has been trained on vast amounts of information and data that will continue to update and evolve over time.”

This response illustrates at once the folly and the fascination of ChatGPT and other Large Language Models (LLM).

Words like “transformative” and “tapestry,” repetitive sentence structures, and the tendency to jam just too many ideas into a single statement (“It’s about X and Y and providing X and Y to the X and Y for X and Y results!”) are dead giveaways that text is generated by AI, as pointed out expertly by James Presbitero Jr in the Medium article linked here. It’s easy and important to pick out. And when people don’t respect the thorny sides of AI and use it to supplant, not complement, human intelligence they typically find themselves in very bad places. Same goes for organizations.

Yet, despite such obvious folly, one can’t help but be awestruck by this fascinating technology: “Just how the cuss does it do what it does??”

Understanding AI’s capabilities and limitations is the first step in ethically applying it in our professional and personal lives. This is a demonstration of the ethical consideration we call “Transparency and Explainability” in the classroom at the University of Notre Dame Mendoza College of Business and at Nore Analytics. At its root is the idea of ensuring AI algorithms are understandable and explainable to stakeholders in order to build organizational trust for the technology. It is one of many critical ethical considerations we will explore in this Blog space, but it’s a big one. It’s where everything starts. Without understanding what something is you can’t possibly know if you’re using it right.

For a marketing executive, however, does understanding ChatGPT mean the ability to recite the Transformer Architecture thesis or debate the computational limitations of recurrence and convolutional models? Certainly not. It means understanding the key concepts behind the AI well enough to dispel its mysticism, respect its power, and maintain healthy expectations.

I would like now to offer just such a peek behind the curtain. In a very accessible (and admittedly oversimplified) way, I want to explain how ChatGPT does what it does. The goal here, Dear Reader, is to equip you with what I call an “Executive Understanding” of the technology. I am going to cut corners. I am going to explain things in ways that will make my former Google Eng colleagues cringe. But that’s okay. Because even if you aren’t an AI Engineer you need to understand this stuff. Particularly ChatGPT because, as we will see, LLMs form the foundation for how any AI application of prediction – including data analysis, demand forecasting, and even image generation – works.

So let’s get in there.

Keep(ing) It Simple, Smarty

With all due respect to Lockheed’s Kelly Johnson, I believe that understanding the concepts behind ChatGPT in a simple way means you’re smart. I like to illuminate nine important concepts behind the ChatGPT AI when I discuss these ideas in depth. But we will simplify things even more here and make you even smarterer (Err, more smart? Clearly these posts are not written by an AI). If you’d like to strike a better balance between the “Executive” treatment I will give the technology and full-on AI Coding Wizard, I cannot recommend more highly this YouTube video from Spotify’s co-President Gustav Söderström. Söderström does an exceptional job of piercing the conspiracy to make AI harder than it is. The video is worth a view.

But for our purposes, we will consider ChatGPT’s functionality to be a testament to the simplicity underlying AI’s complexity. In the most simple of terms, ChatGPT operates by translating language into numbers, allowing its algorithms to perceive sentences as numerical sequences and perform statistical analysis. This process is akin to how AI in marketing understands consumer behavior — by translating human interactions into data that can be analyzed and predicted.

Let’s dig in a little more.

Predicting Words, LLM-Style

At its core, the statistical analysis an AI conducts predicts the likelihood of future events, such as the next word in a sentence or a consumer’s next buying decision. This predictive capability is grounded in pattern recognition and probability.

How? Well, in simple terms, when ChatGPT “sees” a word, the AI can scour the vast corpus of text it has been trained on and determine the probability of a word pairing with that word. For example, if ChatGPT sees the word “ARE” the AI can find every time the word “ARE” (or, more precisely, the numbers the computer associates with the word “ARE”) appears in its training text and calculate the percent of the time any other word (number) follows it. In this way, words like “YOU” and “THERE” and “WE” probably have a higher probability in following the word “ARE” than more seldomly used “ARE + ______” combinations like “ANIMALS” or “SINKS” or “ZEBRAS”. Certainly you can think of sentences where those latter words come after an “ARE”, but they are doubtless less frequent in our speech than phrases that feature “ARE YOU” or “ARE THERE” or “ARE WE” .

In this way, ChatGPT begins to formulate a response. By predicting the next word in a sequence of words to form a phrase, a sentence, a paragraph based on how that word has been used before. But, clearly, if we are predicting the next word in a sequence based on just a single word, our predictions are going to be pretty bad. We need more words (numbers, data!) to make our predictions more accurate. When I bolt the three words “HI,” “THERE,” and “HOW” to the word “ARE” – forming the phrase “HI THERE, HOW ARE ______” – it becomes much, much easier to predict which word comes next. “YOU”? Pretty likely. “CABINETS”? Not very likely.

Overcoming Computational Hurdles

I know I said at the top that you don’t need to understand it, but it is important to introduce the Transformer Architecture at this point. Why? Because while adding more words, numbers, and data to our predictive analytics aids in accuracy, it massively expands an algorithm’s computational demands. So much so that this computational hurdle is in essence what has kept us from realizing the power of AI until now, even though the idea has been around for nearly 100 years. With more than 174,000 dictionary-recognized words in the English language (and estimates of over a million when slang and vernacular dialects are included), managing to compute probabilities across swelling word combinations is a hard drive crushing inconvenience.

Enter the Transformer Architecture. Introduced by a team of Google researchers in the paper “Attention Is All You Need” published on June 12, 2017, the Transformer Architecture offered an elegant way for LLMs to use as context thousands of words – in fact now happily handling even 100,000 words or the size of a thick book – just to guess the next word in a sequence. Tech companies have been working to build and refine the concept of the Transformer model into their products from the time it was introduced right up until ChatGPT took on 100 million users less than two months after its launch. It will continue to be a vital part of all technology for the foreseeable future.

You can thank the Transformer Architecture team for why it feels like everything today is about AI. You can also thank them for giving ChatGPT the power it needed to become a commercially viable product.

Fine-Tuning By Real Humans

Despite its tech-based faculty, ChatGPT owes its abilities (and many of its liabilities) to human guidance. That’s because the human touch played a huge role in its design and continues to play a huge role in its evolution. The model was trained on human-produced text including Internet sites like Wikipedia, social media networks, academic papers, and books. Why, you may ask, can ChatGPT write you a sonnet in Shakespearean style? Because in its corpus is Shakespeare’s entire life’s work. Indeed, ChatGPT uses this data to perform its predictive calculations and learn, in a primitive way, the nuances of human speech patterns. And how to write like Shakespeare.

Further human guidance in the form of supervised and reinforcement learning teaches ChatGPT how to structure answers to questions, produce summaries of unwieldy articles, and appreciate which responses people like and which ones they dislike. Through this process of guidance, ChatGPT learns how to respond in context and pick up on subtleties in our language. Though it hasn’t yet learned to mix up its sentence length from time to time (hat tip, again, to Presbitero), still it learns and will perhaps one day overcome this wart. Case in point: Have you experienced when ChatGTP offers two responses and asks which is better? Bingo. It is learning from you, Dear (Human) Reader, how to improve its responses based on your preferences.

Understanding Vectors, Victor

I’m a child of the ‘70s so I will never, ever pass up a good Airplane! reference. I’m trying to stop but I just can’t help myself. Joking aside, vectors are getting larger (like Leon…darn it! sorry.) roles in how ChatGPT understands our language. In the simplest of terms, words not only appear as numbers to the AI but as multi-dimensional word vectors (referred to as embeddings) that contain the relationship between the word (number) the AI is analyzing and the other words (numbers) in its memory. This allows ChatGPT to gain a contextual understanding of words so that they become pieces of an intricate puzzle (tapestry?) rather than a standalone entity.

Here’s an example of this idea from Söderström that I like. He uses the idea to explain a Spotify playlist, but it serves our needs just as well. Söderström describes a world where only three types of music exist: Rock, Classical, and EDM. In this very simple musical world, every song is translated to the AI as having some percentage of each of those three dimensions. So maybe 1% of Beethoven’s “Für Elise” has rock characteristics, while 99% of it is classical and 5% of it feels like EDM. Conversely, 99% of Avicii’s “Levels” is EDM, while only 2% and 1% of its elements are rock and classical, respectively. Queen’s “Bohemian Rhapsody” is a unique blend, coming in at 99% rock and 99% classical, but only 5% EDM. In this way, each song has a comparable “profile” and those profiles can be used to determine which songs are like one another and which are distinct from each other (which is something that is important to know when you are serving up “Next Song” suggestions or product recommendations).

For ChatGPT, the idea of vectors means that the AI can interpret profiles of words based on how they have been encoded. More importantly, the AI can understand the profile of a sentence by rolling together profiles from all of the sentence’s words. This is how the AI is able to fit sentences together in its responses and understand the full context of your 16+ sentence prompt. And when we are talking about an AI that is analyzing data points instead of language, the idea of vectors gives it the context of seasonality, competitor impacts, market momentum, or any other dimension it has learned.

Turning Up (or Down) the Temperature

None of this, however, explains how AI employs creativity or, in the worst cases, affixes some vile hallucination to a response. There’s one final concept we must introduce: “Temperature.” When you ask ChatGPT to write that poem in Shakespearean style, this is the concept that produces a Shakespeare-like response, rather than a verbatim quote from one of his works.

In the most basic sense, the temperature setting in LLMs like ChatGPT refers to a parameter that governs the randomness of its responses. A higher temperature results in more varied and creative outputs, while a lower temperature leads to more predictable and conservative responses. In terms of the prediction idea we discussed, a low temperature setting tells the AI to look for the word that is the most likely next word in a sequence, while a higher temperature setting may see the AI skip over the obvious next word for something with a little more pizazz. Returning a third time to our Shakespeare example, I doubt you’ll receive a Shakespearean-inspired ChatGPT poem with the phrase “To be, or not to be: that is the question.” even though this word sequence is seen frequently across human history. ChatGPT’s temperature setting steps in to prevent such obvious text “lifts & shifts.”

The temperature setting is fixed by ChatGPT’s developers in a way that balances the need for accurate information with a degree of variability to handle a wide range of queries. For most LLMs, temperature settings are clearly a work in progress and not adjustable by users. However, in some implementations or specialized AI applications, it might be possible for the developers or users to adjust this setting based on specific requirements.

Conclusion

So, there you have it. Hopefully you have a better understanding of how a LLM like ChatGPT works. In summary, ChatGPT operates on a simple yet powerful set of key ideas:

Statistical Prediction: ChatGPT treats words as numbers and predicts the next word (number) in a sentence based on probabilities derived from its training data, honing its accuracy with each word (number) included in the sequence.

Transformer Architecture: ChatGPT operates with a transformer architecture that allows it to process inputs in a more sophisticated, elegant manner, capturing contextual relevance at a massive scale.

Training and Learning: Post initial training, ChatGPT undergoes fine-tuning through supervised learning and reinforcement learning from human feedback. This process further hones its responses to be more accurate and contextually relevant.

Use of Word Vectors: ChatGPT employs word vectors (embeddings) to understand and manipulate language semantically. These vectors represent words in a high-dimensional space, capturing their meanings based on their usage in the training data.

Temperature Setting for Creativity: ChatGPT developers can adjust its ‘temperature’ to vary the creativity of its responses. A higher temperature allows for more creative (less predictable) responses, while a lower temperature results in more conservative and expected outputs.

By understanding these five fundamental, straightforward principles behind ChatGPT, we can better appreciate AI’s capabilities and limitations and understand how that same predictive capability translates to a variety of AI tools. This understanding is pivotal in demystifying AI’s role in various applications, including marketing, and the first step is its ethical use. You can now go forth and use tools like ChatGPT with better comprehension and build them into your AI strategy with a more clear sense of how they are operating and why they respond the way they do.

If these insights resonate with you and you’d like to explore AI’s potential in your business, Nore Analytics is here to help. Our AI expertise can help you realize new efficiencies and market opportunities. Reach out to us at kevin@noreanalytics.com to discover what Nore can do for your business.