Beyond the Hype: A Realistic Look at Large Language Models • Jodie Burchell • GOTO 2024
Thank you for the warm introduction. My name is Jody Burell, a developer advocate at JetBrains. I’ve spent almost a decade as a data scientist, specializing in natural language processing (NLP).
Despite the hype surrounding large language models (LLMs) like GPT-3.5 and 4, I’ve observed concerns about the messaging around them. I aim to provide a balanced perspective, cutting through the hype and examining the actual applications and limitations of these models.
The Origins of LLMs
LLMs belong to a type of model known as neural networks, inspired by mimicking the human brain. Since their inception in the 1940s, technological limitations hindered their practical application.
However, breakthroughs such as Cuda, allowing GPUs to perform efficient matrix multiplication, and the development of large datasets like Common Crawl paved the way for training more complex language models.
Beyond the Hype: A Realistic Look at Large Language Models • Jodie Burchell • GOTO 2024📷
Challenges and Innovations
Despite these advancements, working with text proved to be challenging. Language’s complexity and context dependency required a neural network capable of capturing these relationships between words.
In 2007, the same year Common Crawl emerged, a type of neural network called Long Short-Term Memory (LSTM) was developed, able to capture these intricate relationships. This innovation marked a significant step in the evolution of language models. GPT Models
Large language models (LLMs) have revolutionized natural language processing. One key type of LLM is the Transformer model, which avoids sequential processing, allowing models to grow significantly in size.
GPTs (Generative Pre-trained Transformers) are autoregressive models that have been highly successful. They have formed the basis of many large language models released in recent years, including ChatGPT, GPT-4, and Claude.
Encoder-Decoder Models for Translation
One of the earliest use cases for Transformer models was machine translation. To translate between languages, encoder-decoder models are used:
Encoder: Learns about the source language.
Decoder: Learns about the target language and generates the translation word by word.
GPTs for Text Prediction
Researchers realized that decoders on their own were valuable. By eliminating the encoder and training decoders to predict the next word in the same language, they created models with a strong understanding of word sequences.
This ability to predict the next word became a significant breakthrough not for text generation but for data acquisition. Training these models on vast amounts of text in the same language addressed the challenge of obtaining sufficient data for training large neural networks. Training and Scaling of GPT Models
The GPT models utilize a technique called “sentence splitting” to create a training set. By breaking a sentence into two parts, the first part becomes the input and the second becomes the target. This method allows for easy scaling of the training data.
The models’ architecture includes stacking decoder units, which increases the number of model parameters. Starting with GPT-1 (120 million parameters), the models have grown exponentially to GPT-4 (estimated at a trillion parameters).
Evolution of LLMs
The evolution of LLMs can be observed by comparing their responses to the same sentence completion prompt (“Belgium is…”):
GPT-1: Grammatically correct but lacks context
GPT-2: Improved polish but still produces unrefined output
GPT-3: Encodes information and provides on-topic responses
Chat GPT 3.5: Demonstrates advanced language skills, generating extensive essays
Perception and Assessment of LLMs
Hype surrounding LLMs suggests they exhibit signs of artificial general intelligence (AGI). However, these claims are exaggerated.
The mistake in assessing intelligence based on task performance is that machine learning models optimize for specific training goals, often using shortcuts. The skills demonstrated by LLMs do not necessarily indicate underlying general intelligence.
This phenomenon, known as the “Kaggle effect,” occurs when models perform exceptionally well on specific tasks but fail when presented with examples outside their training domain. It highlights the brittleness of LLMs and the challenges in assessing true intelligence. LLMs focus heavily on specific task performance, neglecting how intelligence is defined in humans. Researchers in these areas lack expertise in psychology, the field that defines and measures intelligence.
Skill-based assessments of LLM intelligence can be misleading. For example, GPT-4’s ability to solve medical and law exams may have been due to training data memorization rather than true intelligence.
A more accurate assessment of intelligence is how well systems handle unseen tasks (generalization). Researchers have proposed a hierarchy of generalization, ranging from no generalization to extreme generalization. Extreme generalization aligns with human intelligence, enabling problem-solving beyond training data.
Universality, the ability to solve any problem, is not a realistic initial goal for artificial systems. Instead, they should focus on lower generalization levels that align with human intelligence, such as broad abilities and specific skills. Conceptualizing Artificial General Intelligence (AGI)
We can draw lessons from established methods of measuring human intelligence to guide the development of systems with AGI. Chalets proposed a design framework for an AGI system, suggesting it should possess innate priors like elementary geometry, physics, and an understanding of agency, enabling it to refine skill programs based on experience and learn over time.
Assessing Generalization Ability
DeepMind researchers further refined Chalet’s work by introducing generalization difficulty as a key factor. They classified systems based on their performance and generalization capabilities, creating a spectrum from narrow to general systems. Narrow systems can outperform specific task categories, while AGI systems aim to generalize across a wide range of tasks.
Current State of AGI Development
Despite advancements in narrow AI, researchers indicate that we are still far from achieving AGI. DeepMind’s evaluation of ChatGPT classified it as an emerging AGI system, highlighting the limitations of current LLMs to generalize across multiple task domains.
Natural Language Tasks and LLMs
LLMs excel at natural language tasks, which encompass a wide range of applications such as text processing, translation, and information retrieval. This has been a focus of research for several decades, and LLMs represent the latest advancement in this field. Natural Language Problems and RAG Techniques
LLMs excel in several tasks, including:
Language translation (with sufficient training data)
Text classification (inferring text topics and assigning categories)
Text summarization (condensing longer texts into shorter ones)
Question answering (providing answers based on provided questions)
Question Answering Methods
LLMs can answer questions using various methods:
Parametric knowledge: During training, large LLMs acquire knowledge encoded in their training data, allowing them to answer questions based solely on this information.
Fine-tuning: LLMs can be further trained to enhance their question answering abilities in specific domains.
Retrieval-augmented generation (RAG): RAG incorporates additional context from external sources into the question prompt, improving the LLM’s ability to answer questions accurately.
RAG Pipeline
A RAG pipeline for question answering involves the following steps:
Create prompt: Formulate a question prompt.
Gather additional context: Retrieve relevant context from external sources (e.g., web search, document database).
Encode documents: Convert documents into document embeddings using an encoder model.
Retrieve information: Find the most similar document embeddings to the prompt in a vector database.
Augment prompt: Instruct the LLM to ignore parametric knowledge and use only the retrieved information when answering the question.
Answer question: The LLM provides an accurate answer based on the augmented prompt.
Building a RAG Pipeline in PyCharm
Using Python and the LangChain package, it is possible to build a simple RAG pipeline for question answering:
Ingest documentation: Convert the unindexed PDF documentation into a format suitable for processing by the LLM.
Encode documents: Convert the processed documentation into document embeddings.
Retrieve information: Implement the retrieval mechanism to find relevant document embeddings for a given question.
Augment prompt: Create an augmented prompt for the LLM using the retrieved information.
Answer question: Pass the augmented prompt to the LLM to obtain the answer. Language Model and Data Preparation
We start by choosing an LLM, such as OpenAI’s model. Next, we load our comprehensive documentation into the application, which accepts various formats. We split the document into chunks and convert them into document embeddings, which are stored in a vector database.
Question Retrieval and Answering
To retrieve relevant chunks for answering questions, we create a retriever using our database. Parameters can be set to specify the number of chunks retrieved for each query. Using these components, we build an application that bundles them together.
Optimization and Parameter Tuning
The application allows for parameter customization. The choice of model (Chat GPT 3.5), number of chunks retrieved (five), and other parameters influence the application’s performance. By adjusting these parameters, optimization can be achieved.
Question Answering
The application is ready to answer questions. It generates answers using the LLM, providing summaries and relevant information. Users can ask follow-up questions by passing in previous questions and answers, allowing for conversation-like interactions.
Multilingual Translation
Advanced LLMs like Chat GPT 3.5 support multiple tasks, including translation. This enables the application to handle queries in different languages, such as German to English translation. By leveraging the model’s capabilities, the application can retrieve relevant document chunks in English for questions posed in German. RAG (Retrieval-Augmented Generation) Considerations
Tuning RAG parameters, such as chunk size and retrieval method, can significantly impact performance.
LLM (Large Language Model) Selection
Not all LLMs are suitable for all tasks. Consider the model’s training data and specialization. Using a model untrained on a specific language or domain may lead to poor translations or hallucinations.
Performance Evaluation
Objective benchmarks exist for measuring LLM performance on general language tasks. However, specific task performance depends on the domain and use case. Domain-specific benchmarks or custom benchmarking data sets may be necessary.
Conclusion
LLMs are powerful but limited models. They have practical applications, but only with careful use case selection, tuning, and performance monitoring. These challenges echo familiar software development and machine learning issues.
Even a little bit of alcohol is creating potholes; it's disrupting the highways in the brain.
The speaker describes how even small amounts of alcohol can disrupt brain function by affecting the white matter, likening it to creating potholes in the brain's highways.
The Diary Of A CEO - The No.1 Brain Doctor: This Parenting Mistake Ruins Your Kids Brain & Alcohol Will Ruin Yours!
The web is going to go away, and when it does, it will go away very fast because something better will come along.
The discussion predicts the eventual decline of the web, suggesting that a new technology will replace it quickly.
ThePrimeTime - The Greatest Software Engineers of All Time
"I am waiting for some sort of compelling evidence that something extraterrestrial is going on."
The speaker expresses skepticism about the existence of extraterrestrial life and is waiting for compelling evidence.
PowerfulJRE - Joe Rogan Experience #2269 - Bret Weinstein
"Raising Mentally Strong Kids" () – Dr. Daniel Amen
'Is there anything nonobvious that we do to our children's brains? Yes, and this is so important because this is one thing a lot of parents do without knowing the consequences for their children.'
"Change Your Brain Every Day" (2020) – Dr. Mike Dow
'Dr. Daniel aan is the renowned psychiatrist and brain health expert who has scanned over 260,000 brains.'
"Artificial Intelligence: A Modern Approach" (2010) – Stuart Jonathan Russell
'I had a guy on that wrote the book on artificial intelligence, the textbook that was used worldwide in like a hundred languages, something like that, and Stuart Russell.'
"The Mars Project" (1953) – Wernher Von Braun
'Werner Von brn who was the head of NASA wrote a novel a fictional novel about a guy named Elon that takes us to Mars'
"Raising Mentally Strong Kids: How to Combine the Power of Neuroscience With Love and Logic to Grow Confident, Kind, Responsible, and Resilient Children and Young Adults" (2024) – Amen MD Daniel G.
'You can purchase ‘Raising Mentally Strong Kids: How to Combine the Power of Neuroscience With Love and Logic to Grow Confident, Kind, Responsible, and Resilient Children and Young Adults’, here: amzn.to/4aSfizL'
"Middlemarch" (1891) – George Eliot
'There's a wonderful quote in Middlemarch by George Eliot, where she says if we could properly register the full sounds of life, we would lose our minds from the full richness of existence.'
"The Prince of Medicine" (2013) – Susan P. Mattern
'There's a wonderful book that if anyone is suffering from insomnia they should check out because it's extremely detailed and difficult to listen to or read but it's called The Prince of Medicine which basically details all the reasons why we are so confused about how medicine is done and should be done and it has to do with rules and restrictions and cultural conventions and it's a whole barbed wire mess basically.'
"The Bible" (1987) – Baker Book House
'The Bible records Moses going up to Mount Sinai, coming down with the Ten Commandments, getting mad at the Israelites worshiping a golden calf, and he breaks the Ten Commandments.'
"AI Engineering" (2024) – Chip Huyen
'Chip Huyen is a computer scientist and writer and author of the book AI engineering. This book is currently the most R title on the O'Reilly platform.'
"Catch-22" (2011) – Joseph Heller
'Kurt vonet famous American Author Joseph heler another famous American author author of uh Catch 22 the satirical work.'
"The School of Life: An Emotional Education" (2019) – Alain de Botton
'School of Life the organization that I started um if you if you follow our stuff every every day I'm writing stuff for for our website for our app Etc so there's content coming out all the time and we got a lot of books I've written 15 books under my own name I've written about 70 books uh under the School of Life.'
166 Followers 613 FollowingFather and partner, observer of life’s tapestry. Former Hedge Fund manager. Keen on Rachmaninoff, the physics of black holes and unconstrained thinking.
265 Followers 1K FollowingFind exclusive pre-orders and support independent bookshops! 10% of every purchase goes back to local stores. Shop top reads & rare finds. #bookbuzz
12.7M Followers 1.1M FollowingA forum of thoughts and perspectives designed to ignite conversations and actions leading to growth, and occasional self promotion. #NeverGiveUp #RiseAboveHate
3K Followers 3K FollowingHealthCare Professional, Independent Thinker, Damn Good Sense of Humor. I am open to all ideas and will follow those with differing opinions. I no talk to DMs
1K Followers 6K FollowingA Tribute to the Jewish Spirit: 'Jews Have Horns' Anthology by Wilbur Pierce and Sara Pierce
#JewishHistory #HolocaustRemembrance #Antisemitism #WritersLift
830 Followers 922 Following⠀⠀⠀⠀— #booktwt | I’m a fantasy author!! ⠀⠀⠀⠀ ⠀⠀⠀⠀ check my book on pinned!! I talk about reading and writing here • 21 • she/her • reading ‘hidden pictures’ ⠀⠀
638 Followers 2K FollowingI'm a 5'3 twig of a woman who's deaf (I have bilateral hearing implants), tattooed, married(11 yrs), & a mom(10 yrs) *my fan page is a Norman Reedus fan page*
110K Followers 112K FollowingPromoting new books daily - always interested in new ones! Sign up for a free quality check for your book on Amazon at: https://t.co/1wgC5X2GLC
281 Followers 4K FollowingExploring my curiosity and sharing what 1 learn along the way.
Gave up a grand slam on ESPN in 2012 and still waiting for it to
land. Preorder my first book! 👇
401 Followers 3K FollowingGet Richer. Get Stronger. Get Smarter. Learn the secrets of X. Follow @TheCashGhost and he'll help you become the best you can be.
3K Followers 594 FollowingBuilding @OpenPanelDev and tweeting about it. A open-source analytics tool. Combo of Plausible and Mixpanel! https://t.co/rm887ryY62
29K Followers 2K Following👋 I'm Alex & I tweet about web dev stuff
🧙♂️ Open-sourcerer & fully-stacked typescripter.
👉 Creator of @trpcio, work @square, scout @a16z
57K Followers 411 FollowingMaintaining TanStack Query • blog at https://t.co/tqjsQfMvyp • Software Engineer @getsentry • ReactJs • TypeScript • 🇦🇹 Vienna, Austria • Father of two 👧👦
722K Followers 120 Following#1 NYT Bestselling Author: The 48 Laws of Power, The Art of Seduction, The 33 Strategies of War, The 50th Law, Mastery, The Laws of Human Nature, The Daily Laws
416K Followers 50 FollowingTypeScript is a language for application-scale JavaScript development. It's a typed superset of JavaScript that compiles to plain JavaScript.