Firelog @firelog_io

Interesting content and insights from the tech world firelog.io Insights Feed Joined December 2024

Tweets

426
Followers

47
Following

239
Likes

409

Firelog @firelog_io

a year ago

youtube.com/watch?v=Pv0cfs…

0 0 0 66 0

View Details

Beyond the Hype: A Realistic Look at Large Language Models • Jodie Burchell • GOTO 2024 Thank you for the warm introduction. My name is Jody Burell, a developer advocate at JetBrains. I’ve spent almost a decade as a data scientist, specializing in natural language processing (NLP). Despite the hype surrounding large language models (LLMs) like GPT-3.5 and 4, I’ve observed concerns about the messaging around them. I aim to provide a balanced perspective, cutting through the hype and examining the actual applications and limitations of these models. The Origins of LLMs LLMs belong to a type of model known as neural networks, inspired by mimicking the human brain. Since their inception in the 1940s, technological limitations hindered their practical application. However, breakthroughs such as Cuda, allowing GPUs to perform efficient matrix multiplication, and the development of large datasets like Common Crawl paved the way for training more complex language models. Beyond the Hype: A Realistic Look at Large Language Models • Jodie Burchell • GOTO 2024📷 Challenges and Innovations Despite these advancements, working with text proved to be challenging. Language’s complexity and context dependency required a neural network capable of capturing these relationships between words. In 2007, the same year Common Crawl emerged, a type of neural network called Long Short-Term Memory (LSTM) was developed, able to capture these intricate relationships. This innovation marked a significant step in the evolution of language models. GPT Models Large language models (LLMs) have revolutionized natural language processing. One key type of LLM is the Transformer model, which avoids sequential processing, allowing models to grow significantly in size. GPTs (Generative Pre-trained Transformers) are autoregressive models that have been highly successful. They have formed the basis of many large language models released in recent years, including ChatGPT, GPT-4, and Claude. Encoder-Decoder Models for Translation One of the earliest use cases for Transformer models was machine translation. To translate between languages, encoder-decoder models are used: Encoder: Learns about the source language. Decoder: Learns about the target language and generates the translation word by word. GPTs for Text Prediction Researchers realized that decoders on their own were valuable. By eliminating the encoder and training decoders to predict the next word in the same language, they created models with a strong understanding of word sequences. This ability to predict the next word became a significant breakthrough not for text generation but for data acquisition. Training these models on vast amounts of text in the same language addressed the challenge of obtaining sufficient data for training large neural networks. Training and Scaling of GPT Models The GPT models utilize a technique called “sentence splitting” to create a training set. By breaking a sentence into two parts, the first part becomes the input and the second becomes the target. This method allows for easy scaling of the training data. The models’ architecture includes stacking decoder units, which increases the number of model parameters. Starting with GPT-1 (120 million parameters), the models have grown exponentially to GPT-4 (estimated at a trillion parameters). Evolution of LLMs The evolution of LLMs can be observed by comparing their responses to the same sentence completion prompt (“Belgium is…”): GPT-1: Grammatically correct but lacks context GPT-2: Improved polish but still produces unrefined output GPT-3: Encodes information and provides on-topic responses Chat GPT 3.5: Demonstrates advanced language skills, generating extensive essays Perception and Assessment of LLMs Hype surrounding LLMs suggests they exhibit signs of artificial general intelligence (AGI). However, these claims are exaggerated. The mistake in assessing intelligence based on task performance is that machine learning models optimize for specific training goals, often using shortcuts. The skills demonstrated by LLMs do not necessarily indicate underlying general intelligence. This phenomenon, known as the “Kaggle effect,” occurs when models perform exceptionally well on specific tasks but fail when presented with examples outside their training domain. It highlights the brittleness of LLMs and the challenges in assessing true intelligence. LLMs focus heavily on specific task performance, neglecting how intelligence is defined in humans. Researchers in these areas lack expertise in psychology, the field that defines and measures intelligence. Skill-based assessments of LLM intelligence can be misleading. For example, GPT-4’s ability to solve medical and law exams may have been due to training data memorization rather than true intelligence. A more accurate assessment of intelligence is how well systems handle unseen tasks (generalization). Researchers have proposed a hierarchy of generalization, ranging from no generalization to extreme generalization. Extreme generalization aligns with human intelligence, enabling problem-solving beyond training data. Universality, the ability to solve any problem, is not a realistic initial goal for artificial systems. Instead, they should focus on lower generalization levels that align with human intelligence, such as broad abilities and specific skills. Conceptualizing Artificial General Intelligence (AGI) We can draw lessons from established methods of measuring human intelligence to guide the development of systems with AGI. Chalets proposed a design framework for an AGI system, suggesting it should possess innate priors like elementary geometry, physics, and an understanding of agency, enabling it to refine skill programs based on experience and learn over time. Assessing Generalization Ability DeepMind researchers further refined Chalet’s work by introducing generalization difficulty as a key factor. They classified systems based on their performance and generalization capabilities, creating a spectrum from narrow to general systems. Narrow systems can outperform specific task categories, while AGI systems aim to generalize across a wide range of tasks. Current State of AGI Development Despite advancements in narrow AI, researchers indicate that we are still far from achieving AGI. DeepMind’s evaluation of ChatGPT classified it as an emerging AGI system, highlighting the limitations of current LLMs to generalize across multiple task domains. Natural Language Tasks and LLMs LLMs excel at natural language tasks, which encompass a wide range of applications such as text processing, translation, and information retrieval. This has been a focus of research for several decades, and LLMs represent the latest advancement in this field. Natural Language Problems and RAG Techniques LLMs excel in several tasks, including: Language translation (with sufficient training data) Text classification (inferring text topics and assigning categories) Text summarization (condensing longer texts into shorter ones) Question answering (providing answers based on provided questions) Question Answering Methods LLMs can answer questions using various methods: Parametric knowledge: During training, large LLMs acquire knowledge encoded in their training data, allowing them to answer questions based solely on this information. Fine-tuning: LLMs can be further trained to enhance their question answering abilities in specific domains. Retrieval-augmented generation (RAG): RAG incorporates additional context from external sources into the question prompt, improving the LLM’s ability to answer questions accurately. RAG Pipeline A RAG pipeline for question answering involves the following steps: Create prompt: Formulate a question prompt. Gather additional context: Retrieve relevant context from external sources (e.g., web search, document database). Encode documents: Convert documents into document embeddings using an encoder model. Retrieve information: Find the most similar document embeddings to the prompt in a vector database. Augment prompt: Instruct the LLM to ignore parametric knowledge and use only the retrieved information when answering the question. Answer question: The LLM provides an accurate answer based on the augmented prompt. Building a RAG Pipeline in PyCharm Using Python and the LangChain package, it is possible to build a simple RAG pipeline for question answering: Ingest documentation: Convert the unindexed PDF documentation into a format suitable for processing by the LLM. Encode documents: Convert the processed documentation into document embeddings. Retrieve information: Implement the retrieval mechanism to find relevant document embeddings for a given question. Augment prompt: Create an augmented prompt for the LLM using the retrieved information. Answer question: Pass the augmented prompt to the LLM to obtain the answer. Language Model and Data Preparation We start by choosing an LLM, such as OpenAI’s model. Next, we load our comprehensive documentation into the application, which accepts various formats. We split the document into chunks and convert them into document embeddings, which are stored in a vector database. Question Retrieval and Answering To retrieve relevant chunks for answering questions, we create a retriever using our database. Parameters can be set to specify the number of chunks retrieved for each query. Using these components, we build an application that bundles them together. Optimization and Parameter Tuning The application allows for parameter customization. The choice of model (Chat GPT 3.5), number of chunks retrieved (five), and other parameters influence the application’s performance. By adjusting these parameters, optimization can be achieved. Question Answering The application is ready to answer questions. It generates answers using the LLM, providing summaries and relevant information. Users can ask follow-up questions by passing in previous questions and answers, allowing for conversation-like interactions. Multilingual Translation Advanced LLMs like Chat GPT 3.5 support multiple tasks, including translation. This enables the application to handle queries in different languages, such as German to English translation. By leveraging the model’s capabilities, the application can retrieve relevant document chunks in English for questions posed in German. RAG (Retrieval-Augmented Generation) Considerations Tuning RAG parameters, such as chunk size and retrieval method, can significantly impact performance. LLM (Large Language Model) Selection Not all LLMs are suitable for all tasks. Consider the model’s training data and specialization. Using a model untrained on a specific language or domain may lead to poor translations or hallucinations. Performance Evaluation Objective benchmarks exist for measuring LLM performance on general language tasks. However, specific task performance depends on the domain and use case. Domain-specific benchmarks or custom benchmarking data sets may be necessary. Conclusion LLMs are powerful but limited models. They have practical applications, but only with careful use case selection, tuning, and performance monitoring. These challenges echo familiar software development and machine learning issues.

1 0 0 91 0

View Details

Firelog @firelog_io

a year ago

Even a little bit of alcohol is creating potholes; it's disrupting the highways in the brain. The speaker describes how even small amounts of alcohol can disrupt brain function by affecting the white matter, likening it to creating potholes in the brain's highways. The Diary Of A CEO - The No.1 Brain Doctor: This Parenting Mistake Ruins Your Kids Brain & Alcohol Will Ruin Yours!

0 0 0 66 1

View Details

Firelog @firelog_io

a year ago

The web is going to go away, and when it does, it will go away very fast because something better will come along. The discussion predicts the eventual decline of the web, suggesting that a new technology will replace it quickly. ThePrimeTime - The Greatest Software Engineers of All Time

0 0 2 152 1

View Details

Firelog @firelog_io

a year ago

"I am waiting for some sort of compelling evidence that something extraterrestrial is going on." The speaker expresses skepticism about the existence of extraterrestrial life and is waiting for compelling evidence. PowerfulJRE - Joe Rogan Experience #2269 - Bret Weinstein

0 0 0 60 0

View Details

Naval @naval

"The School of Life: An Emotional Education" (2019) – Alain de Botton 'School of Life the organization that I started um if you if you follow our stuff every every day I'm writing stuff for for our website for our app Etc so there's content coming out all the time and we got a lot of books I've written 15 books under my own name I've written about 70 books uh under the School of Life.'