June 20, 2023 – <CONTENT /> v.6

June 20, 2023June 20, 2023

vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

LLMs promise to fundamentally change how we use AI across all industries. However, actually serving these models is challenging and can be surprisingly slow even on expensive hardware. Today we are excited to introduce vLLM, an open-source library for fast LLM inference and serving. vLLM utilizes PagedAttention, our new attention algorithm that effectively manages attention keys and values. vLLM equipped with PagedAttention redefines the new state of the art in LLM serving: it delivers up to 24x higher throughput than HuggingFace Transformers, without requiring any model architecture changes.

vLLM has been developed at UC Berkeley and deployed at Chatbot Arena and Vicuna Demo for the past two months. It is the core technology that makes LLM serving affordable even for a small research team like LMSYS with limited compute resources. Try out vLLM now with a single command at our GitHub repository.
— https://vllm.ai/

June 20, 2023June 20, 2023

Emerging Architectures for LLM Applications | Andreessen Horowitz

Large language models are a powerful new primitive for building software. But since they are so new—and behave so differently from normal computing resources—it’s not always obvious how to use them.

In this post, we’re sharing a reference architecture for the emerging LLM app stack. It shows the most common systems, tools, and design patterns we’ve seen used by AI startups and sophisticated tech companies. This stack is still very early and may change substantially as the underlying technology advances, but we hope it will be a useful reference for developers working with LLMs now.
https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/

June 20, 2023

Github :: free-response-scoring by David Colarusso

This repository shares code used to implement the methods described in Unsupervised Machine Scoring of Free Response Answers—Validated Against Law School Final Exams, presented at the Computational Legal Studies Conference, March 2022, hosted by the Center for Computational Law at Singapore Management University.

You can find links to all relevant content either in, or linked to from, the notebook titled Score Exams.

—colarusso/free-response-scoring

Good alternative to LLM text comparison. Note: patent pending Suffolk University

June 20, 2023

Github :: Flowise – Drag & drop UI to build your customized LLM flow using LangchainJS

Code 0n Github: https://github.com/FlowiseAI/Flowise

June 20, 2023

Reddit :: Tutorial – train your own llama.cpp mini-ggml-model from scratch!

Tutorial – train your own llama.cpp mini-ggml-model from scratch!
byu/Evening_Ad6637 inLocalLLaMA

Here I show how to train with llama.cpp your mini ggml model from scratch! these are currently very small models (20 mb when quantized) and I think this is more fore educational reasons (it helped me a lot to understand much more, when “create” an own model from.. nothing before. And it helps to understand the parameters and their effects much better)

Otherwise, these mini models could be good enough to be experts on very specific fields, like: only gives text in the style of someone. Like one model could speak like cartman from southpark, another could be a poem and you could implement these ‘person’ in your general chat or role play coversations as supporting roles or minor roles.. to make “group” chats, brainstormings, etc.

And: the discussions on github seems to be very promissing that we will soon be able to fine tune pre-trained big models like llama or vicuna and so on. espcially creating (q)lora adapters should be possible soon : )

this will be the next game changer i think (imagine your model could be finetuned in real time incrementally on top of its lora adapter and with your current conversation as the dataset – what awesome implications would this mean?)

EDIT:

You maybe need the training-script

— Tutorial – train your own llama.cpp mini-ggml-model from scratch!

June 20, 2023

5 Most Valuable Ways To Convert Unstructured Text To Structured Data | Width.ai

Here’s 5 of the most valuable ways to convert unstructured text to structured data with natural language processing

Source: 5 Most Valuable Ways To Convert Unstructured Text To Structured Data | Width.ai

June 20, 2023

From Medium :: Run Very Large Language Models on Your Computer | by Benjamin Marie | Towards AI

New large language models are publicly released almost every month. They are getting better and larger.

You may assume that these models can only be run on big clusters or in the cloud.

Fortunately, this is not the case. Recent versions of PyTorch propose several mechanisms that make the use of large language models relatively easy on a standard computer and without much engineering, thanks to the Hugging Face Accelerate package.

Source: Run Very Large Language Models on Your Computer | by Benjamin Marie | Towards AI

June 20, 2023

From Medium :: Mastering AI Summarization: Your Ultimate Productivity Hack

Unlock Your Second Brain with Streamlit and Hugging Face’s Free LLM Summarization: build a Python Webapp running on your PC.

Source: Mastering AI Summarization: Your Ultimate Productivity Hack

This uses a smaller language model tailored to text summarization. Maybe a good path for assessing student short answers and essays.

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30