Get Started
Ling is a workflow framework that supports streaming of structured content generated by large language models (LLMs). It enables quick responses to content streams produced by agents or bots within the workflow, thereby reducing waiting times.
Introduction
Complex AI workflows, such as those found in Bearbobo Learning Companion, require multiple agents/bots to process structured data collaboratively. However, considering real-time responses, utilizing structured data outputs is not conducive to enhancing timeliness through a streaming interface.
The commonly used JSON data format, although flexible, has structural integrity, meaning it is difficult to parse correctly until all the content is completely outputted. Of course, other structured data formats like YAML can be adopted, but they are not as powerful and convenient as JSON. Ling is a streaming framework created to address this issue. Its core is a real-time converter that can parse incoming JSON data streams character by character, outputting content in the form of jsonuri.
For example, consider the following JSON format:
{
"outline": [
{
"topic": "What are clouds made of?"
},
{
"topic": "Why do clouds look soft?"
}
]
// ...
}
During streaming input, the content may be converted in real-time into the following data outputs (using Server-sent Events):
data: {"uri": "outline/0/topic", "delta": "clo"}
data: {"uri": "outline/0/topic", "delta": "uds"}
data: {"uri": "outline/0/topic", "delta": "are"}
data: {"uri": "outline/0/topic", "delta": "mad"}
data: {"uri": "outline/0/topic", "delta": "e"}
data: {"uri": "outline/0/topic", "delta": "of"}
data: {"uri": "outline/0/topic", "delta": "?"}
data: {"uri": "outline/1/topic", "delta": "Why"}
data: {"uri": "outline/1/topic", "delta": "do"}
data: {"uri": "outline/1/topic", "delta": "clo"}
data: {"uri": "outline/1/topic", "delta": "uds"}
data: {"uri": "outline/1/topic", "delta": "loo"}
data: {"uri": "outline/1/topic", "delta": "k"}
data: {"uri": "outline/1/topic", "delta": "sof"}
data: {"uri": "outline/1/topic", "delta": "t"}
data: {"uri": "outline/1/topic", "delta": "?"}
...
This method of real-time data transmission facilitates immediate front-end processing.
Usage
Install using pnpm
pnpm i @bearbobo/ling
Build chat workflow API with Express.js
function workflow(question: string, sse: boolean = false) {
const config: ChatConfig = {
model_name,
api_key: apiKey,
endpoint: endpoint,
};
const ling = new Ling(config);
ling.setSSE(sse); // use server-sent events
// Create Bot
const bot = ling.createBot();
bot.addPrompt(promptTpl); // Set system prompt
bot.chat(question);
bot.on('string-response', ({uri, delta}) => {
// string response completation in a json field
console.log('bot string-response', uri, delta);
// setupt anothor bot
const bot2 = ling.createBot();
bot2.addPrompt(promptTpl2); // Set system prompt
bot2.chat(delta);
bot2.on('response', (content) => {
console.log('bot2 response finished', content);
});
...
});
ling.close();
return ling;
}
app.get('/', (req, res) => {
// setting below headers for Streaming the data
res.writeHead(200, {
'Content-Type': "text/event-stream",
'Cache-Control': "no-cache",
'Connection': "keep-alive"
});
const question = req.query.question as string;
const ling = workflow(question, true);
try {
pipeline((ling.stream as any), res);
} catch(ex) {
ling.cancel();
}
});
Fetch data in front-end
const es = new EventSource('http://localhost:3000/?question=can i laid on the cloud?');
es.onmessage = (e) => {
console.log(e.data);
}
es.onopen = () => {
console.log('connected');
}
es.onerror = (e) => {
console.log(e);
}