Skip to content

Get Started

Ling is a workflow framework that supports streaming of structured content generated by large language models (LLMs). It enables quick responses to content streams produced by agents or bots within the workflow, thereby reducing waiting times.

Introduction

Complex AI workflows, such as those found in Bearbobo Learning Companion, require multiple agents/bots to process structured data collaboratively. However, considering real-time responses, utilizing structured data outputs is not conducive to enhancing timeliness through a streaming interface.

The commonly used JSON data format, although flexible, has structural integrity, meaning it is difficult to parse correctly until all the content is completely outputted. Of course, other structured data formats like YAML can be adopted, but they are not as powerful and convenient as JSON. Ling is a streaming framework created to address this issue. Its core is a real-time converter that can parse incoming JSON data streams character by character, outputting content in the form of jsonuri.

For example, consider the following JSON format:

json
{
  "outline": [
    {
      "topic": "What are clouds made of?"
    },
    {
      "topic": "Why do clouds look soft?"
    }
  ]
  // ...
}

During streaming input, the content may be converted in real-time into the following data outputs (using Server-sent Events):

data: {"uri": "outline/0/topic", "delta": "clo"}
data: {"uri": "outline/0/topic", "delta": "uds"}
data: {"uri": "outline/0/topic", "delta": "are"}
data: {"uri": "outline/0/topic", "delta": "mad"}
data: {"uri": "outline/0/topic", "delta": "e"}
data: {"uri": "outline/0/topic", "delta": "of"}
data: {"uri": "outline/0/topic", "delta": "?"}
data: {"uri": "outline/1/topic", "delta": "Why"}
data: {"uri": "outline/1/topic", "delta": "do"}
data: {"uri": "outline/1/topic", "delta": "clo"}
data: {"uri": "outline/1/topic", "delta": "uds"}
data: {"uri": "outline/1/topic", "delta": "loo"}
data: {"uri": "outline/1/topic", "delta": "k"}
data: {"uri": "outline/1/topic", "delta": "sof"}
data: {"uri": "outline/1/topic", "delta": "t"}
data: {"uri": "outline/1/topic", "delta": "?"}
...

This method of real-time data transmission facilitates immediate front-end processing.

Usage

Install using pnpm

pnpm i @bearbobo/ling

Build chat workflow API with Express.js

js
function workflow(question: string, sse: boolean = false) {
  const config: ChatConfig = {
    model_name,
    api_key: apiKey,
    endpoint: endpoint,
  };

  const ling = new Ling(config);
  ling.setSSE(sse); // use server-sent events

  // Create Bot
  const bot = ling.createBot();
  bot.addPrompt(promptTpl); // Set system prompt
  bot.chat(question);
  bot.on('string-response', ({uri, delta}) => {
    // string response completation in a json field
    console.log('bot string-response', uri, delta);

    // setupt anothor bot
    const bot2 = ling.createBot();
    bot2.addPrompt(promptTpl2); // Set system prompt
    bot2.chat(delta);
    bot2.on('response', (content) => {
      console.log('bot2 response finished', content);
    });

    ...
  });

  ling.close();

  return ling;
}

app.get('/', (req, res) => {
  // setting below headers for Streaming the data
  res.writeHead(200, {
    'Content-Type': "text/event-stream",
    'Cache-Control': "no-cache",
    'Connection': "keep-alive"
  });

  const question = req.query.question as string;
  const ling = workflow(question, true);
  try {
    pipeline((ling.stream as any), res);
  } catch(ex) {
    ling.cancel();
  }
});

Fetch data in front-end

js
const es = new EventSource('http://localhost:3000/?question=can i laid on the cloud?');

es.onmessage = (e) => {
  console.log(e.data);
}
es.onopen = () => {
  console.log('connected');
}
es.onerror = (e) => {
  console.log(e);
}