Run Machine Learning models at the Edge with Grafbase AI

Fredrik BjörkFredrik BjörkHugo BarrigasHugo Barrigas

Run Machine Learning Models at the Edge With Grafbase
AI

The AI feature has been sunset and is no longer available. If you have any questions, please reach out to us on Discord.

Incorporating AI into your product can greatly improve its performance and effectiveness for your end-users. But, at the moment, that usually involves building separate workflows for developers, dealing with different services and their costs, and having to slow your response times as queries are sent to centralized models.

That’s why we’re introducing Grafbase AI. With Grafbase AI, you can run models directly in your GraphQL API, without having to configure any separate services. As part of Grafbase, these models will run on the edge, close to your users for the fastest response times.

The friction involved in deploying ML models has hugely decreased in the past year. Services such as LangChain, Hugging Face, and OpenAI have made it easy to get models working with only a few lines of code.

Grafbase AI makes it even easier, bringing that down to a single line of code. Previously, if you wanted to call an ML model from inside your GraphQL resolver, you’d do something like this:

import { OpenAI } from 'langchain/llms/openai' export default async function Resolver(_, { question }) { const model = new OpenAI({ modelName: 'gpt-3.5-turbo-instruct', openAIApiKey: process.env.OPENAI_API_KEY, }) const response = await model.invoke(question) }

With Grafbase AI, that call becomes:

export default function Resolver(_, { prompt }, { ai }) { const answer = ai.textLlm({ model: 'meta/llama-2-7b-chat-int8', prompt, }) }

No extra libraries, no extra API keys, no extra API calls. Just run it directly within your resolver.

Not only is this simply less code, you also don’t have the added overhead of managing each of your services. Grafbase AI manages each service and you just call whichever is needed. This means no additional API keys, no separate bills, and no extraneous configurations. The entire process is streamlined, simplifying the workflow and reducing the barriers to implementing AI solutions.

This also means no service lock-in. With the service details abstracted away, you can easily mix models and play around to find the best model for your solution simply by changing the model name in your resolver.

By abstracting complexities and embedding machine learning directly within the API, Grafbase AI ensures developers can focus on building, testing, and iterating without getting bogged down by the intricacies of AI integrations.

Grafbase AI is built on Cloudflare Workers AI. Workers AI is a GPU-powered edge network running in multiple data centers around the world. Any request from a user is routed to the nearest data center to reduce the round-trip time. The request is then run directly on that optimized GPU, without the need to be routed to a central service.

This means users get improved latencies from:

The reduced trip time from the user to the data center and back again. The lack of an additional hop from the data center to a central service to run the model. The lower computational time for the model to run on the optimized GPUs

You can then use two other Grafbase features to lower latencies for end users even further.

First, using Grafbase KV you can store your model responses at the edge for quick querying by users. Again, this is embedded directly in your resolvers to simplify development. If we wanted to store our question/answer set in the KV store, all we need to do is call kv.set():

export default async function Resolver(_, { prompt }, { ai }) { try { const { value } = await kv.get(prompt) if (value === null) { const answer = ai.textLlm({ model: 'meta/llama-2-7b-chat-int8', prompt, }) await kv.set(prompt, answer, { ttl: 60 }) return answer } else { return value } } catch (e) { return 'error' } }

That response is now stored and available to the user quicker without re-invoking the model.

Second, you can go further and use Edge Caching to cache the query and response for users in proximity to the end-users to significantly reduce response times for those users. This double-edged approach—fast responses when uncached and even quicker when cached—massively reduces response times and greatly elevates user experience.

To start, Grafbase AI is free to use. But even when pricing is introduced, we think this paradigm for building and invoking models will be more cost-efficient for companies.

You won’t have to establish and maintain machine learning infrastructure. You won’t have to pay fees to myriad different services like OpenAI or Hugging Face. There's no separate billing and no hidden costs. Billing will be usage-based. Instead of flat fees or unpredictable expenses, companies can scale their costs according to their actual usage.

This model ensures that businesses only pay for what they consume, making it easier to forecast expenses and optimize budgets. The consolidation also translates to savings on overheads and streamlines budgeting.

Finally, with efficient edge storage and caching you can reduce model calls and therefore costs.

You can start using Grafbase AI in your resolvers today. The models are preconfigured and ready to go. Note: this is still experimental, and we’d love to have your feedback on how this streamlined AI workflow is working for you and your team.

You can enable Grafbase AI for your project using the SDK:

import { config, graph } from '@grafbase/sdk' const g = graph.Standalone() export default config({ graph: g, experimental: { ai: true, }, })

We want this to be a way to work with AI in an efficient manner for developers, reducing the burden of managing multiple models, libraries, and costs, and giving them a single, straightforward workflow to deploy fast machine learning models to your users.

Get Started

Build your API of the future now.