Photo by Jonathan Kemper on Unsplash
Beginner's Guide to Semantic Kernel with Dapper, PostgreSQL, and PGVector in C#
Introduction
Semantic Kernel is an open-source framework by Microsoft that integrates machine learning and natural language processing to build AI applications.
It offers tools for tasks like language understanding, text generation, and semantic search, emphasizing modularity and easy customization.
The tutorial guides through setting up a C# Web API for uploading and querying text files using PostgreSQL with PGVector for vector database, configuring database services and dependencies, creating embeddings with AzureOpenAI, and utilizing the Semantic Kernel Memory plugin for persistent storage and intelligent querying. The example demonstrates the practical steps needed to implement and interact with these technologies in a seamless manner.
Prerequisites
A C# compatible IDE, such as Visual Studio, Rider, or VS Code
An account with a compatible AI service (for this tutorial, I am using Azure Open AI)
Docker
Setting up Postgres with PGVector
To use Semantic Kernel, we need a vector database. Vector databases are specialized databases designed to store, index, and query high-dimensional vectors, enabling efficient similarity search and retrieval for tasks like machine learning and natural language processing.
For this tutorial, I’m going to use PostgreSQL. PostgreSQL is an open-source relational database management system known for its robustness, extensibility, and support for advanced data types and complex queries. Through the use of the PGVector extension, it is also very capable of handling vector embeddings. The easiest way to get started with this is to set up Docker and deploy the pgvector container with the following command:
docker run --name postgres -e POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 -d pgvector/pgvector:pg16
Creating the Application
For this tutorial I’m going to set up a new C# Web API, that allows us to simply upload and query a text file.
Open up the IDE and start a new web project.
Installing the Dependencies
Before we start, we’re going to need to install the following packages form Nuget into our project:
Dapper
Pgvector.Dapper
And if you’re using Azure Open AI then the following:
Microsoft.AspNetCore.OpenApI
Microsoft.KernelMemory.Core
Microsoft.KernelMemory.MemoryDb.Postgres
Microsoft.KernelMemory.SemanticKernelPlugin
Microsoft.SemanticKernel
Microsoft.SemanticKernel.Connectors.OpenAI
For different AI providers different packages will be required to some of those above.
Configuring the Database
The first thing we need to do is set up our database to enable the PGVector extension. The full DbContext.cs is here, but in summary, we have an Init()
function that is called during the program start-up to create the database:
private async Task CreateDatabase()
{
// create database if it doesn't exist
await using var connection = new NpgsqlConnection(DbConstants.ConnectionString.Replace(DbConstants.DatabaseName, "postgres"));
var sqlDbCount = $"SELECT EXISTS(SELECT datname FROM pg_catalog.pg_database WHERE datname = '{DbConstants.DatabaseName}');";
var dbCount = await connection.ExecuteScalarAsync<int>(sqlDbCount);
if (dbCount == 0)
{
var sql = $"CREATE DATABASE \"{DbConstants.DatabaseName}\"";
await connection.ExecuteAsync(sql);
}
}
and then enable the PGVector extension:
private async Task ConfigureVectors()
{
await using var connection = CreateConnection() as NpgsqlConnection;
await using (var cmd = new NpgsqlCommand("CREATE EXTENSION IF NOT EXISTS vector", connection))
{
await cmd.ExecuteNonQueryAsync();
}
await connection.ReloadTypesAsync();
}
It is important to call ReloadTypesAsync()
on the connection to ensure vector types are loaded into our application.
In our CreateConnection()
function, we also configure our data source to use vectors:
public IDbConnection CreateConnection()
{
var dataSourceBuilder = new NpgsqlDataSourceBuilder(DbConstants.ConnectionString);
dataSourceBuilder.UseVector();
var dataSource = dataSourceBuilder.Build();
return dataSource.OpenConnection();
}
We also need to configure Dapper to handle vector types. We can do this by calling SqlMapper.AddTypeHandler(new VectorTypeHandler());
in our startup code. In this example, I have moved our startup code to create and configure our database into an extension method that is called when the program starts.
public static void ConfigureDatabaseServices(this WebApplication app)
{
SqlMapper.AddTypeHandler(new VectorTypeHandler());
using var scope = app.Services.CreateScope();
var context = scope.ServiceProvider.GetRequiredService<DbContext>();
Task.Run(() => context.Init()).Wait();
}
Adding AI
To use AI in our applications, we basically need to split our process into 2 parts. Loading the data which involves creating embeddings from our data, and querying the embeddings once we have loaded it.
Embeddings
Embeddings are dense vector representations of data, like words or images, that capture relationships and similarities. This makes processing and analysis in machine learning and natural language processing tasks more efficient.
In our example, we use AzureOpenAITextEmbeddingGeneration
to create embeddings.
After creating the embeddings, we store them in our database for querying.
Adding the Memory
To store the vectors in our PostgreSQL database, we can use the Semantic Kernel Memory plugin.
The Semantic Kernel Memory plugin enhances the Semantic Kernel framework by enabling persistent memory storage for embeddings and other data. This allows applications to remember context and improve their ability to retrieve relevant information over time. It helps in tasks like conversational AI and recommendation systems by facilitating more intelligent interactions and better performance.
Again, we set up the Kernel and Memory at startup through an extension method. The full code for setting up Semantic Kernel and Memory is available here, but to summarise:
Firstly, I had to specify the embedding dimensions as 1536 (EmbeddingDimensions = 1536
), since PGVector has a limit on vector dimensions, which the newest OpenAI models exceed. 1536 should be sufficient for most use cases.
You can also see some other configurations around partition size and tokens. You can adjust these to best suit your use case or leave them as they are.
Then, we add the PostgreSQL connection string and build the kernel memory.
Finally we build the Kernel Memory, build the KErnel then add the memory to the Kernel through the plugin:
//...ommited for brevity
var kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.Services.AddAzureOpenAIChatCompletion(AzureOpenAiChatCompletionDeployment, AzureOpenAiEndpoint,
AzureOpenAiApiKey);
var kernel = kernelBuilder.Build();
var plugin = new MemoryPlugin(memory, "kernelMemory");
kernel.ImportPluginFromObject(plugin, "memory");
Consuming the Kernel
To use the Kernel we have just built, for this tutorial I have created a simple service that can simply import test into memory or query the memory to produce a response:
public class SemanticKernelService(Kernel kernel, IKernelMemory memory) : ISemanticKernelService
{
public async Task ImportText(string text)
{
await memory.ImportTextAsync(text);
}
public async Task<string> Query(string query)
{
var result = await memory.AskAsync(query);
return result.Result;
}
}
And created a couple of Endpoints in our Program.cs to allow us to upload a text file to create embeddings, and then query the data:
//...ommited for brevity
app.MapPost("Query",
async (ISemanticKernelService semanticKernelService, string query) =>
{
var result = await semanticKernelService.Query(query);
return TypedResults.Ok(result);
})
.WithOpenApi(operation =>
{
operation.Summary = "Query Transcriptions";
operation.Description = "Query Transcriptions";
operation.Tags = new List<OpenApiTag> { new() { Name = "AI" } };
return operation;
});
app.MapPost("/upload",
async (IFormFile file) =>
{
await using var stream = file.OpenReadStream();
var buffer = new byte[stream.Length];
await stream.ReadAsync(buffer);
var text = Encoding.UTF8.GetString(buffer);
var semanticKernelService = app.Services.GetRequiredService<ISemanticKernelService>();
await semanticKernelService.ImportText(text);
return TypedResults.Ok();
})
.DisableAntiforgery();
app.Run();
Running Our Application
We can run our application, upload a text file and query it through the build in Swagger page.
In the below example I simply uploaded this text file and then asked a question to the query endpoint:
I hope you enjoyed this tutorial and found it useful. If you did please give me a shout out.