Building serverless text prediction from training to deployment

What we'll be doing

We'll be making a simple text based prediction model based off of the book Pride and Prejudice. We will then create an API which will take prediction queries and return the top three most likely words to come afterwards. This will be deployed to the cloud so that it can be used in another project.

Prerequisites

uv - for Python dependency management
The Nitric CLI
(optional) Your choice of an AWS, GCP or Azure account

Getting started

We'll start by creating a new project for our API.

nitric new prediction-api py-starter

Next, open the project in your editor of choice.

cd prediction-api

Make sure all dependencies are resolved using uv:

uv sync

Exploring our data

We will need to start by downloading the Pride and Prejudice text file from Project Gutenberg. This will form the basis of our training data and as you will find at the end, it makes the predictions have a Jane Austen spin to it.

An important first step for training a model is exploring and pre-processing the training data. After spending some time looking through, we will find that Project Gutenberg adds a header and a footer to the data. As the book is three separate volumes, there are volume headers that also need to be removed. Along with these, we will need to remove chapter headings, punctuation, contractions, and convert all the numbers to number words i.e. 8 -> eight. This will allow for our training data to be as versatile as possible, which will make the predictions more cohesive.

To start we can manually remove the headers and footers. The header starts with The Project Gutenberg eBook, Pride and Prejudice, by Jane Austen, Edited and ends with CHAPTER I.. The footer starts with Transcriber's note: and ends with subscribe to our email newsletter to hear about new eBooks..

We can then either manually remove the section headers, or do it programmatically.

def remove_section_headers(lines: list[str]):
  section = False
  new_lines = []
  for line in lines:
    if line.lower().startswith(("end of the second", "end of vol")):
      section = True
    elif line.lower().startswith("chapter") and section:
      section = False

    if not section:
      new_lines.append(line)
  return new_lines

Removing the chapters.

import re

def remove_chapters(data: str):
  return str(re.sub('(CHAPTER .+)', '', data))

Remove contractions.

def remove_contractions(data: str) -> str:
  return (data.
    replace("shan't", "shall not").
    replace("here's", "here is").
    replace("you'll", "you will").
    replace("what's", "what is").
    replace("don't", "do not").
    replace("i'm", "i am").
    replace("there's", "there is")
  )

Remove punctuation.

import string

def remove_punctuation(data: str) -> str:
  return data.translate(str.maketrans(string.punctuation, ' ' * len(string.punctuation)))

Convert to numbers using num2words. This will mean we have to install it.

uv add num2words

We can then write our convert numbers function.

from num2words import num2words

def convert_numbers(data: str) -> str:
  numberless_data = []
  for word in data.split():
    if str.isdigit(word):
      numberless_data.append(num2words(word))
    else:
      numberless_data.append(word)
  return " ".join(numberless_data)

Putting it all together we can get our cleaned data.

# Open text data and read it into array
file = open("data.txt", "r")
lines = []

for line in file:
  lines.append(line)

data = remove_section_headers(lines)
data = remove_chapters(" ".join(data))
data = data.lower()
data = remove_contractions(data)
data = remove_punctuation(data)
data = convert_numbers(data)

# Save the cleaned data in a new file
with open('clean_data.txt', 'w') as f:
  f.write(data)

Before we are done, we will want to tokenize the data so that it can be processed by the model. After it's fit to the text, we will save it so we can use it later. To tokenize the data, we will use keras' pre-processing module. For this we require the keras module.

uv add keras

We can then create and fit the tokenizer to the text. We will initialize the Out of Vocabulary (OOV) token as <oov>.

import pickle
from keras.preprocessing.text import Tokenizer

# Tokenize the data and fit it to the text
tokenizer = Tokenizer(oov_token='<oov>')
tokenizer.fit_on_texts(data.split())

# Save tokenizer
with open('tokenizer.pickle', 'wb') as handle:
  pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

Training the model

To train the model, we will be using a Bi-Directional Long-Short Term Memory Recurrent Neural Network or Bi-LSTM for short. This type of recurrent neural network is ideal for this problem as it is able to store state for both long term and short term memory. This enables the neural network to be able to store the context of the previous words in the sentence.

Start by loading the tokenizer from the pre-processing stage.

import pickle

with open('tokenizer.pickle', 'rb') as handle:
  tokenizer = pickle.load(handle)

We can then create all the input sequences to train our model. This works by getting every 6 word combination in the text. First add numpy as a dependency.

uv add numpy

Then we'll write the function to create the input sequences from the data.

import numpy as np
from keras.utils import pad_sequences

def create_input_sequences(data: list[str], n_gram_size=6):
  # Create n-gram input sequences based on an n-gram size of 6
  input_sequences = []
  token_list = tokenizer.texts_to_sequences([data])[0]

  # Sliding iteration which takes every 6 words in a row as an input sequence
  for i in range(1, len(token_list) - n_gram_size):
    n_gram_sequence = token_list[i:i+n_gram_size]
    input_sequences.append(n_gram_sequence)

  # Pad sequences
  max_sequence_len = max([len(x) for x in input_sequences])
  return np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre')), max_sequence_len

We'll then split the input sequences into labels, training, and testing data.

from keras.utils import to_categorical, pad_sequences
from sklearn.model_selection import train_test_split

# Create the features and labels and split the data into training and testing
def create_training_data(input_sequences):
  # Create features and labels
  xs, labels = input_sequences[:,:-1], input_sequences[:,-1]
  ys = to_categorical(labels, num_classes=total_words)

  # Split data
  return train_test_split(xs, ys, test_size=0.1, shuffle=True)

The next part is fitting, compiling, and training the model. We will use the X training data and y training data, as well as the sizes of our data. We are using an ADAM optimizer, a reduce learning rate on plateau callback, and a save model on checkpoint callback.

# Create callbacks
checkpoint = ModelCheckpoint("model.h5", monitor='loss', verbose=1, save_best_only=True, mode='auto')

reduce = ReduceLROnPlateau(monitor='loss', factor=0.2, patience=3, min_lr=0.0001, verbose = 1)

# Create optimiser
optimizer = Adam(learning_rate=0.01)

Then we will add layers to the sequential model.

# Create model
model = Sequential()
model.add(Embedding(total_words, 100, input_length=max_sequence_len-1))
model.add(Bidirectional(LSTM(512)))
model.add(Dense(total_words, activation='softmax'))

model.summary()

Putting it all together and compiling the model using the training data.

from keras.models import Sequential
from keras.layers import LSTM, Dense, Embedding, Bidirectional
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau

# Train the model
def train_model(X_train, y_train, total_words, max_sequence_len):
  # Create callbacks
  checkpoint = ModelCheckpoint("model.h5", monitor='loss', verbose=1, save_best_only=True, mode='auto')

  reduce = ReduceLROnPlateau(monitor='loss', factor=0.2, patience=3, min_lr=0.0001, verbose = 1)

  # Create optimiser
  optimizer = Adam(learning_rate=0.01)

  # Create model
  model = Sequential()
  model.add(Embedding(total_words, 100, input_length=max_sequence_len-1))
  model.add(Bidirectional(LSTM(512)))
  model.add(Dense(total_words, activation='softmax'))

  model.summary()

  # Compile model
  model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
  model.fit(
    X_train, y_train, epochs=20, batch_size=2000,
    callbacks=[
      checkpoint,
      reduce,
    ]
  )

With all the services defined, we can train our model with the cleaned data.

data = open('clean_data.txt', 'r').read().split(' ')
total_words = len(tokenizer.word_index) + 1

input_sequences, max_sequence_len = create_input_sequences(data)
X_train, X_test, y_train, y_test = create_training_data(input_sequences)

train_model(X_train, y_train, total_words, max_sequence_len)

The model checkpoint save callback will save the model as model.h5. We will then be able to load the model when we create our API.

Predicting text

Starting with the api.py file, we will first load the model and tokenizer. This is done with dynamic imports so that it will reduce the cold start time when its deployed.

import pickle
import importlib

model = None
tokenizer = None

def load_tokenizer():
  global tokenizer
  if tokenizer is None:
    # Load the tokenizer
    with open('prediction/tokenizer.pickle', 'rb') as handle:
      tokenizer = pickle.load(handle)
  return tokenizer

def load_model():
  global model
  if model is None:
    models = importlib.import_module("keras.models")
    # Load the model
    model = models.load_model('prediction/model.h5')
  return model

Once the model is loaded, we can write a function to predict the next 3 most likely words. This uses the tokenizer to create the same token list that was used to train the model. We can then get a prediction of all the most likely words, which we will reduce down to 3. We'll then get the actual word from the map of tokens by finding the word in the dictionary. The tokenizer word index is in the from { "word": token_num }, e.g. { "the": 1, "and": 2 }. The predictions we receive will be an array of the token numbers.

# Predict text based on a set of seed text
# Returns a list of 3 top choices for the next word
def predict_text(seed_text: str) -> list[str]:
  # Dynamically load the utils
  utils = importlib.import_module("keras.utils")

  # Convert the seed text into a token list using the same process as the previous tokenization
  token_list = load_tokenizer().texts_to_sequences([seed_text])[0]
  token_list = utils.pad_sequences([token_list], maxlen=5, padding='pre')

  # Make the prediction
  m  = load_model()

  predict_x = m.predict(token_list, batch_size=500, verbose=0)

  # Find the top three words
  predict_x = np.argpartition(predict_x, -3, axis=1)[0][-3:]

  # Reverse the list so the most popular is first
  predictions = list(predict_x)
  predictions.reverse()

  # Iterate over the predicted words, and find the word in the tokenizer dictionary that matches
  output_words = []
  for prediction in predictions:
    for word, index in tokenizer.word_index.items():
      if prediction == index:
        output_words.append(word)
        break

  return output_words

Creating the API

Using the predictive text function, we can create our API. First we will make sure that the necessary modules are imported.

from nitric.resources import api
from nitric.application import Nitric
from nitric.context import HttpContext

We will then define the API and our first route.

mainApi = api("main")

@mainApi.get("/prediction")
async def create_prediction(ctx: HttpContext) -> None:
  pass

Nitric.run()

Within this function block we want to define the code that is run on a request. We will accept the prompt to predict from via the query parameters. This will mean requests are in the form: /predictions?prompt=where should I.

@mainApi.get("/prediction")
async def create_prediction(ctx: HttpContext) -> None:
  prompt = ctx.req.query.get("prompt")

  if prompt is None:
    return

  prompt = " ".join(prompt)

Nitric.run()

With the users prompt we can then do the prediction and return to the user the prediction.

@mainApi.get("/prediction")
async def create_prediction(ctx: HttpContext) -> None:
  ...

  prompt = " ".join(prompt)

  prediction = predict_text(prompt)

  ctx.res.body = f"{prompt} {prediction}"

Nitric.run()

Thats all there is to it. To test the function locally, we will start the nitric server.

nitric start

You can then make a request to the API using any HTTP client.

curl "http://localhost:4001/prediction?prompt=what%20should%20I"

What should I ['have', 'think', 'say']

Deploy to the cloud

At this point, you can deploy what you've built to any of the supported cloud providers. In this example we'll deploy to AWS. Start by setting up your credentials and configuration for the nitric/aws provider.

Next, we'll need to create a stack file (deployment target). A stack is a deployed instance of an application. You might want separate stacks for each environment, such as stacks for dev, test, and prod. For now, let's start by creating a file for the dev stack.

The stack new command below will create a stack named dev that uses the aws provider.

nitric stack new dev aws

This project will run perfectly fine with a default memory configuration of 512 MB. However, to get instant predictions we will amend the memory to be 1 GB. Edit the stack file nitric.dev.yaml and set your preferred AWS region and memory configuration.

provider: nitric/awsp@latest
region: us-east-1
config:
  lambda:
    memory: 1024

You are responsible for staying within the limits of the free tier or any costs associated with deployment.

Let's try deploying the stack with the up command:

nitric up

When the deployment is complete, go to the relevant cloud console and you'll be able to see and interact with your application.

To tear down your application from the cloud, use the down command:

nitric down