Inference
Prequisites
A model checkpoint is required. You can use the one provided in data or get other Llama2 models online:
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.binInferencing
You can either generate a single text, optionally giving an input prompt, or have an interactive chat. For an interactive Chat the maximum tokens that can be generated is limited by the sequence length of the Model choosen, for ours its 256.
As default there is no temperature set which will then lead to the usage of wsample. If a temperature is given sample will be used. For more information read the the sample.jldocumentation
Llama2.talktollm — Function
talktollm(modelpath::String, [prompt::String]; max_tokens::Int, vocabpath::String, verbose::Bool) -> StringGenerate text using a pretrained LLama2 transformer model. Return that text as a String. Load the model from modelpath and the corresponding tokenizer from vocabpath (which defaults to "data/tokenizer.bin"). Take an initial prompt String to start the text generation and generate up to max_tokens tokens. If verbose, print the text during generation. Set temperature and topp to configure the Sampler.
Arguments
modelpath::String: Path to the binary model file (e.g., "stories15M.bin").prompt::String: Initial text to condition generation (default: empty, starts with BOS token).
Keyword Arguments
max_tokens::Int=255: Maximum number of tokens to generate.vocabpath::String: Path to tokenizer binary (default: "data/tokenizer.bin").verbose::Bool=false: Iftrue, print tokens as they are generated.temperature::Float32=0.0f0: Sampling temperature (0 = greedy, higher = more random).topp::Float32=1.1f0: Nucleus sampling threshold (≤0 or ≥1 disables nucleus sampling).
Returns
String: The generated text.
julia> print(talktollm("/PATH/TO/YOUR/MODEL.bin"))
Once upon a time, there was a little girl named Lily. She loved to play outside in the park with her friends. One day, Lily was running and she fell and hit her head on a rock. She got a big ouchie and it started to bleed.
Lily's mom took her to the doctor and the doctor said she needed a stitch. Lily was scared, but her mom was very dependable and told her they would be coming back home soon.
After the doctor fixed Lily's knee, they went home and Lily's friends came to play again. But Lily's mom noticed that she was playing with a ball and some new toys. This made her very happy.
julia> print(talktollm("/PATH/TO/YOUR/MODEL.bin", ""What is this?""))
"What is this?" the woman asked.
The little girl looked at the bookion and said, "This is a book about a princess. Maybe we can use it together."
They decided to sit down and read the book together. They read about a beautiful garden with lovely flowers. The little girl loved the book very much and said, "I want to be a princess again!"
"Maybe, if you read me another book," the woman said.
From that day on, they would sit down and read the book every night before bed. They hoped that when they finished reading it, something magical would happen.Llama2.ChatBot — Type
ChatBot(path::String; vocabpath::String)Create a ChatBot constructing a Transformer from path.
vocabpath defaults to "data/tokenizer.bin". The ChatBot struct is used with chatwithllm for continuous text generation.
Llama2.chatwithllm — Function
chatwithllm(bot::ChatBot, [prompt::String]; max_tokens::Int, verbose::Bool)Generate text using a pretrained LLama2 transformer model. Return that text as a String.
Multiple calls on the same instance of ChatBot respect the previously generated tokens and continue generation from there. Take an initial prompt String to start the text generation and generate up to max_tokens tokens. If verbose, print the text during generation.
julia> c = ChatBot("data/stories15M.bin");
julia> print(chatwithllm(c); max_tokens = 63)
Once upon a time, there was an old house with an ancient sign inside. The sign was very big and had many words on it. One day, a little girl went to visit the old house. She wanted to see what was inside.
The old house said, "Hello? Can I come in?"
julia> print(chatwithllm(c, "
The little girl said:"; max_tokens = 63))
The little girl said: "Yes please! Can I come in too?"
The old house thought for moments before it said, "Yes. This light is available for you 30 cent a nightmare."
The little girl was very excited. She said thank you and then, followed her favorite sign
julia> print(chatwithllm(c, "until"; max_tokens = 63))
until she saw there was a beautiful light online.
When the old house passed, the girl happily went inside. It was very old, but it had been there for a long time. The old house was very special, and she thought the light was the prettiest thing ever.