Llama2.jl

What is Llama2?

LLama2 is a family of pre-trained LLMs by Meta AI. More information can be found at: https://www.llama.com/

Llama2.jl can inference a given model from within julia. For this cause you will have to provide your own model checkpoint. This project follows the procedure outlined by the run.c file from llama2.c.

Getting started

Clone the repository to a desired location:

cd /PATH/TO/DESIRED/LOCATION/
git clone git@github.com:ConstantConstantin/Llama2.git

Start julia, activate a desired environment and add the package; it can then be loaded in your session:

(@v1.11) pkg> activate .
  Activating new project at `PATH/TO/MY/ENVIRONMENT/myLlama2`

(myLlama2) pkg> add /PATH/TO/DESIRED/LOCATION/Llama2/
     Cloning git-repo `/PATH/TO/DESIRED/LOCATION/Llama2`
    Updating git-repo `/PATH/TO/DESIRED/LOCATION/Llama2`
    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
    Updating `PATH/TO/MY/ENVIRONMENT/Project.toml`
  [0e620e9f] + Llama2 v1.0.0-DEV `/PATH/TO/DESIRED/LOCATION/Llama2#aj/docs`
    Updating `PATH/TO/MY/ENVIRONMENT/Manifest.toml`
  [0e620e9f] + Llama2 v1.0.0-DEV `/PATH/TO/DESIRED/LOCATION/Llama2#aj/docs`
Precompiling project...
  1 dependency successfully precompiled in 1 seconds

julia> using Llama2

Llama2.Config
Llama2.RunState
Llama2.TokenIndex
Llama2.Tokenizer
Llama2.Transformer
Llama2.TransformerWeights
Llama2.compare_tokens
Llama2.encode
Llama2.str_lookup

Llama2.Config — Type

Config

Create a Config containing 7 Int32. These describe meta-data to read values from an input file.

Developer Notes

This is an internal struct.

source

Llama2.RunState — Type

RunState

Create a RunState containing several Float32 containers. These reflect the state of the Transformer at run-time.

Developer Notes

This is an internal struct.

source

Llama2.TokenIndex — Type

TokenIndex(str::String, id::Integer)

Create a TokenIndex from a string and an integer identifier.

The byte sequence is converted to String and the ID is converted to Int16. Throw a DomainError if id ≤ 0.

Examples

julia> using Llama2;

julia> TokenIndex("Julia", 1)
TokenIndex("Julia", 1)

julia> TokenIndex("Julia", -1)
ERROR: DomainError with Token index must be > 0.
[...]

Developer Notes

This is an internal struct.

source

Llama2.Tokenizer — Type

Tokenizer

Construct a tokenizer storing vocabulary entries, scores, and byte-piece mappings.

Constructors

Tokenizer(vocab, vocab_scores, sorted_vocab, vocab_size, max_token_length, byte_pieces) Construct a tokenizer directly from the provided fields. Validate that max_token_length > 0 and that byte_pieces has length 256.
Tokenizer(path::String, vocab_size::Integer) Load a tokenizer from a binary file.

Fields

vocab: Token string sequences.
vocab_scores: Scores for each token.
sorted_vocab: Sorted token indices.
vocab_size: Number of vocabulary entries.
max_token_length: Maximum token length in bytes.
byte_pieces: Byte mapping (length 256).

source

Llama2.Transformer — Method

Transformer(path::String)

Load a binary file with location path and construct a Transformer from its content. The file is expected to have a header of 7 Int32 values followed by Float32 data.

Example

julia> t = Llama2.Transformer("/PATH/TO/YOUR.bin");

source

Llama2.TransformerWeights — Type

TransformerWeights

Create a TransformerWeights containing several Float32 containers. These describe actual weight data that is loaded from an input file.

Developer Notes

This is an internal struct.

source

Llama2.compare_tokens — Method

compare_tokens(first_token::TokenIndex, second_token::TokenIndex) -> Bool

Compare two TokenIndex objects by their string values. It returns true if the first token's string is lexicographically less than the second's, and false otherwise.

Examples

julia> using Llama2;

julia> compare_tokens(TokenIndex("A", 1), TokenIndex("B", 2))
true

julia> compare_tokens(TokenIndex("B", 1), TokenIndex("A", 2))
false

source

Llama2.encode — Method

encode

Converts a string text into a sequence of token IDs using a Tokenizer. First ensure the tokenizer's vocabulary is sorted, then encode each character into its corresponding ID. After that, iteratively merge token pairs with the highest scores to form longer tokens until no more merges are possible. Return the final token ID sequence.

source

Llama2.str_lookup — Method

str_lookup(str::String, sorted_vocab::Vector{TokenIndex}) -> Int16

Search for a given string str within a sorted vocabulary sorted_vocab of TokenIndex objects. If the string is found, it returns the corresponding token ID; otherwise, it returns -1. It uses a binary search for efficient lookup.

Examples

julia> using Llama2;

julia> str_lookup("aa", [TokenIndex("aa", 1), TokenIndex("bb", 2)])
1

julia> str_lookup("ba", [TokenIndex("aa", 1), TokenIndex("bb", 2)])
-1

source

Llama2.jl

What is Llama2?

What is Llama2.jl?

Getting started