Wildnerve-tlm01 Hybrid Model
This is a chat interface for the Wildnerve-tlm01 language model, a transformer-based model enhanced with Spike-Timing-Dependent Plasticity (STDP) for improved learning capabilities.
TinyLanguageModel (TLM) System
Table of Contents
- Overview
1.1 Chatbot Interface (app.py) - Architecture
- Getting Started
- Usage
- Additional Features
- Future Enhancements
- Contributing
- License
Overview
The TinyLanguageModel (TLM) system is a modular, scalable framework designed to handle multiple specialized language models concurrently. Each model is tailored to specific tasks, such as Sentiment Analysis or Natural Language Processing, and is trained on dedicated datasets. After each training epoch, models synchronize their weights to foster collaborative learning, enhancing overall performance.
Chatbot Interface (app.py)
The chatbot interface is a simple web interface that allows you to interact with the TinyLanguageModel (TLM) system. It is built using Gradio and allows you to generate text responses from the TLM system.
git clone <your_repository_url>
cd <repository_folder>
git submodule update --init --recursive
python app.py
Architecture
1. Configuration Management
Files Involved:
config.jsonconfig.py
Description:
config.json: Central repository for all configuration parameters, including hyperparameters, model specializations, dataset paths, and dataset lists.config.py: Parsesconfig.jsonand exposes the configurations as Python variables for accessibility across the system.
Workflow:
- Initialization:
config.pyloads and parsesconfig.json.- Assigns values to variables like
INPUT_SIZE,OUTPUT_SIZE,NUM_EPOCHS,LEARNING_RATE,SPECIALIZATIONS,DATASET_PATHS, andDATASETS.
- Usage:
- Other modules import
config.pyto ensure consistent and centralized configuration management.
- Other modules import
2. Tokenizer Initialization
File Involved:
tokenizer.pyTokenizerWrapper: Manages the tokenizer's lifecycle by either loading a pre-trained tokenizer or training a new one based on provided datasets.
Workflow:
Initialization:
- Checks for the existence of
tokenizer.json. - Loads the tokenizer if available; otherwise, initializes and trains a new
WordLeveltokenizer with specified special tokens using predefined training files.
- Checks for the existence of
Usage:
- Provides methods to
tokenizetext into token IDs anddecodetoken IDs back to text. - Exposes
get_vocab_sizeto retrieve the tokenizer's vocabulary size.
- Provides methods to
Integration:
- The tokenizer instance is available for import in other modules like
preprocess.py.
- The tokenizer instance is available for import in other modules like
3. Model Definition
File Involved:
model_List.py- 'model_Custm.py'
- 'model_PrTr.py'
- 'model_Combn.py'
Description:
'Model_List' : This module is in-charge of selecting the right base models based on the user prompts complexity and topics and choose the best out of the 3 base models - model_Custm.py, model_PrTr.py and model_Combn.py which will be used by the model_manager to create virtual neuron instances (VNI) from.
'model_Custm.' : This is the custom built model depicted with the specification : 'Wildnerve-tlm01-0.05Bx12' (Wildnerve's tiny language model version 01, 0.05 Billion parameters per VNI x 12 VNIs) which architecturally comprises of both a transformer based and a 'Spiking Neural Network' (SNN) feature called 'STDP' (Spike-Timing Dependent Plasticity) powering the 'communicator' modules (more details later). This custom model is trained on technical datasets on computer programming.
'model_PrTr.' : This is actually pretrained model - 'Bert-Base-Uncased' which will act as a fallback model in event of the custom model has issues, or IF the user prompted non-technical questions that has remotedly anything to do with computer programming or general topics questions that is not related to software and computer programming.
'model_Combn.' : Meanwhile last but not least, is the amalgamation of Wildnerve-tlm01-0.0Bx12 architecture and the pretrained model to handle highly complex and multilayered topics type of prompts that has both very specific, technical and general topics mixed into the user prompt.
Description:
PositionalEncoding: The custom model (Wildnerve-tlm01-0.05Bx12) has implemented sinusoidal positional encodings to inject positional information into token embeddings, crucial for transformer models.TinyLanguageModel: Defines a Transformer-based language model comprising embedding layers, positional encodings, transformer encoder layers, and a classification head.
Workflow:
- Positional Encoding:
- Adds positional information to token embeddings to help the model understand token positions within sequences.
- TinyLanguageModel:
- Embedding Layer: Transforms token IDs into dense vectors.
- Positional Encoding: Adds positional information to embeddings.
- Transformer Encoder: Processes embeddings through multiple transformer layers.
- Aggregation: Averages the transformer outputs across the sequence length.
- Output Layer: Maps aggregated embeddings to output classes (for classification tasks).
- Specialization:
- Each TLM instance has a
specializationattribute defining its focus area (e.g., Sentiment Analysis).
- Each TLM instance has a
4. Data Loading and Preprocessing
Files Involved:
models/dataloader.pypreprocess.pymodel_manager.py
Description:
- Data Loaders: Utilize PyTorch’s
DataLoaderto prepare training and validation datasets, handling batching and shuffling. - Preprocessing:
- Tokenizes text data.
- Pads or truncates sequences to a fixed
max_length. - Creates attention masks for transformer models.
Workflow:
- Custom Dataset Classes:
TextClassificationDataset: Handles text and label retrieval from dataframes.CustomDataset: Handles feature and label tensors.
- Preprocessing Functions (
preprocess.py):preprocess_text: Tokenizes and pads/truncates a single text instance.preprocess_batch: Appliespreprocess_textto a batch of texts.
- Data Loader Preparation (
models/dataloader.py):- Defines
prepare_data_loadersto create training and validationDataLoaderinstances for each model based on its specialization.
- Defines
- Integration with Model Manager (
model_manager.py):create_models_and_loaders: Creates multiple TLM instances and their corresponding data loaders.
5. Model Management
File Involved:
model_manager.py
Description:
- Manages the creation and organization of multiple VNI and their associated data loaders.
- Please note that because we have 3 base models as choices where only one will be chosen by model_List.py completely based on the user prompts topics and complexity, tendencies that the created VNI will have a mixture of either purely custom model or purely pre-trained model or a blend between two. This will poise the model to handle both specific and general topics.
Workflow:
Model Creation:
create_models: InstantiatesTinyLanguageModelinstances, assigning specializations cyclically based on theSPECIALIZATIONSlist.
Data Loader Association:
- For each model, prepares corresponding training and validation data loaders using
prepare_data_loaders.
- For each model, prepares corresponding training and validation data loaders using
Return:
- Returns a tuple containing a list of models and a list of their respective data loader tuples.
6. Main, Training and Evaluation Workflow
Files Involved:
- 'train_model.py`- contains the training scripts for the transformer component of this architecture
- 'train_stdp.py - contains training scripts for the SNN component of this architecture
evaluate.py- containing scripts to evaluate performance of the transformer part- 'main.py' - containing scripts enabling this module to be the model's entry point for inferencing
Description:
main.py: Entry point orchestrating the inferencing and evaluation of multiple models.evaluate.py: Contains theevaluatefunction to assess model performance on validation data.'train_model.py': This is the module that trains the transformer component separately.
Workflow:
Initialization (
main.py):- Calls
create_models_and_loadersto instantiate 5 TLMs and their data loaders. - Initializes
Communicatorwith the list of models. - Defines the loss function (
CrossEntropyLoss) and a common optimizer (Adam) for all models' parameters.
- Calls
Training Loop (
train_model.py):- Iterates over epochs (
NUM_EPOCHS). - For each epoch:
- Iterates over each model:
- Trains the model using the
trainfunction intrain_model.py. - Evaluates the model using the
evaluatefunction inevaluate.py.
- Trains the model using the
- After training all models for the current epoch, shares weights among them via the
Communicator.
- Iterates over each model:
- Iterates over epochs (
Training Function (
train_model.py):- Sets the model to training mode.
- Iterates over the training data loader.
- For each batch:
- Zeroes the optimizer's gradients.
- Performs forward pass through the model.
- Computes loss.
- Backpropagates the error.
- Updates model parameters using the optimizer.
- Logs loss at intervals.
- Logs average training loss after each epoch.
Evaluation Function (
evaluate.py):- Sets the model to evaluation mode.
- Iterates over the validation data loader without gradient computation.
- For each batch:
- Performs forward pass through the model.
- Computes loss and predictions.
- Accumulates loss and calculates accuracy.
- Logs average validation loss and accuracy.
7. Communicator for Weight Sharing
File Involved:
communicator.py- 'communicator_STDP.py'
Description:
- Handles synchronization of model weights among multiple TLM instances to enable collaborative learning.
- meanwhole the communicator_STDP.py is the sub-module of communicator.py that has the SNN's STDP feature to learn contextually based on spiking neural network, which is entasked to learn from the pattern of specialization vs the prompts incoming to allow it to accurately selects the best specialization in the right VNI to be chosen to inference a particular part of the prompt. This is a refined and more dynamic method from conventional Mixture of Experts' gating/routing mechanism which is more rigid.
- Eventually the parent communicator.py will be incharge of sharing learning patterns of each VNI to the rest of the other VNI (without eliminating each VNI's core specialization because each VNI weights is protected by a learning preservation mechanism called the 'adapter'), where the end result will be each VNI will also have the similar learning from all other VNI's specialization. This helps in achieving generalization somewhat similar to those gigantic language models that are trained for general purpose.
Workflow:
- Initialization:
- Receives a list of TLM instances.
- Weight Sharing (
share_weights):- Averages the parameters of all models.
- Updates each model with the averaged parameters to synchronize them.
- Output Exchange (
exchange_outputs):- Placeholder for more complex interactions (currently not utilized in the provided workflow).
8. Service Registry
Files Involved:
- 'service_registry_py'
Description:
- This is created as the mechanism to break any circular dependencies that can send the model into an endless recursive loop if not careful, as this architecture is designed with so many simultaneously moving parts, it requires a highly multi-threading and multi-parallel processing capabilities and to manage that we introduced a work around that involves queuing and central registry system where all interactions shares a single point of truths (like blockchai), simplifying the already complex operation of this hybrid. architecture.
9. Checkpointing and Saving Models
Files Involved:
- 'main.py'
train_model.py- 'train_stdp.py'
- 'Wildnerve-tlm01-0.05Bx12.bin ' 'stdp_model_epochs30.bin Description:
- Checkpointing: Saving model states at intervals to enable resuming training or for model versioning.
- Saving Models: At the end of training, each model’s state is saved to a file corresponding to its specialization.
Workflow:
- Saving Models (
main.py):- After all training epochs, saves each model's state dictionary to a file named after its specialization (e.g.,
Sentiment_Analysis.pth) usingtorch.save.
- After all training epochs, saves each model's state dictionary to a file named after its specialization (e.g.,
- Potential Enhancements:
- Implementing periodic checkpointing within each epoch to safeguard against interruptions and facilitate resuming training.
- Loading from checkpoints to resume training seamlessly.
Getting Started
Prerequisites
- Python 3.7+
- PyTorch
- Tokenizers (
pip install tokenizers) - Other dependencies in
requirements.txt
Installation
Clone the Repository Note: This repository is part of the EvolphTech Solutionsprivate projects. To clone the latest version of the model, ensure you have the necessary access permissions, then run:
git clone https://github.com/EvolphTech/Wildnerve-tlm01.git cd tlmInstall Dependencies
It's recommended to use a virtual environment.
pip install -r requirements.txtPrepare Datasets
Ensure all dataset CSV files are placed in the
data/directory as specified inconfig.json.
Configuration
Edit
config.jsonConfigure the following parameters as needed:
Hyperparameters
INPUT_SIZE: Size of the input layer.OUTPUT_SIZE: Number of output classes.NUM_EPOCHS: Number of training epochs.LEARNING_RATE: Learning rate for the optimizer.
Model Specializations
SPECIALIZATIONS: List of specializations for TLM instances.
Dataset Paths
DATASET_PATHS: Mapping from specialization to dataset paths.
Verify Configurations
Ensure all paths and parameters are correctly set to match your environment and dataset locations.
Usage
Run the following command to train the transformer portion of the model and create checkpoints: 'python trainer.py'
Run the following command to train the STDP portion of the model and create checkpoints: 'python -m STDP_Communicator.train_stdp'
Upon completion of training, the model will be saved in the checkpoints directory.
Do Validation after each training epoch to monitor the model's performance. Run the following command to validate the model:
'python -m Analyzers.validation'
Then repeat the training process for the next epoch.
'python trainer.py' 'python -m STDP_Communicator.train_stdp'
validation results are in the 'validation_results' directory. The current validation results as at 23 February 2025 shows that the model is performing poorly on the routing and weight sharing process. This is expected because the model has not been trained yet. So you have to train it for more epochs. Note: That this is a new model and the validation results are not a reflection of the model's performance.
Subsequent to the first training, you can start the training process by executing the main.py script:
To start the training process, execute the main.py script:
Requirements
pip install -r requirements.txt
Installation Instructions
To set up the environment with the necessary dependencies for the TinyLanguageModel (TLM) system, follow these steps:
Clone the Repository
Create a Virtual Environment (Optional but Recommended)
Install Dependencies
Ensure you have pip updated to the latest version:
Then install the required packages: Dependency Details
torch==2.0.1 PyTorch is an open-source machine learning library used for applications such as computer vision and natural language processing. It provides flexibility and speed when building deep learning models.
tokenizers==0.13.3 The tokenizers library offers fast and efficient tokenization tools, essential for preprocessing text data in natural language processing tasks. pandas==1.5.2 Pandas is a powerful data manipulation and analysis library for Python. It provides data structures like DataFrames, which are integral for handling and preprocessing datasets.
Additional Notes Python Version: Ensure you are using Python 3.7 or higher. CUDA Support: If you intend to utilize GPU acceleration, make sure you have the appropriate CUDA version installed that is compatible with the specified PyTorch version. Environment Management: Using virtual environments (like venv or conda) is recommended to manage dependencies and avoid conflicts with other projects.
Troubleshooting
Torch Installation Issues:
If you encounter issues installing PyTorch, refer to the official PyTorch installation guide to select the correct version compatible with your system and CUDA installation.
Tokenizers Training Data:
Ensure that all the training files specified in tokenizer.py exist in the data/ directory. Missing files can cause the tokenizer training to fail.
Dataset Paths:
Verify that all dataset paths specified in config.json are correct and that the files are present in the designated locations.
Contact
For any issues or contributions, please open an issue or submit a pull request on the GitHub repository.
License
Mozilla Public License 2.0 (MPL 2.0)
File Layout
Wildnerve_Chatbot / Spaces Repo will contain the following files:
- Chatbot front‑end & inference glue
- app.py
- api.py (or api_wp.py)
- main.py
- interface.py
- Core services & registry
- service_registry.py
- communicator.py
- communicator_STDP.py
- Model data & preprocessing
- preprocess.py
- inference.py
- dataset.py
- dataloader.py
- find_weights.py
- repository_status.py
- Training & tasks
- trainer.py / train_model.py (if used at inference)
- celery_app.py / tasks.py (if using Celery)
- Deployment & dependencies
- Dockerfile
- docker‑compose.yml
- entrypoint.sh
- requirements_spaces.txt (or requirements.txt)
- .dockerignore
- Downloads last month
- 14