Wildnerve-tlm01 Hybrid Model

This is a chat interface for the Wildnerve-tlm01 language model, a transformer-based model enhanced with Spike-Timing-Dependent Plasticity (STDP) for improved learning capabilities.

TinyLanguageModel (TLM) System

TLM Architecture

Table of Contents

  1. Overview
    1.1 Chatbot Interface (app.py)
  2. Architecture
  3. Getting Started
  4. Usage
  5. Additional Features
  6. Future Enhancements
  7. Contributing
  8. License

Overview

The TinyLanguageModel (TLM) system is a modular, scalable framework designed to handle multiple specialized language models concurrently. Each model is tailored to specific tasks, such as Sentiment Analysis or Natural Language Processing, and is trained on dedicated datasets. After each training epoch, models synchronize their weights to foster collaborative learning, enhancing overall performance.

Chatbot Interface (app.py)

The chatbot interface is a simple web interface that allows you to interact with the TinyLanguageModel (TLM) system. It is built using Gradio and allows you to generate text responses from the TLM system.

git clone <your_repository_url>
cd <repository_folder>
git submodule update --init --recursive
python app.py

Architecture

1. Configuration Management

Files Involved:

  • config.json
  • config.py

Description:

  • config.json: Central repository for all configuration parameters, including hyperparameters, model specializations, dataset paths, and dataset lists.
  • config.py: Parses config.json and exposes the configurations as Python variables for accessibility across the system.

Workflow:

  1. Initialization:
    • config.py loads and parses config.json.
    • Assigns values to variables like INPUT_SIZE, OUTPUT_SIZE, NUM_EPOCHS, LEARNING_RATE, SPECIALIZATIONS, DATASET_PATHS, and DATASETS.
  2. Usage:
    • Other modules import config.py to ensure consistent and centralized configuration management.

2. Tokenizer Initialization

File Involved:

  • tokenizer.py

  • TokenizerWrapper: Manages the tokenizer's lifecycle by either loading a pre-trained tokenizer or training a new one based on provided datasets.

Workflow:

  1. Initialization:

    • Checks for the existence of tokenizer.json.
    • Loads the tokenizer if available; otherwise, initializes and trains a new WordLevel tokenizer with specified special tokens using predefined training files.
  2. Usage:

    • Provides methods to tokenize text into token IDs and decode token IDs back to text.
    • Exposes get_vocab_size to retrieve the tokenizer's vocabulary size.
  3. Integration:

    • The tokenizer instance is available for import in other modules like preprocess.py.

3. Model Definition

File Involved:

  • model_List.py
  • 'model_Custm.py'
  • 'model_PrTr.py'
  • 'model_Combn.py'

Description:

  • 'Model_List' : This module is in-charge of selecting the right base models based on the user prompts complexity and topics and choose the best out of the 3 base models - model_Custm.py, model_PrTr.py and model_Combn.py which will be used by the model_manager to create virtual neuron instances (VNI) from.

  • 'model_Custm.' : This is the custom built model depicted with the specification : 'Wildnerve-tlm01-0.05Bx12' (Wildnerve's tiny language model version 01, 0.05 Billion parameters per VNI x 12 VNIs) which architecturally comprises of both a transformer based and a 'Spiking Neural Network' (SNN) feature called 'STDP' (Spike-Timing Dependent Plasticity) powering the 'communicator' modules (more details later). This custom model is trained on technical datasets on computer programming.

  • 'model_PrTr.' : This is actually pretrained model - 'Bert-Base-Uncased' which will act as a fallback model in event of the custom model has issues, or IF the user prompted non-technical questions that has remotedly anything to do with computer programming or general topics questions that is not related to software and computer programming.

  • 'model_Combn.' : Meanwhile last but not least, is the amalgamation of Wildnerve-tlm01-0.0Bx12 architecture and the pretrained model to handle highly complex and multilayered topics type of prompts that has both very specific, technical and general topics mixed into the user prompt.

Description:

  • PositionalEncoding: The custom model (Wildnerve-tlm01-0.05Bx12) has implemented sinusoidal positional encodings to inject positional information into token embeddings, crucial for transformer models.
  • TinyLanguageModel: Defines a Transformer-based language model comprising embedding layers, positional encodings, transformer encoder layers, and a classification head.

Workflow:

  1. Positional Encoding:
    • Adds positional information to token embeddings to help the model understand token positions within sequences.
  2. TinyLanguageModel:
    • Embedding Layer: Transforms token IDs into dense vectors.
    • Positional Encoding: Adds positional information to embeddings.
    • Transformer Encoder: Processes embeddings through multiple transformer layers.
    • Aggregation: Averages the transformer outputs across the sequence length.
    • Output Layer: Maps aggregated embeddings to output classes (for classification tasks).
  3. Specialization:
    • Each TLM instance has a specialization attribute defining its focus area (e.g., Sentiment Analysis).

4. Data Loading and Preprocessing

Files Involved:

  • models/dataloader.py
  • preprocess.py
  • model_manager.py

Description:

  • Data Loaders: Utilize PyTorch’s DataLoader to prepare training and validation datasets, handling batching and shuffling.
  • Preprocessing:
    • Tokenizes text data.
    • Pads or truncates sequences to a fixed max_length.
    • Creates attention masks for transformer models.

Workflow:

  1. Custom Dataset Classes:
    • TextClassificationDataset: Handles text and label retrieval from dataframes.
    • CustomDataset: Handles feature and label tensors.
  2. Preprocessing Functions (preprocess.py):
    • preprocess_text: Tokenizes and pads/truncates a single text instance.
    • preprocess_batch: Applies preprocess_text to a batch of texts.
  3. Data Loader Preparation (models/dataloader.py):
    • Defines prepare_data_loaders to create training and validation DataLoader instances for each model based on its specialization.
  4. Integration with Model Manager (model_manager.py):
    • create_models_and_loaders: Creates multiple TLM instances and their corresponding data loaders.

5. Model Management

File Involved:

  • model_manager.py

Description:

  • Manages the creation and organization of multiple VNI and their associated data loaders.
  • Please note that because we have 3 base models as choices where only one will be chosen by model_List.py completely based on the user prompts topics and complexity, tendencies that the created VNI will have a mixture of either purely custom model or purely pre-trained model or a blend between two. This will poise the model to handle both specific and general topics.

Workflow:

  1. Model Creation:

    • create_models: Instantiates TinyLanguageModel instances, assigning specializations cyclically based on the SPECIALIZATIONS list.
  2. Data Loader Association:

    • For each model, prepares corresponding training and validation data loaders using prepare_data_loaders.
  3. Return:

    • Returns a tuple containing a list of models and a list of their respective data loader tuples.

6. Main, Training and Evaluation Workflow

Files Involved:

  • 'train_model.py`- contains the training scripts for the transformer component of this architecture
  • 'train_stdp.py - contains training scripts for the SNN component of this architecture
  • evaluate.py - containing scripts to evaluate performance of the transformer part
  • 'main.py' - containing scripts enabling this module to be the model's entry point for inferencing

Description:

  • main.py: Entry point orchestrating the inferencing and evaluation of multiple models.

  • evaluate.py: Contains the evaluate function to assess model performance on validation data.

  • 'train_model.py': This is the module that trains the transformer component separately.

Workflow:

  1. Initialization (main.py):

    • Calls create_models_and_loaders to instantiate 5 TLMs and their data loaders.
    • Initializes Communicator with the list of models.
    • Defines the loss function (CrossEntropyLoss) and a common optimizer (Adam) for all models' parameters.
  2. Training Loop (train_model.py):

    • Iterates over epochs (NUM_EPOCHS).
    • For each epoch:
      • Iterates over each model:
        • Trains the model using the train function in train_model.py.
        • Evaluates the model using the evaluate function in evaluate.py.
      • After training all models for the current epoch, shares weights among them via the Communicator.
  3. Training Function (train_model.py):

    • Sets the model to training mode.
    • Iterates over the training data loader.
    • For each batch:
      • Zeroes the optimizer's gradients.
      • Performs forward pass through the model.
      • Computes loss.
      • Backpropagates the error.
      • Updates model parameters using the optimizer.
      • Logs loss at intervals.
    • Logs average training loss after each epoch.
  4. Evaluation Function (evaluate.py):

    • Sets the model to evaluation mode.
    • Iterates over the validation data loader without gradient computation.
    • For each batch:
      • Performs forward pass through the model.
      • Computes loss and predictions.
      • Accumulates loss and calculates accuracy.
    • Logs average validation loss and accuracy.

7. Communicator for Weight Sharing

File Involved:

  • communicator.py
  • 'communicator_STDP.py'

Description:

  • Handles synchronization of model weights among multiple TLM instances to enable collaborative learning.
  • meanwhole the communicator_STDP.py is the sub-module of communicator.py that has the SNN's STDP feature to learn contextually based on spiking neural network, which is entasked to learn from the pattern of specialization vs the prompts incoming to allow it to accurately selects the best specialization in the right VNI to be chosen to inference a particular part of the prompt. This is a refined and more dynamic method from conventional Mixture of Experts' gating/routing mechanism which is more rigid.
  • Eventually the parent communicator.py will be incharge of sharing learning patterns of each VNI to the rest of the other VNI (without eliminating each VNI's core specialization because each VNI weights is protected by a learning preservation mechanism called the 'adapter'), where the end result will be each VNI will also have the similar learning from all other VNI's specialization. This helps in achieving generalization somewhat similar to those gigantic language models that are trained for general purpose.

Workflow:

  1. Initialization:
    • Receives a list of TLM instances.
  2. Weight Sharing (share_weights):
    • Averages the parameters of all models.
    • Updates each model with the averaged parameters to synchronize them.
  3. Output Exchange (exchange_outputs):
    • Placeholder for more complex interactions (currently not utilized in the provided workflow).

8. Service Registry

Files Involved:

  • 'service_registry_py'

Description:

  • This is created as the mechanism to break any circular dependencies that can send the model into an endless recursive loop if not careful, as this architecture is designed with so many simultaneously moving parts, it requires a highly multi-threading and multi-parallel processing capabilities and to manage that we introduced a work around that involves queuing and central registry system where all interactions shares a single point of truths (like blockchai), simplifying the already complex operation of this hybrid. architecture.

9. Checkpointing and Saving Models

Files Involved:

  • 'main.py'
  • train_model.py
  • 'train_stdp.py'
  • 'Wildnerve-tlm01-0.05Bx12.bin ' 'stdp_model_epochs30.bin Description:
  • Checkpointing: Saving model states at intervals to enable resuming training or for model versioning.
  • Saving Models: At the end of training, each model’s state is saved to a file corresponding to its specialization.

Workflow:

  1. Saving Models (main.py):
    • After all training epochs, saves each model's state dictionary to a file named after its specialization (e.g., Sentiment_Analysis.pth) using torch.save.
  2. Potential Enhancements:
    • Implementing periodic checkpointing within each epoch to safeguard against interruptions and facilitate resuming training.
    • Loading from checkpoints to resume training seamlessly.

Getting Started

Prerequisites

  • Python 3.7+
  • PyTorch
  • Tokenizers (pip install tokenizers)
  • Other dependencies in requirements.txt

Installation

  1. Clone the Repository Note: This repository is part of the EvolphTech Solutionsprivate projects. To clone the latest version of the model, ensure you have the necessary access permissions, then run:

    git clone https://github.com/EvolphTech/Wildnerve-tlm01.git
    cd tlm
    
  2. Install Dependencies

    It's recommended to use a virtual environment.

    pip install -r requirements.txt
    
  3. Prepare Datasets

    Ensure all dataset CSV files are placed in the data/ directory as specified in config.json.

Configuration

  1. Edit config.json

    Configure the following parameters as needed:

    • Hyperparameters

      • INPUT_SIZE: Size of the input layer.
      • OUTPUT_SIZE: Number of output classes.
      • NUM_EPOCHS: Number of training epochs.
      • LEARNING_RATE: Learning rate for the optimizer.
    • Model Specializations

      • SPECIALIZATIONS: List of specializations for TLM instances.
    • Dataset Paths

      • DATASET_PATHS: Mapping from specialization to dataset paths.
  2. Verify Configurations

    Ensure all paths and parameters are correctly set to match your environment and dataset locations.


Usage

Run the following command to train the transformer portion of the model and create checkpoints: 'python trainer.py'

Run the following command to train the STDP portion of the model and create checkpoints: 'python -m STDP_Communicator.train_stdp'

Upon completion of training, the model will be saved in the checkpoints directory.

Do Validation after each training epoch to monitor the model's performance. Run the following command to validate the model:

'python -m Analyzers.validation'

Then repeat the training process for the next epoch.

'python trainer.py' 'python -m STDP_Communicator.train_stdp'

validation results are in the 'validation_results' directory. The current validation results as at 23 February 2025 shows that the model is performing poorly on the routing and weight sharing process. This is expected because the model has not been trained yet. So you have to train it for more epochs. Note: That this is a new model and the validation results are not a reflection of the model's performance.

Subsequent to the first training, you can start the training process by executing the main.py script:

To start the training process, execute the main.py script:

Requirements

pip install -r requirements.txt

Installation Instructions

To set up the environment with the necessary dependencies for the TinyLanguageModel (TLM) system, follow these steps:

  1. Clone the Repository

  2. Create a Virtual Environment (Optional but Recommended)

  3. Install Dependencies

  4. Ensure you have pip updated to the latest version:

  5. Then install the required packages: Dependency Details

torch==2.0.1 PyTorch is an open-source machine learning library used for applications such as computer vision and natural language processing. It provides flexibility and speed when building deep learning models.

tokenizers==0.13.3 The tokenizers library offers fast and efficient tokenization tools, essential for preprocessing text data in natural language processing tasks. pandas==1.5.2 Pandas is a powerful data manipulation and analysis library for Python. It provides data structures like DataFrames, which are integral for handling and preprocessing datasets.

Additional Notes Python Version: Ensure you are using Python 3.7 or higher. CUDA Support: If you intend to utilize GPU acceleration, make sure you have the appropriate CUDA version installed that is compatible with the specified PyTorch version. Environment Management: Using virtual environments (like venv or conda) is recommended to manage dependencies and avoid conflicts with other projects.


Troubleshooting

Torch Installation Issues:

If you encounter issues installing PyTorch, refer to the official PyTorch installation guide to select the correct version compatible with your system and CUDA installation.

Tokenizers Training Data:

Ensure that all the training files specified in tokenizer.py exist in the data/ directory. Missing files can cause the tokenizer training to fail.

Dataset Paths:

Verify that all dataset paths specified in config.json are correct and that the files are present in the designated locations.

Contact

For any issues or contributions, please open an issue or submit a pull request on the GitHub repository.


License

Mozilla Public License 2.0 (MPL 2.0)

File Layout

Wildnerve_Chatbot / Spaces Repo will contain the following files:

  • Chatbot front‑end & inference glue
    • app.py
    • api.py (or api_wp.py)
    • main.py
    • interface.py
  • Core services & registry
    • service_registry.py
    • communicator.py
    • communicator_STDP.py
  • Model data & preprocessing
    • preprocess.py
    • inference.py
    • dataset.py
    • dataloader.py
    • find_weights.py
    • repository_status.py
  • Training & tasks
    • trainer.py / train_model.py (if used at inference)
    • celery_app.py / tasks.py (if using Celery)
  • Deployment & dependencies
    • Dockerfile
    • docker‑compose.yml
    • entrypoint.sh
    • requirements_spaces.txt (or requirements.txt)
    • .dockerignore
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support