The Best Generative GPT-2 Model For The Serbian Language
NOTE: This model is locked with a key, if you need decryption keys, feel free to contact us at [email protected]
By sharing this model, we aim to foster further research and applications in Serbian language processing.
Introduction:
This GPT-2 model has been tuned on an extensive Serbian corpus, boasting a richness of 750 million tokens. It is designed to generate high-quality text in Serbian, capturing the nuances and intricacies of the language.
Dataset Details:
The dataset encompasses a diverse range of topics, representing various aspects of the Serbian language and culture. Size: 750 million tokens.
Model Usage:
This model can be utilized for various NLP tasks such as text generation, summarization, translation, and more. Due to its comprehensive training on a vast corpus, it promises accurate and contextually relevant outputs, especially for tasks related to the Serbian language.
Download & Decryption the Model:
import os
import requests
import shutil
import threading
import time
from transformers import GPT2LMHeadModel
from cryptography.fernet import Fernet
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
# Download Serbian-GPT-2 model
print("\nDownload Serbian-GPT-2 model...")
model_name = 'edukom/Serbian-GPT-2'
base_url = f'https://huggingface.co/{model_name}/resolve/main/'
files_to_download = ['added_tokens.json', 'config.json', 'generation_config.json', 'merges.txt', 'pytorch_model.bin', 'special_tokens_map.json', 'tokenizer.json', 'tokenizer_config.json', 'vocab.json']
cache_dir = 'path/to/where/you/want/to/store/the/model'
for file in files_to_download:
response = requests.get(base_url + file)
with open(os.path.join(cache_dir, file), 'wb') as f:
f.write(response.content)
# Decryption pytorch_model.bin
key = input("\nEnter the decryption key: ").encode()
cipher_suite = Fernet(key)
decryption_data = os.path.join(cache_dir, 'pytorch_model.bin')
try:
with open(decryption_data, 'rb') as file:
encrypted_data = file.read()
decrypted_data = cipher_suite.decrypt(encrypted_data)
with open(decryption_data, 'wb') as file:
file.write(decrypted_data)
def find_and_copy():
base_snapshot_dir = os.path.join(cache_dir, 'models--edukom--Serbian-GPT-2', 'snapshots')
while not os.path.exists(base_snapshot_dir):
time.sleep(0.1)
while True:
existing_dirs = [d for d in os.listdir(base_snapshot_dir) if os.path.isdir(os.path.join(base_snapshot_dir, d))]
if existing_dirs:
destination_path = os.path.join(base_snapshot_dir, existing_dirs[0], 'pytorch_model.bin')
shutil.copyfile(decryption_data, destination_path)
break
time.sleep(0.1)
# Start the copy process in parallel
copy_thread = threading.Thread(target=find_and_copy, name="find_and_copy")
copy_thread.start()
# Loading Serbian-GPT-2 model
model = GPT2LMHeadModel.from_pretrained(model_name, cache_dir=cache_dir)
# Ensure the copying finishes
copy_thread.join()
print("\nCongratulations, the Serbian-GPT-2 model is ready for use ヅ\n")
except Exception as e:
print(f"\nError during decryption: {e}")
print("\nYou can decrypt the model by contacting the author of this model who will add the key, email: [email protected]")
# Now you can use the Serbian-GPT-2 model for further operations...
Model Usage License:
The author of this model is the company Edukom AI. The model is protected by encryption and its use requires a decryption key.
This model is available under the following license:
For private and non-public use: This model is freely available for use without any additional obligations. You can use it in your internal projects and experiments without any restrictions.
For commercial use: For commercial use of this model, users are required to contact Edukom AI company to obtain the appropriate license and agreement.
Please adhere to the license terms when using this model. For any questions or if you need decryption keys, feel free to contact us at [email protected]
Thank you for using our model! ヅ
- Downloads last month
- 663