Infinity reasoning with this prompt

#25
by celsowm - opened

Você é um Advogado especializado em Direito Civil e sua tarefa é redigir uma uma petição inicial para uma ação de cobrança, utilizando apenas as informações factuais fornecidas a seguir. Apoie-se em seus conhecimentos jurídicos, aplicando fundamentos técnicos e normas pertinentes ao caso, e apresente a minuta com linguagem formal e estruturada, com os capítulos dos fatos e do direito redigidos em texto corrido.
Informações do Caso:

Autor: Carlos Almeida, brasileiro, engenheiro, CPF 123.456.789-01, residente na Rua das Palmeiras, nº 123, Salvador/BA.
Ré: Construtora Beta Ltda., CNPJ 98.765.432/0001-09, com sede na Av. das Torres, nº 456, Salvador/BA.
O autor é um prestador de serviços que realizou um contrato com a ré em 01/09/2023 para a execução de serviços de consultoria técnica no valor total de R$ 50.000,00.O serviço foi devidamente executado e finalizado em 15/09/2023, conforme o relatório técnico emitido.
A ré deveria ter efetuado o pagamento até 15/10/2023, conforme o contrato firmado entre as partes. Apesar de várias notificações extrajudiciais enviadas entre 01/11/2023 e 15/11/2023, a ré permaneceu inadimplente, não apresentando justificativas para o não pagamento.
Pedidos:
Cobrança do valor de R$ 50.000,00, acrescido de:
Juros de mora de 1% ao mês desde o vencimento.
Multa contratual de 2% e correção monetária conforme índice oficial.
Condenação da ré ao pagamento das custas processuais e honorários advocatícios de 10% do valor da causa.
Foro Competente: Comarca de Salvador/BA, Vara Cível.

I think the community may need a little clearer description of what you are asking.

If the issue is that it is stuck in the reasoning forever and never replies with an answer: There are a lot of possibilities... One first glance possibility may be that it is going into a loop of repetition.

A repetition_penalty of 1.1 or more being applied may help if repetition is the cause ... if this model and or the inference provider you may be using supports it.

For context: One easily correctable behavior that small language models sometimes exhibit is diverging from the writing task into a cycle of endlessly writing the same phrase, sentence, or paragraph repetitively, after writing several sensible paragraphs that are consistent with the prompt:

Example: Ask a small LM: 'If you as an LLM exhibited this behavior: "Infinity reasoning with this prompt", How would an engineer fix this problem?':

Get the response:



The phrase "Infinity reasoning with this prompt" suggests that an LLM might be caught in a loop or recursive process that doesn’t terminate, effectively reasoning forever. While this is a contrived scenario, here are several strategies that a system might use to “fix” or prevent such behavior:

    Recursion or Iteration Limits
    Implementing a maximum depth or a predefined iteration limit can help the system recognize when it is cycling through the same reasoning patterns. Once the limit is reached, the system can either return a partial answer or provide a safe default response.

    Early Termination Conditions
    Design the system so it can detect repeating reasoning patterns or unproductive cycles. This might include checks for repetition of internal thought process tokens or recognizing that the solution isn’t progressing. Upon these detection, the system can trigger a break-out mechanism.

Stop Tokens & Timeouts may also help with this problem.
Stop Tokens & Timeouts may also help with this problem.
Stop Tokens & Timeouts may also help with this problem.
Stop Tokens & Timeouts may also help with this problem.
Stop Tokens & Timeouts may also help with this problem.
Stop Tokens & Timeouts may also help with this problem.
Stop Tokens & Timeouts may also help with this problem.
....

This describes behavior may be happening in the ... section, but with this little information to go by, there are many other possibilities

I would also consider the setting you have for max_new_tokens and the hardware you are on as well. Could it just be under-resourced and running slow?

I got this loop here on local chat on model page

Try this in Google colab (T4 High RAM kernel) or any suitable environment with around a 65 TFLOPS / 16 GB GPU RAM and 32GB of regular system RAM.

modified from: colab.research.google.com/#fileId=https%3A//huggingface.co/HuggingFaceTB/SmolLM3-3B/blob/main/notebook.ipynb

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "HuggingFaceTB/SmolLM3-3B"
device = "cuda" # for GPU usage or "cpu" for CPU usage

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
).to(device)

# prepare the model input
PROMPT = """

Replace this with the prompt you are asking the model.
...
...
"""


messages_think = [
    {"role": "user", "content": PROMPT}
]

text = tokenizer.apply_chat_template(
    messages_think,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate the output

# <---------------------------------------------------------------------<<
# Pay attention here
generated_ids = model.generate(**model_inputs, 
                               max_new_tokens=32768, # <-------<< Reducing this may help, but try leaving this as, unless the other fails
                               do_sample=True, # <--------<< The other parameters will not work if this option is not set to True
                               temperature=0.6,  
                               top_k=40,
                               top_p=0.95,
                               repetition_penalty=1.1  #  <--------<< Setting this to ~ 1.1 or maybe 1.2 ... **may** prevent the problem
)

# Get and decode the output
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

I'm not 100% sure the parameter repetition_penalty is supported for this model,

  • I do I know it is supported for HuggingFaceTB/SmolLM2-1.7B-Instruct, so it probably will work ...
  • I can't test it myself at the moment, because I don't have access to anything with enough RAM to this 3B model nor inference credits, ...

Sign up or log in to comment