
Margaux Ammour
mammour
·
AI & ML interests
Instead of looking for something that is potentially harmful, better grasp what we can already make happen
Artificially Augmented Intelligence advocate
Recent Activity
liked
a model
7 days ago
deepcogito/cogito-v1-preview-qwen-32B
Organizations
mammour's activity
reacted to
sr-rai's
post with 🤗
8 days ago
Post
2593
ExLlamaV3 is out. And it introduces EXL3 - a new SOTA quantization format!
"The conversion process is designed to be simple and efficient and requires only an input model (in HF format) and a target bitrate. By computing Hessians on the fly and thanks to a fused Viterbi kernel, the quantizer can convert a model in a single step, taking a couple of minutes for smaller models, up to a few hours for larger ones (70B+) (on a single RTX 4090 or equivalent GPU.)"
Repo: https://github.com/turboderp-org/exllamav3
"The conversion process is designed to be simple and efficient and requires only an input model (in HF format) and a target bitrate. By computing Hessians on the fly and thanks to a fused Viterbi kernel, the quantizer can convert a model in a single step, taking a couple of minutes for smaller models, up to a few hours for larger ones (70B+) (on a single RTX 4090 or equivalent GPU.)"
Repo: https://github.com/turboderp-org/exllamav3
upvoted
a
paper
9 days ago
reacted to
etemiz's
post with 👍👍
28 days ago
Post
2827
My 1 year of work summarized.
TLDR: by carefully curating datasets we can fix misinformation in AI. Then we can use that to measure misinformation in other AI.
https://huggingface.co/blog/etemiz/building-a-beneficial-ai
TLDR: by carefully curating datasets we can fix misinformation in AI. Then we can use that to measure misinformation in other AI.
https://huggingface.co/blog/etemiz/building-a-beneficial-ai
commented on
syncIAL🍏: A Multi-Purpose Synthetic Debate and Argument Mapping Corpus
2 months ago
Following your exemple :
Your FoF-2 = FoF-1, as it stand there it biases the dataset by overponderating/oversaturating the same argument as two different ones.
https://argdown.org/syntax/#equivalence-classes
They should look like that :
<Focus on Fundamentals>: Restricting access to fan fiction and social media in schools allows students to prioritize core academic subjects and develop a solid foundation in STEM fields, literature, and critical thinking.
<Focus on Fundamentals>: By limiting access to non-academic online content, schools can redirect students' attention to foundational subjects, fostering a stronger understanding of complex concepts and better retention of critical information.
leading to the following :
[Learning Over Leisure]: Schools should restrict students' access to fan fiction and social media to protect the integrity of education.
<- <Restriction Infringes on Freedom of Expression>: Restricting access to fan fiction and social media unconstitutionally limits students' right to freedom of expression and stifles their creativity.
<+ <Lifelong Learning>: By exercising their freedom of expression, students develop essential skills in critical thinking, problem-solving, and effective communication, preparing them for success in their future careers and personal lives.
<- <Echo Chamber Effect>: Exercising freedom of expression in an unstructured environment can create an echo chamber where students only communicate with like-minded individuals, failing to develop the skills to engage with diverse perspectives and opposing views.
<- <Silent Observer>: Developing skills to engage with diverse perspectives and opposing views is not essential for effective communication in situations where listening and observing, rather than actively engaging, is the most effective strategy.
<- <Fan Fiction Distortion>: Fan fiction and social media often distort students' creativity by promoting unoriginal and copyrighted content, rather than fostering genuine artistic expression.
<- <Artistic Evolution>: The value of artistic expression lies in its ability to evoke emotions and spark new ideas, regardless of whether it is original or builds upon existing works, making the distinction between original and unoriginal content irrelevant.
<+ <Innovation Incubator>: Unrestricted freedom of expression enables students to develop critical thinking, problem-solving, and communication skills, essential for academic and professional success.
<+ <Focus on Fundamentals>: Restricting access to fan fiction and social media in schools allows students to prioritize core academic subjects and develop a solid foundation in STEM fields, literature, and critical thinking.
<+ <Focus on Fundamentals>: By limiting access to non-academic online content, schools can redirect students' attention to foundational subjects, fostering a stronger understanding of complex concepts and better retention of critical information.
<+ <Knowledge Pyramid>: A strong grasp of foundational subjects allows students to recognize relationships between different ideas and concepts, creating a hierarchical structure of knowledge that enhances retention and recall of critical information.
Problem solved, now we need to fix the dataset :
Pass all jsons trough :
#!/usr/bin/env python3
"""
Script to fix “almost duplicated” labels in a debate JSON.
It reads an input JSON file (with a “nodes” array where each node has a “label”),
finds labels that are very similar (according to a fuzzy–match threshold),
and then updates all such nodes to share a canonical label.
"""
import json
import sys
import logging
import argparse
from difflib import SequenceMatcher
from typing import List, Dict, Any
# Set up logging configuration
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
def similarity(a: str, b: str) -> float:
"""Return a similarity ratio between two strings (0 to 1)."""
return SequenceMatcher(None, a, b).ratio()
def cluster_labels(labels: List[str], threshold: float = 0.90) -> Dict[str, str]:
"""
Given a list of labels, return a dictionary mapping each label to a canonical label.
Two labels that are at least 'threshold' similar will be treated as duplicates.
(The first label encountered becomes the canonical version.)
"""
canonical: Dict[str, str] = {}
unique_labels = list(set(labels)) # unique labels in no particular order
unique_labels.sort() # sort for consistency
# Build clusters by iterating over the unique labels.
for i, label in enumerate(unique_labels):
if label in canonical:
continue
canonical[label] = label # label becomes its own canonical version
for other_label in unique_labels[i + 1:]:
if other_label in canonical:
continue
if similarity(label, other_label) >= threshold:
canonical[other_label] = label
return canonical
def fix_labels(data: Dict[str, Any], threshold: float = 0.90) -> Dict[str, Any]:
"""
Given a debate JSON object (with a "nodes" key), fix labels by unifying similar ones.
Returns the modified JSON object.
"""
if "nodes" not in data:
logging.error("No 'nodes' key found in JSON data.")
return data
nodes = data["nodes"]
if not isinstance(nodes, list):
logging.error("'nodes' should be a list.")
return data
# Extract all labels; if a node doesn't have a "label", default to an empty string.
labels = [node.get("label", "") for node in nodes if isinstance(node, dict)]
# Build mapping from each label to its canonical version.
mapping = cluster_labels(labels, threshold=threshold)
logging.info("Found %d unique labels; mapping to canonical labels:", len(mapping))
for key, canonical_label in mapping.items():
if key != canonical_label:
logging.info(" %r --> %r", key, canonical_label)
# Update each node's label using the mapping.
for node in nodes:
if isinstance(node, dict):
original_label = node.get("label", "")
if original_label in mapping:
node["label"] = mapping[original_label]
return data
def parse_args() -> argparse.Namespace:
"""Parse command-line arguments."""
parser = argparse.ArgumentParser(
description="Fix almost duplicated labels in a debate JSON file."
)
parser.add_argument("input_file", help="Path to the input JSON file.")
parser.add_argument("output_file", help="Path where the fixed JSON will be saved.")
parser.add_argument(
"--threshold", type=float, default=0.90,
help="Fuzzy matching threshold (default: 0.90)."
)
return parser.parse_args()
def main() -> None:
args = parse_args()
# Load JSON data from file with error handling.
try:
with open(args.input_file, "r", encoding="utf-8") as infile:
data = json.load(infile)
except FileNotFoundError:
logging.error("Input file '%s' not found.", args.input_file)
sys.exit(1)
except json.JSONDecodeError as e:
logging.error("Error decoding JSON from '%s': %s", args.input_file, e)
sys.exit(1)
except Exception as e:
logging.error("An unexpected error occurred while reading '%s': %s", args.input_file, e)
sys.exit(1)
# Fix labels in the data.
fixed_data = fix_labels(data, threshold=args.threshold)
# Write the fixed data to the output file with error handling.
try:
with open(args.output_file, "w", encoding="utf-8") as outfile:
json.dump(fixed_data, outfile, indent=2, ensure_ascii=False)
except Exception as e:
logging.error("An error occurred while writing to '%s': %s", args.output_file, e)
sys.exit(1)
logging.info("Fixed JSON written to '%s'", args.output_file)
if __name__ == "__main__":
main()
we get this stdo :
λ python fix_labels.py input.json output.json
INFO: Found 638 unique labels; mapping to canonical labels:
INFO: 'Algorithmic Bias Amplification' --> 'Algorithmic Amplification'
INFO: 'Biased Benchmarks' --> 'Biased Benchmark'
INFO: 'Crime Deterrent' --> 'Crime Deterrence'
INFO: 'Dataset Augmentation' --> 'Data Augmentation'
INFO: 'Data Deserts' --> 'Data Desert'
INFO: 'Diverse Datasets' --> 'Diverse Data Sets'
INFO: 'Surveillance Slippery Slope' --> 'Mass Surveillance Slippery Slope'
INFO: 'National Security Exemption' --> 'National Security Exception'
INFO: 'Protecting the Vulnerable:' --> 'Protecting the Vulnerable'
INFO: 'Redundant Safeguards' --> 'Redundancy Safeguard'
INFO: Fixed JSON written to 'output.json'
all you need to do is to adapt main and make a pass through. atm your dataset is bad practice.
Credits : me, argdown docs, AI for [code review] and [error handling].
Mistral Small 24 B
4
#19 opened 2 months ago
by
HandsomeMagyar
Why increase censorship?
21
#20 opened 2 months ago
by
notafraud

multi GPU inferencing
2
#18 opened 5 months ago
by
cjj2003
Error during inference
7
#1 opened 6 months ago
by
Jellon

reacted to
singhsidhukuldeep's
post with 👍
7 months ago
Post
4000
Researchers have developed a novel approach called Logic-of-Thought (LoT) that significantly enhances the logical reasoning capabilities of large language models (LLMs).
Here are the steps on how Logic-of-Thought (LoT) is implemented:
-- 1. Logic Extraction
1. Use Large Language Models (LLMs) to identify sentences containing conditional reasoning relationships from the input context.
2. Generate a collection of sentences with logical relationships.
3. Use LLMs to extract the set of propositional symbols and logical expressions from the collection.
4. Identify propositions with similar meanings and represent them using identical propositional symbols.
5. Analyze the logical relationships between propositions based on their natural language descriptions.
6. Add negation (¬) for propositions that express opposite meanings.
7. Use implication (→) to connect propositional symbols when a conditional relationship exists.
-- 2. Logic Extension
1. Apply logical reasoning laws to the collection of logical expressions from the Logic Extraction phase.
2. Use a Python program to implement logical deduction and expand the expressions.
3. Apply logical laws such as Double Negation, Contraposition, and Transitivity to derive new logical expressions.
-- 3. Logic Translation
1. Use LLMs to translate the newly generated logical expressions into natural language descriptions.
2. Combine the natural language descriptions of propositional symbols according to the extended logical expressions.
3. Incorporate the translated logical information as a new part of the original input prompt.
-- 4. Integration with Existing Prompting Methods
1. Combine the LoT-generated logical information with the original prompt.
2. Use this enhanced prompt with existing prompting methods like Chain-of-Thought (CoT), Self-Consistency (SC), or Tree-of-Thoughts (ToT).
3. Feed the augmented prompt to the LLM to generate the final answer.
What do you think about LoT?
Here are the steps on how Logic-of-Thought (LoT) is implemented:
-- 1. Logic Extraction
1. Use Large Language Models (LLMs) to identify sentences containing conditional reasoning relationships from the input context.
2. Generate a collection of sentences with logical relationships.
3. Use LLMs to extract the set of propositional symbols and logical expressions from the collection.
4. Identify propositions with similar meanings and represent them using identical propositional symbols.
5. Analyze the logical relationships between propositions based on their natural language descriptions.
6. Add negation (¬) for propositions that express opposite meanings.
7. Use implication (→) to connect propositional symbols when a conditional relationship exists.
-- 2. Logic Extension
1. Apply logical reasoning laws to the collection of logical expressions from the Logic Extraction phase.
2. Use a Python program to implement logical deduction and expand the expressions.
3. Apply logical laws such as Double Negation, Contraposition, and Transitivity to derive new logical expressions.
-- 3. Logic Translation
1. Use LLMs to translate the newly generated logical expressions into natural language descriptions.
2. Combine the natural language descriptions of propositional symbols according to the extended logical expressions.
3. Incorporate the translated logical information as a new part of the original input prompt.
-- 4. Integration with Existing Prompting Methods
1. Combine the LoT-generated logical information with the original prompt.
2. Use this enhanced prompt with existing prompting methods like Chain-of-Thought (CoT), Self-Consistency (SC), or Tree-of-Thoughts (ToT).
3. Feed the augmented prompt to the LLM to generate the final answer.
What do you think about LoT?