Spaces:

RIET-lab
/

moral-kg-workshop

Running

App Files Files Community

andrewelawrence commited on Jul 17

Commit

8cec78f

1 Parent(s): 8c6a471

Migration to RIET-lab/moral-kg-workshop-listenr

Browse files

Files changed (18) hide show

.gitignore +1 -2
README.md +0 -1
SETUP.md +0 -116
SPEC.md +0 -132
config.yaml +2 -4
setup.py +0 -221
setup.sh +0 -16
utils/__init__.py +0 -145
utils/dataset_utils.py +0 -351
utils/log_utils.py +0 -292
utils/phase1_utils.py +0 -240
utils/setup_utils.py +0 -200
utils/user_utils.py +0 -208
utils/webhook_utils.py +0 -340
utils/wipe_utils.py +0 -164
utils/workspace_utils.py +0 -387
wipe.py +0 -165
wipe.sh +0 -16

.gitignore CHANGED Viewed

@@ -1,5 +1,4 @@
 # Env file nor User config file not to be uploaded to the HF Space!
 .env
-users/
 archive/
-*__pycache__*

 # Env file nor User config file not to be uploaded to the HF Space!
 .env
 archive/
+*__pycache__*

README.md CHANGED Viewed

@@ -21,7 +21,6 @@ Part of RIET Lab's initiative to improve AI using moral reasoning.
 **Organization**: [https://huggingface.co/RIET-lab](https://huggingface.co/RIET-lab)
 **Workshop Website**: [https://sites.google.com/view/mereworkshop](https://sites.google.com/view/mereworkshop)
 **Argdown Documentation**:[https://argdown.org/guide/](https://argdown.org/guide/)
-**Listener HF Space**: [https://huggingface.co/spaces/RIET-lab/moral-kg-workshop-listener](https://huggingface.co/spaces/RIET-lab/moral-kg-workshop-listener)
 - Creating your own Argilla Spaces, check the [quickstart guide](http://docs.argilla.io/latest/getting_started/quickstart/) and the [Hugging Face Spaces configuration](http://docs.argilla.io/latest/getting_started/how-to-configure-argilla-on-huggingface/) for more details.
 - Discovering the Argilla UI, sign in with your Hugging Face account!

 **Organization**: [https://huggingface.co/RIET-lab](https://huggingface.co/RIET-lab)
 **Workshop Website**: [https://sites.google.com/view/mereworkshop](https://sites.google.com/view/mereworkshop)
 **Argdown Documentation**:[https://argdown.org/guide/](https://argdown.org/guide/)
 - Creating your own Argilla Spaces, check the [quickstart guide](http://docs.argilla.io/latest/getting_started/quickstart/) and the [Hugging Face Spaces configuration](http://docs.argilla.io/latest/getting_started/how-to-configure-argilla-on-huggingface/) for more details.
 - Discovering the Argilla UI, sign in with your Hugging Face account!

SETUP.md DELETED Viewed

@@ -1,116 +0,0 @@
-# MERe Workshop Setup Guide
-Setup and usage guide for the MERe Workshop dataset annotation process.
-A very important note: much of this infrastructure is to avoid paying for a space - there is NO persistant storage in `moral-kg-workshop`.
-## Environment
-### Required Environment Variables
-```bash
-export ARGILLA_API_URL="your-argilla-url"
-export ARGILLA_API_KEY="your-api-key"
-export HF_TOKEN="your-huggingface-token"
-```
-### Optional Environment Variables
-```bash
-export SLACK_WEBHOOK_URL="your-slack-webhook-url"
-# For error notifications to slack channel
-# Requires a custom slack app setup with a webhook url.
-# See https://api.slack.com/messaging/webhooks
-```
-### Dependencies
-Install required Python packages:
-```bash
-pip install -r requirements.txt
-```
-## Configuration
-See `config.yaml`
-## Space Setup
-### Complete Setup
-Run all setup operations (users, datasets, webhooks):
-```bash
-./setup.sh
-# or
-python setup.py
-```
-### Partial Setup
-Skip specific operations:
-```bash
-python setup.py --skip-users      # Skip user creation
-python setup.py --skip-workspaces # Skip workspace creation (breaks dataset allocation)
-python setup.py --skip-datasets   # Skip dataset creation
-python setup.py --skip-webhooks   # Skip webhook creation
-```
-### Status Check Only
-View current space status without making changes:
-```bash
-python setup.py --status-only
-```
-## Wipe Operations
-### Complete Wipe
-Remove everything (users, workspaces, datasets, webhooks):
-```bash
-./wipe.sh
-# or
-python3 wipe.py
-```
-### Selective Wipe
-Remove specific components:
-```bash
-python wipe.py --datasets-only   # Only datasets
-python wipe.py --users-only      # Only users
-python wipe.py --webhooks-only   # Only webhooks
-```
-### Force Wipe
-Skip confirmation prompts:
-```bash
-python wipe.py --force
-```
-### Status Check Only
-View current space status without making changes:
-```bash
-python wipe.py --status-only
-```
-## Troubleshooting
-### Debug Mode
-For detailed debugging, set log level to DEBUG in `config.yaml`:
-```yaml
-logging:
-  level: "DEBUG"
-```
-### Status Commands
-```bash
-# Check status during setup
-python3 setup.py --status-only
-# Check status during wipe
-python3 wipe.py --status-only
-```

SPEC.md DELETED Viewed

@@ -1,132 +0,0 @@
-# Moral-kg annotation process setup
-Notes on the annotation pipeline / data ETL process
-## Annotation Pipeline Architecture
-Annotation occurs in two phases. During **Phase 1** annotators determine which set
-of claims best represents the argument of each paper. During **Phase 2** annotators
-map those claims into an Argument Map.
-### Setup
-1. Create users and user-specific workspaces based off of `users.csv` list
-2. Create Phase 1 dataset with records for each user
-  - NOTE: depending on space constraints, we could do one Phase 1 dataset for
-    all users as only Phase 2 is user-response-dependent.
-3. Create webhooks.
-### Phase 1 Argilla Dataset
-Creation:
-- At startup
-Records Input:
-- Manual batch input via HF dataset `moral-kg-sample`
-Response Output:
-- Real-time webhook to HF dataset `moral-kg-sample-labels`
-- Real-time webhook to Argilla Phase 2 dataset records
-Updates:
-- Only if moral-kg-sample is updated (this is handled manually)
-Fields:
-- Title (Author, Year) "title_info"
-- Text "text"
-Metadata:
-- Identifier (visible to annotators) "id"
-Questions:
-- TextQuestion "claims"
-  - Users list the claims which best represent the argument in the paper
-  - AI/ML-generated claims are proposed in a list as a suggestion
-Webhooks:
-- Listen if the dataset is ever published (it shouldn't be) and notify admin if
-  it is.
-- Response created/updated/deleted -> update `moral-kg-sample-labels`
-                                   -> update Argilla Phase 2 records
-### Phase 2 Argilla Dataset
-Creation:
-- When the first Phase 1 response is created
-Records Input:
-- Real-time webhook `response.created`/`.updated`/`.deleted` from Phase 1
-Response Output:
-- Real-time webhook to HF dataset `moral-kg-sample-maps`
-Updates:
-- When a Phase 1 response is created/updated/deleted
-Fields:
-- Title (Author, Year) "title_info"
-- Argdown Page "argdown"
-- Text "text"
-Metadata:
-- Identifier (visible to annotators) "id"
-Questions:
-- TextQuestion "argmap"
-  - Users are asked to copy and paste their final Argdown input into this box
-    as the solution.
-Webhooks:
-- Listen if the dataset is ever published (it shouldn't be) and notify admin if
-  it is.
-- Response created/updated/deleted -> update `moral-kg-sample-maps`
-## HuggingFace Datasets
-There are three huggingface datasets that will be involved in the annotation
-process: `moral-kg-sample`, `moral-kg-sample-labels`, and `moral-kg-sample-maps`.
-### `moral-kg-sample` (private)
-Will store the data associated with each paper in the sample:
-- identifier    | str       | The Phil-Papers ID associated with each paper
-- title         | str       | The title of the paper
-- authors       | list:str  | The authors attributed to the paper
-- year          | str       | The publication year of the paper
-- text          | str       | The paper content (in plain text or markdown)
-- map           | dict      | The claim:method map that contains each claim
-                              extracted from the text and its associated
-                              extraction method.
-### `moral-kg-sample-labels` (private)
-Will store data associated with the claims annotators select for each paper in
-the sample:
-- identifier    | str       | The Phil-Papers ID associated with each paper
-- annotator     | str       | The annotator's unique Argilla UUID
-- map           | dict      | The claim:method map that contains each claim the
-                              annotator selects as representative of the paper.
-                              Claims not found in the original map are labeled
-                              "annotator"
-### `moral-kg-sample-maps` (private)
-Will store data associated with the argument maps annotators create for each
-paper in the sample:
-- identifier    | str       | The Phil-Papers ID associated with each paper
-- annotator     | str       | The annotator's unique Argilla UUID
-- argmap        | dict      | The argument map (in Argdown format) that
-                              represents the paper argument structure.
-## Webhooks
-### dataset.published
-- Stretch goal: implement slack notification. For now just log that a dataset
- was published.
-### response.created
-IF data.data.values contains "claims":
-- This means it is phase 1 response
-ELSE IF data.data.values contains "argmap":
-- This means it is phase 2 response
-### response.updated
-IF data.record.questions.name contains "claims":
--
-ELSE IF data.record.questions.name contains "argmap":
--
-### response.deleted
-IF data.record.questions.name contains "claims":
--
-ELSE IF data.record.questions.name contains "argmap":
--
-## Notes, Comments, and Questions
-- I assume that our ultimate moral-kg dataset, that which makes up the entirety
-  of the KG and will be public, will be in a separate HF dataset.
-- There are no user event webhooks so we must either:
-  1. batch create users or
-  2. poll every second during the workshop or
-  3. track OAuth sign-ins
-- Should we put a link to the website pdf alongside its processed text?
-- For Phase 2 argmap building: ideally we are able to extract the user text
-  inputted into the iFrame but I'm not confident we will be able to so this
-  solution suffices for now.

config.yaml CHANGED Viewed

@@ -1,4 +1,6 @@
 # moral-kg-workshop config
 # File Paths Configuration
 paths:
@@ -83,10 +85,6 @@ phase1:
 logging:
   level: "INFO"
   format: "[%(asctime)s] [%(name)s] [%(levelname)s] - %(message)s"
-  # External library log levels (set to WARNING/ERROR to reduce verbosity)
-  external_libraries:
-    httpx: "WARNING"
-    argilla.sdk: "WARNING"
 # Error Handling Configuration
 error_handling:

 # moral-kg-workshop config
+#
+# NOTE: See moral-kg-workshop-listener config for updates!
 # File Paths Configuration
 paths:
 logging:
   level: "INFO"
   format: "[%(asctime)s] [%(name)s] [%(levelname)s] - %(message)s"
 # Error Handling Configuration
 error_handling:

setup.py DELETED Viewed

@@ -1,221 +0,0 @@
-#!/usr/bin/env python3
-"""
-setup.py
-Setup script for the MERe Workshop Argilla Hugging Face space. This is the
-primary annotation pipeline. Creates users, workspaces, datasets, and webhooks.
-"""
-import argparse
-import json
-import os
-from huggingface_hub import HfApi
-from utils import (
-    validate_env,
-    log_operation_success,
-    log_operation_failure,
-    get_status,
-    log_info,
-    log_warning,
-    create_users,
-    create_user_workspaces,
-    create_webhooks,
-    create_phase1_datasets,
-    list_users,
-    get_config,
-)
-def parse_args():
-    """Parse command line arguments."""
-    parser = argparse.ArgumentParser(
-        description="Setup MERe Workshop Argilla Hugging Face space",
-        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
-    )
-    parser.add_argument(
-        "-u", "--skip-users",
-        action="store_true",
-        help="Skip user creation step"
-    )
-    parser.add_argument(
-        "-w", "--skip-workspaces",
-        action="store_true",
-        help="Skip workspace creation and user assignment step"
-    )
-    parser.add_argument(
-        "-d", "--skip-datasets",
-        action="store_true",
-        help="Skip dataset creation step"
-    )
-    parser.add_argument(
-        "-l", "--skip-listener",
-        action="store_true",
-        help="Skip restarting the listener space (skips webhook creation step)."
-    )
-    parser.add_argument(
-        "-s",
-        "--status-only",
-        action="store_true",
-        help="Only show current space status, do not perform setup",
-    )
-    return parser.parse_args()
-def restart_listener():
-    """Start the RIET-lab/moral-kg-workshop-listener space."""
-    try:
-        api = HfApi(token=os.getenv("HF_TOKEN"))
-        api.restart_space(repo_id="RIET-lab/moral-kg-workshop-listener")
-        log_operation_success("restart listener space", "Space restart initiated successfully")
-        return True
-    except Exception as e:
-        log_operation_failure("restart listener space", e)
-        return False
-def show_space_status():
-    """Display current space status."""
-    status = get_status()
-    if "error" in status:
-        log_operation_failure("check space status", status["error"])
-        return False
-    print()
-    log_info("=== Current Argilla Space Status ===")
-    log_info(f"Workspaces: {status['workspaces']}")
-    log_info(f"Users: {status['users']}")
-    log_info(f"Datasets: {status['datasets']}")
-    log_info(f"Records: {status['records']}")
-    log_info(f"Webhooks: {status['webhooks']}")
-    print()
-    return True
-def track_user_info(
-    filepath=None
-):
-    """Store Argilla user info to a file or log them if no file is provided."""
-    users = list_users()
-    if filepath:
-        try:
-            with open(filepath, 'w', encoding='utf-8') as f:
-                json.dump(users, f, indent=2)
-            log_info(f"User ID map written to {filepath}")
-        except Exception as e:
-            log_operation_failure("map user ids", e)
-    else:
-        log_info(f"User ID map: {users}")
-def main():
-    """Main setup function."""
-    args = parse_args()
-    config = get_config()
-    # Validate environment
-    try:
-        validate_env()
-        log_operation_success("setup validation", "Environment validated")
-    except Exception as e:
-        log_operation_failure("setup validation", e)
-        return 1
-    # Show current status
-    if not show_space_status():
-        return 1
-    # If status-only mode, exit here
-    if args.status_only:
-        return 0
-    # Track overall success
-    operations_success = []
-    # Step 1: Create users
-    if not args.skip_users:
-        print()
-        log_info("Creating users...")
-        success = create_users()
-        operations_success.append(success)
-        if success:
-            log_info("Success: Users created successfully")
-            # Track user profiles after creation so we can map users to their UUIDs
-            track_user_info(config.get('paths', {}).get('users_info', None))
-        else:
-            log_info("Failed: Could not create users")
-    else:
-        log_info("Skipping user creation")
-    # Step 2: Create workspaces
-    if not args.skip_workspaces:
-        print()
-        log_info("Creating workspaces and assigning users...")
-        success = create_user_workspaces()
-        operations_success.append(success)
-        if success:
-            log_info("Success: Workspaces created and users assigned successfully")
-        else:
-            log_info("Failed: Could not create workspaces and assign users")
-    else:
-        log_info("Skipping workspace creation and user assignment")
-    # Step 3: Create datasets
-    if not args.skip_datasets:
-        print()
-        log_info("Creating datasets...")
-        success = create_phase1_datasets()
-        operations_success.append(success)
-        if success:
-            log_info("Success: Datasets created successfully")
-        else:
-            log_info("Failed: Could not create datasets")
-    else:
-        log_info("Skipping dataset creation")
-    # # Step 4: Restart listener to create webhooks
-    if not args.skip_listener:
-        print()
-        log_info("Restarting RIET-lab/moral-kg-workshop-listener space...")
-        success = restart_listener()
-        if success:
-            log_info("Success: Listener space restart initiated")
-        else:
-            log_info("Failed: Could not restart listener space")
-        return 0 if success else 1
-    # Show final status
-    show_space_status()
-    # Overall result
-    if operations_success:
-        successful_count = sum(operations_success)
-        total_count = len(operations_success)
-        if successful_count == total_count:
-            log_operation_success("complete setup", "All operations completed successfully", send_to_slack=True)
-            return 0
-        else:
-            log_operation_failure("complete setup", Exception("Some or all operations failed"), send_to_slack=True)
-            return 1
-    else:
-        log_operation_success("complete setup", "No operations were required", send_to_slack=True)
-        return 0
-if __name__ == "__main__":
-    exit(main())

setup.sh DELETED Viewed

@@ -1,16 +0,0 @@
-#!/bin/bash
-# setup.sh
-#
-# Shell wrapper for the MERe Workshop setup process.
-set -euo pipefail
-# Get the directory where this script is located
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &> /dev/null && pwd)"
-# Change to the script directory
-cd "$SCRIPT_DIR"
-# Run the setup script
-python setup.py "$@"

utils/__init__.py DELETED Viewed

@@ -1,145 +0,0 @@
-"""
-utils package for MERe Workshop annotation pipeline
-This package provides utilities for:
-- Configuration management (setup_utils)
-- Logging and notifications (log_utils)
-- Argilla Phase 1 dataset and Hugging Face dataset management (dataset_utils)
-- User management (users_utils.py)
-- Argilla webhook management (webhook_utils.py)
-- Argilla space wiping/status (wipe_utils)
-"""
-from .setup_utils import (
-    get_root,
-    get_config,
-    get_client,
-    get_hf_api,
-    validate_env,
-    load_users,
-)
-from .log_utils import (
-    log_error,
-    log_warning,
-    log_info,
-    log_operation_success,
-    log_operation_failure,
-    log_dataset_operation,
-    log_user_operation,
-    log_webhook_operation,
-)
-from .dataset_utils import (
-    create_dataset,
-    delete_datasets,
-    delete_dataset,
-    list_datasets,
-    update_datasets,
-    update_dataset,
-    load_moral_kg_sample,
-)
-from .phase1_utils import (
-    create_phase1_datasets,
-    delete_phase1_datasets,
-    update_phase1_datasets,
-)
-from .user_utils import (
-    create_users,
-    create_user,
-    delete_users,
-    delete_user,
-    list_users,
-)
-from .workspace_utils import (
-    create_workspaces,
-    create_workspace,
-    create_user_workspaces,
-    create_user_workspace,
-    delete_workspaces,
-    delete_workspace,
-    delete_user_workspaces,
-    delete_user_workspace,
-    list_workspaces,
-    list_user_workspaces,
-)
-from .webhook_utils import (
-    create_webhooks,
-    create_webhook,
-    delete_webhooks,
-    delete_webhook,
-    list_webhooks,
-    list_webhook_events,
-    update_webhooks,
-    update_webhook,
-    validate_webhooks,
-    webhook_exists,
-)
-from .wipe_utils import (
-    get_status,
-    wipe_space,
-    wipe_datasets_only,
-    wipe_users_only,
-    wipe_webhooks_only,
-)
-__all__ = [
-    "get_root",
-    "get_config",
-    "get_client",
-    "get_hf_api",
-    "validate_env",
-    "load_users",
-    "log_error",
-    "log_warning",
-    "log_info",
-    "log_operation_success",
-    "log_operation_failure",
-    "log_dataset_operation",
-    "log_user_operation",
-    "log_webhook_operation",
-    "create_phase1_datasets",
-    "create_dataset",
-    "delete_phase1_datasets",
-    "delete_datasets",
-    "delete_dataset",
-    "list_datasets",
-    "update_phase1_datasets",
-    "update_datasets",
-    "update_dataset",
-    "load_moral_kg_sample",
-    "create_users",
-    "create_user",
-    "delete_users",
-    "delete_user",
-    "list_users",
-    "create_workspaces",
-    "create_workspace",
-    "create_user_workspaces",
-    "create_user_workspace",
-    "delete_workspaces",
-    "delete_workspace",
-    "delete_user_workspaces",
-    "delete_user_workspace",
-    "list_workspaces",
-    "list_user_workspaces",
-    "create_webhooks",
-    "create_webhook",
-    "delete_webhooks",
-    "delete_webhook",
-    "list_webhooks",
-    "list_webhook_events",
-    "update_webhooks",
-    "update_webhook",
-    "validate_webhooks",
-    "webhook_exists",
-    "get_status",
-    "wipe_space",
-    "wipe_datasets_only",
-    "wipe_users_only",
-    "wipe_webhooks_only",
-]

utils/dataset_utils.py DELETED Viewed

@@ -1,351 +0,0 @@
-"""
-dataset_utils.py
-Helper functions for dataset creation and management in the MERe Workshop annotation pipeline.
-Transformed from create-datasets.py script to follow proper helper function paradigm.
-"""
-import os
-import warnings
-from typing import Dict, List, Optional
-import argilla as rg
-from datasets import load_dataset
-from .setup_utils import (
-    get_config,
-    get_client,
-    get_hf_api
-)
-from .log_utils import (
-    log_info,
-    log_operation_success,
-    log_operation_failure,
-    log_dataset_operation
-)
-# Get config
-_config = get_config()
-# Get client
-_client = get_client()
-def load_moral_kg_sample(
-) -> Optional[List[Dict]]:
-    """Load the moral-kg-sample dataset from HuggingFace."""
-    global config
-    dataset_name = _config.get('datasets.sample')
-    if not dataset_name:
-        log_operation_failure("load sample dataset", Exception("Dataset name not configured"))
-        return None
-    try:
-        # Setup HF client to ensure authentication
-        get_hf_api()
-        dataset = load_dataset(dataset_name, split="train", token=os.getenv("HF_TOKEN"))
-        # Convert to list of dictionaries for easier processing
-        records = []
-        for item in dataset:
-            item = dict(item)
-            records.append({
-                'identifier': item.get('identifier'),
-                'title': item.get('title'),
-                'authors': item.get('authors'),
-                'year': item.get('year'),
-                'categories': item.get('categories'),
-                'text': item.get('text'),
-                'map': item.get('map')
-            })
-        log_operation_success("load moral-kg-sample dataset", f"Loaded {len(records)} records")
-        return records
-    except Exception as e:
-        log_operation_failure("load moral-kg-sample dataset", e)
-        return None
-def _get_workspace_names(
-) -> List[str]:
-    """Get list of available workspaces."""
-    try:
-        global _client
-        workspaces = _client.workspaces
-        workspace_names = [ws.name or "" for ws in workspaces]
-        return workspace_names
-    except Exception as e:
-        log_operation_failure("fetch workspaces", e)
-        return []
-def _format_title_info(
-    authors: List[str],
-    year: str,
-    title: str
-) -> str:
-    """Format title info as 'Title (Author, Year)'."""
-    # Take first author and add et al. if multiple authors
-    authors_display = authors[0] if authors else "Unknown"
-    if len(authors) > 1:
-        authors_display += " et al."
-    return f"{title} ({authors_display}, {year})"
-def _check_dataset_exists(
-    workspace_name: str,
-    dataset_name: str
-) -> bool:
-    """Check if dataset already exists in workspace."""
-    try:
-        with warnings.catch_warnings():
-            warnings.simplefilter("ignore")
-            workspace = _client.workspaces(workspace_name)
-        if workspace:
-            for existing_dataset in workspace.datasets:
-                if existing_dataset.name == dataset_name:
-                    return True
-    except Exception:
-        pass
-    return False
-def create_dataset(
-    dataset_name: str,
-    workspace_name: Optional[str],
-    settings: rg.Settings,
-    records: Optional[List[Dict]] = None,
-) -> bool:
-    """Create a dataset with given settings in specified workspace."""
-    global _client
-    try:
-        dataset = rg.Dataset(
-            name=dataset_name,
-            workspace=workspace_name,
-            settings=settings,
-            client=_client,
-        )
-        dataset.create()
-        log_dataset_operation("created", dataset_name, f"in workspace {workspace_name}")
-        # Add records if provided
-        if records:
-            dataset.records.log(records)
-            log_operation_success("load records into dataset", f"Added {len(records)} records")
-        return True
-    except Exception as e:
-        log_operation_failure("create dataset", e)
-        return False
-def delete_datasets(
-    dataset_names: Optional[List[str]] = None,
-    workspace_name: Optional[str] = None
-) -> bool:
-    """Delete multiple datasets or all datasets if none specified."""
-    global _client
-    if dataset_names is None:
-        # Delete all datasets from all workspaces or specific workspace
-        if workspace_name:
-            with warnings.catch_warnings():
-                warnings.simplefilter("ignore")
-                workspace = _client.workspaces(workspace_name)
-            if not workspace:
-                log_operation_failure("delete datasets", Exception(f"Workspace {workspace_name} not found"))
-                return False
-            datasets = workspace.datasets
-            dataset_names = [ds.name for ds in datasets if ds.name]
-            success_count = 0
-            for ds_name in dataset_names:
-                if delete_dataset(workspace_name, ds_name):
-                    success_count += 1
-            log_operation_success("delete datasets from workspace",
-                                f"Deleted {success_count}/{len(dataset_names)} datasets from {workspace_name}")
-            return success_count == len(dataset_names)
-        else:
-            # Get all datasets from all workspaces
-            all_datasets = []
-            for ws in _client.workspaces:
-                ws_name = ws.name
-                if ws_name:
-                    datasets = ws.datasets
-                    for ds in datasets:
-                        if ds.name:
-                            all_datasets.append((ws_name, ds.name))
-            success_count = 0
-            for ws_name, ds_name in all_datasets:
-                if delete_dataset(ws_name, ds_name):
-                    success_count += 1
-            log_operation_success("delete all datasets",
-                                 f"Deleted {success_count}/{len(all_datasets)} datasets")
-            return success_count == len(all_datasets)
-    else:
-        # Delete specific datasets
-        if not workspace_name:
-            log_operation_failure("delete datasets", Exception("Workspace name required when specifying dataset names"))
-            return False
-        success_count = 0
-        for dataset_name in dataset_names:
-            if delete_dataset(workspace_name, dataset_name):
-                success_count += 1
-        log_operation_success("delete datasets",
-                            f"Deleted {success_count}/{len(dataset_names)} datasets")
-        return success_count == len(dataset_names)
-def delete_dataset(
-    workspace_name: str,
-    dataset_name: str
-) -> bool:
-    """Delete a specific dataset from a workspace."""
-    try:
-        global _client
-        workspace = _client.workspaces(workspace_name)
-        if not workspace:
-            log_operation_failure("delete dataset", Exception(f"Workspace {workspace_name} not found"))
-            return False
-        # Find the dataset in workspace
-        dataset = None
-        for ds in workspace.datasets:
-            if ds.name == dataset_name:
-                dataset = ds
-                break
-        if not dataset:
-            log_operation_failure("delete dataset", Exception(f"Dataset {dataset_name} not found in workspace {workspace_name}"))
-            return False
-        # Delete all records first
-        try:
-            records = list(dataset.records)
-            # Filter out None records to avoid AttributeError
-            records = [r for r in records if r is not None]
-            if records:
-                dataset.records.delete(records=records)
-                log_dataset_operation("deleted records", dataset_name, f"{len(records)} records")
-            else:
-                log_info(f"No records found in dataset {dataset_name}")
-        except Exception as e:
-            if e is AttributeError:
-                pass
-            else:
-                log_operation_failure("delete dataset records", e)
-        # Delete the dataset
-        dataset.delete()
-        log_dataset_operation("deleted", dataset_name, f"from workspace {workspace_name}")
-        return True
-    except Exception as e:
-        log_operation_failure("delete dataset", e)
-        return False
-def list_datasets(
-) -> Dict[str, List[str]]:
-    """List all datasets grouped by workspace."""
-    global _client
-    try:
-        workspace_datasets = {}
-        for workspace in _client.workspaces:
-            workspace_name = workspace.name or "Unknown"
-            datasets = [dataset.name for dataset in workspace.datasets if dataset.name]
-            workspace_datasets[workspace_name] = datasets
-            log_dataset_operation("listed", f"workspace {workspace_name}",
-                                f"Found {len(datasets)} datasets")
-        return workspace_datasets
-    except Exception as e:
-        log_operation_failure("list datasets", e)
-        return {}
-def update_datasets(
-    dataset_updates: List[Dict[str, str]],
-    new_settings: Optional[rg.Settings] = None
-) -> bool:
-    """Update multiple datasets."""
-    success_count = 0
-    for update_info in dataset_updates:
-        workspace_name = update_info.get('workspace', '')
-        dataset_name = update_info.get('dataset', '')
-        new_workspace = update_info.get('new_workspace')
-        if update_dataset(workspace_name, dataset_name, new_settings, new_workspace):
-            success_count += 1
-    log_operation_success("update datasets",
-                        f"Updated {success_count}/{len(dataset_updates)} datasets")
-    return success_count == len(dataset_updates)
-def update_dataset(
-    workspace_name: str,
-    dataset_name: str,
-    new_settings: Optional[rg.Settings] = None,
-    new_workspace: Optional[str] = None
-) -> bool:
-    """Update a specific dataset's settings or move to new workspace."""
-    global _client
-    try:
-        with warnings.catch_warnings():
-            warnings.simplefilter("ignore")
-            workspace = _client.workspaces(workspace_name)
-        dataset = workspace.datasets(dataset_name) #type: ignore
-        if not dataset:
-            log_operation_failure("update dataset",
-                                  Exception(f"Dataset {dataset_name} not found in workspace {workspace_name}"))
-            return False
-        # Update settings if provided
-        if new_settings:
-            # Note: Argilla may not support direct settings updates, this might need to be recreate
-            log_operation_success("update dataset settings",
-                                 f"Attempted to update {dataset_name}")
-        # Move to new workspace if provided
-        if new_workspace:
-            # Note: This typically requires recreating the dataset in the new workspace
-            log_operation_success("move dataset workspace",
-                                 f"Attempted to move {dataset_name} to {new_workspace}")
-        return True
-    except Exception as e:
-        log_operation_failure("update dataset", e)
-        return False

utils/log_utils.py DELETED Viewed

@@ -1,292 +0,0 @@
-"""
-log_utils.py
-Logging and notification utilities for the MERe Workshop annotation pipeline.
-Handles error logging, Slack notifications, and webhook data logging.
-"""
-import logging
-import os
-import textwrap
-from typing import Optional
-import requests
-from .setup_utils import get_config
-# Get config
-config = get_config()
-def _setup_logging(
-) -> logging.Logger:
-    """Set up logging configuration."""
-    global config
-    log_config = config.get("logging", {})
-    # Configure logging
-    logging.basicConfig(
-        level=getattr(logging, log_config.get("level", "INFO")),
-        format=log_config.get(
-            "format", "[%(asctime)s] [%(name)s] [%(levelname)s] - %(message)s"
-        ),
-    )
-    # Configure external library log levels
-    external_libs = log_config.get("external_libraries", {})
-    for lib_name, log_level in external_libs.items():
-        lib_logger = logging.getLogger(lib_name)
-        lib_logger.setLevel(getattr(logging, log_level.upper()))
-    return logging.getLogger("mere_workshop")
-# Get logger
-logger = _setup_logging()
-def _can_print(
-) -> bool:
-    """
-    Use to test if you can use the print() function.
-    returns True: if you can _print_ in the space or locally
-    return False: if you cannot _print_ in the space or locally
-    """
-    global config
-    return config.get("error_handling.log_data_not_slack", False)
-def _can_log(
-) -> bool:
-    """
-    Use to test if you can use any log_*() function.
-    returns True: if you can _log_ in the space or locally
-    return False: if you cannot _log_ in the space or locally
-    """
-    global config
-    return (config.get("error_handling.log_data_not_slack", False) and
-            config.get("error_handling.force_slack_notifications", False))
-def _send_slack_notification(
-    message: str,
-) -> bool:
-    """Sends Slack notification if configured."""
-    global config
-    global logger
-    slack_webhook_url = os.getenv("SLACK_WEBHOOK_URL")
-    if not slack_webhook_url:
-        if _can_log():
-            logger.warning("SLACK_WEBHOOK_URL not configured, skipping notification")
-        elif _can_print():
-            print(f"SLACK_WEBHOOK_URL not configured, skipping notification")
-        return False
-    try:
-        payload = {"text": message}
-        response = requests.post(
-            slack_webhook_url,
-            json=payload,
-            headers={"Content-Type": "application/json"},
-            timeout=10,
-        )
-        if response.status_code == 200:
-            if _can_log():
-                logger.info(f"Slack notification sent: {message}")
-            elif _can_print():
-                print(f"Slack notification sent: {message}")
-            return True
-        else:
-            if _can_log():
-                logger.error(f"Failed to send Slack notification. Status code: {response.status_code}")
-            elif _can_print():
-                print(f"Failed to send Slack notification. Status code: {response.status_code}")
-            return False
-    except Exception as e:
-        if _can_log():
-            logger.error("Error sending Slack notification", e)
-        elif _can_print():
-            print(f"Error sending Slack notification: {e}")
-        return False
-def _send_to_slack(
-    send_to_slack: bool,
-    message: str
-) -> bool:
-    """Determine if a log should be sent as a slack notification"""
-    global config
-    global logger
-    try:
-        if (config.get("error_handling.slack_notifications", False) and
-           (send_to_slack or
-            config.get("error_handling.force_slack_notifications", False))):
-            return _send_slack_notification(message)
-        else:
-            return True
-    except Exception as e:
-        logger.error("Error sending Slack notification", e)
-        return False
-def log_error(
-    error_msg: str,
-    exception: Optional[Exception] = None,
-    send_to_slack: bool = False,
-) -> None:
-    """Log errors."""
-    global config
-    global logger
-    if exception:
-        # Format error with indented description using textwrap
-        error_detail = textwrap.indent(str(exception), "    ")
-        full_msg = (
-            f"{error_msg}\n    Exception: {type(exception).__name__}\n{error_detail}"
-        )
-    else:
-        full_msg = error_msg
-    if _can_log():
-        logger.error(full_msg)
-        _send_to_slack(send_to_slack, full_msg)
-    elif _can_print():
-        print(f"[ERROR] {full_msg}")
-        _send_to_slack(send_to_slack, full_msg)
-def log_warning(
-    warning_msg: str,
-    send_to_slack: bool = False
-) -> None:
-    """Log warnings."""
-    global config
-    global logger
-    if _can_log():
-        logger.warning(warning_msg)
-        _send_to_slack(send_to_slack, warning_msg)
-    elif _can_print():
-        print(f"[WARNING] {warning_msg}")
-        _send_to_slack(send_to_slack, warning_msg)
-def log_info(
-    info_msg: str,
-    send_to_slack: bool = False
-) -> None:
-    """Log information."""
-    global config
-    global logger
-    if _can_log():
-        logger.info(info_msg)
-        _send_to_slack(send_to_slack, info_msg)
-    elif _can_print():
-        print(f"[INFO] {info_msg}")
-        _send_to_slack(send_to_slack, info_msg)
-def log_operation_success(
-    operation: str,
-    details: Optional[str] = None,
-    send_to_slack: bool = False
-) -> None:
-    """Log successful operation."""
-    global config
-    msg = f"Successfully completed {operation}"
-    if details:
-        msg += f": {details}"
-    log_info(msg)
-    _send_to_slack(send_to_slack, msg)
-def log_operation_failure(
-    operation: str,
-    error: Optional[Exception] = None,
-    send_to_slack: bool = False,
-) -> None:
-    """Logs failed operation."""
-    global config
-    msg = f"Failed to {operation}"
-    log_error(msg, error)
-    if error:
-        error_detail = textwrap.indent(str(error), "    ")
-        full_msg = (
-            f"{msg}\n    Exception: {type(error).__name__}\n{error_detail}"
-        )
-        _send_to_slack(send_to_slack, full_msg)
-    else:
-        _send_to_slack(send_to_slack, msg)
-def log_dataset_operation(
-    operation: str,
-    dataset_name: str,
-    details: Optional[str] = None,
-    send_to_slack: bool = False,
-) -> None:
-    """Log dataset-related operations."""
-    global config
-    global logger
-    msg = f"Dataset {operation} ({dataset_name})"
-    if details:
-        msg += f": {details}"
-    logger.info(msg)
-    _send_to_slack(send_to_slack, msg)
-def log_user_operation(
-    operation: str,
-    username: str,
-    details: Optional[str] = None,
-    send_to_slack: bool = False,
-) -> None:
-    """Log user-related operations."""
-    global config
-    global logger
-    msg = f"User {operation} ({username})"
-    if details:
-        msg += f": {details}"
-    logger.info(msg)
-    _send_to_slack(send_to_slack, msg)
-def log_webhook_operation(
-    operation: str,
-    event: str,
-    details: Optional[str] = None,
-    send_to_slack: bool = False,
-) -> None:
-    """Log webhook-related operations."""
-    global config
-    global logger
-    msg = f"Webhook {operation} ({event})"
-    if details:
-        msg += f": {details}"
-    logger.info(msg)
-    _send_to_slack(send_to_slack, msg)

utils/phase1_utils.py DELETED Viewed

@@ -1,240 +0,0 @@
-"""
-phase1_utils.py
-Helper functions for Phase 1 dataset creation and management in the MERe Workshop annotation pipeline.
-"""
-import json
-import warnings
-from typing import Dict, List, Optional
-import argilla as rg
-from .setup_utils import (
-    get_config,
-    get_client,
-)
-from .log_utils import (
-    log_info,
-    log_operation_success,
-    log_operation_failure,
-    log_dataset_operation
-)
-from .dataset_utils import (
-    load_moral_kg_sample,
-    _get_workspace_names,
-    _format_title_info,
-    _check_dataset_exists,
-    create_dataset,
-    delete_dataset,
-    update_dataset
-)
-# Get config and client
-_config = get_config()
-_client = get_client()
-def _create_phase1_settings(
-) -> rg.Settings:
-    """Create the Phase 1 dataset settings from configuration."""
-    global _config
-    phase1_config = _config.phase1
-    # Build fields from config
-    fields = []
-    for field_name, field_config in phase1_config.get('fields', {}).items():
-        fields.append(rg.TextField(
-            name=field_config['name'],
-            title=field_config['title'],
-            use_markdown=field_config.get('use_markdown', False)
-        ))
-    # Build metadata from config
-    metadata = []
-    for meta_name, meta_config in phase1_config.get('metadata', {}).items():
-        metadata.append(rg.TermsMetadataProperty(
-            name=meta_config['name'],
-            title=meta_config['title'],
-            visible_for_annotators=meta_config.get('visible_for_annotators', True)
-        ))
-    # Build questions from config
-    questions = []
-    for question_name, question_config in phase1_config.get('questions', {}).items():
-        if question_config.get('type') == 'TextQuestion':
-            questions.append(rg.TextQuestion(
-                name=question_config['name'],
-                title=question_config['title'],
-                description=question_config.get('description', ''),
-                required=question_config.get('required', False)
-            ))
-        else:
-            log_operation_failure("add question to Phase 1 dataset",
-                                  Exception("Haven't implemented non TextQuestions into the process."))
-    return rg.Settings(
-        guidelines=phase1_config.get('guidelines', ''),
-        fields=fields,
-        metadata=metadata,
-        questions=questions
-    )
-def _create_phase1_dataset(
-    workspace_name: str,
-    records: List[Dict]
-) -> bool:
-    """Create Phase 1 dataset for a specific workspace."""
-    global _client
-    dataset_name = _config.get('phase1.dataset_name', 'Phase 1')
-    # Check if dataset already exists
-    if _check_dataset_exists(workspace_name, dataset_name):
-        log_dataset_operation("created", dataset_name, f"in workspace {workspace_name} (already exists)")
-        # Get existing dataset for record loading
-        try:
-            with warnings.catch_warnings():
-                warnings.simplefilter("ignore")
-                workspace = _client.workspaces(workspace_name)
-            if workspace:
-                for existing_dataset in workspace.datasets:
-                    if existing_dataset.name == dataset_name:
-                        dataset = existing_dataset
-                        break
-        except Exception as e:
-            log_operation_failure("get existing dataset", e)
-            return False
-    else:
-        # Create new dataset
-        try:
-            dataset = rg.Dataset(
-                name=dataset_name,
-                workspace=workspace_name,
-                settings=_create_phase1_settings(),
-                client=_client,
-            )
-            dataset.create()
-            log_dataset_operation("created", dataset_name, f"in workspace {workspace_name}")
-        except Exception as e:
-            log_operation_failure("create Phase 1 dataset", e)
-            return False
-    # Convert records to Argilla format and load them
-    try:
-        argilla_records = []
-        for record in records:
-            title_info = _format_title_info(
-                record['authors'],
-                record['year'],
-                record['title']
-            ).strip()
-            # Parse map from JSON string back to dictionary
-            map_data = json.loads(record['map']) if record['map'] else {}
-            suggestions = list(map_data.keys())
-            argilla_record = rg.Record(
-                fields={
-                    "title_info": title_info,
-                    "text": record['text']
-                },
-                metadata={
-                    "id": record['identifier'],
-                    "fields": record['categories']
-                },
-                suggestions=[
-                    rg.Suggestion(
-                        question_name="claims",
-                        value="\n\n".join(suggestions)
-                    )
-                ]
-            )
-            argilla_records.append(argilla_record)
-        # Add records to dataset
-        dataset.records.log(argilla_records)
-        log_operation_success("load records into dataset", f"Added {len(argilla_records)} records")
-        return True
-    except Exception as e:
-        log_operation_failure("load records into dataset", e)
-        return False
-def create_phase1_datasets(
-) -> bool:
-    """Create Phase 1 datasets for all available workspaces."""
-    try:
-        # Load client and get workspaces
-        workspace_names = _get_workspace_names()
-        if not workspace_names:
-            log_operation_failure("create datasets", Exception("No workspaces found"))
-            return False
-        # Load records from HuggingFace
-        records = load_moral_kg_sample()
-        if not records:
-            log_operation_failure("create datasets", Exception("Failed to load sample records"))
-            return False
-        # Create datasets for each workspace
-        success_count = 0
-        failed_count = 0
-        for workspace_name in workspace_names:
-            if _create_phase1_dataset(workspace_name, records):
-                success_count += 1
-            else:
-                failed_count += 1
-        # Use transaction-like logging
-        log_info(f"Create Phase 1 datasets: {success_count} / {len(workspace_names)} succeeded, {failed_count} failed.")
-        return success_count == len(workspace_names)
-    except Exception as e:
-        log_operation_failure("create datasets for all workspaces", e)
-        return False
-def delete_phase1_datasets(
-) -> bool:
-    """Delete all Phase 1 datasets from all workspaces."""
-    global config
-    dataset_name = _config.get('phase1.dataset_name', 'Phase 1')
-    workspace_names = _get_workspace_names()
-    success_count = 0
-    for workspace_name in workspace_names:
-        if delete_dataset(workspace_name, dataset_name):
-            success_count += 1
-    log_operation_success("delete Phase 1 datasets",
-                        f"Deleted {success_count}/{len(workspace_names)} datasets")
-    return success_count == len(workspace_names)
-def update_phase1_datasets(
-    new_settings: Optional[rg.Settings] = None,
-    new_workspace: Optional[str] = None
-) -> bool:
-    """Update all Phase 1 datasets with new settings or move to new workspace."""
-    global _config
-    dataset_name = _config.get('phase1.dataset_name', 'Phase 1')
-    workspace_names = _get_workspace_names()
-    success_count = 0
-    for workspace_name in workspace_names:
-        if update_dataset(workspace_name, dataset_name, new_settings, new_workspace):
-            success_count += 1
-    log_operation_success("update Phase 1 datasets",
-                        f"Updated {success_count}/{len(workspace_names)} datasets")
-    return success_count == len(workspace_names)

utils/setup_utils.py DELETED Viewed

@@ -1,200 +0,0 @@
-"""
-setup_utils.py
-Initialization utilities for the MERe Workshop annotation pipeline.
-Handles setup of clients, configuration loading, and environment validation.
-"""
-import os
-from pathlib import Path
-from typing import Any, Dict
-import argilla as rg
-from huggingface_hub import HfApi
-import rootutils
-import yaml
-# Setup project root
-_root = rootutils.setup_root(__file__, indicator=".git", pythonpath=True)
-def validate_env(
-) -> bool:
-    """Validate that all required environment variables are set."""
-    required_vars = [
-        "ARGILLA_API_URL",
-        "ARGILLA_API_KEY",
-        "HF_TOKEN"]
-    missing_vars = [var for var in required_vars if not os.getenv(var)]
-    if missing_vars:
-        raise EnvironmentError(
-            f"Missing required environment variables: {', '.join(missing_vars)}"
-        )
-    return True
-class Config:
-    """Configuration manager for the MERe Workshop application."""
-    def __init__(
-        self,
-        config_path: str = "config.yaml"
-    ):
-        self._config_path = config_path
-        self._config = self._load_config()
-    def _load_config(
-        self
-    ) -> Dict[str, Any] | None:
-        """Load configuration from YAML file."""
-        if validate_env():
-            config_file = _root / self._config_path
-            if not config_file.exists():
-                raise FileNotFoundError(f"Configuration file not found: {config_file}")
-            with open(config_file, "r", encoding="utf-8") as f:
-                return yaml.safe_load(f)
-    def get(
-        self,
-        key_path: str,
-        default: Any = None
-    ) -> Any:
-        """Get configuration value using dot notation (e.g., 'datasets.sample')."""
-        keys = key_path.split(".")
-        value = self._config
-        for key in keys:
-            if isinstance(value, dict) and key in value:
-                value = value[key]
-            else:
-                return default
-        return value
-    @property
-    def datasets(
-        self
-    ) -> Dict[str, str]:
-        """Get dataset configuration."""
-        return self.get("datasets", {})
-    @property
-    def webhook_events(
-        self
-    ) -> Dict[str, Any]:
-        """Get webhook configuration."""
-        return self.get("webhooks.events", {})
-    @property
-    def phase1(
-        self
-    ) -> Dict[str, Any]:
-        """Get Phase 1 configuration."""
-        return self.get("phase1", {})
-    @property
-    def users_config(
-        self
-    ) -> Dict[str, Any]:
-        """Get users configuration."""
-        return self.get("users", {})
-    @property
-    def paths(
-        self
-    ) -> Dict[str, str]:
-        """Get file paths configuration."""
-        return self.get("paths", {})
-# Global config instance
-_config = Config()
-# Global Argilla client instance
-_client = None
-# Global Hugging Face API instance
-_hf_api = None
-def get_root(
-) -> Path:
-    """Get the project root directory."""
-    return _root
-def get_config(
-) -> Config:
-    """Get the configuration manager."""
-    return _config
-def get_client(
-) -> rg.Argilla: # type: ignore
-    """Get the Argilla client."""
-    global _client
-    if _client is not None:
-        return _client
-    if validate_env():
-        try:
-            _client = rg.Argilla(
-                api_url=os.getenv("ARGILLA_API_URL"),
-                api_key=os.getenv("ARGILLA_API_KEY"),
-            )
-            return _client
-        except Exception as e:
-            if "ArgillaCredentialsError" in str(e):
-                print(
-                    "\n    HINT: Did you wipe/restart the space? If you did, ",
-                    "you need to update your Argilla API key!\n"
-                )
-            raise
-def get_hf_api(
-) -> HfApi: # type: ignore
-    """Get the HuggingFace API client."""
-    global _hf_api
-    if _hf_api is not None:
-        return _hf_api
-    if validate_env():
-        _hf_api = HfApi(token=os.getenv("HF_TOKEN"))
-        return _hf_api
-def load_users(
-) -> list[Dict[str, str]] | None:
-    """Load users from CSV file specified in config."""
-    config = get_config()
-    csv_path = config.get("paths.users_csv", "users.csv")
-    full_path = _root / csv_path
-    if not full_path.exists():
-        raise FileNotFoundError(f"Users CSV file not found: {full_path}")
-    import csv
-    users = []
-    with open(full_path, "r", newline="", encoding="utf-8") as csvfile:
-        reader = csv.DictReader(csvfile)
-        for row in reader:
-            user_data = {key.rstrip(): value.rstrip() for key, value in row.items()}
-            users.append(user_data)
-    return users

utils/user_utils.py DELETED Viewed

@@ -1,208 +0,0 @@
-"""
-user_utils.py
-Helper functions for user management in the MERe Workshop annotation pipeline.
-Transformed from create-users.py script to follow proper helper function paradigm.
-"""
-from typing import Dict, List, Optional
-import argilla as rg
-from .setup_utils import (
-    get_config,
-    get_client,
-    load_users
-)
-from .log_utils import (
-    log_info,
-    log_operation_success,
-    log_operation_failure,
-    log_user_operation
-)
-# Get config
-_config = get_config()
-# Get client
-_client = get_client()
-def create_user(
-    user_data: Dict[str, str],
-) -> bool:
-    """Create a single user."""
-    global _config
-    global _client
-    username = user_data['username']
-    # Check if user already exists
-    try:
-        for existing_user in _client.users:
-            if existing_user.username == username:
-                log_user_operation("created", username, f"role: {existing_user.role} (already exists)")
-                log_operation_success("create user", f"{username} (already exists)")
-                return True
-    except Exception:
-        # Continue with creation if check fails
-        pass
-    try:
-        # Create user
-        user = rg.User(
-            username=username,
-            first_name=user_data.get('first_name', ''),
-            last_name=user_data.get('last_name', ''),
-            role=user_data.get('role', _config.get('users.default_role', 'annotator')),
-            password=user_data['password']
-        )
-        created_user = user.create()
-        log_user_operation("created", username, f"role: {user.role}")
-        log_operation_success("create user", username)
-        return True
-    except Exception as e:
-        # Check if user already exists
-        error_str = str(e).lower()
-        if "conflict" in error_str or "already exists" in error_str or "not unique" in error_str:
-            log_user_operation("created", username, "role: annotator (already exists)")
-            return True
-        else:
-            log_operation_failure("create user", e)
-            return False
-def create_users(
-    users_data: Optional[List[Dict[str, str]]] = None
-) -> bool:
-    """Create all users from the CSV file or provided list."""
-    try:
-        if users_data is None:
-            users_data = load_users()
-        if not users_data:
-            log_operation_failure("create users", Exception("No users found"))
-            return False
-        # Create each user
-        success_count = 0
-        for user_data in users_data:
-            if create_user(user_data):
-                success_count += 1
-        log_operation_success("create users",
-                             f"Created {success_count}/{len(users_data)} users successfully")
-        return success_count == len(users_data)
-    except Exception as e:
-        log_operation_failure("create users", e)
-        return False
-def delete_user(
-    username: str,
-    skip_admin: bool = True
-) -> bool:
-    """Delete a single user."""
-    global _client
-    try:
-        # Find and delete user
-        users = _client.users
-        user_to_delete = None
-        user_found = False
-        for user in users:
-            if user.username == username:
-                user_found = True
-                if skip_admin:
-                    if user.role not in ["owner", "admin"]:
-                        user_to_delete = user
-                        break
-                    else:
-                        log_info(f"SKIPPED OWNER or ADMIN ({user.username})")
-                        # Skipping admin/owner is considered success
-                        return True
-                else:
-                    user_to_delete = user
-                    break
-        if not user_found:
-            log_operation_failure("delete user", Exception(f"User {username} not found"))
-            return False
-        if not user_to_delete:
-            log_operation_failure("delete user", Exception(f"User {username} could not be deleted"))
-            return False
-        # Delete user
-        user_to_delete.delete()
-        log_user_operation("deleted", username)
-        return True
-    except Exception as e:
-        log_operation_failure("delete user", e)
-        return False
-def delete_users(
-    usernames: Optional[List[str]] = None
-) -> bool:
-    """Delete all users or specified users."""
-    try:
-        global _client
-        if usernames is None:
-            # Delete all users
-            users = _client.users
-            usernames = [user.username for user in users if user.username]
-        if not usernames:
-            log_operation_success("delete users", "No users to delete")
-            return True
-        # Delete each user
-        success_count = 0
-        for username in usernames:
-            if delete_user(username):
-                success_count += 1
-        log_operation_success("delete users",
-                             f"Deleted {success_count}/{len(usernames)} users")
-        return success_count == len(usernames)
-    except Exception as e:
-        log_operation_failure("delete users", e)
-        return False
-def list_users(
-) -> List[Dict[str, str]]:
-    """List all users with their details."""
-    try:
-        global _client
-        users = _client.users
-        user_list = []
-        for user in users:
-            user_info = {
-                'username': user.username or '',
-                'first_name': user.first_name or '',
-                'last_name': user.last_name or '',
-                'role': user.role or '',
-                'id': str(user.id) if user.id else ''
-            }
-            user_list.append(user_info)
-        log_user_operation("listed all users", f"Found {len(user_list)} users")
-        return user_list
-    except Exception as e:
-        log_operation_failure("list users", e)
-        return []

utils/webhook_utils.py DELETED Viewed

@@ -1,340 +0,0 @@
-"""
-webhook_utils.py
-Helper functions for webhook management in the MERe Workshop annotation pipeline.
-Transformed from create-webhooks.py and related scripts to follow proper helper function paradigm.
-"""
-import os
-from typing import List, Optional, Dict
-import argilla as rg
-from .setup_utils import (
-    get_config,
-    get_client
-)
-from .log_utils import (
-    log_operation_success,
-    log_operation_failure,
-    log_webhook_operation
-)
-# Setup config
-_config = get_config()
-# Setup client
-_client = get_client()
-def create_webhook(
-    event: str,
-    description: str,
-) -> Optional[rg.Webhook]:
-    """Create a webhook for a specific event."""
-    global _client
-    webhook_url = os.getenv("ARGILLA_WEBHOOK_URL")
-    if not webhook_url:
-        log_operation_failure("create webhook",
-                              Exception(f"ARGILLA_WEBHOOK_URL environment variable not set for {event}"))
-        return None
-    try:
-        webhook = rg.Webhook(
-            url=webhook_url,
-            events=[event], # type: ignore
-            description=description
-        )
-        created_webhook = webhook.create()
-        log_webhook_operation("created", event, description)
-        return created_webhook #type: ignore
-    except Exception as e:
-        log_operation_failure("create webhook", e)
-        return None
-def list_webhook_events(
-) -> List[str]:
-    """Return list of webhook events from configuration."""
-    global _config
-    return _config.get('webhooks.events', [])
-def create_webhooks(
-) -> bool:
-    """Create webhooks for all configured events."""
-    try:
-        global _client
-        events = list_webhook_events()
-        if not events:
-            log_operation_failure("create webhooks",
-                                  Exception("No webhook events configured"))
-            return False
-        webhook_url = os.getenv("ARGILLA_WEBHOOK_URL")
-        if not webhook_url:
-            log_operation_failure("create webhooks",
-                                  Exception("ARGILLA_WEBHOOK_URL environment variable not set"))
-            return False
-        # Create webhooks for each event, recreating if they already exist
-        success_count = 0
-        for event in events:
-            # Check if webhook already exists
-            if webhook_exists(event):
-                log_webhook_operation("already exists", event, "recreating")
-                # Delete existing webhook first
-                for webhook in _client.webhooks:
-                    if webhook.events and event in webhook.events:
-                        webhook.delete()
-                        log_webhook_operation("deleted existing", event)
-                        break
-            description = f"Webhook for {event} events to {webhook_url}"
-            if create_webhook(event, description):
-                success_count += 1
-        log_operation_success("create webhooks",
-                            f"Created {success_count}/{len(events)} webhooks successfully")
-        return success_count == len(events)
-    except Exception as e:
-        log_operation_failure("create webhooks", e)
-        return False
-def list_webhooks(
-) -> List[Dict[str, str]]:
-    """List all existing webhooks."""
-    try:
-        global _client
-        webhooks = _client.webhooks
-        webhook_list = []
-        for webhook in webhooks:
-            webhook_info = {
-                'url': webhook.url or '',
-                'events': ', '.join(webhook.events) if webhook.events else '',
-                'description': webhook.description or ''
-            }
-            webhook_list.append(webhook_info)
-        log_webhook_operation("listed all webhooks", f"Found {len(webhook_list)} webhooks")
-        return webhook_list
-    except Exception as e:
-        log_operation_failure("list webhooks", e)
-        return []
-def delete_webhook(
-    webhook_url: str,
-    webhook_events: List[str],
-) -> bool:
-    """Delete a specific webhook by URL and events."""
-    try:
-        global _client
-        # Find webhook by URL and events
-        webhook_to_delete = None
-        for webhook in _client.webhooks:
-            if (webhook.url == webhook_url and
-                webhook.events and
-                set(webhook.events) == set(webhook_events)):
-                webhook_to_delete = webhook
-                break
-        if not webhook_to_delete:
-            log_operation_failure("delete webhook",
-                                Exception(f"Webhook with URL {webhook_url} and events {webhook_events} not found"))
-            return False
-        # Delete webhook
-        webhook_to_delete.delete()
-        log_webhook_operation("deleted", f"{webhook_url} ({', '.join(webhook_events)})")
-        return True
-    except Exception as e:
-        log_operation_failure("delete webhook", e)
-        return False
-def delete_webhooks(
-    webhook_specs: Optional[List[Dict[str, str]]] = None
-) -> bool:
-    """Delete all webhooks or specified webhooks."""
-    try:
-        global _client
-        if webhook_specs is None:
-            # Delete all webhooks
-            webhooks = _client.webhooks
-            webhook_specs = []
-            for webhook in webhooks:
-                if webhook.url and webhook.events:
-                    webhook_specs.append({
-                        'url': webhook.url,
-                        'events': ','.join(webhook.events)
-                    })
-        if not webhook_specs:
-            log_operation_success("delete webhooks", "No webhooks to delete")
-            return True
-        # Delete each webhook
-        success_count = 0
-        for webhook_spec in webhook_specs:
-            webhook_url = webhook_spec.get('url', '')
-            webhook_events = webhook_spec.get('events', '').split(',') if webhook_spec.get('events') else []
-            if delete_webhook(webhook_url, webhook_events):
-                success_count += 1
-        log_operation_success("delete webhooks",
-                            f"Deleted {success_count}/{len(webhook_specs)} webhooks")
-        return success_count == len(webhook_specs)
-    except Exception as e:
-        log_operation_failure("delete webhooks", e,)
-        return False
-def webhook_exists(
-    event: str
-) -> bool:
-    """Check if a webhook already exists for a specific event."""
-    try:
-        global _client
-        webhooks = _client.webhooks
-        for webhook in webhooks:
-            if webhook.events and event in webhook.events:
-                log_webhook_operation("found existing", event, f"webhook URL: {webhook.url}")
-                return True
-        return False
-    except Exception as e:
-        log_operation_failure("check webhook exists", e)
-        return False
-def validate_webhooks(
-) -> bool:
-    """Validate that webhook configuration is correct."""
-    try:
-        # Check if webhook URL is set
-        webhook_url = os.getenv("ARGILLA_WEBHOOK_URL")
-        if not webhook_url:
-            log_operation_failure("validate webhook config", Exception("ARGILLA_WEBHOOK_URL environment variable not set"))
-            return False
-        # Check if events are configured
-        events = list_webhook_events()
-        if not events:
-            log_operation_failure("validate webhook config", Exception("No webhook events configured"))
-            return False
-        # Check if Argilla client can be created
-        try:
-            get_client()
-        except Exception as e:
-            log_operation_failure("validate webhook config", Exception(f"Cannot create Argilla client: {str(e)}"))
-            return False
-        log_operation_success("validate webhook config", f"Configuration valid for {len(events)} events")
-        return True
-    except Exception as e:
-        log_operation_failure("validate webhook config", e)
-        return False
-def update_webhook(
-    webhook_url: str,
-    webhook_events: List[str],
-    new_url: Optional[str] = None,
-    new_events: Optional[List[str]] = None,
-    new_description: Optional[str] = None,
-) -> bool:
-    """Update a webhook's properties by recreating it (since Argilla doesn't support direct updates)."""
-    try:
-        global _client
-        # Find webhook
-        webhook = None
-        for w in _client.webhooks:
-            if (w.url == webhook_url and
-                w.events and
-                set(w.events) == set(webhook_events)):
-                webhook = w
-                break
-        if not webhook:
-            log_operation_failure("update webhook",
-                                Exception(f"Webhook with URL {webhook_url} and events {webhook_events} not found"))
-            return False
-        # Since Argilla doesn't support direct webhook updates, we need to recreate
-        # First delete the existing webhook
-        webhook.delete()
-        log_webhook_operation("deleted for update", f"{webhook_url} ({', '.join(webhook_events)})")
-        # Create new webhook with updated properties
-        final_url = new_url if new_url else webhook_url
-        final_events = new_events if new_events else webhook_events
-        final_description = new_description if new_description else webhook.description
-        for event in final_events:
-            description = final_description or f"Webhook for {event} events to {final_url}"
-            new_webhook = rg.Webhook(
-                url=final_url,
-                events=[event], # type: ignore
-                description=description
-            )
-            new_webhook.create()
-        updates = []
-        if new_url:
-            updates.append(f"url: {new_url}")
-        if new_events:
-            updates.append(f"events: {', '.join(new_events)}")
-        if new_description:
-            updates.append(f"description: {new_description}")
-        log_operation_success("update webhook", f"{webhook_url} - {', '.join(updates)}")
-        return True
-    except Exception as e:
-        log_operation_failure("update webhook", e)
-        return False
-def update_webhooks(
-    webhook_updates: List[Dict[str, str]]
-) -> bool:
-    """Update multiple webhooks."""
-    success_count = 0
-    for update_info in webhook_updates:
-        webhook_url = update_info.get('url', '')
-        webhook_events = update_info.get('events', '').split(',') if update_info.get('events') else []
-        new_url = update_info.get('new_url')
-        new_events = update_info.get('new_events', '').split(',') if update_info.get('new_events') else None
-        new_description = update_info.get('new_description')
-        if update_webhook(webhook_url, webhook_events, new_url, new_events, new_description):
-            success_count += 1
-    log_operation_success("update webhooks",
-                        f"Updated {success_count}/{len(webhook_updates)} webhooks")
-    return success_count == len(webhook_updates)

utils/wipe_utils.py DELETED Viewed

@@ -1,164 +0,0 @@
-"""
-wipe_utils.py
-Helper functions for wiping/cleaning Argilla space in the MERe Workshop annotation pipeline.
-Transformed from wipe-space.py script to follow proper helper function paradigm.
-"""
-from .setup_utils import get_client
-from .dataset_utils import delete_datasets
-from .user_utils import delete_users
-from .webhook_utils import delete_webhooks
-from .workspace_utils import delete_workspaces
-from .log_utils import (
-    log_operation_success,
-    log_operation_failure,
-)
-# Setup client
-_client = get_client()
-def wipe_space(
-) -> bool:
-    """Completely wipe the Argilla space - datasets, users, workspaces, and webhooks."""
-    try:
-        # Track success of each operation
-        operations = [
-            ("datasets", delete_datasets),
-            ("webhooks", delete_webhooks),
-            ("users", delete_users),
-            ("workspaces", delete_workspaces)
-        ]
-        operations_results = {}
-        # Execute each operation and continue even if one fails
-        for operation_name, operation_func in operations:
-            try:
-                success = operation_func()
-                operations_results[operation_name] = success
-                if success:
-                    log_operation_success(f"wipe {operation_name}", "Operation completed successfully")
-                else:
-                    log_operation_failure(f"wipe {operation_name}", Exception("Operation completed with some failures"))
-            except Exception as e:
-                operations_results[operation_name] = False
-                log_operation_failure(f"wipe {operation_name}", e)
-        # Calculate summary
-        successful_ops = sum(1 for success in operations_results.values() if success)
-        total_ops = len(operations_results)
-        if successful_ops == total_ops:
-            log_operation_success("wipe entire Argilla space", "All components deleted successfully")
-            return True
-        else:
-            failed_ops = [name for name, success in operations_results.items() if not success]
-            log_operation_failure("wipe entire Argilla space",
-                                Exception(f"{total_ops - successful_ops}/{total_ops} operations failed: {', '.join(failed_ops)}"))
-            # Return True if at least some operations succeeded
-            return successful_ops > 0
-    except Exception as e:
-        log_operation_failure("wipe entire Argilla space", e)
-        return False
-def wipe_datasets_only(
-) -> bool:
-    """Wipe only datasets, keeping users and workspaces."""
-    try:
-        success = delete_datasets()
-        if success:
-            log_operation_success("wipe datasets only", "All datasets deleted successfully")
-        else:
-            log_operation_failure("wipe datasets only", Exception("Some datasets could not be deleted"))
-        return success
-    except Exception as e:
-        log_operation_failure("wipe datasets only", e)
-        return False
-def wipe_users_only(
-) -> bool:
-    """Wipe only users, keeping datasets and workspaces."""
-    try:
-        success = delete_users()
-        if success:
-            log_operation_success("wipe users only", "All users deleted successfully")
-        else:
-            log_operation_failure("wipe users only", Exception("Some users could not be deleted"))
-        return success
-    except Exception as e:
-        log_operation_failure("wipe users only", e)
-        return False
-def wipe_webhooks_only(
-) -> bool:
-    """Wipe only webhooks, keeping everything else."""
-    try:
-        success = delete_webhooks()
-        if success:
-            log_operation_success("wipe webhooks only", "All webhooks deleted successfully")
-        else:
-            log_operation_failure("wipe webhooks only", Exception("Some webhooks could not be deleted"))
-        return success
-    except Exception as e:
-        log_operation_failure("wipe webhooks only", e)
-        return False
-def get_status(
-) -> dict:
-    """Get current status of the Argilla space (counts of datasets, users, etc.)."""
-    try:
-        global _client
-        # Count datasets across all workspaces
-        total_datasets = 0
-        total_records = 0
-        for workspace in _client.workspaces:
-            workspace_datasets = workspace.datasets
-            total_datasets += len(workspace_datasets)
-            for dataset in workspace_datasets:
-                try:
-                    records = list(dataset.records)
-                    total_records += len(records)
-                except Exception:
-                    # Skip if can't access records
-                    pass
-        status = {
-            'workspaces': len(_client.workspaces),
-            'users': len(_client.users),
-            'datasets': total_datasets,
-            'records': total_records,
-            'webhooks': len(_client.webhooks)
-        }
-        log_operation_success("get space status", f"Status retrieved: {status}")
-        return status
-    except Exception as e:
-        log_operation_failure("get space status", e)
-        return {
-            'workspaces': 0,
-            'users': 0,
-            'datasets': 0,
-            'records': 0,
-            'webhooks': 0,
-            'error': str(e)
-        }

utils/workspace_utils.py DELETED Viewed

@@ -1,387 +0,0 @@
-"""
-workspace_utils.py
-Helper functions for workspace management in the MERe Workshop annotation pipeline.
-Handles workspace creation, deletion, user assignment, and management operations.
-"""
-from typing import Dict, List, Optional
-import warnings
-import argilla as rg
-from .setup_utils import (
-    get_client,
-    load_users
-)
-from .log_utils import (
-    log_operation_success,
-    log_operation_failure,
-    log_user_operation
-)
-# Setup client
-_client = get_client()
-def create_workspace(
-    workspace_name: str,
-) -> bool:
-    """Create a single workspace."""
-    global _client
-    # Check if workspace already exists
-    try:
-        with warnings.catch_warnings():
-            warnings.simplefilter("ignore")
-            existing_workspace = _client.workspaces(workspace_name)
-        if existing_workspace:
-            log_operation_success("create workspace", f"{workspace_name} (already exists)")
-            return True
-    except Exception:
-        # Workspace doesn't exist, continue with creation
-        pass
-    try:
-        workspace = rg.Workspace(name=workspace_name)
-        workspace.create()
-        log_operation_success("create workspace", workspace_name)
-        return True
-    except Exception as e:
-        # Check if workspace already exists
-        error_str = str(e).lower()
-        if "conflict" in error_str or "already exists" in error_str or "not unique" in error_str:
-            log_operation_success("create workspace", f"{workspace_name} (already exists)")
-            return True
-        else:
-            log_operation_failure("create workspace", e)
-            return False
-def create_workspaces(
-    workspace_names: List[str]
-) -> bool:
-    """Create multiple workspaces from a list of workspace names."""
-    global _client
-    success_count = 0
-    for workspace_name in workspace_names:
-        if create_workspace(workspace_name):
-            success_count += 1
-    log_operation_success("create workspaces",
-                        f"Created {success_count}/{len(workspace_names)} workspaces")
-    return success_count == len(workspace_names)
-def create_user_workspace(
-    username: str,
-    workspace_name: str
-) -> bool:
-    """Add a user to a specific workspace."""
-    global _client
-    try:
-        # Find user
-        user = None
-        for u in _client.users:
-            if u.username == username:
-                user = u
-                break
-        if not user:
-            log_operation_failure("add user to workspace", Exception(f"User {username} not found"))
-            return False
-        # Find workspace
-        with warnings.catch_warnings():
-            warnings.simplefilter("ignore")
-            workspace = _client.workspaces(workspace_name)
-        if not workspace:
-            log_operation_failure("add user to workspace", Exception(f"Workspace {workspace_name} not found"))
-            return False
-        # Check if user is already in workspace
-        try:
-            workspace_users = list(workspace.users)
-            for existing_user in workspace_users:
-                if existing_user.username == username:
-                    log_user_operation("added to workspace", username, f"{workspace_name} (already assigned)")
-                    return True
-        except Exception:
-            # Continue if check fails
-            pass
-        # Add user to workspace
-        workspace.add_user(user) #type: ignore
-        log_user_operation("added to workspace", username, workspace_name)
-        return True
-    except Exception as e:
-        # Check if user already in workspace
-        error_str = str(e).lower()
-        if "conflict" in error_str or "already" in error_str:
-            log_user_operation("added to workspace", username, f"{workspace_name} (already assigned)")
-            return True
-        else:
-            log_operation_failure("add user to workspace", e)
-            return False
-def create_user_workspaces(
-    user_workspace_map: Optional[Dict[str, List[str]]] = None
-) -> bool:
-    """Create workspaces for users based on mapping or CSV data."""
-    if user_workspace_map is None:
-        # Load from CSV and create user workspaces based on usernames
-        users = load_users()
-        if not users:
-            log_operation_failure("create user workspaces", Exception("No users found in CSV"))
-            return False
-        success_count = 0
-        total_count = 0
-        for user_data in users:
-            username = user_data['username']
-            # Create workspace with username as workspace name
-            total_count += 1
-            if create_workspace(username):
-                # Add user to their workspace
-                if create_user_workspace(username, username):
-                    success_count += 1
-        log_operation_success("create user workspaces from CSV",
-                            f"Created {success_count}/{total_count} user workspaces")
-        return success_count == total_count
-    else:
-        # Use provided mapping
-        success_count = 0
-        total_count = 0
-        for username, workspace_names in user_workspace_map.items():
-            for workspace_name in workspace_names:
-                total_count += 1
-                if create_user_workspace(username, workspace_name):
-                    success_count += 1
-        log_operation_success("create user workspaces from mapping",
-                            f"Added users to {success_count}/{total_count} workspaces")
-        return success_count == total_count
-def delete_workspace(
-    workspace_name: str, client: Optional[rg.Argilla] = None
-) -> bool:
-    """Delete a single workspace."""
-    global _client
-    try:
-        with warnings.catch_warnings():
-            warnings.simplefilter("ignore")
-            workspace = _client.workspaces(workspace_name)
-        if not workspace:
-            log_operation_failure("delete workspace", Exception(f"Workspace {workspace_name} not found"))
-            return False
-        # Check for remaining datasets first
-        try:
-            datasets = list(workspace.datasets)
-            if datasets:
-                dataset_names = [ds.name for ds in datasets if ds.name]
-                log_operation_failure("delete workspace",
-                                    Exception(f"Workspace {workspace_name} still has datasets: {', '.join(dataset_names)}. Delete datasets first."))
-                return False
-        except Exception as e:
-            # If we can't check datasets, try to continue
-            log_operation_failure("check workspace datasets", e)
-        # Remove all users from workspace first
-        try:
-            workspace_users = list(workspace.users)
-            for user in workspace_users:
-                try:
-                    workspace.remove_user(user)
-                    log_user_operation("removed from workspace", user.username or f"User-{user.id}", workspace_name)
-                except Exception as e:
-                    log_operation_failure("remove user from workspace", e)
-        except Exception as e:
-            # Continue if user removal fails
-            log_operation_failure("remove users from workspace", e)
-        # Delete the workspace
-        workspace.delete()
-        log_operation_success("delete workspace", workspace_name)
-        return True
-    except Exception as e:
-        # Check if it's a dependency error
-        error_str = str(e).lower()
-        if "has some datasets linked" in error_str or "dependency" in error_str:
-            log_operation_failure("delete workspace",
-                                Exception(f"Workspace {workspace_name} cannot be deleted due to remaining dependencies"))
-        else:
-            log_operation_failure("delete workspace", e)
-        return False
-def delete_workspaces(
-    workspace_names: Optional[List[str]] = None
-) -> bool:
-    """Delete multiple workspaces or all workspaces if none specified."""
-    global _client
-    if workspace_names is None:
-        # Delete all workspaces
-        workspaces = _client.workspaces
-        workspace_names = [ws.name for ws in workspaces if ws.name]
-    success_count = 0
-    for workspace_name in workspace_names:
-        if delete_workspace(workspace_name):
-            success_count += 1
-    log_operation_success("delete workspaces",
-                        f"Deleted {success_count}/{len(workspace_names)} workspaces")
-    return success_count == len(workspace_names)
-def delete_user_workspace(
-    username: str,
-    workspace_name: str,
-    delete_if_empty: bool = True
-) -> bool:
-    """Remove a user from a workspace and optionally delete workspace if empty."""
-    global _client
-    try:
-        # Find user
-        user = None
-        for u in _client.users:
-            if u.username == username:
-                user = u
-                break
-        if not user:
-            log_operation_failure("remove user from workspace", Exception(f"User {username} not found"))
-            return False
-        # Find workspace
-        with warnings.catch_warnings():
-            warnings.simplefilter("ignore")
-            workspace = _client.workspaces(workspace_name)
-        if not workspace:
-            log_operation_failure("remove user from workspace", Exception(f"Workspace {workspace_name} not found"))
-            return False
-        # Remove user from workspace
-        workspace.remove_user(user)
-        log_user_operation("removed from workspace", username, workspace_name)
-        # Check if workspace is empty and delete if requested
-        if delete_if_empty:
-            remaining_users = workspace.users
-            if not remaining_users:
-                workspace.delete()
-                log_operation_success("delete empty workspace", workspace_name)
-            else:
-                log_operation_success("workspace not empty", f"{workspace_name} still has {len(remaining_users)} users")
-        return True
-    except Exception as e:
-        log_operation_failure("remove user from workspace", e)
-        return False
-def delete_user_workspaces(usernames: List[str]) -> bool:
-    """Remove users from all their workspaces and delete empty workspaces."""
-    success_count = 0
-    for username in usernames:
-        user_workspaces = list_user_workspaces(username)
-        user_success = True
-        for workspace_name in user_workspaces:
-            if not delete_user_workspace(username, workspace_name, delete_if_empty=True):
-                user_success = False
-        if user_success:
-            success_count += 1
-    log_operation_success("delete user workspaces",
-                        f"Processed {success_count}/{len(usernames)} users")
-    return success_count == len(usernames)
-def list_workspaces(
-) -> List[Dict[str, str]]:
-    """List all workspaces with their details."""
-    global _client
-    try:
-        workspaces = _client.workspaces
-        workspace_list = []
-        for workspace in workspaces:
-            workspace_info = {
-                'name': workspace.name or '',
-                'id': str(workspace.id) if workspace.id else '',
-                'user_count': str(len(workspace.users))
-            }
-            workspace_list.append(workspace_info)
-        log_operation_success("list workspaces", f"Found {len(workspace_list)} workspaces")
-        return workspace_list
-    except Exception as e:
-        log_operation_failure("list workspaces", e)
-        return []
-def list_user_workspaces(
-    username: str,
-) -> List[str]:
-    """Get list of workspaces a user has access to."""
-    global _client
-    try:
-        # Find user
-        user = None
-        for u in _client.users:
-            if u.username == username:
-                user = u
-                break
-        if not user:
-            log_operation_failure("get user workspaces", Exception(f"User {username} not found"))
-            return []
-        # Get workspaces the user has access to
-        workspaces = []
-        for workspace in _client.workspaces:
-            try:
-                # Check if user has access to workspace
-                workspace_users = workspace.users
-                if any(wu.id == user.id for wu in workspace_users):
-                    workspaces.append(workspace.name or '')
-            except Exception:
-                # Skip workspaces we can't access
-                continue
-        log_user_operation("listed workspaces", username, f"Found {len(workspaces)} workspaces")
-        return workspaces
-    except Exception as e:
-        log_operation_failure("get user workspaces", e)
-        return []

wipe.py DELETED Viewed

@@ -1,165 +0,0 @@
-#!/usr/bin/env python3
-"""
-wipe.py
-Clean wipe script for the MERe Workshop annotation pipeline.
-Removes users, workspaces, datasets, and webhooks using modular helper functions.
-"""
-import sys
-import argparse
-from pathlib import Path
-from utils import (
-    validate_env,
-    log_operation_success,
-    log_operation_failure,
-    wipe_space,
-    wipe_datasets_only,
-    wipe_users_only,
-    wipe_webhooks_only,
-    get_status,
-    log_info,
-    log_warning
-)
-def parse_args():
-    """Parse command line arguments."""
-    parser = argparse.ArgumentParser(
-        description="Wipe MERe Workshop Argilla space",
-        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
-    )
-    parser.add_argument(
-        "-d", "--datasets-only",
-        action="store_true",
-        help="Only wipe datasets, keep users and workspaces",
-    )
-    parser.add_argument(
-        "-u", "--users-only",
-        action="store_true",
-        help="Only wipe users, keep datasets and workspaces",
-    )
-    parser.add_argument(
-        "-w", "--webhooks-only",
-        action="store_true",
-        help="Only wipe webhooks, keep everything else",
-    )
-    parser.add_argument(
-        "-s", "--status-only",
-        action="store_true",
-        help="Only show current space status, do not perform wipe",
-    )
-    parser.add_argument("--force", action="store_true", help="Skip confirmation prompt")
-    return parser.parse_args()
-def show_space_status():
-    """Display current space status."""
-    status = get_status()
-    if "error" in status:
-        log_operation_failure("check space status", status["error"])
-        return False
-    print()
-    log_info("=== Current Argilla Space Status ===")
-    log_info(f"Workspaces: {status['workspaces']}")
-    log_info(f"Users: {status['users']}")
-    log_info(f"Datasets: {status['datasets']}")
-    log_info(f"Records: {status['records']}")
-    log_info(f"Webhooks: {status['webhooks']}")
-    print()
-    return True
-def confirm_wipe(
-    operation_description: str,
-    force: bool = False
-) -> bool:
-    """Confirm wipe operation with user."""
-    if force:
-        return True
-    log_warning(f"WARNING: This will {operation_description}")
-    log_warning("This action cannot be undone!")
-    log_warning("Are you sure you want to proceed? [y/N]:")
-    response = input().strip().lower()
-    return response in ["y", "yes"]
-def main():
-    """Main wipe function."""
-    args = parse_args()
-    # Validate environment
-    try:
-        validate_env()
-        log_operation_success("wipe validation", "Environment validated")
-    except Exception as e:
-        log_operation_failure("wipe validation", e)
-        return 1
-    # Show current status
-    if not show_space_status():
-        return 1
-    # If status-only mode, exit here
-    if args.status_only:
-        return 0
-    # Determine operation and confirmation message
-    if args.datasets_only:
-        operation = "datasets only"
-        confirmation_msg = "delete ALL DATASETS (keeping users and workspaces)"
-        wipe_function = wipe_datasets_only
-    elif args.users_only:
-        operation = "users only"
-        confirmation_msg = (
-            "delete ALL USERS (keeping datasets, workspaces, and webhooks)"
-        )
-        wipe_function = wipe_users_only
-    elif args.webhooks_only:
-        operation = "webhooks only"
-        confirmation_msg = "delete ALL WEBHOOKS (keeping users and datasets)"
-        wipe_function = wipe_webhooks_only
-    else:
-        operation = "entire space"
-        confirmation_msg = "DELETE EVERYTHING (users, workspaces, datasets, webhooks)"
-        wipe_function = wipe_space
-    # Confirm operation
-    if not confirm_wipe(confirmation_msg, args.force):
-        log_info("Wipe operation cancelled")
-        return 0
-    # Perform wipe operation
-    print()
-    log_info(f"Wiping {operation}...")
-    success = wipe_function()
-    if success:
-        log_operation_success(f"wipe {operation}", "Operation completed successfully")
-    else:
-        log_operation_failure(f"wipe {operation}", Exception("Operation failed"))
-        return 1
-    # Show final status
-    if not show_space_status():
-        return 1
-    log_operation_success("Wipe operation completed", send_to_slack=True)
-    return 0
-if __name__ == "__main__":
-    exit(main())

wipe.sh DELETED Viewed

@@ -1,16 +0,0 @@
-#!/bin/bash
-# wipe.sh
-#
-# Shell wrapper for the MERe Workshop wipe process.
-set -euo pipefail
-# Get the directory where this script is located
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &> /dev/null && pwd)"
-# Change to the script directory
-cd "$SCRIPT_DIR"
-# Run the wipe script
-python wipe.py "$@"