andrewelawrence commited on
Commit
8cec78f
·
1 Parent(s): 8c6a471

Migration to RIET-lab/moral-kg-workshop-listenr

Browse files
.gitignore CHANGED
@@ -1,5 +1,4 @@
1
  # Env file nor User config file not to be uploaded to the HF Space!
2
  .env
3
- users/
4
  archive/
5
- *__pycache__*
 
1
  # Env file nor User config file not to be uploaded to the HF Space!
2
  .env
 
3
  archive/
4
+ *__pycache__*
README.md CHANGED
@@ -21,7 +21,6 @@ Part of RIET Lab's initiative to improve AI using moral reasoning.
21
  **Organization**: [https://huggingface.co/RIET-lab](https://huggingface.co/RIET-lab)
22
  **Workshop Website**: [https://sites.google.com/view/mereworkshop](https://sites.google.com/view/mereworkshop)
23
  **Argdown Documentation**:[https://argdown.org/guide/](https://argdown.org/guide/)
24
- **Listener HF Space**: [https://huggingface.co/spaces/RIET-lab/moral-kg-workshop-listener](https://huggingface.co/spaces/RIET-lab/moral-kg-workshop-listener)
25
 
26
  - Creating your own Argilla Spaces, check the [quickstart guide](http://docs.argilla.io/latest/getting_started/quickstart/) and the [Hugging Face Spaces configuration](http://docs.argilla.io/latest/getting_started/how-to-configure-argilla-on-huggingface/) for more details.
27
  - Discovering the Argilla UI, sign in with your Hugging Face account!
 
21
  **Organization**: [https://huggingface.co/RIET-lab](https://huggingface.co/RIET-lab)
22
  **Workshop Website**: [https://sites.google.com/view/mereworkshop](https://sites.google.com/view/mereworkshop)
23
  **Argdown Documentation**:[https://argdown.org/guide/](https://argdown.org/guide/)
 
24
 
25
  - Creating your own Argilla Spaces, check the [quickstart guide](http://docs.argilla.io/latest/getting_started/quickstart/) and the [Hugging Face Spaces configuration](http://docs.argilla.io/latest/getting_started/how-to-configure-argilla-on-huggingface/) for more details.
26
  - Discovering the Argilla UI, sign in with your Hugging Face account!
SETUP.md DELETED
@@ -1,116 +0,0 @@
1
- # MERe Workshop Setup Guide
2
-
3
- Setup and usage guide for the MERe Workshop dataset annotation process.
4
- A very important note: much of this infrastructure is to avoid paying for a space - there is NO persistant storage in `moral-kg-workshop`.
5
-
6
- ## Environment
7
-
8
- ### Required Environment Variables
9
-
10
- ```bash
11
- export ARGILLA_API_URL="your-argilla-url"
12
- export ARGILLA_API_KEY="your-api-key"
13
- export HF_TOKEN="your-huggingface-token"
14
- ```
15
-
16
- ### Optional Environment Variables
17
-
18
- ```bash
19
- export SLACK_WEBHOOK_URL="your-slack-webhook-url"
20
- # For error notifications to slack channel
21
- # Requires a custom slack app setup with a webhook url.
22
- # See https://api.slack.com/messaging/webhooks
23
- ```
24
-
25
- ### Dependencies
26
-
27
- Install required Python packages:
28
- ```bash
29
- pip install -r requirements.txt
30
- ```
31
-
32
- ## Configuration
33
-
34
- See `config.yaml`
35
-
36
- ## Space Setup
37
-
38
- ### Complete Setup
39
-
40
- Run all setup operations (users, datasets, webhooks):
41
- ```bash
42
- ./setup.sh
43
- # or
44
- python setup.py
45
- ```
46
-
47
- ### Partial Setup
48
-
49
- Skip specific operations:
50
- ```bash
51
- python setup.py --skip-users # Skip user creation
52
- python setup.py --skip-workspaces # Skip workspace creation (breaks dataset allocation)
53
- python setup.py --skip-datasets # Skip dataset creation
54
- python setup.py --skip-webhooks # Skip webhook creation
55
- ```
56
-
57
- ### Status Check Only
58
-
59
- View current space status without making changes:
60
- ```bash
61
- python setup.py --status-only
62
- ```
63
-
64
- ## Wipe Operations
65
-
66
- ### Complete Wipe
67
-
68
- Remove everything (users, workspaces, datasets, webhooks):
69
- ```bash
70
- ./wipe.sh
71
- # or
72
- python3 wipe.py
73
- ```
74
-
75
- ### Selective Wipe
76
-
77
- Remove specific components:
78
- ```bash
79
- python wipe.py --datasets-only # Only datasets
80
- python wipe.py --users-only # Only users
81
- python wipe.py --webhooks-only # Only webhooks
82
- ```
83
-
84
- ### Force Wipe
85
-
86
- Skip confirmation prompts:
87
- ```bash
88
- python wipe.py --force
89
- ```
90
-
91
- ### Status Check Only
92
-
93
- View current space status without making changes:
94
- ```bash
95
- python wipe.py --status-only
96
- ```
97
-
98
- ## Troubleshooting
99
-
100
- ### Debug Mode
101
-
102
- For detailed debugging, set log level to DEBUG in `config.yaml`:
103
- ```yaml
104
- logging:
105
- level: "DEBUG"
106
- ```
107
-
108
- ### Status Commands
109
-
110
- ```bash
111
- # Check status during setup
112
- python3 setup.py --status-only
113
-
114
- # Check status during wipe
115
- python3 wipe.py --status-only
116
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SPEC.md DELETED
@@ -1,132 +0,0 @@
1
- # Moral-kg annotation process setup
2
- Notes on the annotation pipeline / data ETL process
3
-
4
- ## Annotation Pipeline Architecture
5
- Annotation occurs in two phases. During **Phase 1** annotators determine which set
6
- of claims best represents the argument of each paper. During **Phase 2** annotators
7
- map those claims into an Argument Map.
8
-
9
- ### Setup
10
- 1. Create users and user-specific workspaces based off of `users.csv` list
11
- 2. Create Phase 1 dataset with records for each user
12
- - NOTE: depending on space constraints, we could do one Phase 1 dataset for
13
- all users as only Phase 2 is user-response-dependent.
14
- 3. Create webhooks.
15
-
16
- ### Phase 1 Argilla Dataset
17
- Creation:
18
- - At startup
19
- Records Input:
20
- - Manual batch input via HF dataset `moral-kg-sample`
21
- Response Output:
22
- - Real-time webhook to HF dataset `moral-kg-sample-labels`
23
- - Real-time webhook to Argilla Phase 2 dataset records
24
- Updates:
25
- - Only if moral-kg-sample is updated (this is handled manually)
26
- Fields:
27
- - Title (Author, Year) "title_info"
28
- - Text "text"
29
- Metadata:
30
- - Identifier (visible to annotators) "id"
31
- Questions:
32
- - TextQuestion "claims"
33
- - Users list the claims which best represent the argument in the paper
34
- - AI/ML-generated claims are proposed in a list as a suggestion
35
- Webhooks:
36
- - Listen if the dataset is ever published (it shouldn't be) and notify admin if
37
- it is.
38
- - Response created/updated/deleted -> update `moral-kg-sample-labels`
39
- -> update Argilla Phase 2 records
40
-
41
- ### Phase 2 Argilla Dataset
42
- Creation:
43
- - When the first Phase 1 response is created
44
- Records Input:
45
- - Real-time webhook `response.created`/`.updated`/`.deleted` from Phase 1
46
- Response Output:
47
- - Real-time webhook to HF dataset `moral-kg-sample-maps`
48
- Updates:
49
- - When a Phase 1 response is created/updated/deleted
50
- Fields:
51
- - Title (Author, Year) "title_info"
52
- - Argdown Page "argdown"
53
- - Text "text"
54
- Metadata:
55
- - Identifier (visible to annotators) "id"
56
- Questions:
57
- - TextQuestion "argmap"
58
- - Users are asked to copy and paste their final Argdown input into this box
59
- as the solution.
60
- Webhooks:
61
- - Listen if the dataset is ever published (it shouldn't be) and notify admin if
62
- it is.
63
- - Response created/updated/deleted -> update `moral-kg-sample-maps`
64
-
65
- ## HuggingFace Datasets
66
- There are three huggingface datasets that will be involved in the annotation
67
- process: `moral-kg-sample`, `moral-kg-sample-labels`, and `moral-kg-sample-maps`.
68
-
69
- ### `moral-kg-sample` (private)
70
- Will store the data associated with each paper in the sample:
71
- - identifier | str | The Phil-Papers ID associated with each paper
72
- - title | str | The title of the paper
73
- - authors | list:str | The authors attributed to the paper
74
- - year | str | The publication year of the paper
75
- - text | str | The paper content (in plain text or markdown)
76
- - map | dict | The claim:method map that contains each claim
77
- extracted from the text and its associated
78
- extraction method.
79
-
80
- ### `moral-kg-sample-labels` (private)
81
- Will store data associated with the claims annotators select for each paper in
82
- the sample:
83
- - identifier | str | The Phil-Papers ID associated with each paper
84
- - annotator | str | The annotator's unique Argilla UUID
85
- - map | dict | The claim:method map that contains each claim the
86
- annotator selects as representative of the paper.
87
- Claims not found in the original map are labeled
88
- "annotator"
89
-
90
- ### `moral-kg-sample-maps` (private)
91
- Will store data associated with the argument maps annotators create for each
92
- paper in the sample:
93
- - identifier | str | The Phil-Papers ID associated with each paper
94
- - annotator | str | The annotator's unique Argilla UUID
95
- - argmap | dict | The argument map (in Argdown format) that
96
- represents the paper argument structure.
97
-
98
- ## Webhooks
99
-
100
- ### dataset.published
101
- - Stretch goal: implement slack notification. For now just log that a dataset
102
- was published.
103
-
104
- ### response.created
105
- IF data.data.values contains "claims":
106
- - This means it is phase 1 response
107
- ELSE IF data.data.values contains "argmap":
108
- - This means it is phase 2 response
109
-
110
- ### response.updated
111
- IF data.record.questions.name contains "claims":
112
- -
113
- ELSE IF data.record.questions.name contains "argmap":
114
- -
115
-
116
- ### response.deleted
117
- IF data.record.questions.name contains "claims":
118
- -
119
- ELSE IF data.record.questions.name contains "argmap":
120
- -
121
-
122
- ## Notes, Comments, and Questions
123
- - I assume that our ultimate moral-kg dataset, that which makes up the entirety
124
- of the KG and will be public, will be in a separate HF dataset.
125
- - There are no user event webhooks so we must either:
126
- 1. batch create users or
127
- 2. poll every second during the workshop or
128
- 3. track OAuth sign-ins
129
- - Should we put a link to the website pdf alongside its processed text?
130
- - For Phase 2 argmap building: ideally we are able to extract the user text
131
- inputted into the iFrame but I'm not confident we will be able to so this
132
- solution suffices for now.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.yaml CHANGED
@@ -1,4 +1,6 @@
1
  # moral-kg-workshop config
 
 
2
 
3
  # File Paths Configuration
4
  paths:
@@ -83,10 +85,6 @@ phase1:
83
  logging:
84
  level: "INFO"
85
  format: "[%(asctime)s] [%(name)s] [%(levelname)s] - %(message)s"
86
- # External library log levels (set to WARNING/ERROR to reduce verbosity)
87
- external_libraries:
88
- httpx: "WARNING"
89
- argilla.sdk: "WARNING"
90
 
91
  # Error Handling Configuration
92
  error_handling:
 
1
  # moral-kg-workshop config
2
+ #
3
+ # NOTE: See moral-kg-workshop-listener config for updates!
4
 
5
  # File Paths Configuration
6
  paths:
 
85
  logging:
86
  level: "INFO"
87
  format: "[%(asctime)s] [%(name)s] [%(levelname)s] - %(message)s"
 
 
 
 
88
 
89
  # Error Handling Configuration
90
  error_handling:
setup.py DELETED
@@ -1,221 +0,0 @@
1
- #!/usr/bin/env python3
2
-
3
- """
4
- setup.py
5
-
6
- Setup script for the MERe Workshop Argilla Hugging Face space. This is the
7
- primary annotation pipeline. Creates users, workspaces, datasets, and webhooks.
8
- """
9
-
10
- import argparse
11
- import json
12
- import os
13
-
14
- from huggingface_hub import HfApi
15
-
16
- from utils import (
17
- validate_env,
18
- log_operation_success,
19
- log_operation_failure,
20
- get_status,
21
- log_info,
22
- log_warning,
23
- create_users,
24
- create_user_workspaces,
25
- create_webhooks,
26
- create_phase1_datasets,
27
- list_users,
28
- get_config,
29
- )
30
-
31
-
32
- def parse_args():
33
- """Parse command line arguments."""
34
- parser = argparse.ArgumentParser(
35
- description="Setup MERe Workshop Argilla Hugging Face space",
36
- formatter_class=argparse.ArgumentDefaultsHelpFormatter,
37
- )
38
-
39
- parser.add_argument(
40
- "-u", "--skip-users",
41
- action="store_true",
42
- help="Skip user creation step"
43
- )
44
-
45
- parser.add_argument(
46
- "-w", "--skip-workspaces",
47
- action="store_true",
48
- help="Skip workspace creation and user assignment step"
49
- )
50
-
51
- parser.add_argument(
52
- "-d", "--skip-datasets",
53
- action="store_true",
54
- help="Skip dataset creation step"
55
- )
56
-
57
- parser.add_argument(
58
- "-l", "--skip-listener",
59
- action="store_true",
60
- help="Skip restarting the listener space (skips webhook creation step)."
61
- )
62
-
63
- parser.add_argument(
64
- "-s",
65
- "--status-only",
66
- action="store_true",
67
- help="Only show current space status, do not perform setup",
68
- )
69
-
70
- return parser.parse_args()
71
-
72
-
73
- def restart_listener():
74
- """Start the RIET-lab/moral-kg-workshop-listener space."""
75
- try:
76
- api = HfApi(token=os.getenv("HF_TOKEN"))
77
- api.restart_space(repo_id="RIET-lab/moral-kg-workshop-listener")
78
- log_operation_success("restart listener space", "Space restart initiated successfully")
79
- return True
80
- except Exception as e:
81
- log_operation_failure("restart listener space", e)
82
- return False
83
-
84
-
85
- def show_space_status():
86
- """Display current space status."""
87
- status = get_status()
88
-
89
- if "error" in status:
90
- log_operation_failure("check space status", status["error"])
91
- return False
92
-
93
- print()
94
- log_info("=== Current Argilla Space Status ===")
95
- log_info(f"Workspaces: {status['workspaces']}")
96
- log_info(f"Users: {status['users']}")
97
- log_info(f"Datasets: {status['datasets']}")
98
- log_info(f"Records: {status['records']}")
99
- log_info(f"Webhooks: {status['webhooks']}")
100
- print()
101
-
102
- return True
103
-
104
-
105
- def track_user_info(
106
- filepath=None
107
- ):
108
- """Store Argilla user info to a file or log them if no file is provided."""
109
- users = list_users()
110
-
111
- if filepath:
112
- try:
113
- with open(filepath, 'w', encoding='utf-8') as f:
114
- json.dump(users, f, indent=2)
115
- log_info(f"User ID map written to {filepath}")
116
- except Exception as e:
117
- log_operation_failure("map user ids", e)
118
- else:
119
- log_info(f"User ID map: {users}")
120
-
121
-
122
- def main():
123
- """Main setup function."""
124
- args = parse_args()
125
- config = get_config()
126
-
127
- # Validate environment
128
- try:
129
- validate_env()
130
- log_operation_success("setup validation", "Environment validated")
131
- except Exception as e:
132
- log_operation_failure("setup validation", e)
133
- return 1
134
-
135
- # Show current status
136
- if not show_space_status():
137
- return 1
138
-
139
- # If status-only mode, exit here
140
- if args.status_only:
141
- return 0
142
-
143
- # Track overall success
144
- operations_success = []
145
-
146
- # Step 1: Create users
147
- if not args.skip_users:
148
- print()
149
- log_info("Creating users...")
150
- success = create_users()
151
- operations_success.append(success)
152
-
153
- if success:
154
- log_info("Success: Users created successfully")
155
- # Track user profiles after creation so we can map users to their UUIDs
156
- track_user_info(config.get('paths', {}).get('users_info', None))
157
- else:
158
- log_info("Failed: Could not create users")
159
- else:
160
- log_info("Skipping user creation")
161
-
162
- # Step 2: Create workspaces
163
- if not args.skip_workspaces:
164
- print()
165
- log_info("Creating workspaces and assigning users...")
166
- success = create_user_workspaces()
167
- operations_success.append(success)
168
-
169
- if success:
170
- log_info("Success: Workspaces created and users assigned successfully")
171
- else:
172
- log_info("Failed: Could not create workspaces and assign users")
173
- else:
174
- log_info("Skipping workspace creation and user assignment")
175
-
176
- # Step 3: Create datasets
177
- if not args.skip_datasets:
178
- print()
179
- log_info("Creating datasets...")
180
- success = create_phase1_datasets()
181
- operations_success.append(success)
182
-
183
- if success:
184
- log_info("Success: Datasets created successfully")
185
- else:
186
- log_info("Failed: Could not create datasets")
187
- else:
188
- log_info("Skipping dataset creation")
189
-
190
- # # Step 4: Restart listener to create webhooks
191
- if not args.skip_listener:
192
- print()
193
- log_info("Restarting RIET-lab/moral-kg-workshop-listener space...")
194
- success = restart_listener()
195
- if success:
196
- log_info("Success: Listener space restart initiated")
197
- else:
198
- log_info("Failed: Could not restart listener space")
199
- return 0 if success else 1
200
-
201
- # Show final status
202
- show_space_status()
203
-
204
- # Overall result
205
- if operations_success:
206
- successful_count = sum(operations_success)
207
- total_count = len(operations_success)
208
-
209
- if successful_count == total_count:
210
- log_operation_success("complete setup", "All operations completed successfully", send_to_slack=True)
211
- return 0
212
- else:
213
- log_operation_failure("complete setup", Exception("Some or all operations failed"), send_to_slack=True)
214
- return 1
215
- else:
216
- log_operation_success("complete setup", "No operations were required", send_to_slack=True)
217
- return 0
218
-
219
-
220
- if __name__ == "__main__":
221
- exit(main())
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
setup.sh DELETED
@@ -1,16 +0,0 @@
1
- #!/bin/bash
2
-
3
- # setup.sh
4
- #
5
- # Shell wrapper for the MERe Workshop setup process.
6
-
7
- set -euo pipefail
8
-
9
- # Get the directory where this script is located
10
- SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &> /dev/null && pwd)"
11
-
12
- # Change to the script directory
13
- cd "$SCRIPT_DIR"
14
-
15
- # Run the setup script
16
- python setup.py "$@"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
utils/__init__.py DELETED
@@ -1,145 +0,0 @@
1
- """
2
- utils package for MERe Workshop annotation pipeline
3
-
4
- This package provides utilities for:
5
- - Configuration management (setup_utils)
6
- - Logging and notifications (log_utils)
7
- - Argilla Phase 1 dataset and Hugging Face dataset management (dataset_utils)
8
- - User management (users_utils.py)
9
- - Argilla webhook management (webhook_utils.py)
10
- - Argilla space wiping/status (wipe_utils)
11
- """
12
-
13
- from .setup_utils import (
14
- get_root,
15
- get_config,
16
- get_client,
17
- get_hf_api,
18
- validate_env,
19
- load_users,
20
- )
21
- from .log_utils import (
22
- log_error,
23
- log_warning,
24
- log_info,
25
- log_operation_success,
26
- log_operation_failure,
27
- log_dataset_operation,
28
- log_user_operation,
29
- log_webhook_operation,
30
- )
31
- from .dataset_utils import (
32
- create_dataset,
33
- delete_datasets,
34
- delete_dataset,
35
- list_datasets,
36
- update_datasets,
37
- update_dataset,
38
- load_moral_kg_sample,
39
- )
40
- from .phase1_utils import (
41
- create_phase1_datasets,
42
- delete_phase1_datasets,
43
- update_phase1_datasets,
44
- )
45
- from .user_utils import (
46
- create_users,
47
- create_user,
48
- delete_users,
49
- delete_user,
50
- list_users,
51
- )
52
- from .workspace_utils import (
53
- create_workspaces,
54
- create_workspace,
55
- create_user_workspaces,
56
- create_user_workspace,
57
- delete_workspaces,
58
- delete_workspace,
59
- delete_user_workspaces,
60
- delete_user_workspace,
61
- list_workspaces,
62
- list_user_workspaces,
63
- )
64
- from .webhook_utils import (
65
- create_webhooks,
66
- create_webhook,
67
- delete_webhooks,
68
- delete_webhook,
69
- list_webhooks,
70
- list_webhook_events,
71
- update_webhooks,
72
- update_webhook,
73
- validate_webhooks,
74
- webhook_exists,
75
- )
76
- from .wipe_utils import (
77
- get_status,
78
- wipe_space,
79
- wipe_datasets_only,
80
- wipe_users_only,
81
- wipe_webhooks_only,
82
- )
83
-
84
- __all__ = [
85
- "get_root",
86
- "get_config",
87
- "get_client",
88
- "get_hf_api",
89
- "validate_env",
90
- "load_users",
91
-
92
- "log_error",
93
- "log_warning",
94
- "log_info",
95
- "log_operation_success",
96
- "log_operation_failure",
97
- "log_dataset_operation",
98
- "log_user_operation",
99
- "log_webhook_operation",
100
-
101
- "create_phase1_datasets",
102
- "create_dataset",
103
- "delete_phase1_datasets",
104
- "delete_datasets",
105
- "delete_dataset",
106
- "list_datasets",
107
- "update_phase1_datasets",
108
- "update_datasets",
109
- "update_dataset",
110
- "load_moral_kg_sample",
111
-
112
- "create_users",
113
- "create_user",
114
- "delete_users",
115
- "delete_user",
116
- "list_users",
117
-
118
- "create_workspaces",
119
- "create_workspace",
120
- "create_user_workspaces",
121
- "create_user_workspace",
122
- "delete_workspaces",
123
- "delete_workspace",
124
- "delete_user_workspaces",
125
- "delete_user_workspace",
126
- "list_workspaces",
127
- "list_user_workspaces",
128
-
129
- "create_webhooks",
130
- "create_webhook",
131
- "delete_webhooks",
132
- "delete_webhook",
133
- "list_webhooks",
134
- "list_webhook_events",
135
- "update_webhooks",
136
- "update_webhook",
137
- "validate_webhooks",
138
- "webhook_exists",
139
-
140
- "get_status",
141
- "wipe_space",
142
- "wipe_datasets_only",
143
- "wipe_users_only",
144
- "wipe_webhooks_only",
145
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
utils/dataset_utils.py DELETED
@@ -1,351 +0,0 @@
1
- """
2
- dataset_utils.py
3
-
4
- Helper functions for dataset creation and management in the MERe Workshop annotation pipeline.
5
- Transformed from create-datasets.py script to follow proper helper function paradigm.
6
- """
7
-
8
- import os
9
- import warnings
10
- from typing import Dict, List, Optional
11
-
12
- import argilla as rg
13
- from datasets import load_dataset
14
-
15
- from .setup_utils import (
16
- get_config,
17
- get_client,
18
- get_hf_api
19
- )
20
- from .log_utils import (
21
- log_info,
22
- log_operation_success,
23
- log_operation_failure,
24
- log_dataset_operation
25
- )
26
-
27
-
28
- # Get config
29
- _config = get_config()
30
-
31
- # Get client
32
- _client = get_client()
33
-
34
-
35
- def load_moral_kg_sample(
36
- ) -> Optional[List[Dict]]:
37
- """Load the moral-kg-sample dataset from HuggingFace."""
38
- global config
39
-
40
- dataset_name = _config.get('datasets.sample')
41
- if not dataset_name:
42
- log_operation_failure("load sample dataset", Exception("Dataset name not configured"))
43
- return None
44
-
45
- try:
46
- # Setup HF client to ensure authentication
47
- get_hf_api()
48
-
49
- dataset = load_dataset(dataset_name, split="train", token=os.getenv("HF_TOKEN"))
50
-
51
- # Convert to list of dictionaries for easier processing
52
- records = []
53
- for item in dataset:
54
- item = dict(item)
55
- records.append({
56
- 'identifier': item.get('identifier'),
57
- 'title': item.get('title'),
58
- 'authors': item.get('authors'),
59
- 'year': item.get('year'),
60
- 'categories': item.get('categories'),
61
- 'text': item.get('text'),
62
- 'map': item.get('map')
63
- })
64
-
65
- log_operation_success("load moral-kg-sample dataset", f"Loaded {len(records)} records")
66
- return records
67
-
68
- except Exception as e:
69
- log_operation_failure("load moral-kg-sample dataset", e)
70
- return None
71
-
72
-
73
- def _get_workspace_names(
74
- ) -> List[str]:
75
- """Get list of available workspaces."""
76
-
77
- try:
78
- global _client
79
- workspaces = _client.workspaces
80
- workspace_names = [ws.name or "" for ws in workspaces]
81
- return workspace_names
82
- except Exception as e:
83
- log_operation_failure("fetch workspaces", e)
84
- return []
85
-
86
-
87
- def _format_title_info(
88
- authors: List[str],
89
- year: str,
90
- title: str
91
- ) -> str:
92
- """Format title info as 'Title (Author, Year)'."""
93
- # Take first author and add et al. if multiple authors
94
- authors_display = authors[0] if authors else "Unknown"
95
- if len(authors) > 1:
96
- authors_display += " et al."
97
-
98
- return f"{title} ({authors_display}, {year})"
99
-
100
-
101
- def _check_dataset_exists(
102
- workspace_name: str,
103
- dataset_name: str
104
- ) -> bool:
105
- """Check if dataset already exists in workspace."""
106
- try:
107
- with warnings.catch_warnings():
108
- warnings.simplefilter("ignore")
109
- workspace = _client.workspaces(workspace_name)
110
-
111
- if workspace:
112
- for existing_dataset in workspace.datasets:
113
- if existing_dataset.name == dataset_name:
114
- return True
115
- except Exception:
116
- pass
117
- return False
118
-
119
-
120
- def create_dataset(
121
- dataset_name: str,
122
- workspace_name: Optional[str],
123
- settings: rg.Settings,
124
- records: Optional[List[Dict]] = None,
125
- ) -> bool:
126
- """Create a dataset with given settings in specified workspace."""
127
- global _client
128
-
129
- try:
130
- dataset = rg.Dataset(
131
- name=dataset_name,
132
- workspace=workspace_name,
133
- settings=settings,
134
- client=_client,
135
- )
136
- dataset.create()
137
- log_dataset_operation("created", dataset_name, f"in workspace {workspace_name}")
138
-
139
- # Add records if provided
140
- if records:
141
- dataset.records.log(records)
142
- log_operation_success("load records into dataset", f"Added {len(records)} records")
143
-
144
- return True
145
-
146
- except Exception as e:
147
- log_operation_failure("create dataset", e)
148
- return False
149
-
150
-
151
- def delete_datasets(
152
- dataset_names: Optional[List[str]] = None,
153
- workspace_name: Optional[str] = None
154
- ) -> bool:
155
- """Delete multiple datasets or all datasets if none specified."""
156
- global _client
157
-
158
- if dataset_names is None:
159
- # Delete all datasets from all workspaces or specific workspace
160
- if workspace_name:
161
- with warnings.catch_warnings():
162
- warnings.simplefilter("ignore")
163
- workspace = _client.workspaces(workspace_name)
164
- if not workspace:
165
- log_operation_failure("delete datasets", Exception(f"Workspace {workspace_name} not found"))
166
- return False
167
-
168
- datasets = workspace.datasets
169
- dataset_names = [ds.name for ds in datasets if ds.name]
170
-
171
- success_count = 0
172
- for ds_name in dataset_names:
173
- if delete_dataset(workspace_name, ds_name):
174
- success_count += 1
175
-
176
- log_operation_success("delete datasets from workspace",
177
- f"Deleted {success_count}/{len(dataset_names)} datasets from {workspace_name}")
178
-
179
- return success_count == len(dataset_names)
180
- else:
181
- # Get all datasets from all workspaces
182
- all_datasets = []
183
- for ws in _client.workspaces:
184
- ws_name = ws.name
185
- if ws_name:
186
- datasets = ws.datasets
187
- for ds in datasets:
188
- if ds.name:
189
- all_datasets.append((ws_name, ds.name))
190
-
191
- success_count = 0
192
- for ws_name, ds_name in all_datasets:
193
- if delete_dataset(ws_name, ds_name):
194
- success_count += 1
195
-
196
- log_operation_success("delete all datasets",
197
- f"Deleted {success_count}/{len(all_datasets)} datasets")
198
-
199
- return success_count == len(all_datasets)
200
- else:
201
- # Delete specific datasets
202
- if not workspace_name:
203
- log_operation_failure("delete datasets", Exception("Workspace name required when specifying dataset names"))
204
- return False
205
-
206
- success_count = 0
207
- for dataset_name in dataset_names:
208
- if delete_dataset(workspace_name, dataset_name):
209
- success_count += 1
210
-
211
- log_operation_success("delete datasets",
212
- f"Deleted {success_count}/{len(dataset_names)} datasets")
213
-
214
- return success_count == len(dataset_names)
215
-
216
-
217
- def delete_dataset(
218
- workspace_name: str,
219
- dataset_name: str
220
- ) -> bool:
221
- """Delete a specific dataset from a workspace."""
222
- try:
223
- global _client
224
- workspace = _client.workspaces(workspace_name)
225
-
226
- if not workspace:
227
- log_operation_failure("delete dataset", Exception(f"Workspace {workspace_name} not found"))
228
- return False
229
-
230
- # Find the dataset in workspace
231
- dataset = None
232
- for ds in workspace.datasets:
233
- if ds.name == dataset_name:
234
- dataset = ds
235
- break
236
-
237
- if not dataset:
238
- log_operation_failure("delete dataset", Exception(f"Dataset {dataset_name} not found in workspace {workspace_name}"))
239
- return False
240
-
241
- # Delete all records first
242
- try:
243
- records = list(dataset.records)
244
- # Filter out None records to avoid AttributeError
245
- records = [r for r in records if r is not None]
246
-
247
- if records:
248
- dataset.records.delete(records=records)
249
- log_dataset_operation("deleted records", dataset_name, f"{len(records)} records")
250
-
251
- else:
252
- log_info(f"No records found in dataset {dataset_name}")
253
- except Exception as e:
254
- if e is AttributeError:
255
- pass
256
-
257
- else:
258
- log_operation_failure("delete dataset records", e)
259
-
260
- # Delete the dataset
261
- dataset.delete()
262
- log_dataset_operation("deleted", dataset_name, f"from workspace {workspace_name}")
263
-
264
- return True
265
-
266
- except Exception as e:
267
- log_operation_failure("delete dataset", e)
268
- return False
269
-
270
-
271
- def list_datasets(
272
- ) -> Dict[str, List[str]]:
273
- """List all datasets grouped by workspace."""
274
- global _client
275
-
276
- try:
277
- workspace_datasets = {}
278
-
279
- for workspace in _client.workspaces:
280
- workspace_name = workspace.name or "Unknown"
281
- datasets = [dataset.name for dataset in workspace.datasets if dataset.name]
282
- workspace_datasets[workspace_name] = datasets
283
-
284
- log_dataset_operation("listed", f"workspace {workspace_name}",
285
- f"Found {len(datasets)} datasets")
286
-
287
- return workspace_datasets
288
-
289
- except Exception as e:
290
- log_operation_failure("list datasets", e)
291
- return {}
292
-
293
-
294
- def update_datasets(
295
- dataset_updates: List[Dict[str, str]],
296
- new_settings: Optional[rg.Settings] = None
297
- ) -> bool:
298
- """Update multiple datasets."""
299
- success_count = 0
300
-
301
- for update_info in dataset_updates:
302
- workspace_name = update_info.get('workspace', '')
303
- dataset_name = update_info.get('dataset', '')
304
- new_workspace = update_info.get('new_workspace')
305
-
306
- if update_dataset(workspace_name, dataset_name, new_settings, new_workspace):
307
- success_count += 1
308
-
309
- log_operation_success("update datasets",
310
- f"Updated {success_count}/{len(dataset_updates)} datasets")
311
-
312
- return success_count == len(dataset_updates)
313
-
314
-
315
- def update_dataset(
316
- workspace_name: str,
317
- dataset_name: str,
318
- new_settings: Optional[rg.Settings] = None,
319
- new_workspace: Optional[str] = None
320
- ) -> bool:
321
- """Update a specific dataset's settings or move to new workspace."""
322
- global _client
323
-
324
- try:
325
- with warnings.catch_warnings():
326
- warnings.simplefilter("ignore")
327
- workspace = _client.workspaces(workspace_name)
328
- dataset = workspace.datasets(dataset_name) #type: ignore
329
-
330
- if not dataset:
331
- log_operation_failure("update dataset",
332
- Exception(f"Dataset {dataset_name} not found in workspace {workspace_name}"))
333
- return False
334
-
335
- # Update settings if provided
336
- if new_settings:
337
- # Note: Argilla may not support direct settings updates, this might need to be recreate
338
- log_operation_success("update dataset settings",
339
- f"Attempted to update {dataset_name}")
340
-
341
- # Move to new workspace if provided
342
- if new_workspace:
343
- # Note: This typically requires recreating the dataset in the new workspace
344
- log_operation_success("move dataset workspace",
345
- f"Attempted to move {dataset_name} to {new_workspace}")
346
-
347
- return True
348
-
349
- except Exception as e:
350
- log_operation_failure("update dataset", e)
351
- return False
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
utils/log_utils.py DELETED
@@ -1,292 +0,0 @@
1
- """
2
- log_utils.py
3
-
4
- Logging and notification utilities for the MERe Workshop annotation pipeline.
5
- Handles error logging, Slack notifications, and webhook data logging.
6
- """
7
-
8
- import logging
9
- import os
10
- import textwrap
11
- from typing import Optional
12
-
13
- import requests
14
-
15
- from .setup_utils import get_config
16
-
17
-
18
- # Get config
19
- config = get_config()
20
-
21
-
22
- def _setup_logging(
23
- ) -> logging.Logger:
24
- """Set up logging configuration."""
25
- global config
26
- log_config = config.get("logging", {})
27
-
28
- # Configure logging
29
- logging.basicConfig(
30
- level=getattr(logging, log_config.get("level", "INFO")),
31
- format=log_config.get(
32
- "format", "[%(asctime)s] [%(name)s] [%(levelname)s] - %(message)s"
33
- ),
34
- )
35
-
36
- # Configure external library log levels
37
- external_libs = log_config.get("external_libraries", {})
38
- for lib_name, log_level in external_libs.items():
39
- lib_logger = logging.getLogger(lib_name)
40
- lib_logger.setLevel(getattr(logging, log_level.upper()))
41
-
42
- return logging.getLogger("mere_workshop")
43
-
44
- # Get logger
45
- logger = _setup_logging()
46
-
47
-
48
- def _can_print(
49
- ) -> bool:
50
- """
51
- Use to test if you can use the print() function.
52
-
53
- returns True: if you can _print_ in the space or locally
54
- return False: if you cannot _print_ in the space or locally
55
- """
56
- global config
57
-
58
- return config.get("error_handling.log_data_not_slack", False)
59
-
60
-
61
- def _can_log(
62
- ) -> bool:
63
- """
64
- Use to test if you can use any log_*() function.
65
-
66
- returns True: if you can _log_ in the space or locally
67
- return False: if you cannot _log_ in the space or locally
68
- """
69
- global config
70
-
71
- return (config.get("error_handling.log_data_not_slack", False) and
72
- config.get("error_handling.force_slack_notifications", False))
73
-
74
-
75
- def _send_slack_notification(
76
- message: str,
77
- ) -> bool:
78
- """Sends Slack notification if configured."""
79
- global config
80
- global logger
81
-
82
- slack_webhook_url = os.getenv("SLACK_WEBHOOK_URL")
83
- if not slack_webhook_url:
84
- if _can_log():
85
- logger.warning("SLACK_WEBHOOK_URL not configured, skipping notification")
86
- elif _can_print():
87
- print(f"SLACK_WEBHOOK_URL not configured, skipping notification")
88
-
89
- return False
90
-
91
- try:
92
- payload = {"text": message}
93
- response = requests.post(
94
- slack_webhook_url,
95
- json=payload,
96
- headers={"Content-Type": "application/json"},
97
- timeout=10,
98
- )
99
-
100
- if response.status_code == 200:
101
- if _can_log():
102
- logger.info(f"Slack notification sent: {message}")
103
- elif _can_print():
104
- print(f"Slack notification sent: {message}")
105
-
106
- return True
107
- else:
108
- if _can_log():
109
- logger.error(f"Failed to send Slack notification. Status code: {response.status_code}")
110
- elif _can_print():
111
- print(f"Failed to send Slack notification. Status code: {response.status_code}")
112
-
113
- return False
114
-
115
- except Exception as e:
116
- if _can_log():
117
- logger.error("Error sending Slack notification", e)
118
- elif _can_print():
119
- print(f"Error sending Slack notification: {e}")
120
-
121
- return False
122
-
123
-
124
- def _send_to_slack(
125
- send_to_slack: bool,
126
- message: str
127
- ) -> bool:
128
- """Determine if a log should be sent as a slack notification"""
129
- global config
130
- global logger
131
-
132
- try:
133
- if (config.get("error_handling.slack_notifications", False) and
134
- (send_to_slack or
135
- config.get("error_handling.force_slack_notifications", False))):
136
- return _send_slack_notification(message)
137
- else:
138
- return True
139
-
140
- except Exception as e:
141
- logger.error("Error sending Slack notification", e)
142
- return False
143
-
144
-
145
- def log_error(
146
- error_msg: str,
147
- exception: Optional[Exception] = None,
148
- send_to_slack: bool = False,
149
- ) -> None:
150
- """Log errors."""
151
- global config
152
- global logger
153
-
154
- if exception:
155
- # Format error with indented description using textwrap
156
- error_detail = textwrap.indent(str(exception), " ")
157
- full_msg = (
158
- f"{error_msg}\n Exception: {type(exception).__name__}\n{error_detail}"
159
- )
160
- else:
161
- full_msg = error_msg
162
-
163
- if _can_log():
164
- logger.error(full_msg)
165
- _send_to_slack(send_to_slack, full_msg)
166
- elif _can_print():
167
- print(f"[ERROR] {full_msg}")
168
- _send_to_slack(send_to_slack, full_msg)
169
-
170
-
171
- def log_warning(
172
- warning_msg: str,
173
- send_to_slack: bool = False
174
- ) -> None:
175
- """Log warnings."""
176
- global config
177
- global logger
178
-
179
- if _can_log():
180
- logger.warning(warning_msg)
181
- _send_to_slack(send_to_slack, warning_msg)
182
- elif _can_print():
183
- print(f"[WARNING] {warning_msg}")
184
- _send_to_slack(send_to_slack, warning_msg)
185
-
186
-
187
- def log_info(
188
- info_msg: str,
189
- send_to_slack: bool = False
190
- ) -> None:
191
- """Log information."""
192
- global config
193
- global logger
194
-
195
- if _can_log():
196
- logger.info(info_msg)
197
- _send_to_slack(send_to_slack, info_msg)
198
- elif _can_print():
199
- print(f"[INFO] {info_msg}")
200
- _send_to_slack(send_to_slack, info_msg)
201
-
202
-
203
- def log_operation_success(
204
- operation: str,
205
- details: Optional[str] = None,
206
- send_to_slack: bool = False
207
- ) -> None:
208
- """Log successful operation."""
209
- global config
210
-
211
- msg = f"Successfully completed {operation}"
212
- if details:
213
- msg += f": {details}"
214
-
215
- log_info(msg)
216
- _send_to_slack(send_to_slack, msg)
217
-
218
-
219
- def log_operation_failure(
220
- operation: str,
221
- error: Optional[Exception] = None,
222
- send_to_slack: bool = False,
223
- ) -> None:
224
- """Logs failed operation."""
225
- global config
226
-
227
- msg = f"Failed to {operation}"
228
-
229
- log_error(msg, error)
230
-
231
- if error:
232
- error_detail = textwrap.indent(str(error), " ")
233
- full_msg = (
234
- f"{msg}\n Exception: {type(error).__name__}\n{error_detail}"
235
- )
236
- _send_to_slack(send_to_slack, full_msg)
237
- else:
238
- _send_to_slack(send_to_slack, msg)
239
-
240
-
241
- def log_dataset_operation(
242
- operation: str,
243
- dataset_name: str,
244
- details: Optional[str] = None,
245
- send_to_slack: bool = False,
246
- ) -> None:
247
- """Log dataset-related operations."""
248
- global config
249
- global logger
250
-
251
- msg = f"Dataset {operation} ({dataset_name})"
252
- if details:
253
- msg += f": {details}"
254
-
255
- logger.info(msg)
256
- _send_to_slack(send_to_slack, msg)
257
-
258
-
259
- def log_user_operation(
260
- operation: str,
261
- username: str,
262
- details: Optional[str] = None,
263
- send_to_slack: bool = False,
264
- ) -> None:
265
- """Log user-related operations."""
266
- global config
267
- global logger
268
-
269
- msg = f"User {operation} ({username})"
270
- if details:
271
- msg += f": {details}"
272
-
273
- logger.info(msg)
274
- _send_to_slack(send_to_slack, msg)
275
-
276
-
277
- def log_webhook_operation(
278
- operation: str,
279
- event: str,
280
- details: Optional[str] = None,
281
- send_to_slack: bool = False,
282
- ) -> None:
283
- """Log webhook-related operations."""
284
- global config
285
- global logger
286
-
287
- msg = f"Webhook {operation} ({event})"
288
- if details:
289
- msg += f": {details}"
290
-
291
- logger.info(msg)
292
- _send_to_slack(send_to_slack, msg)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
utils/phase1_utils.py DELETED
@@ -1,240 +0,0 @@
1
- """
2
- phase1_utils.py
3
-
4
- Helper functions for Phase 1 dataset creation and management in the MERe Workshop annotation pipeline.
5
- """
6
-
7
- import json
8
- import warnings
9
- from typing import Dict, List, Optional
10
-
11
- import argilla as rg
12
-
13
- from .setup_utils import (
14
- get_config,
15
- get_client,
16
- )
17
- from .log_utils import (
18
- log_info,
19
- log_operation_success,
20
- log_operation_failure,
21
- log_dataset_operation
22
- )
23
- from .dataset_utils import (
24
- load_moral_kg_sample,
25
- _get_workspace_names,
26
- _format_title_info,
27
- _check_dataset_exists,
28
- create_dataset,
29
- delete_dataset,
30
- update_dataset
31
- )
32
-
33
-
34
- # Get config and client
35
- _config = get_config()
36
- _client = get_client()
37
-
38
-
39
- def _create_phase1_settings(
40
- ) -> rg.Settings:
41
- """Create the Phase 1 dataset settings from configuration."""
42
- global _config
43
- phase1_config = _config.phase1
44
-
45
- # Build fields from config
46
- fields = []
47
- for field_name, field_config in phase1_config.get('fields', {}).items():
48
- fields.append(rg.TextField(
49
- name=field_config['name'],
50
- title=field_config['title'],
51
- use_markdown=field_config.get('use_markdown', False)
52
- ))
53
-
54
- # Build metadata from config
55
- metadata = []
56
- for meta_name, meta_config in phase1_config.get('metadata', {}).items():
57
- metadata.append(rg.TermsMetadataProperty(
58
- name=meta_config['name'],
59
- title=meta_config['title'],
60
- visible_for_annotators=meta_config.get('visible_for_annotators', True)
61
- ))
62
-
63
- # Build questions from config
64
- questions = []
65
- for question_name, question_config in phase1_config.get('questions', {}).items():
66
- if question_config.get('type') == 'TextQuestion':
67
- questions.append(rg.TextQuestion(
68
- name=question_config['name'],
69
- title=question_config['title'],
70
- description=question_config.get('description', ''),
71
- required=question_config.get('required', False)
72
- ))
73
- else:
74
- log_operation_failure("add question to Phase 1 dataset",
75
- Exception("Haven't implemented non TextQuestions into the process."))
76
-
77
- return rg.Settings(
78
- guidelines=phase1_config.get('guidelines', ''),
79
- fields=fields,
80
- metadata=metadata,
81
- questions=questions
82
- )
83
-
84
-
85
- def _create_phase1_dataset(
86
- workspace_name: str,
87
- records: List[Dict]
88
- ) -> bool:
89
- """Create Phase 1 dataset for a specific workspace."""
90
- global _client
91
-
92
- dataset_name = _config.get('phase1.dataset_name', 'Phase 1')
93
-
94
- # Check if dataset already exists
95
- if _check_dataset_exists(workspace_name, dataset_name):
96
- log_dataset_operation("created", dataset_name, f"in workspace {workspace_name} (already exists)")
97
- # Get existing dataset for record loading
98
- try:
99
- with warnings.catch_warnings():
100
- warnings.simplefilter("ignore")
101
- workspace = _client.workspaces(workspace_name)
102
- if workspace:
103
- for existing_dataset in workspace.datasets:
104
- if existing_dataset.name == dataset_name:
105
- dataset = existing_dataset
106
- break
107
- except Exception as e:
108
- log_operation_failure("get existing dataset", e)
109
- return False
110
- else:
111
- # Create new dataset
112
- try:
113
- dataset = rg.Dataset(
114
- name=dataset_name,
115
- workspace=workspace_name,
116
- settings=_create_phase1_settings(),
117
- client=_client,
118
- )
119
- dataset.create()
120
- log_dataset_operation("created", dataset_name, f"in workspace {workspace_name}")
121
- except Exception as e:
122
- log_operation_failure("create Phase 1 dataset", e)
123
- return False
124
-
125
- # Convert records to Argilla format and load them
126
- try:
127
- argilla_records = []
128
- for record in records:
129
- title_info = _format_title_info(
130
- record['authors'],
131
- record['year'],
132
- record['title']
133
- ).strip()
134
- # Parse map from JSON string back to dictionary
135
- map_data = json.loads(record['map']) if record['map'] else {}
136
- suggestions = list(map_data.keys())
137
-
138
- argilla_record = rg.Record(
139
- fields={
140
- "title_info": title_info,
141
- "text": record['text']
142
- },
143
- metadata={
144
- "id": record['identifier'],
145
- "fields": record['categories']
146
- },
147
- suggestions=[
148
- rg.Suggestion(
149
- question_name="claims",
150
- value="\n\n".join(suggestions)
151
- )
152
- ]
153
- )
154
- argilla_records.append(argilla_record)
155
-
156
- # Add records to dataset
157
- dataset.records.log(argilla_records)
158
- log_operation_success("load records into dataset", f"Added {len(argilla_records)} records")
159
-
160
- return True
161
-
162
- except Exception as e:
163
- log_operation_failure("load records into dataset", e)
164
- return False
165
-
166
-
167
- def create_phase1_datasets(
168
- ) -> bool:
169
- """Create Phase 1 datasets for all available workspaces."""
170
- try:
171
- # Load client and get workspaces
172
- workspace_names = _get_workspace_names()
173
-
174
- if not workspace_names:
175
- log_operation_failure("create datasets", Exception("No workspaces found"))
176
- return False
177
-
178
- # Load records from HuggingFace
179
- records = load_moral_kg_sample()
180
- if not records:
181
- log_operation_failure("create datasets", Exception("Failed to load sample records"))
182
- return False
183
-
184
- # Create datasets for each workspace
185
- success_count = 0
186
- failed_count = 0
187
- for workspace_name in workspace_names:
188
- if _create_phase1_dataset(workspace_name, records):
189
- success_count += 1
190
- else:
191
- failed_count += 1
192
-
193
- # Use transaction-like logging
194
- log_info(f"Create Phase 1 datasets: {success_count} / {len(workspace_names)} succeeded, {failed_count} failed.")
195
-
196
- return success_count == len(workspace_names)
197
-
198
- except Exception as e:
199
- log_operation_failure("create datasets for all workspaces", e)
200
- return False
201
-
202
-
203
- def delete_phase1_datasets(
204
- ) -> bool:
205
- """Delete all Phase 1 datasets from all workspaces."""
206
- global config
207
-
208
- dataset_name = _config.get('phase1.dataset_name', 'Phase 1')
209
- workspace_names = _get_workspace_names()
210
-
211
- success_count = 0
212
- for workspace_name in workspace_names:
213
- if delete_dataset(workspace_name, dataset_name):
214
- success_count += 1
215
-
216
- log_operation_success("delete Phase 1 datasets",
217
- f"Deleted {success_count}/{len(workspace_names)} datasets")
218
-
219
- return success_count == len(workspace_names)
220
-
221
-
222
- def update_phase1_datasets(
223
- new_settings: Optional[rg.Settings] = None,
224
- new_workspace: Optional[str] = None
225
- ) -> bool:
226
- """Update all Phase 1 datasets with new settings or move to new workspace."""
227
- global _config
228
-
229
- dataset_name = _config.get('phase1.dataset_name', 'Phase 1')
230
- workspace_names = _get_workspace_names()
231
-
232
- success_count = 0
233
- for workspace_name in workspace_names:
234
- if update_dataset(workspace_name, dataset_name, new_settings, new_workspace):
235
- success_count += 1
236
-
237
- log_operation_success("update Phase 1 datasets",
238
- f"Updated {success_count}/{len(workspace_names)} datasets")
239
-
240
- return success_count == len(workspace_names)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
utils/setup_utils.py DELETED
@@ -1,200 +0,0 @@
1
- """
2
- setup_utils.py
3
-
4
- Initialization utilities for the MERe Workshop annotation pipeline.
5
- Handles setup of clients, configuration loading, and environment validation.
6
- """
7
-
8
- import os
9
- from pathlib import Path
10
- from typing import Any, Dict
11
-
12
- import argilla as rg
13
- from huggingface_hub import HfApi
14
- import rootutils
15
- import yaml
16
-
17
-
18
- # Setup project root
19
- _root = rootutils.setup_root(__file__, indicator=".git", pythonpath=True)
20
-
21
-
22
- def validate_env(
23
- ) -> bool:
24
- """Validate that all required environment variables are set."""
25
- required_vars = [
26
- "ARGILLA_API_URL",
27
- "ARGILLA_API_KEY",
28
- "HF_TOKEN"]
29
-
30
- missing_vars = [var for var in required_vars if not os.getenv(var)]
31
-
32
- if missing_vars:
33
- raise EnvironmentError(
34
- f"Missing required environment variables: {', '.join(missing_vars)}"
35
- )
36
-
37
- return True
38
-
39
-
40
- class Config:
41
- """Configuration manager for the MERe Workshop application."""
42
-
43
- def __init__(
44
- self,
45
- config_path: str = "config.yaml"
46
- ):
47
- self._config_path = config_path
48
- self._config = self._load_config()
49
-
50
- def _load_config(
51
- self
52
- ) -> Dict[str, Any] | None:
53
- """Load configuration from YAML file."""
54
- if validate_env():
55
- config_file = _root / self._config_path
56
-
57
- if not config_file.exists():
58
- raise FileNotFoundError(f"Configuration file not found: {config_file}")
59
-
60
- with open(config_file, "r", encoding="utf-8") as f:
61
- return yaml.safe_load(f)
62
-
63
- def get(
64
- self,
65
- key_path: str,
66
- default: Any = None
67
- ) -> Any:
68
- """Get configuration value using dot notation (e.g., 'datasets.sample')."""
69
- keys = key_path.split(".")
70
- value = self._config
71
-
72
- for key in keys:
73
- if isinstance(value, dict) and key in value:
74
- value = value[key]
75
- else:
76
- return default
77
-
78
- return value
79
-
80
-
81
- @property
82
- def datasets(
83
- self
84
- ) -> Dict[str, str]:
85
- """Get dataset configuration."""
86
- return self.get("datasets", {})
87
-
88
-
89
- @property
90
- def webhook_events(
91
- self
92
- ) -> Dict[str, Any]:
93
- """Get webhook configuration."""
94
- return self.get("webhooks.events", {})
95
-
96
-
97
- @property
98
- def phase1(
99
- self
100
- ) -> Dict[str, Any]:
101
- """Get Phase 1 configuration."""
102
- return self.get("phase1", {})
103
-
104
-
105
- @property
106
- def users_config(
107
- self
108
- ) -> Dict[str, Any]:
109
- """Get users configuration."""
110
- return self.get("users", {})
111
-
112
-
113
- @property
114
- def paths(
115
- self
116
- ) -> Dict[str, str]:
117
- """Get file paths configuration."""
118
- return self.get("paths", {})
119
-
120
-
121
- # Global config instance
122
- _config = Config()
123
-
124
- # Global Argilla client instance
125
- _client = None
126
-
127
- # Global Hugging Face API instance
128
- _hf_api = None
129
-
130
-
131
- def get_root(
132
- ) -> Path:
133
- """Get the project root directory."""
134
- return _root
135
-
136
-
137
- def get_config(
138
- ) -> Config:
139
- """Get the configuration manager."""
140
- return _config
141
-
142
-
143
- def get_client(
144
- ) -> rg.Argilla: # type: ignore
145
- """Get the Argilla client."""
146
- global _client
147
-
148
- if _client is not None:
149
- return _client
150
-
151
- if validate_env():
152
- try:
153
- _client = rg.Argilla(
154
- api_url=os.getenv("ARGILLA_API_URL"),
155
- api_key=os.getenv("ARGILLA_API_KEY"),
156
- )
157
- return _client
158
-
159
- except Exception as e:
160
- if "ArgillaCredentialsError" in str(e):
161
- print(
162
- "\n HINT: Did you wipe/restart the space? If you did, ",
163
- "you need to update your Argilla API key!\n"
164
- )
165
- raise
166
-
167
- def get_hf_api(
168
- ) -> HfApi: # type: ignore
169
- """Get the HuggingFace API client."""
170
- global _hf_api
171
-
172
- if _hf_api is not None:
173
- return _hf_api
174
-
175
- if validate_env():
176
- _hf_api = HfApi(token=os.getenv("HF_TOKEN"))
177
-
178
- return _hf_api
179
-
180
-
181
- def load_users(
182
- ) -> list[Dict[str, str]] | None:
183
- """Load users from CSV file specified in config."""
184
- config = get_config()
185
- csv_path = config.get("paths.users_csv", "users.csv")
186
-
187
- full_path = _root / csv_path
188
- if not full_path.exists():
189
- raise FileNotFoundError(f"Users CSV file not found: {full_path}")
190
-
191
- import csv
192
-
193
- users = []
194
- with open(full_path, "r", newline="", encoding="utf-8") as csvfile:
195
- reader = csv.DictReader(csvfile)
196
- for row in reader:
197
- user_data = {key.rstrip(): value.rstrip() for key, value in row.items()}
198
- users.append(user_data)
199
-
200
- return users
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
utils/user_utils.py DELETED
@@ -1,208 +0,0 @@
1
- """
2
- user_utils.py
3
-
4
- Helper functions for user management in the MERe Workshop annotation pipeline.
5
- Transformed from create-users.py script to follow proper helper function paradigm.
6
- """
7
-
8
- from typing import Dict, List, Optional
9
-
10
- import argilla as rg
11
-
12
- from .setup_utils import (
13
- get_config,
14
- get_client,
15
- load_users
16
- )
17
- from .log_utils import (
18
- log_info,
19
- log_operation_success,
20
- log_operation_failure,
21
- log_user_operation
22
- )
23
-
24
- # Get config
25
- _config = get_config()
26
-
27
- # Get client
28
- _client = get_client()
29
-
30
- def create_user(
31
- user_data: Dict[str, str],
32
- ) -> bool:
33
- """Create a single user."""
34
- global _config
35
- global _client
36
-
37
- username = user_data['username']
38
-
39
- # Check if user already exists
40
- try:
41
- for existing_user in _client.users:
42
- if existing_user.username == username:
43
- log_user_operation("created", username, f"role: {existing_user.role} (already exists)")
44
- log_operation_success("create user", f"{username} (already exists)")
45
- return True
46
- except Exception:
47
- # Continue with creation if check fails
48
- pass
49
-
50
- try:
51
- # Create user
52
- user = rg.User(
53
- username=username,
54
- first_name=user_data.get('first_name', ''),
55
- last_name=user_data.get('last_name', ''),
56
- role=user_data.get('role', _config.get('users.default_role', 'annotator')),
57
- password=user_data['password']
58
- )
59
-
60
- created_user = user.create()
61
- log_user_operation("created", username, f"role: {user.role}")
62
-
63
- log_operation_success("create user", username)
64
- return True
65
-
66
- except Exception as e:
67
- # Check if user already exists
68
- error_str = str(e).lower()
69
- if "conflict" in error_str or "already exists" in error_str or "not unique" in error_str:
70
- log_user_operation("created", username, "role: annotator (already exists)")
71
- return True
72
- else:
73
- log_operation_failure("create user", e)
74
- return False
75
-
76
-
77
- def create_users(
78
- users_data: Optional[List[Dict[str, str]]] = None
79
- ) -> bool:
80
- """Create all users from the CSV file or provided list."""
81
- try:
82
- if users_data is None:
83
- users_data = load_users()
84
-
85
- if not users_data:
86
- log_operation_failure("create users", Exception("No users found"))
87
- return False
88
-
89
- # Create each user
90
- success_count = 0
91
- for user_data in users_data:
92
- if create_user(user_data):
93
- success_count += 1
94
-
95
- log_operation_success("create users",
96
- f"Created {success_count}/{len(users_data)} users successfully")
97
-
98
- return success_count == len(users_data)
99
-
100
- except Exception as e:
101
- log_operation_failure("create users", e)
102
- return False
103
-
104
-
105
- def delete_user(
106
- username: str,
107
- skip_admin: bool = True
108
- ) -> bool:
109
- """Delete a single user."""
110
- global _client
111
-
112
- try:
113
- # Find and delete user
114
- users = _client.users
115
- user_to_delete = None
116
- user_found = False
117
-
118
- for user in users:
119
- if user.username == username:
120
- user_found = True
121
- if skip_admin:
122
- if user.role not in ["owner", "admin"]:
123
- user_to_delete = user
124
- break
125
- else:
126
- log_info(f"SKIPPED OWNER or ADMIN ({user.username})")
127
- # Skipping admin/owner is considered success
128
- return True
129
- else:
130
- user_to_delete = user
131
- break
132
-
133
-
134
- if not user_found:
135
- log_operation_failure("delete user", Exception(f"User {username} not found"))
136
- return False
137
-
138
- if not user_to_delete:
139
- log_operation_failure("delete user", Exception(f"User {username} could not be deleted"))
140
- return False
141
-
142
- # Delete user
143
- user_to_delete.delete()
144
- log_user_operation("deleted", username)
145
-
146
- return True
147
-
148
- except Exception as e:
149
- log_operation_failure("delete user", e)
150
- return False
151
-
152
-
153
- def delete_users(
154
- usernames: Optional[List[str]] = None
155
- ) -> bool:
156
- """Delete all users or specified users."""
157
- try:
158
- global _client
159
-
160
- if usernames is None:
161
- # Delete all users
162
- users = _client.users
163
- usernames = [user.username for user in users if user.username]
164
-
165
- if not usernames:
166
- log_operation_success("delete users", "No users to delete")
167
- return True
168
-
169
- # Delete each user
170
- success_count = 0
171
- for username in usernames:
172
- if delete_user(username):
173
- success_count += 1
174
-
175
- log_operation_success("delete users",
176
- f"Deleted {success_count}/{len(usernames)} users")
177
-
178
- return success_count == len(usernames)
179
-
180
- except Exception as e:
181
- log_operation_failure("delete users", e)
182
- return False
183
-
184
-
185
- def list_users(
186
- ) -> List[Dict[str, str]]:
187
- """List all users with their details."""
188
- try:
189
- global _client
190
- users = _client.users
191
- user_list = []
192
-
193
- for user in users:
194
- user_info = {
195
- 'username': user.username or '',
196
- 'first_name': user.first_name or '',
197
- 'last_name': user.last_name or '',
198
- 'role': user.role or '',
199
- 'id': str(user.id) if user.id else ''
200
- }
201
- user_list.append(user_info)
202
-
203
- log_user_operation("listed all users", f"Found {len(user_list)} users")
204
- return user_list
205
-
206
- except Exception as e:
207
- log_operation_failure("list users", e)
208
- return []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
utils/webhook_utils.py DELETED
@@ -1,340 +0,0 @@
1
- """
2
- webhook_utils.py
3
-
4
- Helper functions for webhook management in the MERe Workshop annotation pipeline.
5
- Transformed from create-webhooks.py and related scripts to follow proper helper function paradigm.
6
- """
7
-
8
- import os
9
- from typing import List, Optional, Dict
10
-
11
- import argilla as rg
12
-
13
- from .setup_utils import (
14
- get_config,
15
- get_client
16
- )
17
- from .log_utils import (
18
- log_operation_success,
19
- log_operation_failure,
20
- log_webhook_operation
21
- )
22
-
23
-
24
- # Setup config
25
- _config = get_config()
26
-
27
- # Setup client
28
- _client = get_client()
29
-
30
-
31
-
32
- def create_webhook(
33
- event: str,
34
- description: str,
35
- ) -> Optional[rg.Webhook]:
36
- """Create a webhook for a specific event."""
37
-
38
- global _client
39
-
40
- webhook_url = os.getenv("ARGILLA_WEBHOOK_URL")
41
- if not webhook_url:
42
- log_operation_failure("create webhook",
43
- Exception(f"ARGILLA_WEBHOOK_URL environment variable not set for {event}"))
44
- return None
45
-
46
- try:
47
- webhook = rg.Webhook(
48
- url=webhook_url,
49
- events=[event], # type: ignore
50
- description=description
51
- )
52
-
53
- created_webhook = webhook.create()
54
- log_webhook_operation("created", event, description)
55
- return created_webhook #type: ignore
56
-
57
- except Exception as e:
58
- log_operation_failure("create webhook", e)
59
- return None
60
-
61
-
62
- def list_webhook_events(
63
- ) -> List[str]:
64
- """Return list of webhook events from configuration."""
65
- global _config
66
- return _config.get('webhooks.events', [])
67
-
68
-
69
- def create_webhooks(
70
- ) -> bool:
71
- """Create webhooks for all configured events."""
72
- try:
73
- global _client
74
- events = list_webhook_events()
75
-
76
- if not events:
77
- log_operation_failure("create webhooks",
78
- Exception("No webhook events configured"))
79
- return False
80
-
81
- webhook_url = os.getenv("ARGILLA_WEBHOOK_URL")
82
- if not webhook_url:
83
- log_operation_failure("create webhooks",
84
- Exception("ARGILLA_WEBHOOK_URL environment variable not set"))
85
- return False
86
-
87
- # Create webhooks for each event, recreating if they already exist
88
- success_count = 0
89
- for event in events:
90
- # Check if webhook already exists
91
- if webhook_exists(event):
92
- log_webhook_operation("already exists", event, "recreating")
93
- # Delete existing webhook first
94
- for webhook in _client.webhooks:
95
- if webhook.events and event in webhook.events:
96
- webhook.delete()
97
- log_webhook_operation("deleted existing", event)
98
- break
99
-
100
- description = f"Webhook for {event} events to {webhook_url}"
101
- if create_webhook(event, description):
102
- success_count += 1
103
-
104
- log_operation_success("create webhooks",
105
- f"Created {success_count}/{len(events)} webhooks successfully")
106
-
107
- return success_count == len(events)
108
-
109
- except Exception as e:
110
- log_operation_failure("create webhooks", e)
111
- return False
112
-
113
-
114
- def list_webhooks(
115
- ) -> List[Dict[str, str]]:
116
- """List all existing webhooks."""
117
- try:
118
- global _client
119
- webhooks = _client.webhooks
120
- webhook_list = []
121
-
122
- for webhook in webhooks:
123
- webhook_info = {
124
- 'url': webhook.url or '',
125
- 'events': ', '.join(webhook.events) if webhook.events else '',
126
- 'description': webhook.description or ''
127
- }
128
- webhook_list.append(webhook_info)
129
-
130
- log_webhook_operation("listed all webhooks", f"Found {len(webhook_list)} webhooks")
131
- return webhook_list
132
-
133
- except Exception as e:
134
- log_operation_failure("list webhooks", e)
135
- return []
136
-
137
-
138
- def delete_webhook(
139
- webhook_url: str,
140
- webhook_events: List[str],
141
- ) -> bool:
142
- """Delete a specific webhook by URL and events."""
143
- try:
144
- global _client
145
- # Find webhook by URL and events
146
- webhook_to_delete = None
147
- for webhook in _client.webhooks:
148
- if (webhook.url == webhook_url and
149
- webhook.events and
150
- set(webhook.events) == set(webhook_events)):
151
- webhook_to_delete = webhook
152
- break
153
-
154
- if not webhook_to_delete:
155
- log_operation_failure("delete webhook",
156
- Exception(f"Webhook with URL {webhook_url} and events {webhook_events} not found"))
157
- return False
158
-
159
- # Delete webhook
160
- webhook_to_delete.delete()
161
- log_webhook_operation("deleted", f"{webhook_url} ({', '.join(webhook_events)})")
162
-
163
- return True
164
-
165
- except Exception as e:
166
- log_operation_failure("delete webhook", e)
167
- return False
168
-
169
-
170
- def delete_webhooks(
171
- webhook_specs: Optional[List[Dict[str, str]]] = None
172
- ) -> bool:
173
- """Delete all webhooks or specified webhooks."""
174
- try:
175
- global _client
176
-
177
- if webhook_specs is None:
178
- # Delete all webhooks
179
- webhooks = _client.webhooks
180
- webhook_specs = []
181
- for webhook in webhooks:
182
- if webhook.url and webhook.events:
183
- webhook_specs.append({
184
- 'url': webhook.url,
185
- 'events': ','.join(webhook.events)
186
- })
187
-
188
- if not webhook_specs:
189
- log_operation_success("delete webhooks", "No webhooks to delete")
190
- return True
191
-
192
- # Delete each webhook
193
- success_count = 0
194
- for webhook_spec in webhook_specs:
195
- webhook_url = webhook_spec.get('url', '')
196
- webhook_events = webhook_spec.get('events', '').split(',') if webhook_spec.get('events') else []
197
-
198
- if delete_webhook(webhook_url, webhook_events):
199
- success_count += 1
200
-
201
- log_operation_success("delete webhooks",
202
- f"Deleted {success_count}/{len(webhook_specs)} webhooks")
203
-
204
- return success_count == len(webhook_specs)
205
-
206
- except Exception as e:
207
- log_operation_failure("delete webhooks", e,)
208
- return False
209
-
210
-
211
- def webhook_exists(
212
- event: str
213
- ) -> bool:
214
- """Check if a webhook already exists for a specific event."""
215
- try:
216
- global _client
217
- webhooks = _client.webhooks
218
-
219
- for webhook in webhooks:
220
- if webhook.events and event in webhook.events:
221
- log_webhook_operation("found existing", event, f"webhook URL: {webhook.url}")
222
- return True
223
-
224
- return False
225
-
226
- except Exception as e:
227
- log_operation_failure("check webhook exists", e)
228
- return False
229
-
230
-
231
- def validate_webhooks(
232
- ) -> bool:
233
- """Validate that webhook configuration is correct."""
234
- try:
235
- # Check if webhook URL is set
236
- webhook_url = os.getenv("ARGILLA_WEBHOOK_URL")
237
- if not webhook_url:
238
- log_operation_failure("validate webhook config", Exception("ARGILLA_WEBHOOK_URL environment variable not set"))
239
- return False
240
-
241
- # Check if events are configured
242
- events = list_webhook_events()
243
- if not events:
244
- log_operation_failure("validate webhook config", Exception("No webhook events configured"))
245
- return False
246
-
247
- # Check if Argilla client can be created
248
- try:
249
- get_client()
250
- except Exception as e:
251
- log_operation_failure("validate webhook config", Exception(f"Cannot create Argilla client: {str(e)}"))
252
- return False
253
-
254
- log_operation_success("validate webhook config", f"Configuration valid for {len(events)} events")
255
- return True
256
-
257
- except Exception as e:
258
- log_operation_failure("validate webhook config", e)
259
- return False
260
-
261
-
262
- def update_webhook(
263
- webhook_url: str,
264
- webhook_events: List[str],
265
- new_url: Optional[str] = None,
266
- new_events: Optional[List[str]] = None,
267
- new_description: Optional[str] = None,
268
- ) -> bool:
269
- """Update a webhook's properties by recreating it (since Argilla doesn't support direct updates)."""
270
- try:
271
- global _client
272
- # Find webhook
273
- webhook = None
274
- for w in _client.webhooks:
275
- if (w.url == webhook_url and
276
- w.events and
277
- set(w.events) == set(webhook_events)):
278
- webhook = w
279
- break
280
-
281
- if not webhook:
282
- log_operation_failure("update webhook",
283
- Exception(f"Webhook with URL {webhook_url} and events {webhook_events} not found"))
284
- return False
285
-
286
- # Since Argilla doesn't support direct webhook updates, we need to recreate
287
- # First delete the existing webhook
288
- webhook.delete()
289
- log_webhook_operation("deleted for update", f"{webhook_url} ({', '.join(webhook_events)})")
290
-
291
- # Create new webhook with updated properties
292
- final_url = new_url if new_url else webhook_url
293
- final_events = new_events if new_events else webhook_events
294
- final_description = new_description if new_description else webhook.description
295
-
296
- for event in final_events:
297
- description = final_description or f"Webhook for {event} events to {final_url}"
298
- new_webhook = rg.Webhook(
299
- url=final_url,
300
- events=[event], # type: ignore
301
- description=description
302
- )
303
- new_webhook.create()
304
-
305
- updates = []
306
- if new_url:
307
- updates.append(f"url: {new_url}")
308
- if new_events:
309
- updates.append(f"events: {', '.join(new_events)}")
310
- if new_description:
311
- updates.append(f"description: {new_description}")
312
-
313
- log_operation_success("update webhook", f"{webhook_url} - {', '.join(updates)}")
314
-
315
- return True
316
-
317
- except Exception as e:
318
- log_operation_failure("update webhook", e)
319
- return False
320
-
321
-
322
- def update_webhooks(
323
- webhook_updates: List[Dict[str, str]]
324
- ) -> bool:
325
- """Update multiple webhooks."""
326
- success_count = 0
327
- for update_info in webhook_updates:
328
- webhook_url = update_info.get('url', '')
329
- webhook_events = update_info.get('events', '').split(',') if update_info.get('events') else []
330
- new_url = update_info.get('new_url')
331
- new_events = update_info.get('new_events', '').split(',') if update_info.get('new_events') else None
332
- new_description = update_info.get('new_description')
333
-
334
- if update_webhook(webhook_url, webhook_events, new_url, new_events, new_description):
335
- success_count += 1
336
-
337
- log_operation_success("update webhooks",
338
- f"Updated {success_count}/{len(webhook_updates)} webhooks")
339
-
340
- return success_count == len(webhook_updates)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
utils/wipe_utils.py DELETED
@@ -1,164 +0,0 @@
1
- """
2
- wipe_utils.py
3
-
4
- Helper functions for wiping/cleaning Argilla space in the MERe Workshop annotation pipeline.
5
- Transformed from wipe-space.py script to follow proper helper function paradigm.
6
- """
7
-
8
- from .setup_utils import get_client
9
- from .dataset_utils import delete_datasets
10
- from .user_utils import delete_users
11
- from .webhook_utils import delete_webhooks
12
- from .workspace_utils import delete_workspaces
13
- from .log_utils import (
14
- log_operation_success,
15
- log_operation_failure,
16
- )
17
-
18
-
19
- # Setup client
20
- _client = get_client()
21
-
22
-
23
- def wipe_space(
24
- ) -> bool:
25
- """Completely wipe the Argilla space - datasets, users, workspaces, and webhooks."""
26
- try:
27
- # Track success of each operation
28
- operations = [
29
- ("datasets", delete_datasets),
30
- ("webhooks", delete_webhooks),
31
- ("users", delete_users),
32
- ("workspaces", delete_workspaces)
33
- ]
34
-
35
- operations_results = {}
36
-
37
- # Execute each operation and continue even if one fails
38
- for operation_name, operation_func in operations:
39
- try:
40
- success = operation_func()
41
- operations_results[operation_name] = success
42
- if success:
43
- log_operation_success(f"wipe {operation_name}", "Operation completed successfully")
44
- else:
45
- log_operation_failure(f"wipe {operation_name}", Exception("Operation completed with some failures"))
46
- except Exception as e:
47
- operations_results[operation_name] = False
48
- log_operation_failure(f"wipe {operation_name}", e)
49
-
50
- # Calculate summary
51
- successful_ops = sum(1 for success in operations_results.values() if success)
52
- total_ops = len(operations_results)
53
-
54
- if successful_ops == total_ops:
55
- log_operation_success("wipe entire Argilla space", "All components deleted successfully")
56
- return True
57
- else:
58
- failed_ops = [name for name, success in operations_results.items() if not success]
59
- log_operation_failure("wipe entire Argilla space",
60
- Exception(f"{total_ops - successful_ops}/{total_ops} operations failed: {', '.join(failed_ops)}"))
61
- # Return True if at least some operations succeeded
62
- return successful_ops > 0
63
-
64
- except Exception as e:
65
- log_operation_failure("wipe entire Argilla space", e)
66
- return False
67
-
68
-
69
- def wipe_datasets_only(
70
- ) -> bool:
71
- """Wipe only datasets, keeping users and workspaces."""
72
- try:
73
- success = delete_datasets()
74
-
75
- if success:
76
- log_operation_success("wipe datasets only", "All datasets deleted successfully")
77
- else:
78
- log_operation_failure("wipe datasets only", Exception("Some datasets could not be deleted"))
79
-
80
- return success
81
-
82
- except Exception as e:
83
- log_operation_failure("wipe datasets only", e)
84
- return False
85
-
86
-
87
- def wipe_users_only(
88
- ) -> bool:
89
- """Wipe only users, keeping datasets and workspaces."""
90
- try:
91
- success = delete_users()
92
-
93
- if success:
94
- log_operation_success("wipe users only", "All users deleted successfully")
95
- else:
96
- log_operation_failure("wipe users only", Exception("Some users could not be deleted"))
97
-
98
- return success
99
-
100
- except Exception as e:
101
- log_operation_failure("wipe users only", e)
102
- return False
103
-
104
-
105
- def wipe_webhooks_only(
106
- ) -> bool:
107
- """Wipe only webhooks, keeping everything else."""
108
- try:
109
- success = delete_webhooks()
110
-
111
- if success:
112
- log_operation_success("wipe webhooks only", "All webhooks deleted successfully")
113
- else:
114
- log_operation_failure("wipe webhooks only", Exception("Some webhooks could not be deleted"))
115
-
116
- return success
117
-
118
- except Exception as e:
119
- log_operation_failure("wipe webhooks only", e)
120
- return False
121
-
122
-
123
- def get_status(
124
- ) -> dict:
125
- """Get current status of the Argilla space (counts of datasets, users, etc.)."""
126
- try:
127
- global _client
128
-
129
- # Count datasets across all workspaces
130
- total_datasets = 0
131
- total_records = 0
132
- for workspace in _client.workspaces:
133
- workspace_datasets = workspace.datasets
134
- total_datasets += len(workspace_datasets)
135
-
136
- for dataset in workspace_datasets:
137
- try:
138
- records = list(dataset.records)
139
- total_records += len(records)
140
- except Exception:
141
- # Skip if can't access records
142
- pass
143
-
144
- status = {
145
- 'workspaces': len(_client.workspaces),
146
- 'users': len(_client.users),
147
- 'datasets': total_datasets,
148
- 'records': total_records,
149
- 'webhooks': len(_client.webhooks)
150
- }
151
-
152
- log_operation_success("get space status", f"Status retrieved: {status}")
153
- return status
154
-
155
- except Exception as e:
156
- log_operation_failure("get space status", e)
157
- return {
158
- 'workspaces': 0,
159
- 'users': 0,
160
- 'datasets': 0,
161
- 'records': 0,
162
- 'webhooks': 0,
163
- 'error': str(e)
164
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
utils/workspace_utils.py DELETED
@@ -1,387 +0,0 @@
1
- """
2
- workspace_utils.py
3
-
4
- Helper functions for workspace management in the MERe Workshop annotation pipeline.
5
- Handles workspace creation, deletion, user assignment, and management operations.
6
- """
7
-
8
- from typing import Dict, List, Optional
9
- import warnings
10
-
11
- import argilla as rg
12
-
13
- from .setup_utils import (
14
- get_client,
15
- load_users
16
- )
17
- from .log_utils import (
18
- log_operation_success,
19
- log_operation_failure,
20
- log_user_operation
21
- )
22
-
23
-
24
- # Setup client
25
- _client = get_client()
26
-
27
-
28
- def create_workspace(
29
- workspace_name: str,
30
- ) -> bool:
31
- """Create a single workspace."""
32
- global _client
33
-
34
- # Check if workspace already exists
35
- try:
36
- with warnings.catch_warnings():
37
- warnings.simplefilter("ignore")
38
- existing_workspace = _client.workspaces(workspace_name)
39
- if existing_workspace:
40
- log_operation_success("create workspace", f"{workspace_name} (already exists)")
41
- return True
42
- except Exception:
43
- # Workspace doesn't exist, continue with creation
44
- pass
45
-
46
- try:
47
- workspace = rg.Workspace(name=workspace_name)
48
- workspace.create()
49
- log_operation_success("create workspace", workspace_name)
50
- return True
51
-
52
- except Exception as e:
53
- # Check if workspace already exists
54
- error_str = str(e).lower()
55
- if "conflict" in error_str or "already exists" in error_str or "not unique" in error_str:
56
- log_operation_success("create workspace", f"{workspace_name} (already exists)")
57
- return True
58
- else:
59
- log_operation_failure("create workspace", e)
60
- return False
61
-
62
-
63
- def create_workspaces(
64
- workspace_names: List[str]
65
- ) -> bool:
66
- """Create multiple workspaces from a list of workspace names."""
67
- global _client
68
-
69
- success_count = 0
70
- for workspace_name in workspace_names:
71
- if create_workspace(workspace_name):
72
- success_count += 1
73
-
74
- log_operation_success("create workspaces",
75
- f"Created {success_count}/{len(workspace_names)} workspaces")
76
-
77
- return success_count == len(workspace_names)
78
-
79
-
80
- def create_user_workspace(
81
- username: str,
82
- workspace_name: str
83
- ) -> bool:
84
- """Add a user to a specific workspace."""
85
- global _client
86
-
87
-
88
- try:
89
- # Find user
90
- user = None
91
- for u in _client.users:
92
- if u.username == username:
93
- user = u
94
- break
95
-
96
- if not user:
97
- log_operation_failure("add user to workspace", Exception(f"User {username} not found"))
98
- return False
99
-
100
- # Find workspace
101
- with warnings.catch_warnings():
102
- warnings.simplefilter("ignore")
103
- workspace = _client.workspaces(workspace_name)
104
- if not workspace:
105
- log_operation_failure("add user to workspace", Exception(f"Workspace {workspace_name} not found"))
106
- return False
107
-
108
- # Check if user is already in workspace
109
- try:
110
- workspace_users = list(workspace.users)
111
- for existing_user in workspace_users:
112
- if existing_user.username == username:
113
- log_user_operation("added to workspace", username, f"{workspace_name} (already assigned)")
114
- return True
115
- except Exception:
116
- # Continue if check fails
117
- pass
118
-
119
- # Add user to workspace
120
- workspace.add_user(user) #type: ignore
121
- log_user_operation("added to workspace", username, workspace_name)
122
-
123
- return True
124
-
125
- except Exception as e:
126
- # Check if user already in workspace
127
- error_str = str(e).lower()
128
- if "conflict" in error_str or "already" in error_str:
129
- log_user_operation("added to workspace", username, f"{workspace_name} (already assigned)")
130
- return True
131
- else:
132
- log_operation_failure("add user to workspace", e)
133
- return False
134
-
135
-
136
- def create_user_workspaces(
137
- user_workspace_map: Optional[Dict[str, List[str]]] = None
138
- ) -> bool:
139
- """Create workspaces for users based on mapping or CSV data."""
140
-
141
- if user_workspace_map is None:
142
- # Load from CSV and create user workspaces based on usernames
143
- users = load_users()
144
- if not users:
145
- log_operation_failure("create user workspaces", Exception("No users found in CSV"))
146
- return False
147
-
148
- success_count = 0
149
- total_count = 0
150
-
151
- for user_data in users:
152
- username = user_data['username']
153
- # Create workspace with username as workspace name
154
- total_count += 1
155
- if create_workspace(username):
156
- # Add user to their workspace
157
- if create_user_workspace(username, username):
158
- success_count += 1
159
-
160
- log_operation_success("create user workspaces from CSV",
161
- f"Created {success_count}/{total_count} user workspaces")
162
-
163
- return success_count == total_count
164
- else:
165
- # Use provided mapping
166
- success_count = 0
167
- total_count = 0
168
-
169
- for username, workspace_names in user_workspace_map.items():
170
- for workspace_name in workspace_names:
171
- total_count += 1
172
- if create_user_workspace(username, workspace_name):
173
- success_count += 1
174
-
175
- log_operation_success("create user workspaces from mapping",
176
- f"Added users to {success_count}/{total_count} workspaces")
177
-
178
- return success_count == total_count
179
-
180
-
181
- def delete_workspace(
182
- workspace_name: str, client: Optional[rg.Argilla] = None
183
- ) -> bool:
184
- """Delete a single workspace."""
185
- global _client
186
-
187
- try:
188
- with warnings.catch_warnings():
189
- warnings.simplefilter("ignore")
190
- workspace = _client.workspaces(workspace_name)
191
- if not workspace:
192
- log_operation_failure("delete workspace", Exception(f"Workspace {workspace_name} not found"))
193
- return False
194
-
195
- # Check for remaining datasets first
196
- try:
197
- datasets = list(workspace.datasets)
198
- if datasets:
199
- dataset_names = [ds.name for ds in datasets if ds.name]
200
- log_operation_failure("delete workspace",
201
- Exception(f"Workspace {workspace_name} still has datasets: {', '.join(dataset_names)}. Delete datasets first."))
202
- return False
203
- except Exception as e:
204
- # If we can't check datasets, try to continue
205
- log_operation_failure("check workspace datasets", e)
206
-
207
- # Remove all users from workspace first
208
- try:
209
- workspace_users = list(workspace.users)
210
- for user in workspace_users:
211
- try:
212
- workspace.remove_user(user)
213
- log_user_operation("removed from workspace", user.username or f"User-{user.id}", workspace_name)
214
- except Exception as e:
215
- log_operation_failure("remove user from workspace", e)
216
- except Exception as e:
217
- # Continue if user removal fails
218
- log_operation_failure("remove users from workspace", e)
219
-
220
- # Delete the workspace
221
- workspace.delete()
222
- log_operation_success("delete workspace", workspace_name)
223
-
224
- return True
225
-
226
- except Exception as e:
227
- # Check if it's a dependency error
228
- error_str = str(e).lower()
229
- if "has some datasets linked" in error_str or "dependency" in error_str:
230
- log_operation_failure("delete workspace",
231
- Exception(f"Workspace {workspace_name} cannot be deleted due to remaining dependencies"))
232
- else:
233
- log_operation_failure("delete workspace", e)
234
- return False
235
-
236
-
237
- def delete_workspaces(
238
- workspace_names: Optional[List[str]] = None
239
- ) -> bool:
240
- """Delete multiple workspaces or all workspaces if none specified."""
241
- global _client
242
- if workspace_names is None:
243
- # Delete all workspaces
244
- workspaces = _client.workspaces
245
- workspace_names = [ws.name for ws in workspaces if ws.name]
246
-
247
- success_count = 0
248
- for workspace_name in workspace_names:
249
- if delete_workspace(workspace_name):
250
- success_count += 1
251
-
252
- log_operation_success("delete workspaces",
253
- f"Deleted {success_count}/{len(workspace_names)} workspaces")
254
-
255
- return success_count == len(workspace_names)
256
-
257
-
258
- def delete_user_workspace(
259
- username: str,
260
- workspace_name: str,
261
- delete_if_empty: bool = True
262
- ) -> bool:
263
- """Remove a user from a workspace and optionally delete workspace if empty."""
264
- global _client
265
-
266
- try:
267
- # Find user
268
- user = None
269
- for u in _client.users:
270
- if u.username == username:
271
- user = u
272
- break
273
-
274
- if not user:
275
- log_operation_failure("remove user from workspace", Exception(f"User {username} not found"))
276
- return False
277
-
278
- # Find workspace
279
- with warnings.catch_warnings():
280
- warnings.simplefilter("ignore")
281
- workspace = _client.workspaces(workspace_name)
282
- if not workspace:
283
- log_operation_failure("remove user from workspace", Exception(f"Workspace {workspace_name} not found"))
284
- return False
285
-
286
- # Remove user from workspace
287
- workspace.remove_user(user)
288
- log_user_operation("removed from workspace", username, workspace_name)
289
-
290
- # Check if workspace is empty and delete if requested
291
- if delete_if_empty:
292
- remaining_users = workspace.users
293
- if not remaining_users:
294
- workspace.delete()
295
- log_operation_success("delete empty workspace", workspace_name)
296
- else:
297
- log_operation_success("workspace not empty", f"{workspace_name} still has {len(remaining_users)} users")
298
-
299
- return True
300
-
301
- except Exception as e:
302
- log_operation_failure("remove user from workspace", e)
303
- return False
304
-
305
-
306
- def delete_user_workspaces(usernames: List[str]) -> bool:
307
- """Remove users from all their workspaces and delete empty workspaces."""
308
-
309
- success_count = 0
310
- for username in usernames:
311
- user_workspaces = list_user_workspaces(username)
312
- user_success = True
313
-
314
- for workspace_name in user_workspaces:
315
- if not delete_user_workspace(username, workspace_name, delete_if_empty=True):
316
- user_success = False
317
-
318
- if user_success:
319
- success_count += 1
320
-
321
- log_operation_success("delete user workspaces",
322
- f"Processed {success_count}/{len(usernames)} users")
323
-
324
- return success_count == len(usernames)
325
-
326
-
327
- def list_workspaces(
328
- ) -> List[Dict[str, str]]:
329
- """List all workspaces with their details."""
330
- global _client
331
-
332
- try:
333
- workspaces = _client.workspaces
334
- workspace_list = []
335
-
336
- for workspace in workspaces:
337
- workspace_info = {
338
- 'name': workspace.name or '',
339
- 'id': str(workspace.id) if workspace.id else '',
340
- 'user_count': str(len(workspace.users))
341
- }
342
- workspace_list.append(workspace_info)
343
-
344
- log_operation_success("list workspaces", f"Found {len(workspace_list)} workspaces")
345
- return workspace_list
346
-
347
- except Exception as e:
348
- log_operation_failure("list workspaces", e)
349
- return []
350
-
351
-
352
- def list_user_workspaces(
353
- username: str,
354
- ) -> List[str]:
355
- """Get list of workspaces a user has access to."""
356
- global _client
357
-
358
- try:
359
- # Find user
360
- user = None
361
- for u in _client.users:
362
- if u.username == username:
363
- user = u
364
- break
365
-
366
- if not user:
367
- log_operation_failure("get user workspaces", Exception(f"User {username} not found"))
368
- return []
369
-
370
- # Get workspaces the user has access to
371
- workspaces = []
372
- for workspace in _client.workspaces:
373
- try:
374
- # Check if user has access to workspace
375
- workspace_users = workspace.users
376
- if any(wu.id == user.id for wu in workspace_users):
377
- workspaces.append(workspace.name or '')
378
- except Exception:
379
- # Skip workspaces we can't access
380
- continue
381
-
382
- log_user_operation("listed workspaces", username, f"Found {len(workspaces)} workspaces")
383
- return workspaces
384
-
385
- except Exception as e:
386
- log_operation_failure("get user workspaces", e)
387
- return []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
wipe.py DELETED
@@ -1,165 +0,0 @@
1
- #!/usr/bin/env python3
2
-
3
- """
4
- wipe.py
5
-
6
- Clean wipe script for the MERe Workshop annotation pipeline.
7
- Removes users, workspaces, datasets, and webhooks using modular helper functions.
8
- """
9
-
10
- import sys
11
- import argparse
12
- from pathlib import Path
13
-
14
- from utils import (
15
- validate_env,
16
- log_operation_success,
17
- log_operation_failure,
18
- wipe_space,
19
- wipe_datasets_only,
20
- wipe_users_only,
21
- wipe_webhooks_only,
22
- get_status,
23
- log_info,
24
- log_warning
25
- )
26
-
27
-
28
- def parse_args():
29
- """Parse command line arguments."""
30
- parser = argparse.ArgumentParser(
31
- description="Wipe MERe Workshop Argilla space",
32
- formatter_class=argparse.ArgumentDefaultsHelpFormatter,
33
- )
34
-
35
- parser.add_argument(
36
- "-d", "--datasets-only",
37
- action="store_true",
38
- help="Only wipe datasets, keep users and workspaces",
39
- )
40
-
41
- parser.add_argument(
42
- "-u", "--users-only",
43
- action="store_true",
44
- help="Only wipe users, keep datasets and workspaces",
45
- )
46
-
47
- parser.add_argument(
48
- "-w", "--webhooks-only",
49
- action="store_true",
50
- help="Only wipe webhooks, keep everything else",
51
- )
52
-
53
- parser.add_argument(
54
- "-s", "--status-only",
55
- action="store_true",
56
- help="Only show current space status, do not perform wipe",
57
- )
58
-
59
- parser.add_argument("--force", action="store_true", help="Skip confirmation prompt")
60
-
61
- return parser.parse_args()
62
-
63
-
64
- def show_space_status():
65
- """Display current space status."""
66
- status = get_status()
67
-
68
- if "error" in status:
69
- log_operation_failure("check space status", status["error"])
70
- return False
71
-
72
- print()
73
- log_info("=== Current Argilla Space Status ===")
74
- log_info(f"Workspaces: {status['workspaces']}")
75
- log_info(f"Users: {status['users']}")
76
- log_info(f"Datasets: {status['datasets']}")
77
- log_info(f"Records: {status['records']}")
78
- log_info(f"Webhooks: {status['webhooks']}")
79
- print()
80
-
81
- return True
82
-
83
-
84
- def confirm_wipe(
85
- operation_description: str,
86
- force: bool = False
87
- ) -> bool:
88
- """Confirm wipe operation with user."""
89
- if force:
90
- return True
91
-
92
- log_warning(f"WARNING: This will {operation_description}")
93
- log_warning("This action cannot be undone!")
94
-
95
- log_warning("Are you sure you want to proceed? [y/N]:")
96
- response = input().strip().lower()
97
- return response in ["y", "yes"]
98
-
99
-
100
- def main():
101
- """Main wipe function."""
102
- args = parse_args()
103
-
104
- # Validate environment
105
- try:
106
- validate_env()
107
- log_operation_success("wipe validation", "Environment validated")
108
- except Exception as e:
109
- log_operation_failure("wipe validation", e)
110
- return 1
111
-
112
- # Show current status
113
- if not show_space_status():
114
- return 1
115
-
116
- # If status-only mode, exit here
117
- if args.status_only:
118
- return 0
119
-
120
- # Determine operation and confirmation message
121
- if args.datasets_only:
122
- operation = "datasets only"
123
- confirmation_msg = "delete ALL DATASETS (keeping users and workspaces)"
124
- wipe_function = wipe_datasets_only
125
- elif args.users_only:
126
- operation = "users only"
127
- confirmation_msg = (
128
- "delete ALL USERS (keeping datasets, workspaces, and webhooks)"
129
- )
130
- wipe_function = wipe_users_only
131
- elif args.webhooks_only:
132
- operation = "webhooks only"
133
- confirmation_msg = "delete ALL WEBHOOKS (keeping users and datasets)"
134
- wipe_function = wipe_webhooks_only
135
- else:
136
- operation = "entire space"
137
- confirmation_msg = "DELETE EVERYTHING (users, workspaces, datasets, webhooks)"
138
- wipe_function = wipe_space
139
-
140
- # Confirm operation
141
- if not confirm_wipe(confirmation_msg, args.force):
142
- log_info("Wipe operation cancelled")
143
- return 0
144
-
145
- # Perform wipe operation
146
- print()
147
- log_info(f"Wiping {operation}...")
148
- success = wipe_function()
149
-
150
- if success:
151
- log_operation_success(f"wipe {operation}", "Operation completed successfully")
152
- else:
153
- log_operation_failure(f"wipe {operation}", Exception("Operation failed"))
154
- return 1
155
-
156
- # Show final status
157
- if not show_space_status():
158
- return 1
159
-
160
- log_operation_success("Wipe operation completed", send_to_slack=True)
161
- return 0
162
-
163
-
164
- if __name__ == "__main__":
165
- exit(main())
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
wipe.sh DELETED
@@ -1,16 +0,0 @@
1
- #!/bin/bash
2
-
3
- # wipe.sh
4
- #
5
- # Shell wrapper for the MERe Workshop wipe process.
6
-
7
- set -euo pipefail
8
-
9
- # Get the directory where this script is located
10
- SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &> /dev/null && pwd)"
11
-
12
- # Change to the script directory
13
- cd "$SCRIPT_DIR"
14
-
15
- # Run the wipe script
16
- python wipe.py "$@"