Spaces:

imageomics
/

bioclip-2-demo

Running

App Files Files Community

egrace479 commited on Jun 6

Commit

040d081

verified ·

1 Parent(s): 1e93828

Add metadata file for sample image retrieval and description of it

Browse files

Files changed (3) hide show

README.md +14 -1
app.py +1 -7
components/metadata.parquet +3 -0

README.md CHANGED Viewed

@@ -16,6 +16,19 @@ datasets:
 This app is modified from the original [BioCLIP Demo](https://huggingface.co/spaces/imageomics/bioclip-demo) to run inference with [BioCLIP 2](https://huggingface.co/imageomics/bioclip-2) and uses [pybioclip](https://github.com/Imageomics/pybioclip).
-Due to space persistent storage limitations, embeddings are fetched from the [TreeOfLife-200M repo](https://huggingface.co/datasets/imageomics/TreeOfLife-200M) and metadata for the images comes from [demo-data](https://huggingface.co/datasets/imageomics/demo-data) (a private Institute dataset repo). The images will be retrieved from an S3 bucket, as with the original BioCLIP demo.
 Note that if this space is duplicated, the sample image portion **will not work**.

 This app is modified from the original [BioCLIP Demo](https://huggingface.co/spaces/imageomics/bioclip-demo) to run inference with [BioCLIP 2](https://huggingface.co/imageomics/bioclip-2) and uses [pybioclip](https://github.com/Imageomics/pybioclip).
+Due to space persistent storage limitations, embeddings are fetched from the [TreeOfLife-200M repo](https://huggingface.co/datasets/imageomics/TreeOfLife-200M). The images will be retrieved from an S3 bucket, as with the origin, described below.
 Note that if this space is duplicated, the sample image portion **will not work**.
+**bioclip-2/metadata.parquet:** metadata file for fetching [TreeOfLife-200M](https://huggingface.co/datasets/imageomics/TreeOfLife-200M) sample images (up to 3 available per taxa) from an S3 bucket.
+- `uuid`: unique identifier for the image within the TreeOfLife-200M dataset.
+- `eol_page_id`: identifier of EOL page for the most specific taxa of the image (where available). Note that an image's association to a particular page ID may change with updates to the EOL (or image provider's) hierarchy. However, EOL taxon page IDs are stable. "https://eol.org/pages/" + `eol_page_id` links to the page.
+- `gbif_id`: identifier used by GBIF for the most specific taxa of the image (where available). "https://gbif.org/species/" + `taxonID` links to the page.
+- `kingdom`: kingdom to which the subject of the image belongs (all `Animalia`).
+- `phylum`: phylum to which the subject of the image belongs.
+- `class`: class to which the subject of the image belongs.
+- `order`: order to which the subject of the image belongs.
+- `family`: family to which the subject of the image belongs.
+- `genus`: genus to which the subject of the image belongs.
+- `species`: species to which the subject of the image belongs.
+- `file_path`: image filepath to fetch image from S3 bucket (`<folder>/<uuid>.jpg`, folders are first two characters of the `uuid`).

app.py CHANGED Viewed

@@ -14,7 +14,6 @@ from torchvision import transforms
 from components.query import  get_sample
 from bioclip import CustomLabelsClassifier
-from huggingface_hub import hf_hub_download
 log_format = "[%(asctime)s] [%(levelname)s] [%(name)s] %(message)s"
 logging.basicConfig(level=logging.INFO, format=log_format)
@@ -23,12 +22,7 @@ logger = logging.getLogger()
 hf_token = os.getenv("HF_TOKEN")
 # For sample images
-hf_hub_download(repo_id="imageomics/demo-data",
-                filename="bioclip-2/metadata.parquet",
-                repo_type="dataset",
-                local_dir = "components",
-                token = hf_token)
-METADATA_PATH = "components/bioclip-2/metadata.parquet"
 # Read page IDs as int
 metadata_df = pl.read_parquet(METADATA_PATH, low_memory = False)
 metadata_df = metadata_df.with_columns(pl.col(["eol_page_id", "gbif_id"]).cast(pl.Int64))

 from components.query import  get_sample
 from bioclip import CustomLabelsClassifier
 log_format = "[%(asctime)s] [%(levelname)s] [%(name)s] %(message)s"
 logging.basicConfig(level=logging.INFO, format=log_format)
 hf_token = os.getenv("HF_TOKEN")
 # For sample images
+METADATA_PATH = "components/metadata.parquet"
 # Read page IDs as int
 metadata_df = pl.read_parquet(METADATA_PATH, low_memory = False)
 metadata_df = metadata_df.with_columns(pl.col(["eol_page_id", "gbif_id"]).cast(pl.Int64))

components/metadata.parquet ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6af05f1f8f08b0d447b9a4c18680c7de39551a05318f026d30c224a9bbe5283e
+size 121162891