Mikolaj Czerkawski

mikonvergence

AI & ML interests

None yet

Recent Activity

updated a Space 21 days ago
ESA-philab/README
reacted to mkluczek's post with ๐Ÿš€ 2 months ago
First Global and Dense Open Embedding Dataset of Earth! ๐ŸŒ ๐Ÿค— Introducing the Major TOM embeddings dataset, created in collaboration with CloudFerro S.A. ๐Ÿ”ถ and ฮฆ-lab at the European Space Agency (ESA) ๐Ÿ›ฐ๏ธ. Together with @mikonvergence and Jฤ™drzej S. Bojanowski, we present the first open-access dataset of Copernicus embeddings, offering dense, global coverage across the full acquisition areas of Sentinel-1 and Sentinel-2 sensors. ๐Ÿ’ก Highlights: ๐Ÿ“Š Data: Over 8 million Sentinel-1 & Sentinel-2 images processed, distilling insights from 9.368 trillion pixels of raw data. ๐Ÿง  Models: Foundation models include SigLIP, DINOv2, and SSL4EO. ๐Ÿ“ฆ Scale: 62 TB of raw satellite data processed into 170M+ embeddings. This project delivers open and free vectorized expansions of https://huggingface.co/spaces/Major-TOM/README datasets, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications. ๐Ÿค— Explore the datasets: https://huggingface.co/datasets/Major-TOM/Core-S2L1C-SSL4EO https://huggingface.co/datasets/Major-TOM/Core-S1RTC-SSL4EO https://huggingface.co/datasets/Major-TOM/Core-S2RGB-DINOv2 https://huggingface.co/datasets/Major-TOM/Core-S2RGB-SigLIP ๐Ÿ“– Check paper: https://huggingface.co/papers/2412.05600 ๐Ÿ’ป Code notebook: https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb
reacted to mkluczek's post with ๐Ÿ”ฅ 2 months ago
First Global and Dense Open Embedding Dataset of Earth! ๐ŸŒ ๐Ÿค— Introducing the Major TOM embeddings dataset, created in collaboration with CloudFerro S.A. ๐Ÿ”ถ and ฮฆ-lab at the European Space Agency (ESA) ๐Ÿ›ฐ๏ธ. Together with @mikonvergence and Jฤ™drzej S. Bojanowski, we present the first open-access dataset of Copernicus embeddings, offering dense, global coverage across the full acquisition areas of Sentinel-1 and Sentinel-2 sensors. ๐Ÿ’ก Highlights: ๐Ÿ“Š Data: Over 8 million Sentinel-1 & Sentinel-2 images processed, distilling insights from 9.368 trillion pixels of raw data. ๐Ÿง  Models: Foundation models include SigLIP, DINOv2, and SSL4EO. ๐Ÿ“ฆ Scale: 62 TB of raw satellite data processed into 170M+ embeddings. This project delivers open and free vectorized expansions of https://huggingface.co/spaces/Major-TOM/README datasets, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications. ๐Ÿค— Explore the datasets: https://huggingface.co/datasets/Major-TOM/Core-S2L1C-SSL4EO https://huggingface.co/datasets/Major-TOM/Core-S1RTC-SSL4EO https://huggingface.co/datasets/Major-TOM/Core-S2RGB-DINOv2 https://huggingface.co/datasets/Major-TOM/Core-S2RGB-SigLIP ๐Ÿ“– Check paper: https://huggingface.co/papers/2412.05600 ๐Ÿ’ป Code notebook: https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb
View all activity

Organizations

Gradio-Themes-Party's profile picture satellite-image-deep-learning's profile picture European Space Agency ฮฆ-lab's profile picture Major TOM's profile picture Hugging Face Discord Community's profile picture

mikonvergence's activity

reacted to mkluczek's post with ๐Ÿš€๐Ÿ”ฅ 2 months ago
view post
Post
1669
First Global and Dense Open Embedding Dataset of Earth! ๐ŸŒ ๐Ÿค—

Introducing the Major TOM embeddings dataset, created in collaboration with CloudFerro S.A. ๐Ÿ”ถ and ฮฆ-lab at the European Space Agency (ESA) ๐Ÿ›ฐ๏ธ. Together with @mikonvergence and Jฤ™drzej S. Bojanowski, we present the first open-access dataset of Copernicus embeddings, offering dense, global coverage across the full acquisition areas of Sentinel-1 and Sentinel-2 sensors.

๐Ÿ’ก Highlights:
๐Ÿ“Š Data: Over 8 million Sentinel-1 & Sentinel-2 images processed, distilling insights from 9.368 trillion pixels of raw data.
๐Ÿง  Models: Foundation models include SigLIP, DINOv2, and SSL4EO.
๐Ÿ“ฆ Scale: 62 TB of raw satellite data processed into 170M+ embeddings.

This project delivers open and free vectorized expansions of Major-TOM/README datasets, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.

๐Ÿค— Explore the datasets:
Major-TOM/Core-S2L1C-SSL4EO
Major-TOM/Core-S1RTC-SSL4EO
Major-TOM/Core-S2RGB-DINOv2
Major-TOM/Core-S2RGB-SigLIP

๐Ÿ“– Check paper: Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space (2412.05600)
๐Ÿ’ป Code notebook: https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb
  • 1 reply
ยท
posted an update 6 months ago
view post
Post
2270
๐๐ž๐ฐ ๐‘๐ž๐ฅ๐ž๐š๐ฌ๐ž: ๐Œ๐š๐ฃ๐จ๐ซ ๐“๐Ž๐Œ ๐ƒ๐ข๐ ๐ข๐ญ๐š๐ฅ ๐„๐ฅ๐ž๐ฏ๐š๐ญ๐ข๐จ๐ง ๐Œ๐จ๐๐ž๐ฅ ๐„๐ฑ๐ฉ๐š๐ง๐ฌ๐ข๐จ๐ง ๐Ÿ—บ๏ธ

Dataset: Major-TOM/Core-DEM

Today with European Space Agency - ESA and Adobe Research, we release a global expansion to Major TOM with GLO-30 DEM data.

You can now instantly access nearly 2M of Major TOM samples with elevation data to build your next AI model for EO. ๐ŸŒ

๐Ÿ” Browse the data in our usual viewer app: Major-TOM/MajorTOM-Core-Viewer

Fantastic work championed by Paul Borne--Pons @NewtNewt ๐Ÿš€
posted an update 11 months ago
view post
Post
1521
๐— ๐—ฎ๐—ท๐—ผ๐—ฟ ๐—ง๐—ข๐— : ๐—ฃ๐—น๐—ฎ๐—ป๐—ฒ๐˜ ๐—˜๐—ฎ๐—ฟ๐˜๐—ต ๐—ถ๐˜€ ๐—ฏฬถ๐—นฬถ๐˜‚ฬถ๐—ฒฬถ ๐Ÿฑ.๐Ÿฐ๐Ÿฌ๐Ÿฑ ๐—š๐—›๐˜‡

๐Ÿšจ EXPANSION RELEASE: ๐—ฆ๐—ฒ๐—ป๐˜๐—ถ๐—ป๐—ฒ๐—น-๐Ÿญ ๐—ถ๐˜€ ๐—ป๐—ผ๐˜„ ๐—ฎ๐˜ƒ๐—ฎ๐—ถ๐—น๐—ฎ๐—ฏ๐—น๐—ฒ in the MajorTOM-Core!
Major-TOM/Core-S1RTC

๐ŸŽ Together with @aliFrancis we've been racing to release the first official expansion to the Major TOM project.

MajorTOM-Core-S1RTC contains 1,469,955 of SAR images paired to Sentinel-2 images from Core-S2.

๐Ÿ”We cover more than 65% of the optical coverage with an average time shift of 7 days.

16 TB of radiometrically calibrated SAR imagery, available in the exact same format as the existing Major-TOM data.

๐Ÿ—บ๏ธ You can explore instantly in our viewing app:
Major-TOM/MajorTOM-Core-Viewer

So, what now?

๐Ÿงฑ ๐‚๐จ๐ฆ๐ฆ๐ฎ๐ง๐ข๐ญ๐ฒ ๐†๐ซ๐จ๐ฐ๐ญ๐ก: our community continues to grow! To coordinate the upcoming expansions as well as use cases of the open data, we will organise a meet up on 23 April, you can ๐ซ๐ž๐ ๐ข๐ฌ๐ญ๐ž๐ซ ๐ฒ๐จ๐ฎ๐ซ ๐ข๐ง๐ญ๐ž๐ซ๐ž๐ฌ๐ญ here: https://forms.gle/eBj8JvibJx9b6PLf9

๐Ÿš‚ ๐Ž๐ฉ๐ž๐ง ๐ƒ๐š๐ญ๐š ๐Ÿ๐จ๐ซ ๐Ž๐ฉ๐ž๐ง ๐Œ๐จ๐๐ž๐ฅ๐ฌ: Major-TOM Core dataset is currently supporting several strands of ongoing research within and outwith our lab and we are looking forward to the time when we can release models that take advantage of that data! https://huggingface.co/Major-TOM

๐Ÿ“Œ ๐๐จ๐ฌ๐ญ๐ž๐ซ ๐š๐ญ ๐ˆ๐†๐€๐‘๐’๐’: We will present Major TOM project as a poster at IGARSS in Athens (July) - come talk to us if you're there! You can access the paper here: Major TOM: Expandable Datasets for Earth Observation (2402.12095)


๐ŸŒŒ Developed at European Space Agency ฮฆ-lab in partnership with Hugging Face
reacted to osanseviero's post with ๐Ÿ”ฅ 11 months ago
view post
Post
Diaries of Open Source. Part 5!

๐ŸคฏContextual KTO Mistral PairRM: this model combines iterative KTO, SnorkelAI DPO dataset, Allenai PairRM for ranking, Mistral for the base model, and is a very strong model with Claude 3 quality on AlpacaEval 2.0
Final model: ContextualAI/Contextual_KTO_Mistral_PairRM
Dataset: snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
Leaderboard: https://tatsu-lab.github.io/alpaca_eval/
Base model: mistralai/Mistral-7B-Instruct-v0.2

๐Ÿค tinyBenchmarks: Quick and cheap LLM evaluation!
Code: https://github.com/felipemaiapolo/tinyBenchmarks
Paper: tinyBenchmarks: evaluating LLMs with fewer examples (2402.14992)
Data: tinyBenchmarks/tinyMMLU

๐ŸŽจTransformers.js 2.16 includes StableLM, speaker verification and diarization, and better chat templating. Try some fun demos!
- Xenova/video-object-detection
- Xenova/cross-encoder-web
- Xenova/the-tokenizer-playground

๐Ÿดโ€โ˜ ๏ธ Abascus Liberated-Qwen1.5-72B, a Qwen 72B-based model that strongly follows system prompts
Model: abacusai/Liberated-Qwen1.5-72B

๐Ÿ‘€Design2Code: benchmark of webpage screenshots to code
Data: SALT-NLP/Design2Code
Project https://salt-nlp.github.io/Design2Code/
Paper Design2Code: How Far Are We From Automating Front-End Engineering? (2403.03163)

๐ŸŒŽData and models around the world
- One of the biggest Italian datasets https://hf.co/datasets/manalog/UsenetArchiveIT
- IndicLLMSuite: argest Pre-training and Instruction Fine-tuning dataset collection across 22 Indic languages ai4bharat/indicllmsuite-65ee7d225c337fcfa0991707
- Hebrew-Gemma-11B, the best base Hebrew model yam-peleg/Hebrew-Gemma-11B
- Komodo-7B, a family of multiple Indonesian languages LLMs Yellow-AI-NLP/komodo-7b-base

You can find the previous part at https://huggingface.co/posts/osanseviero/127895284909100
reacted to osanseviero's post with ๐Ÿ‘ 12 months ago
view post
Post
Diaries of Open Source. Part 3! OS goes to the moon!

๐Ÿ’ป OpenCodeInterpreter, a family of very powerful code generation models
Models: m-a-p/opencodeinterpreter-65d312f6f88da990a64da456
Paper: OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement (2402.14658)
Demo m-a-p/OpenCodeInterpreter_demo

๐Ÿ”ท๐Ÿ”ถZephyr 7B Gemma, Gemma fine-tuned with the Zephyr recipe
Model: HuggingFaceH4/zephyr-7b-gemma-v0.1
Demo: HuggingFaceH4/zephyr-7b-gemma-chat
GH Repo: https://github.com/huggingface/alignment-handbook

๐Ÿช†The MixedBread folks released a 2D Matryoshka text embedding model, which means you can dynamically change the embedding size and layer counts
Model: mixedbread-ai/mxbai-embed-2d-large-v1
Release blog post: https://www.mixedbread.ai/blog/mxbai-embed-2d-large-v1

๐Ÿ‹Microsoft released Orca Math, which includes 200K grade school math problems
Dataset: microsoft/orca-math-word-problems-200k

๐ŸฅทIBM silently released Merlinite, a cool model trained on Mixtral-generated synthetic data using a novel LAB method https://huggingface.co/ibm/merlinite-7b

๐ŸŒš Moondream2 - a small vision language model to run on-device!
Model: vikhyatk/moondream2
Demo: vikhyatk/moondream2

๐Ÿ™๏ธCityDreamer: 3D City Generation
Demo: hzxie/city-dreamer
Repo: https://github.com/hzxie/city-dreamer
Model: hzxie/city-dreamer

๐ŸŒML in all languages
Sailor, a family of South-East Asian languages models sail/sailor-language-models-65e19a749f978976f1959825
Samvaad dataset, which includes 140k QA pairs in Hindi, Bengali, Marathi, Tamil, Telugu, Oriya, Punjabi, and Gujarati GenVRadmin/Samvaad-Mixed-Language-2

You can see the previous part at https://huggingface.co/posts/osanseviero/674644082063278
  • 1 reply
ยท
replied to robmarkcole's post 12 months ago
reacted to robmarkcole's post with ๐Ÿค— 12 months ago
reacted to osanseviero's post with ๐Ÿ‘ 12 months ago
view post
Post
Diaries of Open Source. Part 2. Open Source is going brrrrr

๐Ÿš€The European Space Agency releases MajorTOM, a dataset of earth observation covering half the earth. The dataset has 2.5 trillion pixels! Congrats @aliFrancis and @mikonvergence !
Dataset: Major-TOM/Core-S2L2A
Viewer: Major-TOM/MajorTOM-Core-Viewer

๐ŸžRe-ranking models by MixedBreadAI, with very high quality, Apache 2 license, and easy to use!
Models: https://huggingface.co/models?other=reranker&sort=trending&search=mixedbread-ai
Blog: https://www.mixedbread.ai/blog/mxbai-rerank-v1

๐ŸงŠStabilityAI and TripoAI release TripoSR, a super-fast MIT-licensed image-to-3D model!
Model: stabilityai/TripoSR
Demo: stabilityai/TripoSR

๐ŸคTogether AI and HazyResearch release Based
Models and datasets: hazyresearch/based-65d77fb76f9c813c8b94339c
GH repo: https://github.com/HazyResearch/based

๐ŸŒŠLaVague: an open-source pipeline to turn natural language into browser actions! It can run locally with HuggingFaceH4/zephyr-7b-gemma-v0.1
Read more about it at https://huggingface.co/posts/dhuynh95/717319217106504

๐Ÿ†Berkeley Function-Calling Leaderboard
Read about it: https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html
Leaderboard: https://gorilla.cs.berkeley.edu/leaderboard.html

๐ŸฌSailor-Chat: chat models built on top of OpenOrca and @sarahooker CohereForAI Aya project. They can be used for South-East Asia languages such as Indonesian, Thai, Vietnamese, Malay and Lao!
Models: sail/sailor-language-models-65e19a749f978976f1959825
Demo: https://huggingface.co/spaces/sail/Sailor-7B-Chat

๐Ÿค—Arabic-OpenHermes-2.5: OpenHermes dataset translated to Arabic 2A2I/Arabic-OpenHermes-2.5

See the previous part here https://huggingface.co/posts/osanseviero/622788932781684
  • 3 replies
ยท
reacted to robmarkcole's post with โค๏ธ 12 months ago
reacted to aliFrancis's post with ๐Ÿค— 12 months ago
view post
Post
๐Ÿ—บ Major TOM: Expandable Datasets for Earth Observation

๐Ÿšจ RECORD-BREAKING EO DATASET: the largest ever ML-ready Sentinel-2 dataset! It covers almost every single point on Earth captured by the Copernicus Sentinel-2 satellite. @mikonvergence and I are thrilled to finally announce the release of Major-TOM/Core-S2L2A and Major-TOM/Core-S2L1C

๐ŸŒ About half of the entire planet is covered. That's 2,245,886 patches of 1068 x 1068 pixels, available in both L1C and L2A. At 10 m resolution, we've got 256 million square km with over 2.5 trillion pixels. It's all yours with a few lines of code. See the paper linked below ๐Ÿ”ฝ for more info!

๐Ÿงฑ And this is just the beginning. We are currently preparing more datasets from different satellites for the Major TOM org. TOM stands for Terrestrial Observation Metaset - a simple set of rules for building an ecosystem of ML-ready EO datasets, which can be seamlessly combined as if they were Lego bricks.

๐Ÿšดโ€โ™€๏ธ Want to take the dataset for a spin? We have a viewer app on spaces that lets you go anywhere on Earth and shows you the data, if its available Major-TOM/MajorTOM-Core-Viewer

๐Ÿ“ฐ Preprint paper: Major TOM: Expandable Datasets for Earth Observation (2402.12095)
๐Ÿ’ป Colab example: https://colab.research.google.com/github/ESA-PhiLab/Major-TOM/blob/main/03-Filtering-in-Colab.ipynb

Thank you to the amazing ๐Ÿค—Hugging Face team for the support on this one! @osanseviero @lhoestq @BrigitteTousi
  • 1 reply
ยท