Sadem-12 commited on
Commit
de409a0
·
verified ·
1 Parent(s): 4751fac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -46
README.md CHANGED
@@ -10,82 +10,95 @@ pinned: false
10
  license: apache-2.0
11
  short_description: 'Fainal Project , Part 2 '
12
  ---
 
13
 
14
- # AI Study Summary
 
 
 
15
 
16
- This project provides a simple and interactive tool for text summarization in Arabic and English using the mT5 multilingual model. The tool leverages the Hugging Face `transformers` library and Gradio to create an easy-to-use web interface. The goal is to allow users to input a piece of text in either Arabic or English and receive a concise summary of it.
17
 
18
  ## Project Objectives
 
 
 
 
19
 
20
- The objective of this project is to demonstrate how text summarization can be achieved in both Arabic and English using a single multilingual model. It aims to:
21
- - Provide an easy way for users to summarize text in Arabic and English.
22
- - Allow users to interact with the model via a Gradio interface.
23
- - Showcase the capabilities of the mT5 multilingual model for text summarization.
24
 
25
  ## Implemented Pipelines
26
 
27
- The main pipeline of this application is as follows:
28
 
29
- 1. **Text Input**: The user inputs a piece of text into a textbox and selects the language of the text (either Arabic or English) using a dropdown menu.
30
-
31
- 2. **Text Tokenization**: The input text is tokenized using a pre-trained tokenizer from the Hugging Face model `csebuetnlp/mT5_multilingual_XLSum`.
32
-
33
- 3. **Text Summarization**: The tokenized text is passed through the mT5 model, which generates a summary using beam search and length penalties to ensure concise and accurate summarization.
34
 
35
- 4. **Output**: The model outputs a summarized version of the input text, which is displayed to the user on the interface.
 
 
 
 
36
 
37
  ## How to Use the Interface
38
- 1. Go to the [AI Study Summary Space](https://huggingface.co/spaces/your-username/ai-study-summary).
39
- 2. In the textbox, enter the text you would like to summarize.
40
- 3. Choose the language of the text from the dropdown menu (Arabic or English).
41
- 4. Click "Submit" to generate the summary.
42
- 5. Optionally, you can try pre-defined example texts by selecting them from the examples section.
 
 
 
 
43
 
44
  ## Model and Pipeline Choices
45
 
46
- - **Model Choice**:
47
- The model used in this application is `csebuetnlp/mT5_multilingual_XLSum`. It is based on the mT5 (multilingual T5) architecture and is specifically fine-tuned for text summarization tasks in multiple languages, including both Arabic and English. The mT5 model is well-suited for summarization tasks as it is designed for multilingual and cross-lingual understanding, making it a perfect fit for this project.
48
 
49
- - **Pipeline Design**:
50
- The pipeline was chosen to ensure efficient and effective summarization while keeping the process straightforward and easy for users. The use of beam search ensures that the model generates high-quality summaries, while the length penalties ensure that the summaries are concise and not overly verbose.
 
 
 
51
 
52
  ## Bilingual Implementation
53
 
54
- This application is designed to handle both Arabic and English text inputs. The mT5 model supports multiple languages, including Arabic and English, allowing for bilingual text summarization. The following points explain how the bilingual implementation is addressed:
55
- - The model itself, mT5, is pre-trained and fine-tuned on a multilingual corpus, allowing it to handle both Arabic and English with high accuracy.
56
- - The user is prompted to select the language of the input text (either Arabic or English) before generating the summary, ensuring that the correct language model is applied to the input text.
 
57
 
58
- If the application were to be extended to more languages in the future, the process would remain the same—only the language selection and model fine-tuning would need to be adjusted.
59
 
60
  ## Requirements
61
 
62
  To run this project locally, you need Python 3.7 or higher. You also need to install the following Python libraries:
63
 
64
 
65
- `pip install gradio`
66
- `pip install transformers`
67
- `pip install torch`
68
-
 
69
 
70
  ---
71
-
72
- - Gradio: Provides a simple interface for building and sharing machine learning demos.
73
- - Transformers: Hugging Face's library for pre-trained models, in this case, for multilingual text summarization.
74
- - Torch: PyTorch is used as the backend for the model.
 
75
 
76
  ## Example Usage
77
  Input:
78
- - Text: "Artificial intelligence is a branch of computer science that aims to create intelligent machines that work and react like humans."
79
- - Language: English
80
-
81
  Output:
82
- - Summary: "AI is a branch of computer science aimed at creating intelligent machines that mimic human behavior."
83
-
84
- ## License
85
- This project is licensed under the MIT License - see the LICENSE file for details.
86
-
87
- ## Acknowledgements
88
- The summarization model used in this application is based on the mT5 Multilingual XLSum model by Hugging Face and is fine-tuned on the XLSum dataset for summarization in multiple languages.
89
- Gradio is used to build the web interface, making it easy to interact with the model.
90
 
91
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
10
  license: apache-2.0
11
  short_description: 'Fainal Project , Part 2 '
12
  ---
13
+ # AI Study Assistant
14
 
15
+ This project provides an interactive tool that performs multiple tasks related to text analysis. Given a text in either Arabic or English, the tool will:
16
+ - Summarize the text.
17
+ - Generate several questions based on the summary.
18
+ - Create a concept map linking the summary to the generated questions.
19
 
20
+ The interface is powered by Gradio, and the tasks are achieved using advanced pre-trained models from Hugging Face.
21
 
22
  ## Project Objectives
23
+ The goal of this project is to demonstrate a comprehensive tool that can:
24
+ - Summarize text in both Arabic and English.
25
+ - Generate relevant questions based on the summarized text.
26
+ - Create a concept map to visually represent the relationship between the summary and the questions.
27
 
28
+ By providing an easy-to-use interface, the application aims to assist users in analyzing any piece of text efficiently, with the ability to generate summaries, questions, and concept maps for further insights.
 
 
 
29
 
30
  ## Implemented Pipelines
31
 
32
+ The project implements three key pipelines:
33
 
34
+ 1. **Text Summarization**:
35
+ - A pre-trained model (`csebuetnlp/mT5_multilingual_XLSum`) is used to generate a concise summary of the input text.
36
+ - The model tokenizes the input, processes it, and outputs a summary.
 
 
37
 
38
+ 2. **Question Generation**:
39
+ - Using the `valhalla/t5-small-e2e-qg` model, this pipeline generates multiple questions based on the summary of the input text. It uses text-to-text generation for this task, returning several relevant questions to probe deeper into the text.
40
+
41
+ 3. **Concept Map Generation**:
42
+ - The generated summary and questions are then used to create a concept map, visualizing the relationships between the summary and the questions. The map is generated using the `graphviz` library, and the output is displayed as an image.
43
 
44
  ## How to Use the Interface
45
+ 1. Open the [AI Study Assistant Space](https://huggingface.co/spaces/your-username/ai-study-assistant).
46
+ 2. In the input box, enter a piece of text in either **Arabic** or **English**.
47
+ 3. Select the language of the text (Arabic or English) from the dropdown menu.
48
+ 4. Click "Submit" to receive:
49
+ - A **Summary** of the text.
50
+ - A list of **Questions** generated from the summary.
51
+ - A **Concept Map** that visually links the summary and questions.
52
+
53
+ You can also use one of the pre-defined example texts by selecting from the examples section.
54
 
55
  ## Model and Pipeline Choices
56
 
57
+ - **Summarization Model**:
58
+ The model used for text summarization is `csebuetnlp/mT5_multilingual_XLSum`. This model is based on the mT5 (multilingual T5) architecture, specifically fine-tuned for summarization tasks in multiple languages, including both Arabic and English. The mT5 model is capable of handling multilingual input and producing high-quality summaries.
59
 
60
+ - **Question Generation Model**:
61
+ We use the `valhalla/t5-small-e2e-qg` model for question generation. This model is trained to generate meaningful questions from the input text, ensuring that the questions are relevant and informative. It utilizes text-to-text generation for this task, which works effectively for both English and Arabic texts.
62
+
63
+ - **Concept Map Generation**:
64
+ The concept map is generated using the `graphviz` library. This library is widely used for creating visual diagrams and graphs. The concept map helps to visualize the relationship between the summary and the generated questions, making it easier for the user to understand the core ideas and key points.
65
 
66
  ## Bilingual Implementation
67
 
68
+ This project handles **both Arabic and English** text inputs, allowing for bilingual summarization, question generation, and concept map creation. Here's how it's addressed:
69
+ - **Model Choice**: The `csebuetnlp/mT5_multilingual_XLSum` model supports multiple languages, including Arabic and English. This allows the tool to generate accurate summaries in both languages.
70
+ - **Text Preprocessing**: The user is asked to select the language of the input text (Arabic or English), which ensures that the model applies the correct preprocessing steps for tokenization and summarization.
71
+ - **Question Generation**: The `valhalla/t5-small-e2e-qg` model also supports both Arabic and English, ensuring that questions generated from the summary are linguistically appropriate.
72
 
73
+ This bilingual implementation allows users to work with text in either Arabic or English seamlessly, making it suitable for a wide range of users and text types.
74
 
75
  ## Requirements
76
 
77
  To run this project locally, you need Python 3.7 or higher. You also need to install the following Python libraries:
78
 
79
 
80
+ pip install gradio
81
+ pip install transformers
82
+ pip install torch
83
+ pip install graphviz
84
+ pip install pillow
85
 
86
  ---
87
+ Gradio: Provides the interactive interface for the application.
88
+ Transformers: Hugging Face's library for pre-trained models, used for summarization and question generation.
89
+ Torch: PyTorch, the backend used for model inference.
90
+ Graphviz: Used to create and render the concept map.
91
+ Pillow: A Python Imaging Library (PIL) used to handle image output for the concept map.
92
 
93
  ## Example Usage
94
  Input:
95
+ Text: "Artificial intelligence is a branch of computer science that aims to create intelligent machines that work and react like humans."
96
+ Language: English
 
97
  Output:
98
+ Summary: "AI is a branch of computer science aimed at creating intelligent machines that mimic human behavior."
 
 
 
 
 
 
 
99
 
100
+ Questions:
101
+ "What is artificial intelligence?"
102
+ "What is the goal of artificial intelligence?"
103
+ "What is the role of computers in artificial intelligence?"
104
+ Concept Map: A visual map linking the summary to the generated questions.