ziadmostafa commited on
Commit
3771b6c
·
1 Parent(s): 1f15f83

first commit

Browse files
.gitignore ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python-related
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ env/
8
+ build/
9
+ develop-eggs/
10
+ dist/
11
+ downloads/
12
+ eggs/
13
+ .eggs/
14
+ lib/
15
+ lib64/
16
+ parts/
17
+ sdist/
18
+ var/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Jupyter Notebook
24
+ .ipynb_checkpoints
25
+
26
+ # Virtual Environment
27
+ venv/
28
+ ENV/
29
+ env/
30
+
31
+ # IDE-related
32
+ .idea/
33
+ .vscode/
34
+ *.swp
35
+ *.swo
36
+
37
+ # OS-related
38
+ .DS_Store
39
+ Thumbs.db
40
+
41
+ # Large data files
42
+ # Uncomment if you don't want to include the dataset
43
+ # *.csv
44
+
45
+ # Logs
46
+ logs/
47
+ *.log
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: Road Accidents Severity Analysis
3
- emoji: 🏢
4
  colorFrom: pink
5
  colorTo: purple
6
  sdk: streamlit
@@ -10,4 +10,92 @@ pinned: false
10
  short_description: RTA
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Road Accidents Severity Analysis
3
+ emoji: 🚗
4
  colorFrom: pink
5
  colorTo: purple
6
  sdk: streamlit
 
10
  short_description: RTA
11
  ---
12
 
13
+ # Road Accidents Severity Analysis
14
+
15
+ ![Road Safety](https://img.shields.io/badge/Road-Safety-red)
16
+ ![Data Science](https://img.shields.io/badge/Data-Science-blue)
17
+ ![Machine Learning](https://img.shields.io/badge/Machine-Learning-green)
18
+
19
+ ## 📝 Project Description
20
+
21
+ This project analyzes road traffic accident (RTA) data to identify patterns and factors that contribute to accident severity. Using machine learning models, we predict the severity of accidents based on various factors such as driver characteristics, vehicle conditions, road features, and environmental conditions.
22
+
23
+ The insights from this analysis can help:
24
+ - Identify high-risk scenarios for road accidents
25
+ - Recommend preventive measures to reduce accident severity
26
+ - Support traffic management and road safety policies
27
+ - Raise awareness about factors contributing to severe accidents
28
+
29
+ ## 🔍 Dataset
30
+
31
+ The dataset contains over 12,000 records of road traffic accidents with 32+ features including:
32
+ - Driver information (age, gender, experience, education)
33
+ - Vehicle details (type, service years, defects)
34
+ - Road conditions and features
35
+ - Environmental factors (weather, light conditions)
36
+ - Accident details (collision type, vehicles involved, casualties)
37
+ - Accident severity (target variable)
38
+
39
+ ## 🚀 Features
40
+
41
+ - **Comprehensive Data Analysis**: Explore patterns and relationships in road accident data
42
+ - **Interactive Visualizations**: 8+ interactive charts to understand accident factors
43
+ - **Predictive Modeling**: Machine learning models to predict accident severity
44
+ - **User-friendly Interface**: Input accident details to get severity predictions
45
+ - **Feature Importance Analysis**: Understand which factors most influence accident severity
46
+
47
+
48
+ ## 🛠️ Installation & Setup
49
+
50
+ 1. Clone the repository:
51
+
52
+ 2. Install dependencies:
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ 3. Run the Jupyter notebook to train models:
58
+ ```bash
59
+ jupyter notebook Road_Accidents_Severity_Analysis.ipynb
60
+ ```
61
+
62
+ 4. Launch the Streamlit app:
63
+ ```bash
64
+ streamlit run app.py
65
+ ```
66
+
67
+
68
+ ## 🔧 Technologies Used
69
+
70
+ - **Data Processing**: Pandas, NumPy
71
+ - **Visualization**: Plotly, Cufflinks
72
+ - **Machine Learning**: Scikit-learn
73
+ - **Web Application**: Streamlit
74
+ - **Other Tools**: Jupyter Notebook, Python
75
+
76
+ ## 📁 Project Structure
77
+
78
+ ```
79
+ road-accidents-severity/
80
+ ├── Road_Accidents_Severity_Analysis.ipynb # Analysis & model training
81
+ ├── app.py # Streamlit application
82
+ ├── RTA Dataset.csv # Dataset
83
+ ├── requirements.txt # Dependencies
84
+ ├── README.md # Project documentation
85
+ ├── best_accident_severity_model.pkl # Trained model
86
+ ├── label_encoders.pkl # Saved encoders
87
+ └── scaler.pkl # Saved scaler
88
+ ```
89
+
90
+ ## 🔮 Future Improvements
91
+
92
+ - Incorporate geographic data for spatial analysis
93
+ - Implement more advanced models (e.g., XGBoost, neural networks)
94
+ - Add time series analysis to identify temporal patterns
95
+ - Develop a mobile app for on-the-go predictions
96
+ - Include more interactive features in the dashboard
97
+
98
+
99
+ ## 👥 Contributors
100
+
101
+ - [Ziad Mostafa](https://github.com/ziadmostafa1)
RTA Dataset.csv ADDED
The diff for this file is too large to render. See raw diff
 
Road_Accidents_Severity_Analysis.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
app.py ADDED
@@ -0,0 +1,505 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import numpy as np
4
+ import plotly.express as px
5
+ import plotly.graph_objects as go
6
+ import joblib
7
+ from sklearn.preprocessing import LabelEncoder, StandardScaler
8
+ from sklearn.decomposition import PCA
9
+
10
+ # Set page configuration
11
+ st.set_page_config(
12
+ page_title="Road Accidents Severity Analysis",
13
+ page_icon="🚗",
14
+ layout="wide",
15
+ initial_sidebar_state="expanded"
16
+ )
17
+
18
+ # Load the data
19
+ @st.cache_data
20
+ def load_data():
21
+ df = pd.read_csv('RTA Dataset.csv')
22
+ return df
23
+
24
+ # Load the model and preprocessing objects
25
+ @st.cache_resource
26
+ def load_model():
27
+ try:
28
+ model = joblib.load('best_accident_severity_model.pkl')
29
+ label_encoders = joblib.load('label_encoders.pkl')
30
+ scaler = joblib.load('scaler.pkl')
31
+ return model, label_encoders, scaler
32
+ except Exception as e:
33
+ st.warning(f"Model files not found or error loading: {e}")
34
+ return None, None, None
35
+
36
+ # Main function
37
+ def main():
38
+ # Add a header
39
+ st.title("🚗 Road Accidents Severity Analysis Dashboard")
40
+
41
+ # Create tabs
42
+ tab1, tab2, tab3, tab4 = st.tabs(["📊 Data Overview", "📈 Visualizations", "🔍 Feature Analysis", "🤖 Prediction"])
43
+
44
+ # Load data
45
+ df = load_data()
46
+
47
+ # Load model and preprocessing objects
48
+ model, label_encoders, scaler = load_model()
49
+
50
+ # Tab 1: Data Overview
51
+ with tab1:
52
+ st.header("Dataset Overview")
53
+ st.write(f"Dataset Shape: {df.shape}")
54
+
55
+ # Display sample data
56
+ st.subheader("Sample Data")
57
+ st.dataframe(df.head())
58
+
59
+ # Display summary statistics
60
+ st.subheader("Summary Statistics")
61
+ st.dataframe(df.describe())
62
+
63
+ # Display missing values information
64
+ st.subheader("Missing Values")
65
+ missing_values = df.isnull().sum()
66
+ missing_percentage = (missing_values / len(df)) * 100
67
+ missing_df = pd.DataFrame({
68
+ 'Missing Values': missing_values,
69
+ 'Percentage': missing_percentage
70
+ })
71
+ missing_df = missing_df[missing_df['Missing Values'] > 0].sort_values('Percentage', ascending=False)
72
+
73
+ if not missing_df.empty:
74
+ st.dataframe(missing_df)
75
+ else:
76
+ st.write("No missing values in the dataset.")
77
+
78
+ # Tab 2: Visualizations
79
+ with tab2:
80
+ st.header("Data Visualizations")
81
+
82
+ # Create two columns
83
+ col1, col2 = st.columns(2)
84
+
85
+ with col1:
86
+ # Pie chart of accident severity
87
+ st.subheader("Accident Severity Distribution")
88
+ fig1 = px.pie(df, names='Accident_severity', title='Distribution of Accident Severity',
89
+ color_discrete_sequence=px.colors.sequential.RdBu)
90
+ fig1.update_traces(textposition='inside', textinfo='percent+label')
91
+ st.plotly_chart(fig1, use_container_width=True)
92
+
93
+ # Bar chart of accident causes
94
+ st.subheader("Top Causes of Accidents")
95
+ cause_counts = df['Cause_of_accident'].value_counts().reset_index()
96
+ cause_counts.columns = ['Cause', 'Count']
97
+ cause_counts = cause_counts.sort_values('Count', ascending=False).head(10)
98
+
99
+ fig2 = px.bar(cause_counts, x='Count', y='Cause',
100
+ title='Top 10 Causes of Accidents',
101
+ orientation='h',
102
+ color='Count',
103
+ color_continuous_scale=px.colors.sequential.Viridis)
104
+ st.plotly_chart(fig2, use_container_width=True)
105
+
106
+ with col2:
107
+ # Histogram of casualties
108
+ st.subheader("Distribution of Casualties")
109
+ fig3 = px.histogram(df, x='Number_of_casualties',
110
+ title='Distribution of Number of Casualties',
111
+ nbins=30, color_discrete_sequence=['indianred'])
112
+ fig3.update_layout(bargap=0.2)
113
+ st.plotly_chart(fig3, use_container_width=True)
114
+
115
+ # Box plot of vehicles involved by severity
116
+ st.subheader("Vehicles Involved by Accident Severity")
117
+ fig4 = px.box(df, x='Accident_severity', y='Number_of_vehicles_involved',
118
+ title='Number of Vehicles Involved by Accident Severity',
119
+ color='Accident_severity', notched=True)
120
+ st.plotly_chart(fig4, use_container_width=True)
121
+
122
+ # Full width plots
123
+ st.subheader("Vehicle Types in Accidents")
124
+ vehicle_counts = df['Type_of_vehicle'].value_counts().reset_index()
125
+ vehicle_counts.columns = ['Vehicle Type', 'Count']
126
+ vehicle_counts = vehicle_counts.head(8) # Top 8 vehicle types
127
+
128
+ fig5 = px.pie(vehicle_counts, values='Count', names='Vehicle Type',
129
+ title='Distribution of Vehicle Types in Accidents',
130
+ hole=0.4, color_discrete_sequence=px.colors.sequential.Plasma_r)
131
+ fig5.update_traces(textposition='inside', textinfo='percent+label')
132
+ st.plotly_chart(fig5, use_container_width=True)
133
+
134
+ # Relationship between vehicles and casualties
135
+ st.subheader("Relationship Between Vehicles and Casualties")
136
+ fig6 = px.scatter(df, x='Number_of_vehicles_involved', y='Number_of_casualties',
137
+ color='Accident_severity', size='Number_of_casualties',
138
+ title='Relationship Between Vehicles Involved and Casualties',
139
+ opacity=0.7)
140
+ fig6.update_traces(marker=dict(line=dict(width=0.5, color='DarkSlateGrey')))
141
+ st.plotly_chart(fig6, use_container_width=True)
142
+
143
+ # Tab 3: Feature Analysis
144
+ with tab3:
145
+ st.header("Feature Analysis")
146
+
147
+ # Feature exploration
148
+ feature_col1, feature_col2 = st.columns([1, 2])
149
+
150
+ with feature_col1:
151
+ st.subheader("Feature Selection")
152
+
153
+ feature_options = df.columns.tolist()
154
+ selected_feature = st.selectbox("Select a feature to analyze:", feature_options)
155
+
156
+ if selected_feature:
157
+ if df[selected_feature].dtype in ['int64', 'float64']:
158
+ st.write(f"Statistical Summary for {selected_feature}:")
159
+ st.dataframe(df[selected_feature].describe())
160
+ else:
161
+ value_counts = df[selected_feature].value_counts().reset_index()
162
+ value_counts.columns = ['Value', 'Count']
163
+ st.write(f"Value Counts for {selected_feature}:")
164
+ st.dataframe(value_counts)
165
+
166
+ with feature_col2:
167
+ if selected_feature:
168
+ st.subheader(f"Visualization for {selected_feature}")
169
+
170
+ if df[selected_feature].dtype in ['int64', 'float64']:
171
+ # Numerical feature
172
+ fig = px.histogram(df, x=selected_feature, color='Accident_severity',
173
+ title=f'Distribution of {selected_feature} by Accident Severity',
174
+ marginal='box')
175
+ else:
176
+ # Categorical feature
177
+ cat_counts = df.groupby([selected_feature, 'Accident_severity']).size().reset_index(name='Count')
178
+ fig = px.bar(cat_counts, x=selected_feature, y='Count', color='Accident_severity',
179
+ title=f'{selected_feature} vs Accident Severity',
180
+ barmode='group')
181
+
182
+ st.plotly_chart(fig, use_container_width=True)
183
+
184
+ # Feature correlation analysis
185
+ st.subheader("Feature Correlation Analysis")
186
+
187
+ # Get only numerical columns for correlation
188
+ numeric_df = df.select_dtypes(include=['int64', 'float64'])
189
+
190
+ # Calculate and plot correlation matrix
191
+ corr_matrix = numeric_df.corr()
192
+ fig_corr = px.imshow(corr_matrix, text_auto=True, color_continuous_scale='RdBu_r',
193
+ title='Correlation Matrix of Numerical Features')
194
+ st.plotly_chart(fig_corr, use_container_width=True)
195
+
196
+ # Tab 4: Prediction
197
+ with tab4:
198
+ st.header("Accident Severity Prediction")
199
+
200
+ if model is None:
201
+ st.error("Model not loaded. Please run the notebook first to train and save the model.")
202
+ else:
203
+ st.write("Enter the details below to predict accident severity:")
204
+
205
+ # Get all required features from the model
206
+ expected_features = []
207
+ if hasattr(model, 'feature_names_in_'):
208
+ expected_features = list(model.feature_names_in_)
209
+ st.write(f"The model expects {len(expected_features)} features.")
210
+
211
+ # Create layout for input features
212
+ col1, col2, col3 = st.columns(3)
213
+
214
+ # Create a dictionary to store all input values
215
+ input_data = {}
216
+
217
+ with col1:
218
+ # Time-related inputs
219
+ time_options = ["Morning (6AM-12PM)", "Afternoon (12PM-6PM)", "Evening (6PM-12AM)", "Night (12AM-6AM)"]
220
+ selected_time = st.selectbox("Time of Day:", time_options)
221
+
222
+ # Map time selection to hour (middle of the range)
223
+ time_mapping = {
224
+ "Morning (6AM-12PM)": 9,
225
+ "Afternoon (12PM-6PM)": 15,
226
+ "Evening (6PM-12AM)": 21,
227
+ "Night (12AM-6AM)": 3
228
+ }
229
+ input_data["Hour"] = time_mapping[selected_time]
230
+
231
+ # Day of week
232
+ day_options = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
233
+ input_data["Day_of_week"] = st.selectbox("Day of Week:", day_options)
234
+
235
+ # Driver information
236
+ age_band_options = ['Under 18', '18-30', '31-50', 'Over 51']
237
+ input_data["Age_band_of_driver"] = st.selectbox("Driver Age Band:", age_band_options)
238
+
239
+ input_data["Sex_of_driver"] = st.selectbox("Driver Sex:", ['Male', 'Female'])
240
+
241
+ edu_options = ['Above high school', 'Junior high school', 'Elementary school',
242
+ 'High school', 'Writing & reading', 'Illiterate', 'Unknown']
243
+ input_data["Educational_level"] = st.selectbox("Educational Level:", edu_options)
244
+
245
+ relation_options = ['Employee', 'Owner', 'Other', 'Unknown']
246
+ input_data["Vehicle_driver_relation"] = st.selectbox("Vehicle-Driver Relation:", relation_options)
247
+
248
+ exp_options = ['No Licence', 'Below 1yr', '1-2yr', '2-5yr', '5-10yr', 'Above 10yr', 'Unknown']
249
+ input_data["Driving_experience"] = st.selectbox("Driving Experience:", exp_options)
250
+
251
+ # Pre-calculate Experience_Value for the model
252
+ experience_mapping = {
253
+ 'No Licence': 0, 'Below 1yr': 0.5, '1-2yr': 1.5,
254
+ '2-5yr': 3.5, '5-10yr': 7.5, 'Above 10yr': 15,
255
+ 'Unknown': 5 # Default value for unknown
256
+ }
257
+ input_data["Experience_Value"] = experience_mapping[input_data["Driving_experience"]]
258
+
259
+ # Pre-calculate Age_Value for the model
260
+ age_mapping = {
261
+ 'Under 18': 16, '18-30': 24, '31-50': 40, 'Over 51': 60
262
+ }
263
+ input_data["Age_Value"] = age_mapping[input_data["Age_band_of_driver"]]
264
+
265
+ with col2:
266
+ # Vehicle information
267
+ vehicle_options = ['Automobile', 'Lorry (41?100Q)', 'Lorry (11?40Q)', 'Public (12 seats)',
268
+ 'Public (13-45 seats)', 'Public (> 45 seats)', 'Motorcycle', 'Other']
269
+ input_data["Type_of_vehicle"] = st.selectbox("Vehicle Type:", vehicle_options)
270
+
271
+ owner_options = ['Owner', 'Governmental', 'Organization', 'Other']
272
+ input_data["Owner_of_vehicle"] = st.selectbox("Owner of Vehicle:", owner_options)
273
+
274
+ service_options = ['Unknown', '1-2yr', '2-5yr', '5-10yr', 'Above 10yr', 'Below 1yr']
275
+ input_data["Service_year_of_vehicle"] = st.selectbox("Service Year of Vehicle:", service_options)
276
+
277
+ defect_options = ['No defect', 'Defective tire', 'Defective break']
278
+ input_data["Defect_of_vehicle"] = st.selectbox("Vehicle Defect:", defect_options)
279
+
280
+ # Road and environment information
281
+ area_options = ['Other', 'Office areas', 'Residential areas', 'Rural village areas',
282
+ 'Church areas', 'School areas', 'Market areas', 'Hospital areas',
283
+ 'Industrial areas', 'Rural village areasOffice areas', 'Recreational areas',
284
+ 'Outside rural areas', 'Unknown', 'Rural village']
285
+ input_data["Area_accident_occured"] = st.selectbox("Area Accident Occurred:", area_options)
286
+
287
+ lanes_options = ['Two-way (divided with broken lines road marking)', 'Undivided Two way',
288
+ 'One way', 'Double carriageway (median)', 'Two-way (divided with solid lines road marking)',
289
+ 'Unknown', 'Other']
290
+ input_data["Lanes_or_Medians"] = st.selectbox("Lanes or Medians:", lanes_options)
291
+
292
+ with col3:
293
+ # More road information
294
+ road_align_options = ['Tangent road with flat terrain', 'Tangent road with mild grade and flat terrain',
295
+ 'Steep grade downward with mountainous terrain', 'Escarpments',
296
+ 'Tangent road with mountainous terrain and', 'Steep grade upward with mountainous terrain',
297
+ 'Gentle horizontal curve', 'Sharp reverse curve', 'Tangent road with rolling terrain']
298
+ input_data["Road_allignment"] = st.selectbox("Road Alignment:", road_align_options)
299
+
300
+ junction_options = ['Y Shape', 'No junction', 'Other', 'Crossing', 'O Shape', 'Unknown', 'T Shape', 'X Shape']
301
+ input_data["Types_of_Junction"] = st.selectbox("Type of Junction:", junction_options)
302
+
303
+ surface_type_options = ['Asphalt roads', 'Earth roads', 'Gravel roads', 'Other', 'Asphalt roads with some distress']
304
+ input_data["Road_surface_type"] = st.selectbox("Road Surface Type:", surface_type_options)
305
+
306
+ surface_condition_options = ['Dry', 'Wet or damp', 'Snow', 'Flood over 3cm. deep']
307
+ input_data["Road_surface_conditions"] = st.selectbox("Road Surface Conditions:", surface_condition_options)
308
+
309
+ light_options = ['Daylight', 'Darkness - lights lit', 'Darkness - no lighting', 'Darkness - lights unlit']
310
+ input_data["Light_conditions"] = st.selectbox("Light Conditions:", light_options)
311
+
312
+ weather_options = ['Normal', 'Raining', 'Cloudy', 'Other', 'Raining and Windy',
313
+ 'Fog or mist', 'Windy', 'Snow', 'Unknown']
314
+ input_data["Weather_conditions"] = st.selectbox("Weather Conditions:", weather_options)
315
+
316
+ # Additional column for more inputs
317
+ col4, col5, col6 = st.columns(3)
318
+
319
+ with col4:
320
+ # Collision and vehicle information
321
+ collision_options = ['Vehicle with vehicle collision', 'Collision with roadside objects',
322
+ 'Collision with pedestrians', 'Rollover', 'Collision with animals',
323
+ 'Collision with roadside-parked vehicles', 'Fall from vehicles',
324
+ 'Other', 'Unknown', 'With Train']
325
+ input_data["Type_of_collision"] = st.selectbox("Type of Collision:", collision_options)
326
+
327
+ input_data["Number_of_vehicles_involved"] = st.number_input("Number of Vehicles Involved:",
328
+ min_value=1, max_value=10, value=2)
329
+
330
+ input_data["Number_of_casualties"] = st.number_input("Number of Casualties:",
331
+ min_value=1, max_value=10, value=1)
332
+
333
+ movement_options = ['Going straight', 'Moving Backward', 'U-Turn', 'Other', 'Reversing',
334
+ 'Parked', 'Waiting to go', 'Getting off', 'Overtaking', 'Unknown',
335
+ 'Stopping', 'Changing lane to the right', 'Changing lane to the left']
336
+ input_data["Vehicle_movement"] = st.selectbox("Vehicle Movement:", movement_options)
337
+
338
+ with col5:
339
+ # Casualty information
340
+ casualty_class_options = ['Driver or rider', 'na', 'Pedestrian', 'Passenger']
341
+ input_data["Casualty_class"] = st.selectbox("Casualty Class:", casualty_class_options)
342
+
343
+ sex_casualty_options = ['Male', 'na', 'Female']
344
+ input_data["Sex_of_casualty"] = st.selectbox("Sex of Casualty:", sex_casualty_options)
345
+
346
+ age_casualty_options = ['na', '18-30', '31-50', 'Over 51', 'Under 18', '5']
347
+ input_data["Age_band_of_casualty"] = st.selectbox("Age Band of Casualty:", age_casualty_options)
348
+
349
+ casualty_severity_options = ['3', 'na', '2', '1']
350
+ input_data["Casualty_severity"] = st.selectbox("Casualty Severity:", casualty_severity_options)
351
+
352
+
353
+ with col6:
354
+ # Final inputs
355
+ fitness_options = ['Normal', 'With infirmity', 'Alcohol', 'Illness', 'Asleep or Fatigued']
356
+ input_data["Fitness_of_casuality"] = st.selectbox("Fitness of Casualty:", fitness_options)
357
+
358
+ pedestrian_options = ['Not a Pedestrian',
359
+ 'Crossing from nearside - masked by parked or statioNot a Pedestrianry vehicle',
360
+ 'Unknown or other', 'In carriageway, stationary - not crossing',
361
+ 'Walking along in carriageway, back to traffic',
362
+ 'Crossing from nearside', 'Crossing from offside',
363
+ 'Walking along in carriageway, facing traffic',
364
+ 'Playing in carriageway']
365
+ input_data["Pedestrian_movement"] = st.selectbox("Pedestrian Movement:", pedestrian_options)
366
+
367
+ cause_options = ['No distancing', 'Changing lane to the right', 'Driving carelessly',
368
+ 'No priority to vehicle', 'Moving Backward', 'No priority to pedestrian',
369
+ 'Other', 'Overtaking', 'Driving under the influence of drugs',
370
+ 'Driving to the left', 'Getting off the vehicle improperly',
371
+ 'Driving at high speed', 'Overturning', 'Turnover', 'Overspeed',
372
+ 'Overloading', 'Drunk driving', 'Unknown', 'Improper parking',
373
+ 'Driving on the wrong side of the road']
374
+ input_data["Cause_of_accident"] = st.selectbox("Cause of Accident:", cause_options)
375
+
376
+ work_casualty_options = ['Driver', 'Other', 'Unemployed', 'Employee', 'Self-employed', 'Student', 'Unknown']
377
+ input_data["Work_of_casuality"] = st.selectbox("Work of Casualty:", work_casualty_options)
378
+
379
+ # Calculate derived features needed by the model
380
+ # Casualty to vehicle ratio
381
+ input_data["Casualty_to_vehicle_ratio"] = input_data["Number_of_casualties"] / input_data["Number_of_vehicles_involved"]
382
+
383
+ # Driver Risk Score
384
+ normalized_age_risk = 1 - (input_data["Age_Value"] / 60) # Assuming 60 is max age
385
+ normalized_exp_risk = 1 - (input_data["Experience_Value"] / 15) # Assuming 15 is max experience
386
+ input_data["Driver_Risk_Score"] = (normalized_age_risk + normalized_exp_risk) / 2
387
+
388
+ # Environmental risk factors
389
+ weather_risk = {
390
+ 'Normal': 0.2, 'Raining': 0.7, 'Cloudy': 0.4, 'Windy': 0.5,
391
+ 'Snow': 0.8, 'Fog or mist': 0.9, 'Raining and Windy': 0.8,
392
+ 'Other': 0.5, 'Unknown': 0.5
393
+ }
394
+
395
+ light_risk = {
396
+ 'Daylight': 0.2, 'Darkness - lights lit': 0.5,
397
+ 'Darkness - no lighting': 0.9, 'Darkness - lights unlit': 0.8
398
+ }
399
+
400
+ input_data["Weather_Risk"] = weather_risk.get(input_data["Weather_conditions"], 0.5)
401
+ input_data["Light_Risk"] = light_risk.get(input_data["Light_conditions"], 0.5)
402
+ input_data["Environmental_Risk"] = (input_data["Weather_Risk"] + input_data["Light_Risk"]) / 2
403
+
404
+ # Add Is_Weekend (assuming Python's datetime conventions where Monday is 0 and Sunday is 6)
405
+ day_to_num = {
406
+ 'Monday': 0, 'Tuesday': 1, 'Wednesday': 2, 'Thursday': 3,
407
+ 'Friday': 4, 'Saturday': 5, 'Sunday': 6
408
+ }
409
+ day_num = day_to_num.get(input_data["Day_of_week"], 0)
410
+ input_data["Is_Weekend"] = 1 if day_num >= 5 else 0
411
+
412
+ # Add Is_Night
413
+ input_data["Is_Night"] = 1 if (input_data["Hour"] >= 18 or input_data["Hour"] < 6) else 0
414
+
415
+ # Check for missing expected features and add defaults
416
+ for feature in expected_features:
417
+ if feature not in input_data:
418
+ if feature.startswith("Number_"):
419
+ input_data[feature] = 0 # Default value for numerical features
420
+ else:
421
+ input_data[feature] = "Unknown" # Default value for categorical features
422
+
423
+ if st.button("Predict Accident Severity"):
424
+ try:
425
+ # Create a DataFrame from the input data with matching features
426
+ input_df = pd.DataFrame([input_data])
427
+
428
+ # Filter to include only expected features in the right order
429
+ if expected_features:
430
+ # Create a DataFrame with the same structure as what the model expects
431
+ input_df_filtered = pd.DataFrame(columns=expected_features)
432
+
433
+ # Fill in values from our input data
434
+ for feature in expected_features:
435
+ if feature in input_data:
436
+ input_df_filtered[feature] = [input_data[feature]]
437
+ else:
438
+ # Use a default value
439
+ input_df_filtered[feature] = [0]
440
+
441
+ # Use the filtered DataFrame for prediction
442
+ input_df = input_df_filtered
443
+
444
+ # Encode categorical features with silent error handling
445
+ for col in input_df.columns:
446
+ if col in label_encoders and isinstance(input_df[col].iloc[0], str):
447
+ try:
448
+ le = label_encoders[col]
449
+ # Check if the value exists in the label encoder
450
+ if input_df[col].iloc[0] in le.classes_:
451
+ input_df[col] = le.transform(input_df[col])
452
+ else:
453
+ # Silently handle unknown values
454
+ most_common_class = le.classes_[0]
455
+ input_df[col] = le.transform([most_common_class])
456
+ except Exception as e:
457
+ # Use a fallback value silently
458
+ input_df[col] = 0
459
+
460
+ # Make prediction
461
+ prediction = model.predict(input_df)[0]
462
+
463
+ # Map prediction back to class label
464
+ severity_mapping = {0: 'Slight Injury', 1: 'Serious Injury', 2: 'Fatal Injury'}
465
+ predicted_severity = severity_mapping.get(prediction, str(prediction))
466
+
467
+ # Show prediction with styling based on severity
468
+ severity_color = {
469
+ 'Slight Injury': 'green',
470
+ 'Serious Injury': 'orange',
471
+ 'Fatal Injury': 'red'
472
+ }
473
+
474
+ color = severity_color.get(predicted_severity, 'blue')
475
+
476
+ st.markdown(f"""
477
+ <div style="background-color: {color}; padding: 20px; border-radius: 10px; text-align: center;">
478
+ <h2 style="color: white;">Predicted Accident Severity</h2>
479
+ <h1 style="color: white;">{predicted_severity}</h1>
480
+ </div>
481
+ """, unsafe_allow_html=True)
482
+
483
+ # Show prediction probability if available
484
+ if hasattr(model, 'predict_proba'):
485
+ try:
486
+ probabilities = model.predict_proba(input_df)[0]
487
+ st.subheader("Prediction Confidence")
488
+
489
+ proba_df = pd.DataFrame({
490
+ 'Severity': list(severity_mapping.values())[:len(probabilities)],
491
+ 'Probability': probabilities
492
+ })
493
+
494
+ fig = px.bar(proba_df, x='Severity', y='Probability',
495
+ color='Severity', color_discrete_map=severity_color)
496
+ st.plotly_chart(fig, use_container_width=True)
497
+ except Exception as e:
498
+ st.error(f"Error displaying probabilities: {e}")
499
+
500
+ except Exception as e:
501
+ st.error(f"Prediction error: {e}")
502
+
503
+ # Run the app
504
+ if __name__ == "__main__":
505
+ main()
best_accident_severity_model.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea1e3aeb07c95a9d32d9b339509516d01b15d3509af26ca1fd4b1205d7998f36
3
+ size 3025761
huggingface-metadata.yml ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ sdk_version: 3.0.0
2
+ app_file: app.py
3
+ pinned: false
label_encoders.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ea390bb333b7f2f616f31fcf67401b5350abd11e086f93b8e5bbde6ed493e0e
3
+ size 30581
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ streamlit==1.27.0
2
+ pandas==1.5.3
3
+ numpy==1.24.3
4
+ plotly==5.15.0
5
+ joblib==1.2.0
6
+ scikit-learn==1.2.2
7
+ cufflinks==0.17.3
8
+ matplotlib==3.7.2
9
+ scipy==1.10.1
scaler.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:560f303d719d65a8a8505ddb2d83814fa57acbc4f082b8ba22e0e0548a9206f8
3
+ size 2727