Part 0 - Plotting Using Seaborn - Data Preparation

Import Preliminaries and datasets

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.pylab as plb
import warnings
warnings.filterwarnings('ignore')

test_scores = pd.read_csv("Data/Test scores.csv", parse_dates=['Test taken date'])
test_master = pd.read_csv("Data/Test master.csv")
test_participant = pd.read_csv("Data/Audience summary.csv")

We have three datasets, namely -

Test Scores Dataset

This contains scores of each particpant in the test they appeared.

test_scores.head()

	Participant identifier	Test Name	Test taken date	Track	Designation	Score
0	37MCTM	If conditional	2018-11-23	Engineering	Lead	18
1	37MCTM	Determiners and Quantifiers	2018-11-23	Engineering	Lead	28
2	37MCTM	Modals	2018-11-23	Engineering	Lead	22
3	37MCTM	Tenses	2018-11-13	Engineering	Lead	12
4	37MCTM	Pronouns	2018-11-13	Engineering	Lead	15

---

Test Master

This is about the other details associated with each test.

test_master

	Test name	No. of questions	Complexity	Marks per question
0	Articles-New	15	Easy	1
1	Tenses	15	Easy	1
2	Pronouns	15	Easy	1
3	Articles	15	Easy	1
4	Conjuctions	15	Easy	1
5	Adjective & Adverb	15	Easy	1
6	Active and passive voice	15	Medium	2
7	Puctuations	15	Medium	2
8	If conditional	15	Medium	2
9	Determiners and Quantifiers	15	Medium	2
10	Modals	15	Medium	2
11	Prepositions	15	Medium	2
12	Comprehension	10	Difficult	3
13	Confusing words	15	Difficult	3
14	Synonyms & Antonyms	15	Difficult	3
15	Vocabulary	15	Difficult	3
16	Capitalization	15	Difficult	3

---

Test Participants

This is abouth the other details associated with the pariticipants.

test_participant

	Designation	Engineering	Quality Assurance	Support
0	Associate	1400	250.0	220
1	Lead	1800	400.0	100
2	Manager	300	60.0	70
3	Consultant	200	NaN	10
4	Associate Director and above	600	5.0	32

---

We will create more metrics in the dataset provided so that it would be easy to analyse and compare across multiple factors, like -

Weekday
Week No.
Month of the test taken date
Maximum Score can be obtained
Percentage of marks obtained by the participants

test_scores['weekday_name']  = test_scores['Test taken date'].dt.weekday_name
test_scores['month']  = test_scores['Test taken date'].dt.month_name() 
test_scores['week']  = test_scores['Test taken date'].dt.week-42 # to get number from 1 
test_master['maximum_score'] = test_master['No. of questions'] * test_master['Marks per question']
test_scores = pd.merge(test_scores,test_master,left_on="Test Name", right_on="Test name", how = "left")
cols = ['Participant identifier', 'Test Name', 'Track','Designation', 'Score', 
        'weekday_name', 'month', 'week','Complexity', 'maximum_score']
test_scores = test_scores[cols]
test_scores['Percent'] = round((test_scores['Score']/test_scores['maximum_score'])*100,2)

test_scores.head()

	Participant identifier	Test Name	Track	Designation	Score	weekday_name	month	week	Complexity	maximum_score	Percent
0	37MCTM	If conditional	Engineering	Lead	18	Friday	November	5	Medium	30	60.00
1	37MCTM	Determiners and Quantifiers	Engineering	Lead	28	Friday	November	5	Medium	30	93.33
2	37MCTM	Modals	Engineering	Lead	22	Friday	November	5	Medium	30	73.33
3	37MCTM	Tenses	Engineering	Lead	12	Tuesday	November	4	Easy	15	80.00
4	37MCTM	Pronouns	Engineering	Lead	15	Tuesday	November	4	Easy	15	100.00

---

Now we are ready to visualise this data for better analysis.
The first post in the series is - Part 1 - Plotting Using Seaborn - Violin, Box and Line Plot

Import Preliminaries and datasets

Test Scores Dataset

Test Master

Test Participants

Related Posts

Break, Continue and Pass in Python

Business Days with Custom Holidays

Months between two dates

Table of Contents