Part 0 - Plotting Using Seaborn - Data Preparation


Import Preliminaries and datasets

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.pylab as plb
import warnings
warnings.filterwarnings('ignore')

test_scores = pd.read_csv("Data/Test scores.csv", parse_dates=['Test taken date'])
test_master = pd.read_csv("Data/Test master.csv")
test_participant = pd.read_csv("Data/Audience summary.csv")

We have three datasets, namely -

Test Scores Dataset

This contains scores of each particpant in the test they appeared.

test_scores.head()
Participant identifier Test Name Test taken date Track Designation Score
0 37MCTM If conditional 2018-11-23 Engineering Lead 18
1 37MCTM Determiners and Quantifiers 2018-11-23 Engineering Lead 28
2 37MCTM Modals 2018-11-23 Engineering Lead 22
3 37MCTM Tenses 2018-11-13 Engineering Lead 12
4 37MCTM Pronouns 2018-11-13 Engineering Lead 15
---

Test Master

This is about the other details associated with each test.

test_master
Test name No. of questions Complexity Marks per question
0 Articles-New 15 Easy 1
1 Tenses 15 Easy 1
2 Pronouns 15 Easy 1
3 Articles 15 Easy 1
4 Conjuctions 15 Easy 1
5 Adjective & Adverb 15 Easy 1
6 Active and passive voice 15 Medium 2
7 Puctuations 15 Medium 2
8 If conditional 15 Medium 2
9 Determiners and Quantifiers 15 Medium 2
10 Modals 15 Medium 2
11 Prepositions 15 Medium 2
12 Comprehension 10 Difficult 3
13 Confusing words 15 Difficult 3
14 Synonyms & Antonyms 15 Difficult 3
15 Vocabulary 15 Difficult 3
16 Capitalization 15 Difficult 3
---

Test Participants

This is abouth the other details associated with the pariticipants.

test_participant
Designation Engineering Quality Assurance Support
0 Associate 1400 250.0 220
1 Lead 1800 400.0 100
2 Manager 300 60.0 70
3 Consultant 200 NaN 10
4 Associate Director and above 600 5.0 32
---

We will create more metrics in the dataset provided so that it would be easy to analyse and compare across multiple factors, like -

  • Weekday
  • Week No.
  • Month of the test taken date
  • Maximum Score can be obtained
  • Percentage of marks obtained by the participants
test_scores['weekday_name']  = test_scores['Test taken date'].dt.weekday_name
test_scores['month']  = test_scores['Test taken date'].dt.month_name() 
test_scores['week']  = test_scores['Test taken date'].dt.week-42 # to get number from 1 
test_master['maximum_score'] = test_master['No. of questions'] * test_master['Marks per question']
test_scores = pd.merge(test_scores,test_master,left_on="Test Name", right_on="Test name", how = "left")
cols = ['Participant identifier', 'Test Name', 'Track','Designation', 'Score', 
        'weekday_name', 'month', 'week','Complexity', 'maximum_score']
test_scores = test_scores[cols]
test_scores['Percent'] = round((test_scores['Score']/test_scores['maximum_score'])*100,2)
test_scores.head()
Participant identifier Test Name Track Designation Score weekday_name month week Complexity maximum_score Percent
0 37MCTM If conditional Engineering Lead 18 Friday November 5 Medium 30 60.00
1 37MCTM Determiners and Quantifiers Engineering Lead 28 Friday November 5 Medium 30 93.33
2 37MCTM Modals Engineering Lead 22 Friday November 5 Medium 30 73.33
3 37MCTM Tenses Engineering Lead 12 Tuesday November 4 Easy 15 80.00
4 37MCTM Pronouns Engineering Lead 15 Tuesday November 4 Easy 15 100.00
---

Now we are ready to visualise this data for better analysis.
The first post in the series is - Part 1 - Plotting Using Seaborn - Violin, Box and Line Plot


Related Posts

Break, Continue and Pass in Python

June 28, 2019

Read More
Business Days with Custom Holidays

May 30, 2019

Read More
Months between two dates

May 30, 2019

Read More