Applied Data Analysis

Jawwad Ahmed Farid
7 min readJan 12, 2023

What happens when you talk to data and data talks back? Spring ’23. IBA Karachi, main campus.

Real data is rarely organized in forms that it can be used. How do you become one with data?

What is this course about?

This is a course about data. About becoming one with your dataset. This could have been a course about models but in the chicken and egg question of which came first, we will go with data, not models.

Data and models have been of interest to computer scientists and mathematicians for as long as the field has existed.

How do we assess fit of our models to or on data? We use models implicitly, explicitly, and intuitively but how do we know that the results we present are credible or worthy of trust?

With the arrival of regulators, accounting, banking and actuarial professionals, expectations from presentation of our work and its usage have changed.

As data scientists it is no longer enough to present data and analysis we also need to identify and interpret insights. We also qualify our analysis, estimate likelihoods of occurrence or failure, and identify conditions under which our models will break.

There is now a need for practitioners to be sensitive to how their work is perceived, used, interpreted, and sometimes abused.

We propose that our work as data scientists and analysts only begins once we find insights, a conclusions or actionable intelligence. The most important part of our contribution is to validate, qualify and challenge our analysis at the same time.

While the mathematical foundation of the models we study in this course are trivial, their presentation and usage are not.

This course is intended as a bridge between data, data analysis, data models and reporting and dashboarding frameworks in the field of insurance, banking, accounting, and financial services regulation.

The course work is centered around three industrial strength datasets in raw form. Our objective is to document and understand our journey from data elements to insights, from insights to presentations, from presentations to stakeholders’ questions on our work.

We use Excel as our primary manipulation and analysis tool. Familiarity with Excel, statistics, data modeling and data manipulation techniques is required.

While the course builds on a great deal of statistical theory and applications, this is a practical application focused course. Understanding and comfort with statistics is a pre-requisite.

What do we teach in this course?

We focus on four key themes key in this course. The four themes look at data, data manipulation, analysis, and modeling use cases.

a) Part one of our course work focuses on understanding the business domain and problem we are trying to solve.

b) Part two focuses on statistical inference tools and their applications.

c) Part three focuses on data manipulation and analytical techniques.

d) Part four focuses on reporting, dashboarding and presentation frameworks.

The final grade of the course is based on assessment in each part. Assessment focuses on our ability to apply concepts and frameworks taught in this course to our selected datasets.

The Three problem sets

Insurance rate making

How does an insurance company decide what is the right level of premiums to charge for insurance a motorcycle or a car? This is the ratemaking question. It applies not just to auto-insurance but also to health, marine, fire and other forms of insurance product.

Deposit behavioral analysis

How long will a current account deposit stay with a bank? What is the likely maturity for a deposit when it has a prescribed maturity tenor? And when it doesn’t? Build a data and calculation model using customer account balance data that can help us answer this question.

Total Expected Credit Loss — TECL

IFRS 9 is a new accounting standard that requires business organization to calculate the cost of impaired credit exposure using historical data. Build a credit default and credit loss model to estimate TECL for a given credit portfolio.

For all three problem sets we start work with a master dataset that includes both anonymized information to answer the proposed question in a fashion acceptable to our board members as well as regulators.

Our proposed answers also look at ancillary questions and challenges to our results likely to be raised by our audience and stakeholders.

Who is this course designed for?

This course has a heavy reading and assignment load, requires mandatory classroom attendance, independent thought, and real individual creative contribution (read: work). While there is room for collaborative learning and cooperative teamwork, final assessment is at an individual level.

This is not an easy course to pass or score. There is no minimum grade but there are minimum expected performance and commitment benchmarks.

Project Design

In lieu of a final exam the course is structured around an individual project and its multiple submissions. The submissions are focused on specific dimensions of a problem using one of the three datasets. One condition is your work must be original, visible and viewable in the public domain without a pay wall.

All projects are individual projects and are a requirement to pass the course.

Assessment and grading.

The grading criteria of the course specifies how the student output is assessed. Almost all the work done in the course leads up to a final project submission that is created by an individual student.

Prior to that submission the work done using assignments and class projects goes into making their submission more relevant and effective. Assignments help students get to a better final product.

In the beginning as students get comfortable with their idea and the tools introduced by the course, the quality of submissions is low. As they spend more time with their concept and validate it their understanding and their ability to rise to the expectation of instructor improves. Assignments tend to be graded poorly at the start and improve towards the end of the term. The final project is graded for effort, delivery, relevance, contribution and effectiveness.

a) The quality of student submission. Here quality is determined by your ability to apply the tools and frameworks presented in the course to your idea. Submissions that do not clearly show such applications will be graded accordingly. Submissions are judged on relevance, appeal, and real-world impact on target audience.

b) Class participation — equated to the quality and depth of your contribution to the learning process and not just quantity of meaningless comments. Quality is determined by a comment, question or feedback that improves the learning environment of the class, allows your peers to better understand a point or encourages the instructor to explain the concept in more relevant and applicable terms.

c) A presentation and a white paper for a remote audience of judges and peers. The white paper documents your work, provide an overview of your contribution, insights and recommendations and has action items for future work.

d) The final grade is determined by the ability of students to bring all this together in one easy to understand and work with document.

Submission deadlines and grade contribution

Your course grade will be determined by the following components and associated weights.

a) 10%. Class Participation

b) 20%. In class submissions

c) 20%. Assignments

d) 50%. 3 x forecasting applications submissions

Please note that while your final project represents 50% of your grade, your grade allocation and distribution for the final project is linked to multiple submissions distributed through the duration of the course. If you miss a submission deadline you will lose the points associated with that deadline.

Prescribed Readings

  1. Information Dashboard Design: The Effective Visual Communication of Data, Stephen Few.
  2. Basic Ratemaking, 5th Edition. Werner, Modlin, Willis Tower Watson.
  3. https://www.casact.org/sites/default/files/old/studynotes_werner_modlin_ratemaking.pdf
  4. Something Old, Something New in classification Ratemaking. https://www.casact.org/sites/default/files/2021-02/pubs_forum_99wforum_wf99031.pdf
  5. Credibility Theory, GLM and Auto Ratemaking. https://www.institutdesactuaires.com/docs/mem/090520123b7c578732a63f686535dcaa.pdf
  6. Better Excel Charts, Jawwad Farid. To be distributed electronically.

Lecture One. The 5 lenses. The BTC Price forecast case

We introduce the 5 different lenses of analysis that we can use, do not do’s when presenting charts, plots and data and review the Bitcoin price forecast case.

The core theme of this lecture is that model parameters slip and while R-square and p-values have their uses the true test of model fit is to actually predict the values and see good or model fit is.

Episode One. 5 lenses. 37 minutes

Episode Two. Calibrating regression results for better fit using Excel Solver. 10 minutes

Episode Three. Wrap up and Q&A. 5 minutes

Additional readings and references

--

--

Jawwad Ahmed Farid
Jawwad Ahmed Farid

Written by Jawwad Ahmed Farid

Serial has been. 5 books. 6 startups. 1 exit. Professor of Practice, IBA, Karachi. Fellow Society of Actuaries. https://financetrainingcourse.com/education/

No responses yet