Image for post
Image for post
Photo by Sven Read on Unsplash

As a follow-up to my previous post, I will be applying transfer learning to the RAVDESS Audio Dataset in hopes to improve the model’s accuracy. To review, transfer learning is a deep learning approach in which a model that has been trained on one task is used as a starting point to train a model for a similar task. In this post by DJ Sarkar, he provides a great guide in understanding transfer learning with examples.

We will first try to use the VGG-16 pretrained model as a feature extractor on our dataset, which is where we freeze the convolution blocks of a pretrained model and modify the dense layers. …

Image for post
Image for post
Image by Tengyart on Unsplash

Through all the available senses, humans can sense the emotional state of their communication partner. This emotional detection is natural for humans, but it is very difficult task for computers; although they can easily understand content based information, accessing the depth behind content is difficult and that’s what speech emotion recognition (SER) sets out to do. It is a system through which various audio speech files are classified into different emotions such as happy, sad, anger and neutral by computers. Speech emotion recognition can be used in areas such as the medical field or customer call centers. …

Classification plus using ensemble methods to achieve an overall accuracy score of ~92%

Image for post
Image for post
Photo by Markus Spiske on Unsplash

As a follow-up to my previous article (found here), here I will be demonstrating the steps I took to build a classification model using UCI’s Heart Disease Dataset as well as utilizing ensemble methods to achieve a better accuracy score.

By creating a suitable machine learning algorithm which can classify heart disease more accurately would be highly beneficial to health organizations as well as for patients.

Let’s get started!

First I imported the necessary libraries and read in the cleaned .csv file:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from collections import Counter
from sklearn.preprocessing import StandardScaler
# data splitting
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
# data modeling
from sklearn.metrics import confusion_matrix,accuracy_score,roc_curve,roc_auc_score,classification_report,f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from xgboost import XGBClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from mlxtend.classifier import StackingCVClassifier
import xgboost as xgb
import itertools
from sklearn.dummy import DummyClassifier
from sklearn import…

Image for post
Image for post
Photo by Christina @ on Unsplash

With the rapid growth of data, the demand for data scientists grows as well. According to Smith Hanley Associates, data scientists are being sought for positions in a variety of fields such as healthcare, pharmaceuticals, retail, and other industries. This is great news for those interested in becoming data scientists, especially for those whose jobs were affected by COVID-19, having a secure job is important.

Exploratory data analysis on UCI’s Heart Disease Dataset

Image for post
Image for post
Photo by Bill Oxford on Unsplash

Cardiovascular disease or heart disease is the leading cause of death amongst women and men and amongst most racial/ethnic groups in the United States. Heart disease describes a range of conditions that affect your heart. Diseases under the heart disease umbrella include blood vessel diseases, such as coronary artery disease. From the CDC, roughly every 1 in 4 deaths each year are due to heart disease. The WHO states that human life style is the main reason behind this heart problem. …

Demonstrating the efficiency of pmdarima’s auto_arima() function compared to implementing a traditional ARIMA model.

Image for post
Image for post
Photo by Aleksei Zaitcev on Unsplash

What is Time-Series Analysis?

One of the key concepts in data science is time-series analysis which involves the process of using a statistical model to predict future values of a time series (i.e. financial prices, weather, COVID-19 positive cases/deaths) based on past results. Some components that might be seen in a time-series analysis are:

  1. Trend : Shows a general direction of time series data over a period of time — trends can be increasing (upward), decreasing (downward), or horizontal (stationary).
  2. Seasonality : This component exhibits a trend that repeats with respect to timing, magnitude, and direction — such as the increase in ice cream sales during the summer months or increase in subway riders during colder months. …

Differences between findall(), match(), and search() functions in Python’s built-in Regular Expression module.

Image for post
Image for post
Photo by Abigail Lynn on Unsplash

Regular Expressions, also known as Regex, comes in handy in a multitude of text processing scenarios. You can search for patterns of numbers, letters, punctuation, and even whitespace. Regex is fast and helps avoid unnecessary loops in your program to match and extract desired information. Until recently I felt that Regex was very complicated, the syntax looks frustrating and thought that I would not be able to learn about it. As with many others, we share this same feeling.

Explaining the basics of Python objects and classes using examples

Image for post
Image for post
Photo by Kevin Canlas on Unsplash

Python is an object oriented programming language, which focuses on dividing a program into objects, whereas procedure oriented programming focuses on dividing a program into functions. Objects are simply a collection of attributes (variables) and methods (functions) that act on those data and a class is a blueprint for that object. In this article by Vipul J, he does a great job explaining how Python classes can be thought of as blueprints of a house, and objects can be thought of as a particular instance of that house (there can be multiple objects for one class, while they all may differ in number of bedrooms/bathrooms/etc., …

Follow up to “Topic Modeling and Sentiment Analysis on Amazon Alexa Reviews” analyzing and comparing different Echo models.

Image for post
Image for post
Photo by Hello I’m Nik 🎞 on Unsplash

In my previous article found here, I provided a step-by-step guide on how to perform topic modeling and sentiment analysis using VADER on Amazon Alexa reviews. From my analysis I realized that there were multiple Alexa devices, which I should’ve analyzed from the beginning to compare devices, and see how the negative and positive feedback differ amongst models, insight that is more specific and would be more beneficial to Amazon (*insert embarrassed face here*). …

Attempting to break down hypothesis testing using examples and Python’s SciPy library.

Image for post
Image for post
Photo by Scott Graham on Unsplash

In statistics and data analysis, hypothesis testing is very important because when we perform experiments, we typically do not have access to all members of a population so we take samples of measurements to make inferences about the population. These inferences are hypotheses. In essence, a statistical hypothesis test is a method for testing a hypothesis about a parameter in a population using data measured in a sample.

In this article, I will be reviewing the steps in hypothesis testing, define key terminology and use examples to show the different types of hypothesis tests.

Regardless of the type of statistical hypothesis test you are performing, there are five main steps to executing…


Muriel Kosaka

Data Scientist | ML Enthusiast | MA Psychology Student. LinkedIn-

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store