Music Genre Prediction

Aditi Sharma
Dec 20, 2020
3 min read

”Music is a moral law. It gives soul to the universe, wings to the mind, ﬂight to the imagination, and charm and gaiety to life and to everything” - Plato

The words of Plato rightly describe the importance of music in the world. As the ﬁeld of music is evolving, huge number of songs are being published every day. The bookkeeping of such huge amount of data requires intense manual effort. The aim of this project is to automate the process of classifying this data so as to ease its organizing process.

This study explores the application of machine learning algorithms to identify and classify the genre of a given audio file.

Each song is divided into chunks of 5 seconds, 10 seconds and 20 seconds. Twenty Nine features are extracted from each chunk separately. These features are then fed to machine learning models namely a logistic regression model, a support-vector machine, a k-nearest neighbor (k-NN) model, a decision tree, an ensemble of the above classifiers and a neural network, which are trained to classify the genres: Rock, Pop, RnB, Blues and Hip-Hop.

The performance is evaluated using 5-split cross validation. We compare the proposed models on different length of data samples and also study the relative importance of different features and the relation of length of the sample to the relevant information it carries, which may help in better performance.

Dataset

The choice of dataset plays an important role in any machine learning project.For this project, we manually downloaded 250 songs from genres- Pop, Hip-Hop, RnB, Blues and Rock, fifty songs of each genre. Each song is of at least 1 minute. We use these to form training dataset. We also downloaded twenty five songs for making testing dataset. For each of these, we then split each song into 5, 10 and 20 second chunks using a python library pydub. The motivation behind this is to gather as much information possible from each song. We want to analyze the effect of segmenting the song into smaller chunks, and thereby enhancing the local regions of the song, on the classification performance. Also, the use of smaller song excerpts makes the dataset much easier to acquire than if more number of full tracks had to be downloaded.

Feature Extraction

In order to represent the tracks numerically, twenty nine audio features were extracted from each track after careful analysis of importance of audio features and the selecting the best features.Features can be broadly classified as time domain and frequency domain features. The feature extraction was done using a python library libROSA. Following features were extracted:

Zero Crossing Rate
Root Mean Square Energy
Tempo
Mel-Frequency Cepstral Coefficients
Chroma Features
Spectral Centroid
Spectral Roll-off

Following are the TSNE plots of the dataset of samples of length 5 seconds, 10 seconds and 20 seconds respectively.

As seen in the plots, the 5 classes(genres) are separable to a good extend. There is some overlap between classes which is due to the fact that a song may have elements of several genres, for example, a song may belong to pop and blues or pop and hip-hop and so on.

Models Applied

Six classifiers are created - a logistic regression classifier, a support-vector machine, a k-nearest neighbor (k-NN) classifier, a decision tree, an ensemble of the above classifiers and a neural network composed of several dense layers. Training and testing performance is evaluated using 5-fold cross validation. We applied the following neural network architecture along with relu as the activation function:

Results

The table given below lists out the accuracy for the different models used using 5-fold cross validation. For neural network, we use the separately formed test dataset for measurement of its performance.

Conclusion

According to our estimations as given above, SVM and Ensemble classifier have best performances which are comparable to each other and both perform far better than other classifiers and the neural network architecture we employed.

Future Work

Collection of more data which may improve performance of used models.
Expanding the dataset to include more genres and sub genres which would increase the capacity to classify greater varieties of songs.
Analyze performance of different architectures of neural network on the data.

References

M. A. Ali and Z. A. Siddiqui. Automatic music gen-res classification using machine learning. Int. J. Adv. Comput. Sci. Appl, 8(8):337–344, 2017.
H. Bahuleyan. Music genre classification using machine learning techniques. 04 2018.
B. Lansdown. Machine Learning for Music Genre Classification. PhD thesis, 09 2019.
C. N. Silla Jr, C. A. Kaestner, and A. L. Koerich. Automatic music genre classification using ensemble of classifiers. In2007 IEEE International Conference on Systems, Man and Cybernetics, pages 1687–1692.IEEE, 2007

Project Authors

Aditi Sharma Shivani Mishra

Shivani and I spent hours on Google Meet, bouncing ideas off each other, working out the nitty gritty of the project.

We want to thank our respected professor Dr. Tanmoy Chakraborty and our mentor Shiv Kumar Gehlot Sir for their support.

Comentarios