Yaman Music Classification

Deep learning audio classification.

Hibatillah's PFP
Yemeni Music Classification

AI for Cultural Preservation

Traditional music is a vital part of Yemen's cultural identity, but much of it remains undocumented and unorganized. Manual classification is slow and requires expert knowledge.

This project was built to solve this. It is an automated tool that uses Deep Learning to listen to audio files and categorize them into three distinct regional styles: Adeni, Hadrami, and Lahji.

Architecture

This project required a seamless handshake between a modern JavaScript frontend and a powerful Python backend. I architected a Monorepo setup to handle both concurrently.

The Engineering Challenge

1. From Audio to Image (Preprocessing)

Machines cannot hear music, but they can see it. The core of our system relies on converting raw audio waves into Mel-Spectrograms, visual representations of sound frequencies over time.

We built the Python pipeline that:

  1. Accepts the user's uploaded file.

  2. Normalizes the audio duration.

  3. Generates the Spectrogram image.

  4. Feeds it into the CNN model for prediction.

2. Improving Accuracy

Initially, our model struggled with a 60% accuracy rate due to a small dataset. Working closely with my teammate (who focused on the model architecture), we implemented Data Augmentation techniques. By adding noise, changing pitch, and stretching time in our training data, we made the model robust against poor recording quality.

This iterative tuning pushed our final accuracy to 93.84%.

User Experience

While the backend was complex, I ensured the frontend remained simple. Users don't need to know about Spectrograms or CNNs. They simply upload a song and get an instant result.