Chapter 5 Results

5.1 Trend of average danceability over the years

The Spotify dataset includes data from various years - 1955 to 2020. With the changing time, it is expected that the taste of music of the generation will change too. In the mission to determine the trend of music over the years and to understand if it’s evolution has led in the production of ‘more’ danceable songs or less danceable songs, a following line graph has been plotted. The graph depicts the unusual rise and fall of danceable songs produced in each year. Geneally, the trend has been increasing and it is safe to conclude that with the passing year, the songs produced have a higher danceability factor.

Inference -

To compare the danceability of the two eras - 19th and 20th, it can be said that the 20th century produced songs constantly in the danceability range between 0.5 and 0.6. Whereas, in the 19th century, the distribution has been relatively uneven. The maximum is seen to rise beyond 0.8 and the minimum is below 0.4. Maybe the music producers have learned from this trend and are producing musics within a sweet spot range of 0.5 - 0.6.

Higest average danceability achieved - 0.816 in the year 1962

Lowest average danceability achieved - 0.317 in the year 1960

5.2 Distribution of danceability for different genres

The plot conveys the distribution of danceability for different genres. It can be seen that the average danceability is largest for rap as seens by the highest point on the line. Overall, there are several outliers in the lower range of the danceability. Apart from the rock genre, all other genres have relatively high danceability values.

5.3 Distribution of various genres produced by Top 6 artists

To analyse the genres produced by the top 5 artists we have plotted a stacked bar chart to visualise the results. The plot can help us identify key factors like the genre preferences of an artist, the type of songs they have been producing, etc.

Important information to artists/producers -

It can also benefit the artist by understanding the competition in the music industry. They might want to study the genres produced by various artists and produce a one that is less common.

Inference -

From the plot, we can see that the high producers of the edm genre are David Guetta, Martin Garrix, and The Chainsmokers. One can say that Don Oman produces most songs of Latin genre as compared to other genres. He is also the leading producer of this type of genre when compared with other top artists. Similarly, Queen produces rock type of music the most as compared to any other genres. They are specialised in the rock genre. Drake on the other hand has produced music of all genres in a somewhat similar proportion (higher preferences to rap and r&b). There seems to be direct competition between David Guetta and The Chainsmokers as they produce a competitive number of songs belonging to the genres edm and pop.

Leaders in each genres -

Edm - Martin Garrix Latin - Don Omar Pop - The Chainsmokers/David Guetta r&b - Drake Rap - Drake Rock - Queen

5.4 Trend of average duration of various genres over the years

To understand the trend of various genres over time it is imperative we capture their average duration of being played in a given year.

Important information to artists/producers -

The trend can convey important information to the artists and the producers of the song. It can tell if an artist should be working hard to produce a song of a particular genre or not. For example, as seen in the trend of the edm genre, it’s average duration of listening is tanking. It can be an indication that the audience prefers other genres over pop. An artist might therefore double question on the choice of a genre before finalizing one.

Inference-

Several inferences can be drawn from the plot above -

Latin - The trend followed by Latin is of a hill like, clinching one time high value of average duration of over 350000 in 1980s. The following years have been similar to that of the pop genre (with more drops). On a broader view, the trend has been on a fall with few sharp ups and downs but maintaining the range between 200000 ms and 250000 ms. Again, it is one of the safe genres to produce. Despite a falling trend in recent years, it has been consistently over the 200000 ms mark.

r&b - The trend for r&b has been on a rise until 2010. Post 2010 captures a sudden drop in the average duration level and it continues to fall as the year progresses. However, between 1970 and 2010, the trend has been consistent and impressive, portraying an average duration of genre as high as 270000 ms. The consistent trend makes it the go to genre for producing songs for any artists/producers.

Rock - The average duration for rock has been consistent throughout the years (after 1970). The genre has successfully maintained its average duration range between 220000 to 270000. The highest was achieved at over 300000 in the year 1970. Rock is a great genre choice for artists as it yields considerable interests in the listeners as is reflected in the plot.

Rap - Similar to the trend of pop, the rap genre was quite popular until 2010 and it’s average duration has been on a decline ever since. The rap genre managed to exceed the average duration of 400000 ms, second after the edm genre. The trend however, doesn’t bode well for the rap genre as it continues to sink to it’s all time low duration of under 200000 ms. To conclude, a versatile artist might be a little sceptical about producing the rap genre and would prefer other genres belonging to his repertoire.

edm - The trend of average duration played per year is not consistent with irregular rise and drop at different intervals. It can be interpreted that the demand of edm was the most between the year 1995 and 2010.This can be deemed as a successful era for the edm artists as the average duration in ms has crossed 300000 multiple times and achieved a high of over 450000 ms. Although the graph indicates a dip in the later half of the 2010, the duration of edm still surpasses other genres. To conclude, from the trend it can be attested that the emd genre is the safest to produce and the artists can benefit from it.

Pop - The trend of average duration played per year can be split in two trends - 19th century and 20th century. It can be seen that the overall demand for pop was greater in the 19th century with maximum average duration crossing 300000 ms and lowering at 150000 ms. The demand for the 20th century has been pretty consistent with the curve dipping after 2014 indicating lower demand. The maximum average duration during this century touches 250000 ms and tanks the least at 200000 ms. To conclude, the artists can safely produce more songs of the pop genre considering it’s trend post year 2000 is pretty consistent.

5.5 Checking coorelation of speechiness and instrumentalness for different genres

Before interpreting the above plot, it is important to understand the meaning of the two words - speechiness and instrumentalness.

As described by tidytuesday -

Instrumentalness conveys whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Spoken words or Rap are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.

Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

A normal assumption around any music is that if the song contains high number of words, it’s instrumentalness is bound to be low. However, for various genres, it is seen that the relation between speehiness and instrumentalness is a line of slope nearly 0. For rap, a different trend can be seen. A plot with negative line slope indicates a negative coorelation, i.e, with increase in the value of speechiness, the instrumentalness decreases.
#Distribution of enery and liveness for different genres

To see the distribution of energy and liveness for various genres, a scatter plot is plotted. The plot can help us understand the space at which the data is cluttered the most and the spread of the data.

These plots can be inaluable when we want to understand what genre yeilds maximum liveness or energy. This is especially important for music festivals/clubs where the songs to be played are of high energy and liveness.

The following inferences can be drawn out of this plot -

For edm, the range of liveness is between 0 and 0.3 and the range of energy is between 0.6 and 1 For latin, the range of liveness is between 0.05 and 0.2 and the range of energy is between 0.47 and 0.8 For pop, the range of liveness is between 0.04 and 0.2 and the range of energy is between 0.6 and 0.9 For r&b, the range of liveness is between 0.03 and 0.17 and the range of energy is between 0.2 and 0.86 For rap, the range of liveness is between 0.04 and 0.2 and the range of energy is between 0.3 and 0.9 For rock, the range of liveness is between 0.02 and 0.25 and the range of energy is between 0.4 and 0.95

5.6 Distribution of Top 5 songs based on various variables

It’s always exciting to listen to a trendy song. But how many of us have identified the parameters/qualities that these songs entail. More often than not, we fail to identify the various qualities these songs convey. It was therefore an exciting task to visualise the top 5 songs and compare them on various grounds. We plotted a grouped bar chart of the top 5 songs played with factors like - acousticness, danceability, energy,and speechiness. A general trend can be seen. All the top 5 songs have low values of acousticness and speechiness and high values of energy and danceability. This tells a lot about the kind of songs that are most heard. They are more energetic and are fun to dance on. It is something that the generation prefers. If the artists happen to come across these visualisations, they will know the secret to produce the song that makes it to the top.

Inference -

In this plot, Alive and Forever have the highest value of energy equal to 0.78. While more energy might hint for higher danceability, the visualisation portrays otherwise. Breathe and Forever have the highest value of danceability equal to 0.62. Another interesting fact to notice is that all the top 5 songs have low speechiness value. These songs have low words to music - something enjoyed by the 20th generation population.

Songs with the highest song of variables -

acousticness - Stay danceability - Stay and Breathe energy - Forever and Alive speechiness - Forever

Songs with the lowest value of song variables -

acousticness - Alive danceability - Poisoan energy - Stay speechiness - Stay and Alive

5.7 Distribution of Top 3 artists, genres and duration of songs played

## NULL

To understand the distribution of the music produced by the top 3 artists and their duration in ms, a mosaic plot is plotted. The mosaic plot is split on the three basics - duration of songs produced by the top3 artists, genres of the songs, and the top 3 artists. We can break the information depicted by the mosaic plots as follows -

Inference -

Firstly, we split the data based on the duration heard. We can see that for Martin Garrix, there are more songs produced by him that have been heard over 220k times when compared to songs heard below 220k.

For Queen and Chainsmokers, songs that have been heard below 220k times is greater than songs heard beyond 220k times.

Queen has been a lead producer of songs of rock genre, Chainsmoker have been a lead producer of the pop genre, and Martin Garrix has been a lead producer of the edm genre.

This representation can guide the artists about their overall performance and help them gauge their success based on the duration of the songs heard.

5.8 Desnsity distribution of various genres with different audio features

In the graph below we used ridge regression since, we the spotify playlist genre had multiple attributes against each of the music charecteristics - Dancebility, Energy, loudness and speechiness.

Inference -

For dancebility, it can be inferred that the graphs for all the genres are nearly uniformally distributed. Listenrs of rap and latin music usually enjoy dancing to the beats whereas edm and rock music despite generating most energy amongst the listenrs is not preferred for dancing.

For energy, the distributtions are less uniform and more skewed towards the left. Rock and Edm genre usually produce high intensity music and hence listerns higher levels of energy listening to these genres. It can be seen that r&b has the most uniform grpahs showing that its music has a mix of low and high intensity songs i.e both slow, soft and loud , fast. Hence we can see such wide distribution in case of r&b.

For loudness it can be seen that all the genres produce songs of a similar amplitude. Usually loduness can be related with energy. Genres such as edm and rock are mostly composed of loud music and hence is an important characteristic in inflicting high energy amongst its listeners.

From speechiness it can be inferred that most of the listeners of this spotify dataset enjoy listening to more instrumental and less of spoken words. Since rap music is one of the most popular type of vocal music. hence its distribution is more evenly spaced out incomparision to other genres.

Using this visualisation, artists can use the trusted speechiness. loudness values which will keep the fans hooked on their songs. Understanding a perfect blend of different different song parameters beyond this graph will ensure higher dancebility and energy amongst the listernes and eventually make them and their songs popular.

5.9 Clustering of various artists on the basis of genres

Here to depict the top 15 artists in each of the 6 playlist genre, we used a Treemap. The different colors demarcate the different genres whereas the size of the boxes in the treemap corresponds to the number of tracks played by listeners of an artist. Listeners of edm music clearly show more affection towards the songs produced by Martin Garrix, Dimitri Vegas & Like Mike and Hardwell. For all the other artists there is no dominance.

Similar preferrance can be seen for Queen and Guns N Roses band in rock and Don Omar, Daddy Yankee and Wisin and Yandel in Latin. For songs that fall in the genre of Pop and r&b, the listeners have a diverse choice and no band or group of musicians are dominating. It can also be observed that a lot of artists/groups that are popular in edm have shown remarkable popularity in pop. For example- David Guetta and Calvin Harris are amongst the few artists to feature in more than 1 genree of top 15 artists.

Lastly based on the overall size, it can be seen that edm and rap cover a larger area of treemap thus indicating that these genres are most popular along the other 4 genres. While, pop and r&b show lower popularity amongst the listeners of the spotify dataset.

This visualisation can help artists keep track of other popular artists in different genres. This can also help them understand which artists/bands to look upto i.e their advertisement/ marketing strategies et al. if they plan on releasing songs outside of their genre.

5.10 Coorelation plot of various audio features

The correlation heat map between unique attributes of the audio feature provides comprehensive information on how these features are related to one another. For instance, energy and loudness seem to be positively related. This means if an artist is producing a song intending to excite or energize its listeners, then he can keep the loudness on the higher side. Usually, analyzing a combination of unique audio features works out well for the artist and the audience. For instance, a high valence and instrumentalness would work well for danceability. This will also help them segregate the song into the correct genre.

Besides, such combinations will also help them to be mindful of eccentric combinations. For example, features that are negatively correlated will not go well together. For instance, a high acousticness can negatively impact the energy of the listener. Similarly, if the song is mostly instrumental, the artist should keep the loudness of the song low such that it has a soothing effect on the listener as they are negatively related.

5.11 Boxplot distributin of various genres and audio features

How does boxplots and density plots together help us understand generes and their features?

Valence - As observed in the density plot valence can provide us a good seperation between EDM and Latin tracks as their is a comsiderable difference between their medial values and range, while all other genres have somewhat similar valence.

Energy - Latin and Pop tracks have similar range and median values so energy might not be a good seperator for them while the remaining 4 genres have a decent seperation on energy.

Danceability - As density plots show Rock has the lowest danceability score while Latin tracks are more closely packed with the high danceability scores and Rap have a little more variability than latin tracks but in general have high danceability score.

Tempo - It might do a good job of seperating EDM tracks from the rest of the genres as most of the EDM tracks are clustered around 125 while other genres have more or less a similar spread and variability.

5.12 Distribution of most listened artist based on minutes listened

This graph shows us the most listened artist in the spotify dataset. Since, we had more than a thousand artists, filtering the data by duration helped us segregate the artists whose songs were heard for more than hours. The histogram is used to compare the popularity amongst artists. Matrin Garrix that was popular both in pop and edm genre and hence is played for the longest duration by its neighbors.

Using such data, spotify can work on its marketing strategy to check on ways of improving recognition almongst the less prominent artists. This can be done by showing more advertisements or song suggestions of the lesser popular singers so that listerners listen to them. Such stategy will make listeners spend more time on the spotify platform and can potentially increase their revenue.

5.13 Comparitive analysis of distribution of audio features between top 10 and last 10 tracks

To compare the audio features of Top 10 song and bottom 10 songs, a cleavland plot is plotted. The plot can be useful to understand the features that contribute to the success of the Top 10 songs. It can also give the artists idea on the audio features to set in order to produce a hit song.

5.14 Popularity rank distribution of Top 25 artists from 2015 - 2019

The plot depicts the distribution of change of ranks over the years of various artists. Branch color indicates the name of the track artist and the blocks depict their popularity rank over different years.

Important information to artists/producers -

The graph can aid artists to understand their rank over years which is an indication of their progress over the years. It also informs artists about the competition in the industry by looking at the artists who have been progressing towards higher ranks over the years. Such information can play a vital role in devising strategies or even find potential track partners (those who have been performing consistently well).