Skip to content

Logging Spotify data with Spotifyd

I set up Spotify in the terminal the other day, and now I'm interested in logging what I play locally so I can do my own stats (kind of a roll-your-own Spotify Unwrapped). There's no way to get the data on every track you've played from Spotify so that you can manipulate the data yourself (the API will return the last 50 played tracks at most). However, as I'm using Spotifyd, I can use that - here's how.

Install spotifyd

This was covered in the aforementioned post. The key addition here is to add a command to the config file that does whatever logging you want whenever the track changes:

# A command that gets executed in your shell after each song changes.
on_song_change_hook = "~/spotify-now-playing"

This causes the specified event-handling script to be executed on any Spotifyd event (not just change of song, but also play, pause, volume change, preload, end of track, etc.).

Event-handling script

This is simple:

#!/usr/bin/env bash

set -eu
set -o pipefail

DATE="$(/bin/date -Iseconds)"
EVENT="${PLAYER_EVENT:-"user"}"

if [[ "${EVENT}" =~ (change) ]] ; then
    # New song
    METADATA="$(/usr/local/bin/spt playback --format='"%a", "%b", "%t", %v, "%s"')"
    echo "${DATE}, ${EVENT}, ${METADATA}" >> ~/spotifyd.log
elif [[ "${EVENT}" == "endoftrack" ]] ; then
    # Using metadata from `spt playback` on endoftrack is unreliable; can show
    # previous track, next track, or nothing
    echo "${DATE}, ${EVENT}" >> ~/spotifyd.log
fi

exit 0

Watching the output from running Spotifyd interactively rather than as a daemon (run spotifyd --no-daemon after, if necessary, stopping the background service with brew services stop spotifyd) revealed that there's an environment variable called PLAYER_EVENT defined each time an event occurs, and by setting on_song_change_hook = "echo $PLAYER_EVENT" in the Spotifyd config, it was possible to see all the values that this takes. I eventually decided I only cared about change, which seemed to be reliably issued once and only once for every new track, and endoftrack. I also added some handling for if PLAYER_EVENT is not set, so that I could call this script directly as well as through Spotifyd.

When the track changes, the script uses Spotify TUI (spt) to get the metadata of the currently-playing track, and logs it. When the track ends, it logs the endoftrack event but doesn't try to get the metadata, because due to a race condition this isn't reliably the metadata for the track that has ended (sometimes the spt call is too late and returns the data for the next track that has started, or it can return nothing).

Log file

This gives a log file like this:

2023-12-21T22:06:19+00:00, change, "Agustín Barrios Mangoré, John C. Williams", "The Great Paraguayan", "Villancico de navidad", 89, "▶"
2023-12-21T22:09:36+00:00, endoftrack
2023-12-21T22:09:36+00:00, change, "Underworld", "1992 - 2012", "8 Ball", 89, "▶"
2023-12-21T22:18:34+00:00, endoftrack
2023-12-21T22:18:34+00:00, change, "Underworld", "Beaucoup Fish (Remastered / Super Deluxe)", "Jumbo", 89, "▶"
2023-12-21T22:25:32+00:00, endoftrack
2023-12-21T22:25:32+00:00, change, "Adam F, Fresh, Origin Unknown", "When The Sun Goes Down", "When The Sun Goes Down - Origin Unknown Mix", 89, "▶"
2023-12-21T22:31:42+00:00, endoftrack
2023-12-21T22:31:42+00:00, change, "Danny Breaks, Origin Unknown", "Volumes", "Firin' Line - Origin Unknown Remix", 89, "▶"
2023-12-21T22:37:34+00:00, endoftrack
2023-12-21T22:37:34+00:00, change, "The Fall", "The Real New Fall Formerly 'Country On The Click'", "Mountain Energei", 89, "▶"
2023-12-21T22:40:56+00:00, endoftrack
2023-12-21T22:40:56+00:00, change, "The Fall", "Extricate (Expanded Edition)", "Bill Is Dead", 89, "▶"
2023-12-21T22:45:29+00:00, endoftrack
2023-12-21T22:45:29+00:00, change, "The Fall", "I Am Kurious Oranj", "New Big Prinz", 89, "▶"
2023-12-21T22:48:55+00:00, endoftrack
2023-12-21T22:48:55+00:00, change, "The Fall", "This Nation's Saving Grace", "L.A.", 89, "▶"
2023-12-21T22:53:05+00:00, endoftrack
2023-12-21T22:53:05+00:00, change, "Kloke, Tim Reaper", "Meeting of the Minds, Vol. 1", "Foundation", 89, "▶"
2023-12-21T23:00:25+00:00, endoftrack
2023-12-21T23:00:25+00:00, change, "Dwarde, Tim Reaper", "Meeting of the Minds, Vol. 1", "Inside of Me", 89, "▶"

Analysis

To play with the data, I read it into a Pandas dataframe and filtered for only the change events, which seems the reliable single event that marks the start of a track:

import pandas as pd
df = pd.read_csv(
    "~/spotifyd.log",
    names=["date", "player_event", "artist", "album", "track", "volume", "flags"],
    skipinitialspace=True,
    quotechar='"',
    parse_dates=["date"],
)
df = df[df["player_event"] == "change"]

The quotechar='"' is needed because I've quoted the track, album and artist titles in the log file (see the --format option to spt) to protect any commas, so read_csv doesn't see them as delimiters. This also seemed to require the skipinitialspace=True option to work properly.

Next, I filtered out tracks that were only played for a few seconds before the next change:

df["time_until_next_track"] = (
    df["date"].diff().fillna(pd.Timedelta(seconds=0)).shift(-1)
)
df = df[df["time_until_next_track"] > pd.Timedelta(seconds=10)]

An endoftrack event is only issued if the track is played to the end, so that can't be used for this purpose, and we have to be content to use the time difference between change events. This is an accurate proxy for how long a track was played for, as long as the track was ended by skipping to the next track, which will be the case most of the time. I set the threshold to 10s, which is maybe a bit short, but is long enough to filter out that time I accidentally played Miami by Will Smith instead of Miami by Baxter Dury...

The only useful output I've added so far is to list the top artists according to play count. To do this properly, I first needed to split up cases where multiple artists collaborated on a single track to make sure each was counted. That is, to make sure a track by "Sleaford Mods, Florence Shaw" wasn't recorded as a single artist of that name, but rather was logged against each distinct artist. This is complicated by the possibility that a single artist could have a comma in their name, but since so far I've only thought of a few that actually do, I decided the simplest way to handle this was to hard-wire an explicit list.

import csv

# Dumb hard-wiring for artist names that contain commas
artists_with_commas = [
    "Crosby, Stills & Nash",
    "Crosby, Still, Nash & Young",
    "Earth, Wind & Fire",
    "Up, Bustle & Out",
]

# Function to split artists, considering names with commas
def split_artists(artist_string):
    for artist_with_commas in artists_with_commas:
        artist_string = artist_string.replace(
            artist_with_commas, f'"{artist_with_commas}"'
        )
    # Use csv.reader rather than split() to respect the quoted substrings that
    # contain delimiters which we don't want to split on. It expects a list and
    # returns an iterator.
    return next(csv.reader([artist_string], skipinitialspace=True))

# Expand the "artist" column into a list of individual artists
df["artists"] = df["artist"].apply(split_artists)

The technique is to replace the artist name that contains commas with a quoted version of itself. The standard string .split() method however doesn't seem to respect the quotes, so I used csv.reader() instead (see comments).

We now have a new column, artists, which contains a list of individual artists rather than a single concatenated string. The final task is to count the number of times each artist appears, which we can do with the Pandas explode() and value_counts() methods:

# Count the plays for each artist
exploded_df = df.explode("artists")
artist_counts = exploded_df["artists"].value_counts().reset_index()
artist_counts.columns = ["artist", "count"]

The explode("artists") method returns a copy of the dataframe where each row is replicated for each item in the list contained in that row's artists column.

The final final task is to add a rank:

# Add a numerical rank column starting at 1
artist_counts["rank"] = artist_counts["count"].rank(method="min", ascending=False)

To print the output (as Markdown):

# Reorder columns
artist_counts = artist_counts[["rank", "artist", "count"]]

print(artist_counts.to_markdown(index=False))

Example output is shown here.