Post

Reshape Pandas Data With Melt

Reshape Pandas Data With Melt

Data tables often come in a format that makes sense to the human who created the table, but that’s difficult for analysis. To make analysis easier we can reshape the data into a more computer-friendly form. Pandas is a python data analysis library and in this post I reshape pandas data with melt.

A Reshape Example

Simple Dataframe

Firstly, let’s create a simple dataframe in pandas:

1
2
3
4
5
6
7
data = {'weekday': ["Monday", "Tuesday", "Wednesday",
         "Thursday", "Friday", "Saturday", "Sunday"],
        'Person 1': [12, 6, 5, 8, 11, 6, 4],
        'Person 2': [10, 6, 11, 5, 8, 9, 12],
        'Person 3': [8, 5, 7, 3, 7, 11, 15]}
df = pd.DataFrame(data, columns=['weekday',
        'Person 1', 'Person 2', 'Person 3'])

This dataframe is similar to how we might have recorded the data by hand or in a spreadsheet, and could represent test scores of three different people on each day of the week.There is a row for each day of the week, and a separate column for each person:

weekdayPerson 1Person 2Person 3
0Monday12108
1Tuesday665
2Wednesday5117
3Thursday853
4Friday1187
5Saturday6911
6Sunday41215

Fine For Humans - Not So Good For Analysis

While this simple dataframe structure is fine for a human reader, if we wanted to create  use ‘groupby’ operations or begin to normalise this dataframe to we would need to do some reshaping.  We could reshape the dataframe to have a column containing person information, and another column to contain that person’s score on a particular day, like this:

weekdayPersonScore

Reshape With Melt

It is of course possible to reshape a data table by hand, by copying and pasting the values from each person’s column into the new ‘person’ column. This would take a a long time even for this small dataframe, and would be prone to errrors. A much better idea is to reshape the dataframe with melt:

1
melted = pd.melt(df, id_vars=["weekday"],var_name="Person", value_name="Score")

Here we have set the variables (columns) that we want to leave unaffected. Variables not included in this list will become rows in a new column (which has the name given by “var_name”). The values belonging to the original rows/columns are found in a new column with a name given by “value_name”, and the output dataframe now has three rows for each day of the week - one for each of person 1, 2, and 3. This new form of the dataframe is now more ‘normalised’ and can have groupby operations applied to it.

weekdayPersonScore
0MondayPerson 112
1TuesdayPerson 16
2WednesdayPerson 15
3ThursdayPerson 18
4FridayPerson 111
5SaturdayPerson 16
6SundayPerson 14
7MondayPerson 210
8TuesdayPerson 26
9WednesdayPerson 211
10ThursdayPerson 25
11FridayPerson 28
12SaturdayPerson 29
13SundayPerson 212
14MondayPerson 38
15TuesdayPerson 35
16WednesdayPerson 37
17ThursdayPerson 33
18FridayPerson 37
19SaturdayPerson 311
20SundayPerson 315

Read more about using melt at stackoverflow

This post is licensed under CC BY 4.0 by the author.