Reshape Pandas Data With Melt

By | October 28, 2016

Data tables often come in a format that makes sense to the human who created the table, but that’s difficult for analysis. To make analysis easier we can reshape the data into a more computer-friendly form. Pandas is a python data analysis library and in this post I reshape pandas data with melt.

A Reshape Example

Simple Dataframe

Firstly, let’s create a simple dataframe in pandas:

data = {'weekday': ["Monday", "Tuesday", "Wednesday", 
         "Thursday", "Friday", "Saturday", "Sunday"],
        'Person 1': [12, 6, 5, 8, 11, 6, 4],
        'Person 2': [10, 6, 11, 5, 8, 9, 12],
        'Person 3': [8, 5, 7, 3, 7, 11, 15]}
df = pd.DataFrame(data, columns=['weekday',
        'Person 1', 'Person 2', 'Person 3'])

This dataframe is similar to how we might have recorded the data by hand or in a spreadsheet, and could represent test scores of three different people on each day of the week.There is a row for each day of the week, and a separate column for each person:

weekday Person 1 Person 2 Person 3
0 Monday 12 10 8
1 Tuesday 6 6 5
2 Wednesday 5 11 7
3 Thursday 8 5 3
4 Friday 11 8 7
5 Saturday 6 9 11
6 Sunday 4 12 15

Fine For Humans – Not So Good For Analysis

While this simple dataframe structure is fine for a human reader, if we wanted to create¬† use ‘groupby‘ operations or begin to normalise this dataframe to we would need to do some reshaping.¬† We could reshape the dataframe to have a column containing person information, and another column to contain that person’s score on a particular day, like this:

weekday Person Score

Reshape With Melt

It is of course possible to reshape a data table by hand, by copying and pasting the values from each person’s column into the new ‘person’ column. This would take a a long time even for this small dataframe, and would be prone to errrors. A much better idea is to reshape the dataframe with melt:

melted = pd.melt(df, id_vars=["weekday"], 
                 var_name="Person", value_name="Score")

Here we have set the variables (columns) that we want to leave unaffected. Variables not included in this list will become rows in a new column (which has the name given by “var_name”). The values belonging to the original rows/columns are found in a new column with a name given by “value_name”, and the output dataframe now has three rows for each day of the week – one for each of person 1, 2, and 3. This new form of the dataframe is now more ‘normalised’ and can have groupby operations applied to it.

weekday Person Score
0 Monday Person 1 12
1 Tuesday Person 1 6
2 Wednesday Person 1 5
3 Thursday Person 1 8
4 Friday Person 1 11
5 Saturday Person 1 6
6 Sunday Person 1 4
7 Monday Person 2 10
8 Tuesday Person 2 6
9 Wednesday Person 2 11
10 Thursday Person 2 5
11 Friday Person 2 8
12 Saturday Person 2 9
13 Sunday Person 2 12
14 Monday Person 3 8
15 Tuesday Person 3 5
16 Wednesday Person 3 7
17 Thursday Person 3 3
18 Friday Person 3 7
19 Saturday Person 3 11
20 Sunday Person 3 15

Read more about using melt at stackoverflow