Reshape Pandas Data With Melt

By | October 28, 2016

Data tables often come in a format that makes sense to the human who created the table, but that’s difficult for analysis. To make analysis easier we can reshape the data into a more computer-friendly form. Pandas is a python data analysis library and in this post I reshape pandas data with melt.

A Reshape Example

Simple Dataframe

Firstly, let’s create a simple dataframe in pandas:

data = {'weekday': ["Monday", "Tuesday", "Wednesday", 
         "Thursday", "Friday", "Saturday", "Sunday"],
        'Person 1': [12, 6, 5, 8, 11, 6, 4],
        'Person 2': [10, 6, 11, 5, 8, 9, 12],
        'Person 3': [8, 5, 7, 3, 7, 11, 15]}
df = pd.DataFrame(data, columns=['weekday',
        'Person 1', 'Person 2', 'Person 3'])

This dataframe is similar to how we might have recorded the data by hand or in a spreadsheet, and could represent test scores of three different people on each day of the week.There is a row for each day of the week, and a separate column for each person:

weekday Person 1 Person 2 Person 3
0 Monday 12 10 8
1 Tuesday 6 6 5
2 Wednesday 5 11 7
3 Thursday 8 5 3
4 Friday 11 8 7
5 Saturday 6 9 11
6 Sunday 4 12 15

Fine For Humans – Not So Good For Analysis

While this simple dataframe structure is fine for a human reader, if we wanted to create  use ‘groupby‘ operations or begin to normalise this dataframe to we would need to do some reshaping.  We could reshape the dataframe to have a column containing person information, and another column to contain that person’s score on a particular day, like this:

weekday Person Score

Reshape With Melt

It is of course possible to reshape a data table by hand, by copying and pasting the values from each person’s column into the new ‘person’ column. This would take a a long time even for this small dataframe, and would be prone to errrors. A much better idea is to reshape the dataframe with melt:

melted = pd.melt(df, id_vars=["weekday"], 
                 var_name="Person", value_name="Score")

Here we have set the variables (columns) that we want to leave unaffected. Variables not included in this list will become rows in a new column (which has the name given by “var_name”). The values belonging to the original rows/columns are found in a new column with a name given by “value_name”, and the output dataframe now has three rows for each day of the week – one for each of person 1, 2, and 3. This new form of the dataframe is now more ‘normalised’ and can have groupby operations applied to it.

weekday Person Score
0 Monday Person 1 12
1 Tuesday Person 1 6
2 Wednesday Person 1 5
3 Thursday Person 1 8
4 Friday Person 1 11
5 Saturday Person 1 6
6 Sunday Person 1 4
7 Monday Person 2 10
8 Tuesday Person 2 6
9 Wednesday Person 2 11
10 Thursday Person 2 5
11 Friday Person 2 8
12 Saturday Person 2 9
13 Sunday Person 2 12
14 Monday Person 3 8
15 Tuesday Person 3 5
16 Wednesday Person 3 7
17 Thursday Person 3 3
18 Friday Person 3 7
19 Saturday Person 3 11
20 Sunday Person 3 15

Read more about using melt at stackoverflow

One thought on “Reshape Pandas Data With Melt

Comments are closed.