The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0. Show Exclude NA/null values. If an entire row/column is NA, the result will be NA. *args, **kwargsAdditional keywords have no effect but might be accepted for compatibility with NumPy. ReturnsSeries or DataFrameReturn cumulative sum of Series or DataFrame. See also Similar functionality but ignores Return the sum over DataFrame axis. Return cumulative maximum over DataFrame axis. Return cumulative minimum over DataFrame axis. Return cumulative sum over DataFrame axis. Return cumulative product over DataFrame axis. Examples Series >>> s = pd.Series([2, np.nan, 5, -1, 0]) >>> s 0 2.0 1 NaN 2 5.0 3 -1.0 4 0.0 dtype: float64 By default, NA values are ignored. >>> s.cumsum() 0 2.0 1 NaN 2 7.0 3 6.0 4 6.0 dtype: float64 To include NA values in the operation, use >>> s.cumsum() 0 2.0 1 NaN 2 7.0 3 6.0 4 6.0 dtype: float643 >>> s.cumsum(skipna=False) 0 2.0 1 NaN 2 NaN 3 NaN 4 NaN dtype: float64 DataFrame >>> df = pd.DataFrame([[2.0, 1.0], ... [3.0, np.nan], ... [1.0, 0.0]], ... columns=list('AB')) >>> df A B 0 2.0 1.0 1 3.0 NaN 2 1.0 0.0 By default, iterates over rows and finds the sum in each column. This is equivalent to >>> s.cumsum() 0 2.0 1 NaN 2 7.0 3 6.0 4 6.0 dtype: float644 or >>> s.cumsum() 0 2.0 1 NaN 2 7.0 3 6.0 4 6.0 dtype: float645. A Pareto chart is a type of chart that displays the ordered frequencies of categories along with the cumulative frequencies of categories. This tutorial provides a step-by-step example of how to create a Pareto chart in Python. Step 1: Create the DataSuppose we conduct a survey in which we ask 350 different people to identify their favorite cereal brand between brands A, B, C, D, and E. We can create the following pandas DataFrame to hold the results of the survey: import pandas as pd #create DataFrame df = pd.DataFrame({'count': [97, 140, 58, 6, 17, 32]}) df.index = ['B', 'A', 'C', 'F', 'E', 'D'] #sort DataFrame by count descending df = df.sort_values(by='count', ascending=False) #add column to display cumulative percentage df['cumperc'] = df['count'].cumsum()/df['count'].sum()*100 #view DataFrame df count cumperc A 140 40.000000 B 97 67.714286 C 58 84.285714 D 32 93.428571 E 17 98.285714 F 6 100.000000 Step 2: Create the Pareto ChartWe can use the following code to create the Pareto chart: import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
#define aesthetics for plot
color1 = 'steelblue'
color2 = 'red'
line_size = 4
#create basic bar plot
fig, ax = plt.subplots()
ax.bar(df.index, df['count'], color=color1)
#add cumulative percentage line to plot
ax2 = ax.twinx()
ax2.plot(df.index, df['cumperc'], color=color2, marker="D", ms=line_size)
ax2.yaxis.set_major_formatter(PercentFormatter())
#specify axis colors
ax.tick_params(axis='y', colors=color1)
ax2.tick_params(axis='y', colors=color2)
#display Pareto chart
plt.show()
The x-axis displays the different brands ordered from highest to lowest frequency. The left-hand y-axis shows the frequency of each brand and the right-hand y-axis shows the cumulative frequency of the brands. For example, we can see:
And so on. Step 3: Customize the Pareto Chart (Optional)You can change the colors of the bars and the size of the cumulative percentage line to make the Pareto chart look however you’d like. For example, we could change the bars to be pink and change the line to be purple and slightly thicker: |