Matplotlib Scatter
Scatter plots are one of the most fundamental and powerful tools for visualizing relationships between two numerical variables. matplotlib.pyplot.scatter() plots points on a Cartesian plane defined by X and Y coordinates. Each point represents a data observation, allowing us to visually analyze how two variables correlate, cluster or distribute. For example:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([12, 45, 7, 32, 89, 54, 23, 67, 14, 91])
y = np.array([99, 31, 72, 56, 19, 88, 43, 61, 35, 77])
plt.scatter(x, y)
plt.title("Basic Scatter Plot")
plt.xlabel("X Values")
plt.ylabel("Y Values")
plt.show()
Output

Explanation: plt.scatter(x, y) creates a scatter plot on a 2D plane to visualize the relationship between two variables, with a title and axis labels added for clarity and context.
Syntax
matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, alpha=None, edgecolors=None, label=None)
Parameters:
Parameter | Description |
---|---|
x, y | Sequences of data points to plot |
s | Marker size (scalar or array-like) |
c | Marker color |
marker | Shape of the marker |
cmap | Colormap for mapping numeric values to colors |
alpha | Transparency (0 = transparent, 1 = opaque) |
edgecolors | Color of marker edges |
label | Legend label for the dataset |
Returns: This function returns a PathCollection object representing the scatter plot points. This object can be used to further customize the plot or to update it dynamically.
Examples
Example 1: In this example, we compare the height and weight of two different groups using different colors for each group.
x1 = np.array([160, 165, 170, 175, 180, 185, 190, 195, 200, 205])
y1 = np.array([55, 58, 60, 62, 64, 66, 68, 70, 72, 74])
x2 = np.array([150, 155, 160, 165, 170, 175, 180, 195, 200, 205])
y2 = np.array([50, 52, 54, 56, 58, 64, 66, 68, 70, 72])
plt.scatter(x1, y1, color='blue', label='Group 1')
plt.scatter(x2, y2, color='red', label='Group 2')
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.title('Comparison of Height vs Weight between two groups')
plt.legend()
plt.show()
Output

Explanation: We define NumPy arrays x1, y1 and x2, y2 for height and weight data of two groups. Using plt.scatter(), Group 1 is plotted in blue and Group 2 in red, each with labels. The x-axis and y-axis are labeled "Height (cm)" and "Weight (kg)" for clarity.
Example 2: This example demonstrates how to customize a scatter plot using different marker sizes and colors for each point. Transparency and edge colors are also adjusted.
x = np.array([3, 12, 9, 20, 5, 18, 22, 11, 27, 16])
y = np.array([95, 55, 63, 77, 89, 50, 41, 70, 58, 83])
a = [20, 50, 100, 200, 500, 1000, 60, 90, 150, 300] # size
b = ['red', 'green', 'blue', 'purple', 'orange', 'black', 'pink', 'brown', 'yellow', 'cyan'] # color
plt.scatter(x, y, s=a, c=b, alpha=0.6, edgecolors='w', linewidth=1)
plt.title("Scatter Plot with Varying Colors and Sizes")
plt.show()
Output

Explanation: NumPy arrays x and y set point coordinates, a defines marker sizes and b assigns colors. plt.scatter() plots the points with transparency, white edges and linewidth. A title is added before displaying the plot.
Example 3: This example shows how to create a bubble plot where the size of each point (bubble) represents a variable's magnitude. Edge color and alpha transparency are also used.
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
sizes = [30, 80, 150, 200, 300] # Bubble sizes
plt.scatter(x, y, s=sizes, alpha=0.5, edgecolors='blue', linewidths=2)
plt.title("Bubble Plot Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
Output

Explanation: Lists x and y define point coordinates, while sizes sets the marker (bubble) sizes. The plt.scatter() plots the bubbles with 50% transparency (alpha=0.5), blue edges and edge width of 2. Axis labels and a title are added before displaying the plot.
Example 4: In this example, we map data values to colors using a colormap and add a colorbar. This helps in visualizing a third variable via color intensity.
x = np.random.randint(50, 150, 100)
y = np.random.randint(50, 150, 100)
colors = np.random.rand(100) # Random float values for color mapping
sizes = 20 * np.random.randint(10, 100, 100)
plt.scatter(x, y, c=colors, s=sizes, cmap='viridis', alpha=0.7)
plt.colorbar(label='Color scale')
plt.title("Scatter Plot with Colormap and Colorbar")
plt.show()
Output

Explanation: Random arrays x and y set 100 points, with colors mapped using 'viridis' and varying sizes. plt.scatter() plots them with 0.7 transparency and plt.colorbar() adds a color legend.
Example 5: This final example illustrates how to change the marker style using the marker parameter. Here, triangle markers are used with magenta color.
plt.scatter(x, y, marker='^', color='magenta', s=100, alpha=0.7)
plt.title("Scatter Plot with Triangle Markers")
plt.show()
Output

Explanation: This code plots points with triangle markers ('^') in magenta color, size 100, and 0.7 transparency. A title is added before displaying the plot.