English: Distribution of species across various genera. Each point represents a bin of a certain range of species count.
Data from World Flora Online Plant List June 2023. https://zenodo.org/record/8079052 To reproduce, download the `classification.csv` file, then run ```python import pandas as pd import matplotlib.pyplot as plt import numpy as np
- Open the file with 'utf-8' encoding and 'replace' error handling
with open('classification.csv', 'r', encoding='latin1') as file:
df = pd.read_csv(file, delimiter='\t')
- Group by 'genus' and count the number of species in each genus
genus_counts = df['genus'].value_counts()
- Get histogram data
counts, bin_edges = np.histogram(np.log10(genus_counts), bins=100)
- Get bin centers
bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2.
- Create scatter plot
fig, ax = plt.subplots() ax.scatter(10**bin_centers, counts, s=5)
- Set log scale for y-axis
ax.set_yscale('log') ax.set_xscale('log')
- Set labels
ax.set_xlabel('Number of Species in Genus') ax.set_ylabel('Number of Genera') ax.set_title('Distribution of Number of Species per Genus')
plt.grid() plt.show()
```