Summary: in this tutorial, you’ll learn about the two main data structures of Pandas – the Series and DataFrame.
Pandas Series
The Series is the most basic data structure of pandas.
Assuming you’re already familiar with Python data types, the Series
looks pretty similar to a list in Python. The core difference between a List and a Series is that the Series allows you to use anything you like as the index, instead of restricting on zero-based array indexes.
A Series object contains two “columns”, the first one is the index, the second one contains our data. By default, the index is number based and starts from zero.
import numpy as np
import pandas as pd
# This is a Series
# with number-based indexes
example = pd.Series([1,2,3,4,5])
example
# Output :
0 1
1 2
2 3
3 4
4 5
dtype: int64
You can retrieve one or multiple items from a Series
using the indexes.
# Retrieving a single value #
example[3]
# Output
4
# Retrieving multiple values #
example[[2,4]]
# Output
2 3
4 5
dtype: int64
But you can specify your own index by passing an index
argument.
import numpy as np
import pandas as pd
# This is another Series
# with character-based indexes
example = pd.Series([1,2,3,4,5],
index=['a', 'b', 'c', 'd', 'e'])
example
# Output :
a 1
b 2
c 3
d 4
e 5
dtype: int64
By specifying a custom index column, you can access items from those indexes.
# Retrieving a single value #
example['c']
# Output
3
# Retrieving multiple values #
example[[c,e]]
# Output
c 3
e 5
dtype: int64
You can also perform other statistical operations with a Series, a common one is to get the mean of all values in a Series object.
# Get the means of the values
example.mean()
# Output
3.0
While not being particularly useful than the ordinary list at first, Series is the base of the next powerful data types of Pandas : the DataFrame.
Summary: Series is a list with customizable indexes, serve as the base of DataFrame.
