Pandas consists of two basic data structure called Series and Dataframe.
Pandas is also one of the most important python library for data manipulation and analysis.
Like Numpy package, we install pandas package by the command “pip install python” in python prompt. Now the installed package can be utilized by the import command, “import pandas as pd “, pd is the variable to access the properties of the package for our data.
Series:
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
s = pd.Series(data)
Now, we will jump into the basic operations on pandas series in python.
Creating a Pandas Series – In order to create a series from array, we have to import a numpy module and have to use array() function. Check my Numpy Basics blog to know more about numpy arrays.
import pandas as pd import numpy as np #Simple Array data = np.array(['g', 'e', 'e','k','s']) #Converting numpy array into series object and storing the variable series = pd.Series(data)
In the above operation, we created series using numpy array, Likewise we are going to create a series using python lists.
# a simple list list=['g', 'e', 'e', 'k', 's'] # create series form a list series =pd.Series(list) print(series)

Accessing Elements from Series:
The Series element can be accessed by position as well as by index.
# creating simple array data = np.array(['g','e','e','k','s','f','o','r','g','e','e','k','s']) series = pd.Series(data) # Retrieving the first 5 elements print(series[:5])
Let’s check on some binary operations on series:
# importing pandas module import pandas as pd # creating a series and assigned to a variable data data =pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd']) # creating a series and assigned to a variable data1 data1 =pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e']) print(data, "\n\n", data1) # adding two series using .add data.add(data1) # subtracting two series using .sub data.sub(data1)
mul() – Method is used to multiply series or list like objects with same length with the caller series.
div() – Method is used to divide series or list like objects with same length by the caller series.
sum() – Returns the sum of the values for the requested axis.
prod() – Returns the product of the values for the requested axis.
mean() – Returns the mean of the values for the requested axis
pow() – Method is used to put each element of passed series as exponential power of caller series and returned the results.
abs() – Method is used to get the absolute numeric value of each element in Series/DataFrame.
cov() – Method is used to find covariance of two series.
DataFrame:
Pandas Data Frame consists of main components, the data, rows, and columns.
The pandas data frame can be created by loading the data from the external, existing storage like a database, SQL or CSV files.
But the pandas Data Frame can also be created from the lists, dictionary, etc.
One of the ways to create a pandas data frame is shown below:
#import the pandas library
import pandas as pd
# Dictionary of key pair values called data
data = {'Name':['Ashika', 'Tanu', 'Ashwin', 'Mohit', 'Sourabh'], 'Age': [24, 23, 22, 19, 10]}
#Output
data{'Age': [24, 23, 22, 19, 10], 'Name': ['Ashika', 'Tanu', 'Ashwin', 'Mohit', 'Sourabh']}
# Calling the pandas data frame method by passing the dictionary (data) as a parameter
df = pd.DataFrame(data)
df

Performing on Rows and Columns:
Selecting a Column: In order to select a particular column, all we can do is just call the name of the column inside the data frame.
# Calling the pandas data frame method by passing the dictionary (data) as a parameter df = pd.DataFrame(data) # Selecting column df[['Name']]

Selecting Row: Pandas Data Frame provides a method called “loc” which is used to retrieve rows from the data frame. Also, rows can also be selected by using the “iloc” as a function.
# Calling the pandas data frame method by passing the dictionary (data) as a parameter df = pd.DataFrame(data) # Selecting a row row = df.loc[1] row #output Name Tanu Age 23 Name: 1, dtype: object
The loc method accepts only integers as a parameter.
Working with Missing Data :
Missing data occurs a lot of times when we are accessing big data sets. It occurs often like NaN (Not a number). In order to fill those values, we can use “isnull()” method. This method checks whether a null value is present in a data frame or not.
# importing both pandas and numpy libraries
import pandas as pd
import numpy as np
# Dictionary of key pair values called data
data ={‘First name’:[‘Tanu’, np.nan], ‘Age’: [23, np.nan]}
df = pd.DataFrame(data)
df

# using the isnull() function
df.isnull()

The isnull() returns false if the null is not present and true for null values.
Now we have found the missing values, the next task is to fill those values with 0 this can be done as shown below:
df.fillna(0)
Now the null values, will be assigned as 0 in the column values.
Now we have reached the end of our blog, Hope you enjoyed reading my blog!
Take care and do comment below 🙂
One thought on “Pandas Basics”