Pandas Basics

Pandas consists of two basic data structure called Series and Dataframe.

Pandas is also one of the most important python library for data manipulation and analysis.

Like Numpy package, we install pandas package by the command “pip install python” in python prompt. Now the installed package can be utilized by the import command, “import pandas as pd “, pd is the variable to access the properties of the package for our data.

Series:

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

s = pd.Series(data)

Now, we will jump into the basic operations on pandas series in python.

Creating a Pandas Series – In order to create a series from array, we have to import a numpy module and have to use array() function. Check my Numpy Basics blog to know more about numpy arrays.

import pandas as pd
import numpy as np

#Simple Array
data = np.array(['g', 'e', 'e','k','s'])

#Converting numpy array into series object and storing the variable
series = pd.Series(data)

In the above operation, we created series using numpy array, Likewise we are going to create a series using python lists.

# a simple list
 list=['g', 'e', 'e', 'k', 's']

# create series form a list
 series =pd.Series(list)
 print(series)

Accessing Elements from Series:

The Series element can be accessed by position as well as by index.

# creating simple array
 data = np.array(['g','e','e','k','s','f','o','r','g','e','e','k','s'])
 series = pd.Series(data)

# Retrieving the first 5 elements
 print(series[:5])

Let’s check on some binary operations on series:

 # importing pandas module 
 import pandas as pd 

 # creating a series and assigned to a variable data
 data =pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])

 # creating a series and assigned to a variable data1
 data1 =pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])
 print(data, "\n\n", data1)

 # adding two series using  .add
 data.add(data1)

 # subtracting two series using .sub
 data.sub(data1)

mul() – Method is used to multiply series or list like objects with same length with the caller series.

div() – Method is used to divide series or list like objects with same length by the caller series.

sum() – Returns the sum of the values for the requested axis.

prod() – Returns the product of the values for the requested axis.

mean() – Returns the mean of the values for the requested axis

pow() – Method is used to put each element of passed series as exponential power of caller series and returned the results.

abs() – Method is used to get the absolute numeric value of each element in Series/DataFrame.

cov() – Method is used to find covariance of two series.

DataFrame:

Pandas Data Frame consists of main components, the data, rows, and columns.

The pandas data frame can be created by loading the data from the external, existing storage like a database, SQL or CSV files.

But the pandas Data Frame can also be created from the lists, dictionary, etc.

One of the ways to create a pandas data frame is shown below:

 #import the pandas library
 import pandas as pd

 # Dictionary of key pair values called data 
 data = {'Name':['Ashika', 'Tanu', 'Ashwin', 'Mohit', 'Sourabh'],         'Age': [24, 23, 22, 19, 10]}

 #Output
 data{'Age': [24, 23, 22, 19, 10],  'Name': ['Ashika', 'Tanu', 'Ashwin', 'Mohit', 'Sourabh']}

 # Calling the pandas data frame method by passing the dictionary (data) as            a parameter 
 df = pd.DataFrame(data) 
 df

Performing on Rows and Columns:

Selecting a Column: In order to select a particular column, all we can do is just call the name of the column inside the data frame.

# Calling the pandas data frame method by passing the dictionary (data) as a parameter 
df = pd.DataFrame(data)

# Selecting column 
df[['Name']]

Selecting Row: Pandas Data Frame provides a method called “loc” which is used to retrieve rows from the data frame. Also, rows can also be selected by using the “iloc” as a function.

# Calling the pandas data frame method by passing the dictionary (data) as a parameter
  
df = pd.DataFrame(data) 
 
# Selecting a row 
row = df.loc[1] 
row 

 #output
 Name    Tanu  Age       23  Name: 1, dtype: object

The loc method accepts only integers as a parameter.

Working with Missing Data :

Missing data occurs a lot of times when we are accessing big data sets. It occurs often like NaN (Not a number). In order to fill those values, we can use “isnull()” method. This method checks whether a null value is present in a data frame or not.

# importing both pandas and numpy libraries 
 import pandas as pd 
 import numpy as np
 # Dictionary of key pair values called data
 data ={‘First name’:[‘Tanu’, np.nan], ‘Age’: [23, np.nan]}
 df = pd.DataFrame(data)
 df

# using the isnull() function

df.isnull()

The isnull() returns false if the null is not present and true for null values.

Now we have found the missing values, the next task is to fill those values with 0 this can be done as shown below:

df.fillna(0)

Now the null values, will be assigned as 0 in the column values.

Now we have reached the end of our blog, Hope you enjoyed reading my blog!

Take care and do comment below 🙂

Pandas Basics

One thought on “Pandas Basics”

Leave a comment Cancel reply

Share this:

Related

One thought on “Pandas Basics”

Leave a comment Cancel reply