7. Self Review and Practice with Pandas#

Learning Objectives

Be able to:

  • Read an excel file into a pandas dataframe

  • Check and correct column data types in a dataframe

  • Drop columns containing no data i.e. NaN

  • Drop rows with missing values

  • Add new columns to the dataframe

  • Plot the data

  • Perform a non-linear fit to the data and interpret the results

7.1. Lesson#

No Lesson this week since this is practice and draws from previous lessons especially week 3 “working with data.”

Warning

The fitting function curve_fit in scipy expects numpy arrays as input. This means when you define your fitting function you want to use numpy functions and not math functions. For example, np.e**(-k*t**n) will work but math.exp(-k*t**n) will not work in the curve_fit routine.

7.2. Exercises#

7.2.1. Problem 1#

Perform the following:
a. Read in the data file: crystallization data: (07_data.xlsx)
b. Check the type of each column using .info()
c. Drop columns with no values (“all” dropna) - make sure you use the correct axis
d. Convert columns to numeric
e. Drop any row containing NaN (“any” dropna) - make sure you use the correct axis
f. Add a new column “time, hrs” by converting “time, sec” to hours
g. Plot the time in hours on the x-axis and crystallization fraction on the y-axis. Label your axes.
h. Fit your data (time in hours, crystallization fraction) to the following function: \(y=1-e^{-k \ t^n}\)
i. Replot your data (color black small points) and include your fit (color red line)
j. Using your fitted function, determine the crystallization fraction after 125 hours.

7.2.2. Problem 2 (not graded)#

Problem 1 is a typical example of what you may experience in lab, senior design, or the workplace. If you had any trouble with Problem 1, update your cheat sheet to help for next time.