4. Plotting and Equation Solving#

Learning Objectives

Be able to:

  • import libraries numpy, sympy, matplotlib;

  • symbolically manipulate an equation using sympy.solve();

  • generate x and y data for a function using numpy.linspace();

  • plot the x, y data using matplotlib.pyplot.plot();

  • add horizontal lines and text to a plot; and

  • use pandas read_csv to open a data file

  • group (aggregate) data using pandas: groupby(), sum(), mean()

You will need the following extensions:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sympy.solvers import solve #for solving equations symbolically
from sympy import symbols #for solving equations symbolically

4.1. Hints for Problem 1#

\[ \mathbf{\text{% crystallinity} = \frac{\rho_c (\rho_s - \rho_a)}{\rho_s (\rho_c - \rho_a)} * 100} \]

For this problem you will need to define a function to find the crystallinity based on the values of density as shown above. Your function should look like:

def crystallinity(density_s, density_a, density_c):
    percent_crys=   # finish this statement
    return percent_crys

crystallinity(0.925, 0.870, 0.998)  # checking your function output

Write a function that takes percent_crystallinity, crystalline_density and amorphous_density and returns the sample density. For example,

def sample_density(percent_crystallinity, density_a, density_c):  
     density_s=   #finish this equation
     return density_s

To find the equation for the sample density, you will want to solve the crystallinity equation for the sample_density. You can do this using sympy after loading the following packages:

from sympy.solvers import solve #for solving equations symbolically
from sympy import symbols #for solving equations symbolically

4.1.1. Example of solving an equation#

Let’s say you wanted to solve the equation \(a x^2 + b x + c = d\). First, we tell python that a,b,c,d,x are symbols.

from sympy.solvers import solve #for solving equations symbolically
from sympy import symbols #for solving equations symbolically
a,b,c,d,x = symbols('a b c d x')

Then we can use the solve command.

Important

The equation that goes into the solve command must equal zero so we rewrite \(a x^2 + b x + c = d\) as \(a x^2 + b x + c - d=0\). Notice that we do not include the “=0” part in the solve function below because that is assumed.

# solve for x
solve(a*x**2+b*x+c-d, x)
[(-b - sqrt(-4*a*c + 4*a*d + b**2))/(2*a),
 (-b + sqrt(-4*a*c + 4*a*d + b**2))/(2*a)]
def x_solution(a,b,c,d):
    return (-b + np.sqrt(-4*a*c + 4*a*d + b**2))/(2*a)

x_solution(2,9,5,10)
0.5
x_solution(2,9,0.4,10)
0.8904617494884413

Let’s find our x solution for different values of the parameter “d”

The np.linspace(min, max, numpts) creates evenly spaced points between two numbers. The number of points you want is given by the third number.

dd=np.linspace(0,5,5) #first pt =0, last point=5, total points=5
print('linspace list dd = ',dd)
[x_solution(2,9,5,d) for d in dd] #list comprehension loop
linspace list dd =  [0.   1.25 2.5  3.75 5.  ]
[-0.6492189406417879,
 -0.4646428928642874,
 -0.2974375810233365,
 -0.14346255670591024,
 0.0]

Let’s use the above method to find x solutions for 20 d values between 0 and 50. Example annotations are also included in the plot.

d=np.linspace(0, 50, 20) #creates 20 evenly spaced d-values from 0 to 50
x_sol=x_solution(2,9,5,d)
plt.plot(d,x_sol,'ro')
plt.xlabel('d')
plt.ylabel('x solution')
plt.annotate('$a x^2 + b x + c = d$', xy=(5, 2.5), color='b') # you can include text on your plot
plt.annotate('$a x^2$', xy=(40, 0.5), color='b') # you can include text on your plot
plt.show()
../_images/79d326451e8108e94901f892ac7f9f8c55b4351280280317fc52ed87c1ec2a5d.png

4.2. Hints for Problem 2#

Last week we calculated the value of \(\left(4\times \frac{\text{pts in circle}}{\text{total num pts}}\right)\) for 1000 random points. This week you will plot the value as a function of the number of random points. That is, you want to write a function that takes a parameter, \(n\), for the total number of points and returns the value \(\left(4\times \frac{\text{pts in circle}}{\text{total num pts}}\right)\).

My code for 1000 random points was:

xcoord=(2)*np.random.random(1000)-1 #1000 random pts from -1 to 1
ycoord=(2)*np.random.random(1000)-1 #1000 random pts from -1 to 1
magnitude=np.sqrt(xcoord**2+ycoord**2) #length from center (0,0) to point
mag_test=magnitude<=1
4*len(xcoord[mag_test])/len(xcoord)

Check what len(xcoord[magnitude<=1]) and len(xcoord) does in my code above.

Since your n value in this problem will vary from 100 to 1,000,000 (i.e. several orders of magnitude) it is better to use geomspace() rather than linspace(). geomspace() gives pts spaced on a log scale which is much better when our values change by orders of magnitude.

Don’t forget that your function above requires an integer to be input so you will want to make sure geomspace() is giving you an integer or round your list to the nearest integer.

xdata=np.geomspace(100,1e6,num=50, dtype=int) #tell it to round to the closest integer dtype=int
print('geomspace: notice the nice sample of values across all magnitudes\n')
print(xdata)
# notice the increasing step size as our numbers get larger
# compare this to linspace
print('\nlinspace: notice this skips from 100 to 20506 for the first point\n')
print(np.linspace(100, 1e6, num=50, dtype=int))
geomspace: notice the nice sample of values across all magnitudes

[    100     120     145     175     212     255     308     372     449
     542     655     790     954    1151    1389    1676    2023    2442
    2947    3556    4291    5179    6250    7543    9102   10985   13257
   15998   19306   23299   28117   33932   40949   49417   59636   71968
   86851  104811  126485  152641  184206  222299  268269  323745  390693
  471486  568986  686648  828642 1000000]

linspace: notice this skips from 100 to 20506 for the first point

[    100   20506   40912   61318   81724  102130  122536  142942  163348
  183755  204161  224567  244973  265379  285785  306191  326597  347004
  367410  387816  408222  428628  449034  469440  489846  510253  530659
  551065  571471  591877  612283  632689  653095  673502  693908  714314
  734720  755126  775532  795938  816344  836751  857157  877563  897969
  918375  938781  959187  979593 1000000]

4.3. Hints for Problem 3#

Import the csv file titanic.csv and (a) determine the percentage of woman that survived and the percentage of men that survived and (b) determine the average ticket price (Fare) for those that survived versus those that did not survive. 0 = did not survive. 1 = survived.

Last week we learned how to use pandas to import an excel file; clean imported data including dropping NAN, filling NAN, renaming columns, dropping rows, sorting, slicing, and adding new columns.

*Now we need to learn how to group (aggregate) data: groupby(), sum(), mean()
Let’s take a look at a data file containing information about the insulation thickness from a wire coat machine versus several parameters.

# This is a direct read of the file "raw_clean_cable_insulation.csv" from a google drive. 
raw=pd.read_csv('https://drive.google.com/uc?id=1zmsrtUgXZhJJzERQXFKkUVoCD1zNbBSr')
raw.head(3) #looking at only first 3 records
Unnamed: 0.1 Unnamed: 0 ExtrTemp LineSpeed WaterTemp Measure City
0 0 0 234.83 55.28 26.67 170.67 Kairo
1 1 1 235.09 54.13 26.97 182.86 Kairo
2 2 2 235.09 54.33 27.46 182.74 Kairo
# dropping first two columns here since I don't need them
df=raw.drop(['Unnamed: 0.1','Unnamed: 0'],axis=1)
df
ExtrTemp LineSpeed WaterTemp Measure City
0 234.83 55.28 26.67 170.67 Kairo
1 235.09 54.13 26.97 182.86 Kairo
2 235.09 54.33 27.46 182.74 Kairo
3 235.05 55.00 26.83 175.49 Kairo
4 235.04 55.26 26.62 172.80 Kairo
... ... ... ... ... ...
9992 234.92 55.09 26.98 173.84 Cairo
9993 235.22 55.91 27.30 171.86 Cairo
9994 235.13 55.43 26.99 173.59 Cairo
9995 235.05 54.73 26.73 177.21 Cairo
9996 235.10 54.57 27.21 180.25 Cairo

9997 rows × 5 columns

# rounding some of the numbers so they group better
# i.e. 234.92 and 235.22 etc should all just be 235.  
# you won't need to do this for the titanic data
df['LineSpeed']=df.loc[:,'LineSpeed'].round(1)
df['ExtrTemp']=df.loc[:,'ExtrTemp'].round(0)
df
ExtrTemp LineSpeed WaterTemp Measure City
0 235.0 55.3 26.67 170.67 Kairo
1 235.0 54.1 26.97 182.86 Kairo
2 235.0 54.3 27.46 182.74 Kairo
3 235.0 55.0 26.83 175.49 Kairo
4 235.0 55.3 26.62 172.80 Kairo
... ... ... ... ... ...
9992 235.0 55.1 26.98 173.84 Cairo
9993 235.0 55.9 27.30 171.86 Cairo
9994 235.0 55.4 26.99 173.59 Cairo
9995 235.0 54.7 26.73 177.21 Cairo
9996 235.0 54.6 27.21 180.25 Cairo

9997 rows × 5 columns

  1. Let’s find the percentage of Measurements made in Alexandria vs the total number of measurements.

  2. Let’s also find the average WaterTemp for all measurements made in Alexandria at an ExtrTemp of 235.0

# groupby collects all records for the different cities and then
# .count() counts how many records there are in each city grouping
dfgroup=df.groupby(['City']).count()
dfgroup
ExtrTemp LineSpeed WaterTemp Measure
City
Alexandria 5520 5520 5520 5520
CAI 17 17 17 17
Cairo 4413 4413 4413 4413
Kairo 47 47 47 47
# using .sum() we can get the total number of records in each column
# by default .sum() sums down columns
dfgroup.sum()
ExtrTemp     9997
LineSpeed    9997
WaterTemp    9997
Measure      9997
dtype: int64
# calculating percent of records in Alexandria vs total of 9997
# using .loc[] to pull from dataframe "dfgroup.loc[row, column]"
dfgroup.loc['Alexandria','Measure']/9997*100 #percent
55.21656496949085
# now we are grouping by two columns
# we first collect the Cities
# then group within each city by the ExtrTemp (extruder temperature)
# here we take the mean or average value in these two categories for each column
dfmean=df.groupby(['City','ExtrTemp']).mean()
dfmean
LineSpeed WaterTemp Measure
City ExtrTemp
Alexandria 234.0 54.894118 27.056471 170.712353
235.0 55.003924 26.999927 175.425584
236.0 55.115152 27.000606 180.169091
CAI 235.0 55.176471 26.965882 174.288235
Cairo 234.0 54.980645 27.035806 169.769032
235.0 54.994089 26.993054 175.444202
236.0 55.014706 27.005294 181.228824
Kairo 235.0 55.093617 26.941915 174.370426
# the mean value of WaterTemp in Alexandria at an ExtrTemp of 235 degrees is
# note row is ('Alexandria',235.0) and the column we want is 'WaterTemp'
dfmean.loc[('Alexandria',235.0), 'WaterTemp']
26.999926645883

4.3.1. Pivot Table#

For a lot more control, we could use pivot_table to specify different aggregation functions for each column of values we want to analyze. You should be able to do what you want with groupby() but the pivot_table is a more manual way to specify many options.

pd.pivot_table(df, values=['Measure', 'WaterTemp','LineSpeed'], index=['City', 'ExtrTemp'],
                    aggfunc={'Measure': np.mean,
                             'WaterTemp': [min, max, np.mean],
                            'LineSpeed':[np.sum,np.mean,np.size]}).style
    LineSpeed Measure WaterTemp
    mean size sum mean max mean min
City ExtrTemp              
Alexandria 234.000000 54.894118 34 1866.400000 170.712353 27.560000 27.056471 26.520000
235.000000 55.003924 5453 299936.400000 175.425584 27.700000 26.999927 26.150000
236.000000 55.115152 33 1818.800000 180.169091 27.440000 27.000606 26.420000
CAI 235.000000 55.176471 17 938.000000 174.288235 27.330000 26.965882 26.440000
Cairo 234.000000 54.980645 31 1704.400000 169.769032 27.410000 27.035806 26.580000
235.000000 54.994089 4348 239114.300000 175.444202 27.660000 26.993054 26.290000
236.000000 55.014706 34 1870.500000 181.228824 27.300000 27.005294 26.700000
Kairo 235.000000 55.093617 47 2589.400000 174.370426 27.460000 26.941915 26.380000

4.4. Exercises#

4.4.1. Problem 1:#

Computations of the Density and Percent Crystallinity of Polyethylene (Callister 10th ed., section 14.11)

The % crystallinity is given by:

\[ \mathbf{\text{% crystallinity} = \frac{\rho_c (\rho_s - \rho_a)}{\rho_s (\rho_c - \rho_a)} * 100} \]

(1) Write a function that calculates the percent crystallinity of a branched polyethylene sample \(\rho_s\) that has a density of \(0.925\ 𝑔\ 𝑐𝑚^{-3}\) . The density of a totally amorphous material \(\rho_a\) is \(0.870 \ 𝑔\ 𝑐𝑚^{-3}\) and the density of a totally crystalline \(\rho_c\) material is \(0.998 \ 𝑔\ 𝑐𝑚^{-3}\) .

(2) Calculate the sample density \(\rho_s\) versus % crystallinity and reproduce the plot below (including the horizontal lines and text). To find the equation for the sample density use the solve commands from sympy to solve the above crystallinity equation for \(\rho_s\). An example of using solve can be found in this lesson.

(3) Export your data to an excel file.

../_images/04_2114_prob_1.png

4.4.2. Problem 2:#

Write a function that finds “x” random points in a square of edge length 2 centered at (0,0). Determines how many of these points are contained in a circle of radius 1 centered at (0,0). Returns the value \(\left(4\times \frac{\text{pts in circle}}{\text{total num pts}}\right)\). Plot the value \(\left(4\times \frac{\text{pts in circle}}{\text{total num pts}}\right)\) versus number of points, x. Your x should vary between 100 and 1e6. Also include a horizontal line at the value of pi on your plot.

4.4.3. Problem 3:#

Import the csv file “titanic.csv” and (a) determine the percentage of woman that survived and the percentage of men that survived and (b) determine the average ticket price (Fare) for those that survived versus those that did not survive. 0 = did not survive. 1 = survived.