# Introduction to Financial Python

## NumPy and Basic Pandas

### NumPy

NumPy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. It also has strong integration with Pandas, which is another powerful tool for manipulating financial data.

Python packages like NumPy and Pandas contain classes and methods which we can use by importing the package:

import numpy as np

#### Basic NumPy Arrays

A NumPy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. Here we make an array by passing a list of Apple stock prices:

price_list = [143.73, 145.83, 143.68, 144.02, 143.5, 142.62] price_array = np.array(price_list) print price_array, type(price_array) [out]: [ 143.73 145.83 143.68 144.02 143.5 142.62] <class 'numpy.ndarray'>

Notice that the type of array is "ndarray" which is a multi-dimensional array. If we pass np.array() a list of lists, it will create a 2-dimensional array.

Ar = np.array([[1,3], [2,4]]) print Ar, type(Ar) [out]: [[1 3] [2 4]] <class 'numpy.ndarray'>

We get the dimensions of an ndarray using the .shape attribute:

print Ar.shape [out]: (2, 2)

If we create an 2-dimensional array (i.e. matrix), each row can be accessed by index:

print Ar[0] [out]: [1 3] print Ar[1] [out]: [2 4]

If we want to access the matrix by column instead:

print 'First column:', Ar[:,0] [out]: First column: [1 2] print 'Second column:', Ar[:,1] [out]: Second column: [3 4]

#### Array Functions

Some functions built in NumPy that allow us to perform calculations on arrays. For example, we can apply the natural logarithm to each element of an array:

np.log(price_array) [out]: [4.96793654 4.98244156 4.9675886 4.96995218 4.96633504 4.96018375]

Other functions return a single value:

np.mean(price_array) [out]: 143.896666667 print np.std(price_array) [out]: 0.967379047852 print np.sum(price_array) [out]: 863.38 print np.max(price_array) [out]: 145.83

The functions above return the mean, standard deviation, total and maximum value of an array.

### Pandas

Pandas is one of the most powerful tools for dealing with financial data. First we need to import Pandas:

import pandas as pd

#### Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, float, Python object, etc.)

We create a Series by calling pd.Series(data), where data can be a dictionary, an array or just a scalar value.

price = [143.73, 145.83, 143.68, 144.02, 143.5, 142.62] s = pd.Series(price) print s 0 143.73 1 145.83 2 143.68 3 144.02 4 143.50 5 142.62

We can customize the indices of a new Series:

s = pd.Series(price, index = ['a', 'b', 'c', 'd', 'e', 'f']) print s a 143.73 b 145.83 c 143.68 d 144.02 e 143.50 f 142.62

Or we can change the indices of an existing Series:

s.index = [6,5,4,3,2,1] print s 6 143.73 5 145.83 4 143.68 3 144.02 2 143.50 1 142.62

Series is like a list since it can be sliced by index:

print s[1:] print s[:-2] 5 145.83 4 143.68 3 144.02 2 143.50 1 142.62 dtype: float64 6 143.73 5 145.83 4 143.68 3 144.02 dtype: float64

Series is also like a dictionary whose values can be set or fetched by index label:

print s[4] s[4] = 0 print s 143.68 6 143.73 5 145.83 4 0.00 3 144.02 2 143.50 1 142.62 dtype: float64

Series can also have a name attribute, which will be used when we make up a Pandas DataFrame using several series.

s = pd.Series(price, name = 'Apple Prices') print s print s.name 0 143.73 1 145.83 2 143.68 3 144.02 4 143.50 5 142.62 Name: Apple Prices, dtype: float64 Apple Prices

We can get the statistical summaries of a Series:

print s.describe() count 6.000000 mean 143.896667 std 1.059711 min 142.620000 25% 143.545000 50% 143.705000 75% 143.947500 max 145.830000

#### Time Index

Pandas has a built-in function specifically for creating date indices: pd.date_range(). We use it to create a new index for our Series:

time_index = pd.date_range('2017-01-01', periods = len(s), freq = 'D') print time_index s.index = time_index print s DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06'], dtype='datetime64[ns]', freq='D') 2017-01-01 143.73 2017-01-02 145.83 2017-01-03 143.68 2017-01-04 144.02 2017-01-05 143.50 2017-01-06 142.62 Freq: D, Name: Apple Prices, dtype: float64

Series are usually accessed using the iloc[] and loc[] methods. iloc[] is used to access elements by integer index, and loc[] is used to access the index of the series.

iloc[] is necessary when the index of a series are integers, take our previous defined series as example:

s.index = [6,5,4,3,2,1] print s print s[1] 6 143.73 5 145.83 4 143.68 3 144.02 2 143.50 1 142.62 Name: Apple Prices, dtype: float64 142.62

If we intended to take the second element of the series, we would make a mistake here, because the index are integers. In order to access to the element we want, we use iloc[] here:

print s.iloc[1] [out]: 145.83

While working with time series data, we often use time as the index. Pandas provides us with various methods to access the data by time index.

s.index = time_index print s['2017-01-03'] [out]: 143.68

We can even access to a range of dates:

print s['2017-01-02':'2017-01-05'] 2017-01-02 145.83 2017-01-03 143.68 2017-01-04 144.02 2017-01-05 143.50 Freq: D, Name: Apple Prices, dtype: float64

Series[] provides us a very flexible way to index data. We can add any condition in the square brackets:

print s[s < np.mean(s)] print s[(s > np.mean(s)) & (s < np.mean(s) + 1.64*np.std(s))] 2017-01-01 143.73 2017-01-03 143.68 2017-01-05 143.50 2017-01-06 142.62 Name: Apple Prices, dtype: float64 2017-01-04 144.02 Freq: D, Name: Apple Price List, dtype: float64

As demonstrated, we can use logical operators like & (and), | (or) and ~ (not) to group multiple conditions.

You can also see our Documentation and Videos. You can also get in touch with us via Chat.