Python for Finance: Unlocking the Power of Data Analysis
Table of Contents
Are you exploring the intriguing world of python for finance? Whether you aspire to work in the financial industry, or wish to advance your existing career, mastering Python is a key step forward. In today’s fast-paced financial arena, Python programming is a crucial skill that top institutions seek in their professionals. Dive into this article to discover the four major reasons behind Python’s rising popularity in the financial industry, and how it can propel your career to new heights.
Universal Languages & How Python Programming for Finance Fits In
There are not too many universal languages. When it comes to spoken and written languages, English can be considered universal. If you are not a native speaker, English often is the first foreign language that you learn. And there are good reasons to do so. First, most of the published texts, such as news articles or books, are written in English. Secondly, you can get by with English basically around the world. Mathematics is a universal symbolic language that is also understood and used around the world and across many domains.
A deeper look at Python
When it comes to programming languages, Python has reached a similar status in finance and other domains. Python is used in basically every area of the financial industry, be it for financial data science, machine learning, credit ratings, trading, asset management, pricing, risk, or more administrative tasks. In general, the reach of Python is indeed universal in that it is used around the world, across basically all domains, and for basically any task that requires some form of programming and data processing.
Against this background, basically all financial institutions expect from their new hires and existing staff at least a certain level of Python programming. For specific roles, such as quantitative developers or researchers, Python skills are usually required these days. For other roles, such as in trading or risk management financial institutions offer Python training programs to reflect its increasing importance. Therefore, it for sure pays off to learn Python early on and gain relevant experience in using it for financial use cases.
Without a doubt, there are still several other programming languages in use in finance. For example, C++, as a compiled programming language, is still very popular when code execution speed and performance are of the essence. On the other hand, domain-specific languages, such as R for statistical applications, are also widely used.
Why is Python so huge in finance?
But what are the major reasons why Python has become basically omnipresent in finance and has in many instances replaced other, more focused, programming languages? Some of the basic characteristics of Python are shared by several other languages as well, such as being open-source, interpreted, high-level, and dynamically typed. This alone cannot explain its rise in finance.
In what follows, four arguments are discussed for why Python has become so dominant in finance. The first two are presented in the form of what you could call the power packages in Python: NumPy and pandas. The second two are related to important aspects in finance: code performance and efficient input-output (IO) operations. But first a simple use case illustrating the power of Python for financial data analysis.
Is Python for finance controllers and analysts better than Excel?
A financial controller within an organization can absolutely work with both Microsoft Excel and Python at the same time. Both tools have their unique strengths and can be used complementarily.
Excel is widely used in the business world for simpler data analysis, financial modeling, budget vs actuals, and quick calculations. It offers an intuitive interface and allows for the easy manipulation of data, making it accessible for those without a coding background.
On the other hand, Python is a powerful programming language favored by analysts and data scientists for more complex tasks. It’s excellent for large-scale data analysis, automating repetitive tasks, building sophisticated models, and conducting advanced statistical analysis. Python also has the ability to read and write Excel files, which can be useful for integrating the two tools.
A financial analyst might use Excel for quick and simple analyses, data visualization, or when sharing data with others in a user-friendly format. Meanwhile, Python would come into play for more advanced analysis or when dealing with large datasets that Excel cannot efficiently handle. In other words, Python and Excel can be used in tandem, leveraging the strengths of each depending on the task at hand.
How do finance professionals use Python? A simple use case
The returns of the share prices of stocks in financial markets are important measures for many different financial applications. In this context, consider a relatively simple question: “How did the Apple Stock perform in 2021 compared to the S&P 500 stock index?”.
Working with traditional tools used in finance, such as Microsoft Excel, is often a tedious experience with regard to answering such questions. An analyst would need to download the data, for example, in the form of a CSV (comma-separated value) file containing the time series of prices for the financial instruments of interest. The analyst then needs to import the data into an Excel spreadsheet, which usually involves some configuration and several formatting and layout-related steps.
In addition, such data sets often comprise thousands of rows — something not efficient to process within a spreadsheet since it involves constant up- and down-scrolling. Once the data is ready, the analyst needs to find the relevant dates in the spreadsheet, that is the 31. December of 2020 and 2021, and pick the prices at these dates for Apple and the S&P 500. Based on these prices, the returns for 2021 can then be calculated and compared.
Where Python Packages Come In
On the other hand, an analyst who is proficient in using Python can answer such questions — and even more complex ones — in a much more efficient manner. Leveraging powerful Python packages such as pandas (see section Pandas Package) boils down the whole analysis to three relatively simple lines of code:
- Importing the pandas package
- Retrieving the CSV data file
- Calculating the net returns
Executing the three lines of the following Python code shows that Apple has outperformed the S&P 500 stock index in 2021 by about 7 percentage points.
In : <strong>import</strong> pandas <strong>as </strong>pd In : data = pd.read_csv('https://certificate.tpq.io/mlfin.csv', index_col=0, parse_dates=True) In : (data[['AAPL.O', '.SPX']].loc['2021-12-31'] / data[['AAPL.O', '.SPX']].loc['2020-12-31']) - 1 Out: AAPL.O 0.338232 .SPX 0.268927 dtype: float64
Python packages for finance: Which Python is best for finance controllers?
A Python package is essentially a directory of Python scripts, each of which defines functions, methods, and types. These scripts, known as modules, together form a package. The advantage is that these packages can be “imported” into another Python script, providing reusability of code and efficiency in managing large codebases. This aids in maintaining an organized, modular, and more manageable code structure.
In the context of financial analysis, Python packages like pandas, NumPy, and SciPy offer powerful data manipulation and analysis capabilities. As explained below, pandas will enable you efficient handling and manipulation of financial data sets, NumPy will provide you with high-performance mathematical computation abilities.
Moreover, packages like matplotlib and seaborn facilitate data visualization, essential for interpreting financial trends and patterns. Also, SciPy allows for advanced statistical analysis. Hence, for finance controllers, these Python packages can significantly streamline financial data analysis and decision-making processes for organizations.
How to Implement the NumPy Package
Finance to a large extent is applied mathematics. Therefore, a programming language that supports mathematical and numerical operations well has an advantage in finance. The NumPy package (see https://numpy.org) is central to the implementation of financial algorithms in Python. On its webpage, you find the following text:
NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more.
In particular, NumPy allows you to implement financial algorithms in vectorized fashion. This means that you can avoid, to a large extent, loops on the Python level. Consider the following simple example which implements, in two different ways, the scalar multiplication of a vector in pure Python. Both ways include the looping over the single elements.
In : v = list(range(10)) v Out: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] In : w = list() <strong>for </strong>i <strong>in </strong>v: w.append(2 * i) w Out: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] In : [2 * i for i in v] Out: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Now compare the implementation with NumPy in the following code. It boils down to a single vectorized operation that resembles the symbolics of scalar multiplication in mathematics. There is no loop on the Python level anymore.
In : <strong>import </strong>numpy <strong>as </strong>np In : v = np.arange(10) v Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In : 2 * v Out: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
In such cases, NumPy relies on highly optimized and performant C code as its computation engine. While the performance advantage is hardly noticeable for small datasets, it can become orders of magnitude on large datasets.
In : N = 10_000_000 In : v = list(range(N)) In : %timeit [2 * i <strong>for </strong>i <strong>in </strong>v] 313 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In : v = np.arange(N) In : %timeit 2 * v 7.99 ms ± 86.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
While NumPy is important in its own right, it also builds the basis for many other important Python packages. One of them is pandas.
Other useful python packages for financial analysis: How to use the Pandas Package
While NumPy has its strengths in efficient numerical operations, the pandas package (see https://pandas.pydata.org) is strong when it comes to data processing and analysis. On its Webpage, you find the following description:
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
Pandas helps you manage the whole life cycle of financial data: retrieval, cleaning, processing, analysis, visualization, and storage. The following Python code illustrates this for end-of-day stock price data. The data is retrieved from a remote source, the data is cleaned, a simple moving average is added, and the result is visualized in figure Stock price data for the Apple Stock and simple moving average (SMA). In this context, the code makes use of the major Python plotting package, matplotlib (see https://matplotlib.org).
In : <strong>import </strong>pandas <strong>as </strong>pd <strong>from </strong>pylab <strong>import </strong>plt plt.style.use('seaborn') %config InlineBackend.figure_format = 'svg' In : data = pd.read_csv('https://certificate.tpq.io/mlfin.csv', index_col=0, parse_dates=True) data = pd.DataFrame(data['AAPL.O']).dropna() In : data['SMA'] = data.rolling(100).mean() In : data.tail() Out: AAPL.O SMA Date 2022-10-26 149.35 151.2670 2022-10-27 144.80 151.2536 2022-10-28 155.74 151.3239 2022-10-31 153.34 151.3777 2022-11-01 150.65 151.4578 In : data.plot(); Out: <Figure size 576x396 <strong>with </strong>1 Axes>
While pandas brings powerful data analysis capabilities, it also comes with most of the advantages of NumPy, such as vectorized operations. The following code creates a small two-dimensional DataFrame object and implements vectorized numerical operations on it.
In : df = pd.DataFrame(np.arange(12).reshape(4, 3)) df Out: 0 1 2 0 0 1 2 1 3 4 5 2 6 7 8 3 9 10 11 In : 2 * df Out: 0 1 2 0 0 2 4 1 6 8 10 2 12 14 16 3 18 20 22 In : df ** 3 - df ** 2 + 1.5 Out: 0 1 2 0 1.5 1.5 5.5 1 19.5 49.5 101.5 2 181.5 295.5 449.5 3 649.5 901.5 1211.5
NumPy and pandas, with their vectorized operations, are already quite fast and in particular faster than pure Python code. However, the vectorization of code is not always the best option or even a feasible option to speed up the execution of Python code. Consider something as simple as the calculation of the sum of all elements of a larger vector. The following Python code implements the summation in pure Python. The memory footprint of the v = range(N) object is negligible with 48 bytes.
In : import sys In : N = 10_000_000 In : v = range(N) In : sum(v) Out: 49999995000000 In : %timeit sum(v) 103 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In : sys.getsizeof(v) Out: 48
The implementation of the same task with NumPy in the following code speeds up the code execution considerably. However, the memory footprint increases from 48 bytes to 80 megabytes. In many scenarios, this might be prohibitive.
In : v = np.arange(N) In : np.sum(v) Out: 49999995000000 In : %timeit np.sum(v) 3.79 ms ± 7.63 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In : sys.getsizeof(v) Out: 80000112
One option to increase the execution speed while preserving memory efficiency is the use of dynamic compiling techniques. numba is a Python package that allows the dynamic compilation of pure Python code or code that is a mixture of NumPy and pure Python. The following code first implements the summation as a regular Python function. It then dynamically compiles the function with numba. The performance of the numba compiled version is orders of magnitude faster than the Python version. It also preserves the memory efficiency of the Python version.
In : <strong>import </strong>numba In : <strong>def </strong>sum_py(N): s = 0 <strong>for </strong>i <strong>in </strong>range(N): s += i <strong>return </strong>s In : sum_py(N) Out: 49999995000000 In : sum_nb = numba.jit(sum_py) In : sum_nb(N) Out: 49999995000000 In : %timeit sum_nb(N) 87.9 ns ± 0.319 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
Overall, it is safe to say that Python, by applying the right idioms and techniques, is fast enough even for computationally demanding financial algorithms.
Execution speed is of the essence in many financial applications. However, reading and writing large data sets often proves to be a bottleneck as well. Fortunately, there are a few options in terms of storage technologies that allow for really fast IO operations.
Consider the following Python code that generates by the use of NumPy a larger sample data set with pseudo-random numbers. The data set is written to disk as a CSV file. This is a relatively slow operation because CSV files are simple text-based files. Similarly, reading the data back from the CSV file is also relatively slow. Nevertheless, CSV is still a standard data exchange format in the financial industry and elsewhere.
In : <strong>from </strong>numpy.random <strong>import </strong>default_rng In : rng = default_rng() In : N = 10_000_000 In : df = pd.DataFrame(rng.random((N, 2))) In : fn = '/Users/yves/Temp/data/data.csv' In : %time df.to_csv(fn) CPU times: user 18.6 s, sys: 542 ms, total: 19.1 s Wall time: 19.3 s In : %time new = pd.read_csv(fn, index_col=0, parse_dates=True) CPU times: user 5.54 s, sys: 623 ms, total: 6.17 s Wall time: 6.28 s
On the other hand, relying on a binary storage technology such as HDF5 (see https://hdfgroup.org) speeds up things considerably. Both, writing the data to disk and reading the data back from disk, are orders of magnitude faster in this case. With technologies such as HDF5, IO operations are often only bound by the speed of the available hardware.
In : fn = '/Users/yves/Temp/data/data.hd5' In : %time df.to_hdf(fn, 'data') CPU times: user 25.1 ms, sys: 83.8 ms, total: 109 ms Wall time: 327 ms In : %time new = pd.read_hdf(fn) CPU times: user 18.5 ms, sys: 67.1 ms, total: 85.7 ms Wall time: 93.3 ms
If speed is not of the essence, but rather the storage in a more widely used format — such as in an SQL relational database — pandas can also help. The following Python code writes the data set to an SQLite relational database and reads the data back into memory. The speed in this case is much lower as compared to the HDF5-based storage. Overall, the HDF5 option is also the most efficient in terms of file size on disk.
In : <strong>import </strong>sqlite3 <strong>as </strong>sq3 In : fn = '/Users/yves/Temp/data/data.sq3' In : con = sq3.connect(fn) In : %time df.to_sql('data', con) CPU times: user 7.79 s, sys: 1.54 s, total: 9.33 s Wall time: 10.1 s Out: 10000000 In : %time new = pd.read_sql('SELECT * FROM data', con) CPU times: user 6.08 s, sys: 3.59 s, total: 9.67 s Wall time: 11.4 s In : ls -n /Users/yves/Temp/data total 2272264 -rw-r--r-- 1 501 20 464285269 Dec 19 14:30 data.csv -rw-r--r-- 1 501 20 240007240 Dec 19 14:30 data.hd5 -rw-r--r-- 1 501 20 443166720 Dec 19 14:30 data.sq3 In : rm /Users/yves/Temp/data/data.*
Can Python be used with Microsoft PowerBI?
Yes, Python can indeed be used with Power BI, a business analytics tool developed by Microsoft. Power BI provides interactive visualizations and business intelligence capabilities, for instance for planning, budgeting and forecasting. The tool allows for Python integration in a few different ways:
- Python Scripts as a data source: Power BI Desktop supports the use of Python scripts as a data source. You can run a Python script directly in Power BI and the output of that script, typically a dataframe, can be imported directly into Power BI as a dataset.
- Python Visuals: Power BI allows you to create your own visuals using Python. In the visuals pane, you can select Python visuals and write Python code that generates a plot using libraries like matplotlib, seaborn, or plotly. The visual updates live as you manipulate the data or adjust the code.
- Running Python scripts in Power Query: If you’re using Power Query to clean and transform your data, you can also run Python scripts. This is useful for leveraging Python’s powerful data manipulation libraries like pandas within your Power Query workflows.
To use Python in Power BI, you need to install Python on your machine, and then specify the installation path in the Power BI options. You also need to install any Python libraries that you plan to use. Remember to verify if any newer Python versions or more specific functionalities have been added since then.
Get Your Next Belt in Python for Finance
If you are interested in entering the financial industry or accelerating your career in it, there is hardly any better option than to learn Python programming and advance your level from either white to green belt or from green belt to brown belt. Like English as a spoken language and mathematics as a symbolic language, Python can be considered universal in its reach, scope, and applicability. This not only holds true for finance but for other domains as well. It also holds true for such fundamental technologies as machine learning and artificial intelligence. Python is indeed everywhere these days and picking it up might be one of the best options to turbo-charge your career.
Photo by Chris Ried on Unsplash
Building a PWA for Android and iOS: tutorial and live example ?Published: 2017/9/27 | Updated: 2017/9/27 | Miguel Guardo
Mobile web apps (known as Progressive Web Apps or PWA) can be a cheaper and totally viable replacement to native ...
Excel Formulas Not Working: 5 Critical Examples of Checking & Fixing WorkbooksPublished: 2021/3/16 | Updated: 2021/3/16 | Oz du Soleil
When it comes to working with data in Microsoft Excel, especially if it wasn’t your data to begin with, things ...
How to Use Google Sheets Pivot Tables, Column Stats, Explore, & QUERY FunctionPublished: 2021/9/15 | Updated: 2021/9/15 | David Benaim
As a manager, perhaps you’ve wondered how to take your financial reporting to the next level or how to amp ...