UP | HOME

Download historical data using Alpha Vantage

Table of Contents

Introduction

Alpha Vantage provides free historical data for various financial time series. This page is intended as a short introduction to download series of stock price data. Check the company website for full APIdocumentation. In order to use their data retieval interface you need to request a specific key called APIKEY. It can be requested for free from this page.

Tickers

The first problem is to identify the ticker symbol of the desired securities. The ticker symbol is an unique string that identifies each security. A global selection of tickers is available as an Excel file from investexcel.

For the New York stock exchange (NYSE), the updated list of tickers can be automatically downloaded in CSV format using

wget 'http://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download'

I use wget but one can directly past the URL in the adress bar of any browser. Replace nyse with nasdaq to obtain the list of security traded on NASDAQ. The obtained file lists:

  1. the ticker symbol
  2. the name of the company
  3. the price of the last transaction
  4. the market capitalization,
  5. the IPO year
  6. the Sector as defined by the stock exchange,
  7. the Industry, a further division of the Sectors
  8. the link to company page on the stock exchange website

You can take the sample lists I download October 28, 2019 for NYSE and NASDAQ. Another useful list of tickers are those of companies that compose the S&P 500 (as of October 25, 2019).

Obtaining historical data

Once the ticker of the interesting security is know, the data can be downloaded using the Alpha Vantage interface from the base URL https://www.alphavantage.co/query? followed by a list of name=value pairs separated by & that specify what you want to download. The pairs are:

symbol
The ticker of the stock
function
It can be TIME_SERIES_DAILY for daily prices, TIME_SERIES_WEEKLY for weekly prices or TIME_SERIES_MONTHLY for monlty prices. Adding _ADJUSTED to the time series specification provided adjusted prices
apikey
Use your key
datatype
It can be JASON (default) or CSV
outputsize
compact (default) returns the last 100 data point. full returns the entire history, around 20 years.

For instance, to obtain the daily prices of General Eletricks (GE) in CSV format and save them in a file called "GE.csv" use

wget --output-document=GE.csv "https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&outputsize=full&symbol=GE&apikey=[APIKEY]&datatype=csv"

where [APIKEY] is your personal API key. I use wget because I find it more convenient to use inside script, but you can just cut and paste the URL inside a web browser. The data in GE.csv contain the following comma separated fields:

  1. Date
  2. Open
  3. High
  4. Low
  5. Close
  6. Volume

Using TIME_SERIES_DAILY_ADJUSTED as a function one gets instead:

  1. Date
  2. Open
  3. High
  4. Low
  5. Close
  6. Adjusted Close
  7. Volume
  8. Dividend Amount
  9. Split Coefficient

However, The 6th column, "Adjusted Close", seems to be problematic, see Caveat below. Also notice that the data retrieved from Alpha Vantage are in inverse temporal order: the last prices come first in the file.

Scripting

One generally wants to download data about several securities at once. For instance, in order to download the price data at daily frequency for all NYSE traded companies in the Pharmaceutical sector we can proceed as follows (the commands below are intended to be issued in a Linux terminal). First, obtain the list of traded comapnies and we save them in the file companies.txt

wget --output-document=companies.txt http://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download;

Next, from the downloaded file, obtain the list of tickers of the Technology sector and save them in the env variable tickers

tickers=$( gawk -F, '{ if($7=="\"Major Pharmaceuticals\""){ gsub(/"/,"",$1); print $1} }' companies.txt )

Finally, cycle over all tickers and download the price data. Presently, you can perform up to 5 requests per minute and 500 requests per day. So the script needs to sleep a bit between the different requests

AVWEB="https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&outputsize=full&apikey=[APIKEY]&datatype=csv"
for ticker in $tickers; do
wget --output-document=${ticker}.csv ${AVWEB}&symbol=${ticker}
sleep 15s
done

The procedure above is replicated in the example file.

Caveat

When using the TIME_SERIES_DAILY_ADJUSTED option the returned quantities should be adjusted by split events. But this does not seem always to be the case. The plot below shows the "close" and adjusted close" price series obtained from Alpha Vantage, together with the same series obtained from CRSP, for Microsoft, one of the most capitalized companies trading on NYSE.

AV_CRSP_MSFT.png

Figure 1: Comparing price time series from Alpha Vantage and CRSP

The time windows has been selected to show the only split occurred to Microsoft shares in the period covered by Alpha Vantage data. As can be seen, the regular price series from the the two databases, "AV close" and "CRSP Price", are the same. You don't see the green line because it is below the yellow one. On the other hand, the behavior of the "adjusted price" from Alpha Vantage is strange. One would expect the adjusted price to track the regular price after the last split event. In fact the CRSP adjusted closing price does exactly that. The Alpha Vantage adjusted closing remains instead below it. With other stocks, however, I observed that the "price" columns 2-5 of the Alpha Vantage data match the corresponding adjusted quantities obtained from the CRSP database. In conclusion, the situation is uncertain: sometimes Alpha Vantage seems to return un-adjusted quantities, sometimes the adjusted ones. In general, I'd advise against using the 6th column of the data obtained with TIME_SERIES_DAILY_ADJUSTED. It is better to build an adjusting factor using the "split coefficient" provided by Alpha Vantage itself (column 9). Using gawk and a simple script it is possible to obtain a transformed dataset

> gawk -F',' -f AV_adjust.awk GE.csv > GEadj.csv

Now the last column of the file GEadj.csv contains a scaling factor that can be used to obtain adjusted quantities, prices, volume and dividend, starting from the un-adjusted ones. Depending on the stock, you have to use this column or not. In general, it is easy to see it by just plotting the time series of the quantity of interest and looking for abrupt and persistent jumps. If they are there, then the quantity is probably un-adjusted and the scaling factor has to be applied. Another possible source of troubles is however the possibility that the database returns a false "split event". This can be checked using, for instance, stocksplithistory website.

S&P 500

Using a simple script once can obtain the price history of the stocks composing the S&P 500 index. You can download an archive of the historical prices as of October, 28 2019 (one security, ticker symbol "GL", is missing). I already reverted the order of the lines and applied the script described in the Caveat section above. Files are named according to the ticker symbol of the company. They contains eight space separated columns:

  1. Date (yyyymmdd)
  2. Open
  3. High
  4. Low
  5. Close
  6. Volume
  7. Dividend Amount
  8. Price adjustment factor

Author: Giulio Bottazzi

Created: 2019-10-30 Wed 13:39

Validate