Getting Started on DoordaStats with Python
What is Spine?
With over 100 datasets available in DoordaStats, it may require significant time and effort to understand and create value from them. So, in order to speed up the process and help you merge data easily, we have created
`r1_stats_spine`, a base dataset which provides a starting point for you.
`r1_stats_spine` consists of basic details like postcode, outputarea, postcode introduction date, latitude, longitude (to help you to map data) and also total people and households.
`r1_stats_spine` is also available on our hosted platform, acquiring that data is as simple as 4 lines of code.
import pandas as pd projectid = 'doorda-production-207313' query = 'SELECT * FROM DoordaStats.r1_stats_spine' spine = pd.io.gbq.read_gbq(query, projectid)
Mix and Match with the Data You Need
For the purpose of this demonstration, we are comparing the electricity and gas consumption of a region.
# Get Electricity Consumption Data projectid = 'doorda-production-207313' query = 'SELECT * FROM DoordaStats.r1_household_electricity_consumption' electricity_consumption = pd.io.gbq.read_gbq(query, projectid)
# Get Gas Consumption Data projectid = 'doorda-production-207313' query = 'SELECT * FROM DoordaStats.r1_household_gas_consumption' gas_consumption=pd.io.gbq.read_gbq(query, projectid)
It's important to note that we do not need all the data from
`r1_household_electricity_consumption` dataset. We only need information on usages:
`electricity_usage`, in addition to the corresponding postcode which serves as a key for joining the dataset.
# Extracting relevant columns electricity_usage = electricity_consumption[['postcode','electricity_usage']] gas_usage = gas_consumption[['postcode','gas_usage']]
Create a final dataset with all the required metrics by using postcode as the key.
spine = spine.merge(electricity_usage, left_on='postcode', right_on='postcode', how='left') spine = spine.merge(gas_usage, left_on='postcode', right_on='postcode', how='left')
Calculating Individual Utility Usages
Let's find out utility usage on an individual level.
# Calculate individual electricity usage individual_electricity_usage = spine['electricity_usage']/spine['total_people'] spine['individual_electricity_usage'] = individual_electricity_usage # Calculate individual gas usage individual_gas_usage = spine['gas_usage']/spine['total_people'] spine['individual_gas_usage'] = individual_gas_usage