Eurostat Data

Eurostat Data with the eurostat package

The eurostat package provides access to well over 9000 datasets from the Eurostat. It may seem a challenging task to find the correct dataset, but you are essentially looking for the code that describes the dataset. We an get a table of contents, namely all of th ecodes contained in the eurostat database.

library(eurostat)
library(fpp2) # for time series decomposition
library(seasonal)
library(tmap) #mapping eurostat data

# Get Eurostat data listing
# Function get_eurostat_toc() downloads a table of contents of eurostat datasets. 
# The values in column ‘code’ should be used to download a selected dataset.
toc <- get_eurostat_toc()

# Check the first 20 rows 
head(toc, 20) %>% 
  kable()
title code type last update of data last table structure change data start data end values
Database by themes data folder NA NA NA NA NA
General and regional statistics general folder NA NA NA NA NA
European and national indicators for short-term analysis euroind folder NA NA NA NA NA
Business and consumer surveys (source: DG ECFIN) ei_bcs folder NA NA NA NA NA
Consumer surveys (source: DG ECFIN) ei_bcs_cs folder NA NA NA NA NA
Consumers - monthly data ei_bsco_m dataset 29.09.2020 29.09.2020 1980M01 2020M09 NA
Consumers - quarterly data ei_bsco_q dataset 29.09.2020 30.07.2020 1990Q1 2020Q3 NA
Business surveys - NACE Rev. 2 activity (source: DG ECFIN) ei_bcs_bs folder NA NA NA NA NA
Industry - monthly data ei_bsin_m_r2 dataset 29.09.2020 29.09.2020 1980M01 2020M09 NA
Industry - quarterly data ei_bsin_q_r2 dataset 29.09.2020 30.07.2020 1980Q1 2020Q3 NA
Construction - monthly data ei_bsbu_m_r2 dataset 29.09.2020 29.09.2020 1980M01 2020M09 NA
Construction - quarterly data ei_bsbu_q_r2 dataset 29.09.2020 30.07.2020 1981Q1 2020Q3 NA
Retail sale - monthly data ei_bsrt_m_r2 dataset 29.09.2020 29.09.2020 1984M01 2020M09 NA
Sentiment indicators - monthly data ei_bssi_m_r2 dataset 29.09.2020 29.09.2020 1980M01 2020M09 NA
Services - monthly data ei_bsse_m_r2 dataset 29.09.2020 29.09.2020 1988M01 2020M09 NA
Services - quarterly data ei_bsse_q_r2 dataset 29.09.2020 30.07.2020 2001Q2 2020Q3 NA
Euro-zone Business Climate Indicator - monthly data ei_bsci_m_r2 dataset 29.09.2020 29.09.2020 1985M01 2020M09 NA
Financial services - monthly data ei_bsfs_m dataset 29.09.2020 29.09.2020 2006M04 2020M09 NA
Financial services - quarterly data ei_bsfs_q dataset 29.09.2020 30.07.2020 2007Q3 2020Q3 NA
Employment expectations indicator ei_bsee_m_r2 dataset 29.09.2020 29.09.2020 1980M01 2020M09 NA

House Price Index (HPI)

The Eurostat House Price Index (HPI) measures price changes of all residential properties purchased by households (flats, detached houses, terraced houses, etc.), both new and existing, independently of their final use and their previous owners. First, we node that the code id for this dataset is teicp270. Once we know the relevant code id, we can download eurostat data using the get_eurostat(id) function.

hpi <- get_eurostat(id="teicp270")
glimpse(hpi)
## Rows: 1,257
## Columns: 5
## $ indic  <chr> "TOTAL", "TOTAL", "TOTAL", "TOTAL", "TOTAL", "TOTAL", "TOTAL...
## $ unit   <chr> "I15_NSA", "I15_NSA", "I15_NSA", "I15_NSA", "I15_NSA", "I15_...
## $ geo    <chr> "AT", "BE", "BG", "CY", "CZ", "DE", "DK", "EA", "EA19", "EE"...
## $ time   <date> 2017-04-01, 2017-04-01, 2017-04-01, 2017-04-01, 2017-04-01,...
## $ values <dbl> 114.2, 104.7, 115.4, 102.7, 119.1, 113.1, 110.5, 107.8, 107....
head(hpi,40) %>% 
  kable()
indic unit geo time values
TOTAL I15_NSA AT 2017-04-01 114.2
TOTAL I15_NSA BE 2017-04-01 104.7
TOTAL I15_NSA BG 2017-04-01 115.4
TOTAL I15_NSA CY 2017-04-01 102.7
TOTAL I15_NSA CZ 2017-04-01 119.1
TOTAL I15_NSA DE 2017-04-01 113.1
TOTAL I15_NSA DK 2017-04-01 110.5
TOTAL I15_NSA EA 2017-04-01 107.8
TOTAL I15_NSA EA19 2017-04-01 107.8
TOTAL I15_NSA EE 2017-04-01 108.4
TOTAL I15_NSA ES 2017-04-01 110.4
TOTAL I15_NSA EU 2017-04-01 109.0
TOTAL I15_NSA EU27_2020 2017-04-01 108.7
TOTAL I15_NSA EU28 2017-04-01 109.0
TOTAL I15_NSA FI 2017-04-01 103.0
TOTAL I15_NSA FR 2017-04-01 103.4
TOTAL I15_NSA HR 2017-04-01 104.5
TOTAL I15_NSA HU 2017-04-01 125.5
TOTAL I15_NSA IE 2017-04-01 115.9
TOTAL I15_NSA IS 2017-04-01 130.1
TOTAL I15_NSA IT 2017-04-01 99.6
TOTAL I15_NSA LT 2017-04-01 114.5
TOTAL I15_NSA LU 2017-04-01 112.2
TOTAL I15_NSA LV 2017-04-01 119.5
TOTAL I15_NSA MT 2017-04-01 108.9
TOTAL I15_NSA NL 2017-04-01 111.4
TOTAL I15_NSA NO 2017-04-01 115.5
TOTAL I15_NSA PL 2017-04-01 105.4
TOTAL I15_NSA PT 2017-04-01 115.5
TOTAL I15_NSA RO 2017-04-01 114.3
TOTAL I15_NSA SE 2017-04-01 116.0
TOTAL I15_NSA SI 2017-04-01 111.4
TOTAL I15_NSA SK 2017-04-01 113.1
TOTAL I15_NSA TR 2017-04-01 124.3
TOTAL I15_NSA UK 2017-04-01 111.2
TOTAL PCH_Q1_NSA AT 2017-04-01 2.4
TOTAL PCH_Q1_NSA BE 2017-04-01 -0.3
TOTAL PCH_Q1_NSA BG 2017-04-01 2.4
TOTAL PCH_Q1_NSA CY 2017-04-01 3.1
TOTAL PCH_Q1_NSA CZ 2017-04-01 2.5

Typically, the downloaded data has codes and abbreviations for all of the variables, but we can use label_eurostat to get a more verbose description.

house_price_index_data <-  hpi %>% 
  label_eurostat()

head(house_price_index_data,40) %>% 
  kable()
indic unit geo time values
Total Index, 2015=100 (NSA) Austria 2017-04-01 114.2
Total Index, 2015=100 (NSA) Belgium 2017-04-01 104.7
Total Index, 2015=100 (NSA) Bulgaria 2017-04-01 115.4
Total Index, 2015=100 (NSA) Cyprus 2017-04-01 102.7
Total Index, 2015=100 (NSA) Czechia 2017-04-01 119.1
Total Index, 2015=100 (NSA) Germany (until 1990 former territory of the FRG) 2017-04-01 113.1
Total Index, 2015=100 (NSA) Denmark 2017-04-01 110.5
Total Index, 2015=100 (NSA) Euro area (EA11-1999, EA12-2001, EA13-2007, EA15-2008, EA16-2009, EA17-2011, EA18-2014, EA19-2015) 2017-04-01 107.8
Total Index, 2015=100 (NSA) Euro area - 19 countries (from 2015) 2017-04-01 107.8
Total Index, 2015=100 (NSA) Estonia 2017-04-01 108.4
Total Index, 2015=100 (NSA) Spain 2017-04-01 110.4
Total Index, 2015=100 (NSA) European Union (EU6-1958, EU9-1973, EU10-1981, EU12-1986, EU15-1995, EU25-2004, EU27-2007, EU28-2013, EU27-2020) 2017-04-01 109.0
Total Index, 2015=100 (NSA) European Union - 27 countries (from 2020) 2017-04-01 108.7
Total Index, 2015=100 (NSA) European Union - 28 countries (2013-2020) 2017-04-01 109.0
Total Index, 2015=100 (NSA) Finland 2017-04-01 103.0
Total Index, 2015=100 (NSA) France 2017-04-01 103.4
Total Index, 2015=100 (NSA) Croatia 2017-04-01 104.5
Total Index, 2015=100 (NSA) Hungary 2017-04-01 125.5
Total Index, 2015=100 (NSA) Ireland 2017-04-01 115.9
Total Index, 2015=100 (NSA) Iceland 2017-04-01 130.1
Total Index, 2015=100 (NSA) Italy 2017-04-01 99.6
Total Index, 2015=100 (NSA) Lithuania 2017-04-01 114.5
Total Index, 2015=100 (NSA) Luxembourg 2017-04-01 112.2
Total Index, 2015=100 (NSA) Latvia 2017-04-01 119.5
Total Index, 2015=100 (NSA) Malta 2017-04-01 108.9
Total Index, 2015=100 (NSA) Netherlands 2017-04-01 111.4
Total Index, 2015=100 (NSA) Norway 2017-04-01 115.5
Total Index, 2015=100 (NSA) Poland 2017-04-01 105.4
Total Index, 2015=100 (NSA) Portugal 2017-04-01 115.5
Total Index, 2015=100 (NSA) Romania 2017-04-01 114.3
Total Index, 2015=100 (NSA) Sweden 2017-04-01 116.0
Total Index, 2015=100 (NSA) Slovenia 2017-04-01 111.4
Total Index, 2015=100 (NSA) Slovakia 2017-04-01 113.1
Total Index, 2015=100 (NSA) Turkey 2017-04-01 124.3
Total Index, 2015=100 (NSA) United Kingdom 2017-04-01 111.2
Total Percentage change q/q-1 (NSA) Austria 2017-04-01 2.4
Total Percentage change q/q-1 (NSA) Belgium 2017-04-01 -0.3
Total Percentage change q/q-1 (NSA) Bulgaria 2017-04-01 2.4
Total Percentage change q/q-1 (NSA) Cyprus 2017-04-01 3.1
Total Percentage change q/q-1 (NSA) Czechia 2017-04-01 2.5

We note that our dataframe contains both the value of the index (unit = I15_NSA), as well as the percentage change (unit = PCH_Q1_NSA). We will select the I15_NSA index, a few countries and the EU-28 index, and plot the evolution of house prices over time.

hpi_data <- hpi %>% 
  
  # choose the UK, France, Poland, Spain, Portugal, Germany, Italy, and the EU28
  filter(geo %in%  c("UK", "FR", "PL", "ES","PT", "DE","IT","EU28") ) %>%  
  
  # choose value of the index (unit =   `I15_NSA`) 
    filter(unit == "I15_NSA")

ggplot(hpi_data, aes(x=time, y=values, group=geo, colour=geo))+
  geom_point()+
  geom_line()+
  theme_bw()+
  labs(
    title= "House price index in the EU (2015 = 100)",
    x = "Time",
    y = "Housing Price Index", 
    caption = "Source: Eurostat, code id = teicp270"
  )

Tourism Seasonality in the Meditteranean

The eurostat database has a dedicated tourism section. I wanted to check monthly nights spent at hotels– the relevant code id = tour_occ_nim in the four Meditteranean countries, Portugal, Spain, Italy, and Greece since 2000.

The code below downloads the data and plots time series plots for all countries.

# create a dataframe tourism_data that contains the eurostat data for
# code id = "tour_occ_nim", namely value of monthly nights spent at hotels
tourism_data <- get_eurostat(id="tour_occ_nim")

med_tourism <-  tourism_data %>%   
  
  # choose Portugal, Spain, Italy, and Greece
  filter(geo %in%  c("PT", "ES", "IT", "EL" ) ) %>%
  
  #use label_eurostat to get verbose descriptions of codes
  label_eurostat() %>% 
  
  # choose number of total hotel accommodations since Jan 1, 2000
  filter (c_resid == "Total", 
          nace_r2 == "Hotels and similar accommodation", 
          unit == "Number",
          time >= "2000-01-01") %>% 
  
  # express values in million of nights
  mutate(values = values/1000000) 

ggplot(med_tourism, aes(x=time, y=values, group=geo, colour=geo))+
  geom_point()+
  geom_line()+
  geom_smooth(se=FALSE)+
  facet_wrap(~geo)+
  theme_bw()+
  labs(title="Hotel stays in the Medditeranean, 2000-present", 
       y= "Millions of nights spent in hotels",
       x = "Year",
       caption = "Source: Eurostat, code = tour_occ_nim")+
  theme(legend.position="none")

All countries exhibit the same seasonal pattern: there is a peak in July-August, and the minimum number is around December-January.

Look at the impact of Covid-19 on all countries!

#first define **ts** (time series ) objects; one for each country  

portugal_tourism <- med_tourism %>% 
  
  #select the country you are interested in, in this case Portugal
  filter (geo == "Portugal") %>% 
  
  #sort by time in ascending order, so  earliest observation is first
  arrange(time) %>%
  
  #we just want to keep the values 
  select(values) %>% 
  
  #time series (ts) starts Jan 2000 and has monthlyfrequency (12 months/yr)
  ts(start=2000, frequency = 12) 



spain_tourism <- med_tourism %>% 
  filter (geo == "Spain") %>% 
  arrange(time) %>% 
  select(values) %>% 
  ts(start=2000, frequency = 12)

italy_tourism <- med_tourism %>% 
  filter (geo == "Italy") %>% 
  arrange(time) %>% 
  select(values) %>%   
  ts(start=2000, frequency = 12)

greece_tourism <- med_tourism %>% 
  filter (geo == "Greece") %>% 
  arrange(time) %>% 
  select(values) %>%   
  ts(start=2000, frequency = 12)


#Season plot for Spain and Greece: the seasonal pattern is consistent since 2000
ggseasonplot(spain_tourism, year.labels=TRUE, year.labels.left=TRUE) +
  labs(
    title = "Seasonal plot: Hotel stays in Spain",
    y = "Millions of nights spent in hotels"
  )+
    theme_bw()

ggseasonplot(greece_tourism, year.labels=TRUE, year.labels.left=TRUE) +
  labs(
    title = "Seasonal plot: Hotel stays in Greece",
    y = "Millions of nights spent in hotels"
  )+
  theme_bw()

An interesting question is which country has the greatest seasonality distortion, namely, how much bigger is the summer peak from the winter bottom. For this we produce a subseries plot, one that emphasises the seasonal patterns and where the data for each season are collected together in separate mini time plots. The horizontal lines indicate the means for each month. This form of plot enables the underlying seasonal pattern to be seen clearly, and also shows the changes in seasonality over time. It is especially useful in identifying changes within particular seasons.

ggsubseriesplot(portugal_tourism)+
  labs(
    title = "Seasonal subseries plot: Hotel stays in Portugal 2000-present",
    subtitle = "Horizontal lines indicate monthly averages",
    y = "Millions of nights spent in hotels", 
    caption = "Source:Eurostat"
  )+
  theme_bw()

ggsubseriesplot(spain_tourism)+
  labs(
    title = "Seasonal subseries plot: Hotel stays in Spain 2000-present",
    subtitle = "Horizontal lines indicate monthly averages",
    y = "Millions of nights spent in hotels", 
    caption = "Source:Eurostat"
  )+
  theme_bw()

ggsubseriesplot(italy_tourism)+
  labs(
    title = "Seasonal subseries plot: Hotel stays in Italy 2000-present",
    subtitle = "Horizontal lines indicate monthly averages",
    y = "Millions of nights spent in hotels", 
        caption = "Source:Eurostat"
  )+
  theme_bw()

ggsubseriesplot(greece_tourism)+
  labs(
    title = "Seasonal subseries plot: Hotel stays in Greece 2000-present",
    subtitle = "Horizontal lines indicate monthly averages",
    y = "Millions of nights spent in hotels", 
    caption = "Source:Eurostat"
  )+
  theme_bw()

Visually, the approximate ratio of max:min averages for each of the four Mediterranean countries is as follows:

  • Portugal 6:2 = 3
  • Spain 39:13 = 3
  • Italy 43:10 = 4.3
  • Greece 13.5:1= 13.5

Disposable income of private households by NUTS 2 regions

Using the eurostat data, we can create maps of, e.g., disposable income at a regional level. NUTS or Nomenclature of Territorial Units for Statistics is a geocode standard for referencing subdvisions (regions, counties, districts, etc.) within a country.

We will work with the Disposable income of private households by NUTS 2 regions database

income_data <- get_eurostat(id="tgs00026") %>% 
  select(geo,time,values) %>% 
  dplyr::mutate(cat = cut_to_classes(values))


income_2016 <- income_data %>% 
  filter(time == "2016-01-01")

# Download geospatial data from GISCO
geodata <- get_eurostat_geospatial(output_class = "sf",
                                   resolution = "60",
                                   nuts_level = 2,
                                   year = 2016) 


map_data <- inner_join(geodata, income_2016)


ggplot(data=map_data) + geom_sf(aes(fill=cat),color="dim grey", size=.1) + 
  scale_fill_brewer(palette = "Accent") +
  guides(fill = guide_legend(reverse=T, title = "euro")) +
  labs(title="Disposable household income in 2016",
       caption="(C) EuroGeographics for the administrative boundaries 
                Map produced in R with a help from Eurostat-package <github.com/ropengov/eurostat/>") +
  theme_light() + theme(legend.position=c(.8,.8)) +
  coord_sf(xlim=c(-12,44), ylim=c(35,70))

Acknowledgments