Quantcast
Channel: R tutorial for Spatial Statistics
Viewing all 44 articles
Browse latest View live

Spreadsheet Data Manipulation in R

$
0
0
Today I decided to create a new repository on GitHub where I am sharing code to do spreadsheet data manipulation in R.

The first version of the repository and R script is available here: SpreadsheetManipulation_inR

As an example I am using a csv freely available from the IRS, the US Internal Revenue Service.
https://www.irs.gov/statistics/soi-tax-stats-individual-income-tax-statistics-2015-zip-code-data-soi

This spreadsheet has around 170'000 rows and 131 columns.

Please feel free to request new functions to be added or add functions and code yourself directly on GitHub.



Geocoding function

$
0
0
This is a very simple function to perform geocoding using the Google Maps API:


getGeoCode <- function(gcStr, key)  {
library("RJSONIO") #Load Library
gcStr <- gsub('','%20',gcStr) #Encode URL Parameters
#Open Connection
connectStr <- paste0('https://maps.googleapis.com/maps/api/geocode/json?address=',gcStr, "&key=",key)
con <- url(connectStr)
data.json <- fromJSON(paste(readLines(con), collapse=""))
close(con)
#Flatten the received JSON
data.json <- unlist(data.json)
if(data.json["status"]=="OK") {
lat <- data.json["results.geometry.location.lat"]
lng <- data.json["results.geometry.location.lng"]
gcodes <- c(lat, lng)
names(gcodes) <- c("Lat", "Lng")
return (gcodes)
}
}

Essentially, users need to get an API key from google and then use as an input (string) for the function. The function itself is very simple, and it is an adaptation of some code I found on-line (unfortunately I did not write down where I found the original version so I do not have a way to reference the source, sorry!!).


geoCodes <- getGeoCode(gcStr="11 via del piano, empoli", key)

To use the function we simply need to include an address, and it will return its coordinates in WGS84.
It can be used in a mutate call within dplyr and it is reasonably fast.

The repository is here:
https://github.com/fveronesi/RGeocode.r

Weather Forecast from MET Office

$
0
0
This is another function I wrote to access the MET office API and obtain a 5-day ahead weather forecast:



METDataDownload <- function(stationID, product, key){
library("RJSONIO") #Load Library
library("plyr")
library("dplyr")
library("lubridate")
connectStr <- paste0("http://datapoint.metoffice.gov.uk/public/data/val/wxfcs/all/json/",stationID,"?res=",product,"&key=",key)

con <- url(connectStr)
data.json <- fromJSON(paste(readLines(con), collapse=""))
close(con)

#Station
LocID <- data.json$SiteRep$DV$Location$`i`
LocName <- data.json$SiteRep$DV$Location$name
Country <- data.json$SiteRep$DV$Location$country
Lat <- data.json$SiteRep$DV$Location$lat
Lon <- data.json$SiteRep$DV$Location$lon
Elev <- data.json$SiteRep$DV$Location$elevation

Details <- data.frame(LocationID = LocID,
LocationName = LocName,
Country = Country,
Lon = Lon,
Lat = Lat,
Elevation = Elev)
#Parameters
param <- do.call("rbind",data.json$SiteRep$Wx$Param)

#Forecast
if(product == "daily"){
dates <- unlist(lapply(data.json$SiteRep$DV$Location$Period, function(x){x$value}))
DayForecast <- do.call("rbind", lapply(data.json$SiteRep$DV$Location$Period, function(x){x$Rep[[1]]}))
NightForecast <- do.call("rbind", lapply(data.json$SiteRep$DV$Location$Period, function(x){x$Rep[[2]]}))
colnames(DayForecast)[ncol(DayForecast)] <- "Type"
colnames(NightForecast)[ncol(NightForecast)] <- "Type"

ForecastDF <- plyr::rbind.fill.matrix(DayForecast, NightForecast) %>%
as_tibble() %>%
mutate(Date = as.Date(rep(dates, 2))) %>%
mutate(Gn = as.numeric(Gn),
Hn = as.numeric(Hn),
PPd = as.numeric(PPd),
S = as.numeric(S),
Dm = as.numeric(Dm),
FDm = as.numeric(FDm),
W = as.numeric(W),
U = as.numeric(U),
Gm = as.numeric(Gm),
Hm = as.numeric(Hm),
PPn = as.numeric(PPn),
Nm = as.numeric(Nm),
FNm = as.numeric(FNm))


} else {
dates <- unlist(lapply(data.json$SiteRep$DV$Location$Period, function(x){x$value}))
Forecast <- do.call("rbind", lapply(lapply(data.json$SiteRep$DV$Location$Period, function(x){x$Rep}), function(x){do.call("rbind",x)}))
colnames(Forecast)[ncol(Forecast)] <- "Hour"

DateTimes <- seq(ymd_hms(paste0(as.Date(dates[1])," 00:00:00")),ymd_hms(paste0(as.Date(dates[length(dates)])," 21:00:00")), "3 hours")

if(nrow(Forecast)<length(DateTimes)){
extra_lines <- length(DateTimes)-nrow(Forecast)
for(i in 1:extra_lines){
Forecast <- rbind(rep("0", ncol(Forecast)), Forecast)
}
}

ForecastDF <- Forecast %>%
as_tibble() %>%
mutate(Hour = DateTimes) %>%
filter(D != "0") %>%
mutate(F = as.numeric(F),
G = as.numeric(G),
H = as.numeric(H),
Pp = as.numeric(Pp),
S = as.numeric(S),
T = as.numeric(T),
U = as.numeric(U),
W = as.numeric(W))

}


list(Details, param, ForecastDF)

}


The API key can be obtained for free at this link:
https://www.metoffice.gov.uk/datapoint/api

Once we have an API key we can simply insert the station ID and the type of product we want to obtain the forecast. We can select between two products: daily and 3hourly

To obtain the station ID we need to use another query and download an XML with all stations names and ID:



library(xml2)

url = paste0("http://datapoint.metoffice.gov.uk/public/data/val/wxfcs/all/daily/sitelist?key=",key)
XML_StationList <- read_xml(url)

write_xml(XML_StationList, "StationList.xml")


This will save an XML, which we can then open with a txt editor (e.g. Notepad++).

The function can be used as follows:


METDataDownload(stationID=3081, product="daily", key)

It will return a list with 3 elements:

  1. Station info: Name, ID, Lon, Lat, Elevation
  2. Parameter explanation
  3. Weather forecast: tibble format
I have not tested it much, so if you find any bug you are welcome to tweak it on GitHub:

Shiny App to access NOAA data

$
0
0
Now that the US Government shutdown is over, it is time to download NOAA weather daily summaries in bulk and store them somewhere safe so that at the next shutdown we do not need to worry.

Below is the code to download data for a series of years:


NOAA_BulkDownload <- function(Year, Dir){
URL <- paste0("ftp://ftp.ncdc.noaa.gov/pub/data/gsod/",Year,"/gsod_",Year,".tar")
download.file(URL, destfile=paste0(Dir,"/gsod_",Year,".tar"),
method="auto",mode="wb")

if(dir.exists(paste0(Dir,"/NOAA Data"))==FALSE){dir.create(paste0(Dir,"/NOAA Data"))}

untar(paste0(Dir,"/gsod_",Year,".tar"),
exdir=paste0(Dir,"/NOAA Data"))
}


An example on how to use this function is below:

Date <- 1980:2019
lapply(Date, NOAA_BulkDownload, Dir="C:/Users/fabio.veronesi/Desktop/New folder")

Theoretically, the process can be parallelized using parLappy, but I have not tested it.

Once we have all the file in one folder we can create the Shiny app to query these data.
The app will have a dashboard look with two tabs: one with a Leaflet map showing the location of the weather stations (markers are shown only at a certain zoom level to decrease loading time and RAM usage), below:




The other tab will allow the creation of the time-series (each file represents only 1 year, so we need to bind several files together to get the full period we are interested in) and it will also do some basic data cleaning, e.g. turn T from F to C, or snow depth from inches to mm. Finally, from this tab users can view the final product and download a cleaned csv.



The code for ui and server scripts is on my GitHub:
https://github.com/fveronesi/NOAA_ShinyApp
Viewing all 44 articles
Browse latest View live