Wednesday, December 13, 2017

Lab 15-Dasymetric Mapping

Lab 15 focused on using ancillary data to assist in making estimations regarding population density. Utilizing ancillary data to assist in making estimations regarding population density in areal weighting is called dasymetric mapping. For this weeks lab we were provided with three shapefiles for Seminole County Florida. They included high school boundaries polygons, water polygons, and census tracts polygons. The first objective was to determine future high school attendance rate of the 5-14 year old population. The water polygon was the ancillary data provided for this assignment. 

The secondary objective was to utilize a 30-meter raster of impervious landcover data and improve on the previous analysis. This task was bit more difficult and involved utilizing the Zonal Statistics as Table tool. This tool summarizes the values of a raster within the zones of another dataset and reports the results to a table. Various Spatial Joins were then utilized to determine the final outcome. The final results for my analysis actually resulted in the water polygon providing a lower percentage of the population aged 5-14 that were allocated incorrectly compared to using the dasymetric mapping baed on impervious data.

Population estimate using the water polygon

Population estimate using the impervious data

Wednesday, December 6, 2017

Lab 14- Spatial Data Aggregation

Lab 14 focused on becoming familiar with the modifiable areal unit problem (MAUP) and using provided data to access the delineation of political boundaries. What can happen when delineating political boundaries is the creation of districts that can be used for political advantages. This is called gerrymandering. A 2014 congressional district file for the US was provide along with a shape file containing all of the counties in the continental US. Our task was to identify the worst offenders of gerrymandering using the census data provided. The criteria was 'compactness' which determined the bizarre-shaped legislative districts and 'community' such as determining the number of counties that were divided by congressional districts.

In order to identify the worst offenders based on the 'compactness' criteria the size of the area was observed. A district that was small in area and clustered with other small districts seemed like they could be considered districts that were gerrymandered. In order to determine the area of a polygon the shapefile was projected to NAD 1983. A new field was added and the field calculator calculated the total area in square miles.

In order to measure 'community' the number of counties that were divided by a congressional district was determined.
  • ·         A spatial intersect between the ‘Districts’ shape file and the ‘County’ shape file was conducted.
  • ·          A summary count of the “counties” and the “districts” from the Intersect results was conducted.
  • ·         The results were joined back with the intersect results. A feature count greater than 1 represented a community that was divided by a district. The counties with a result greater than 1 were selected and a new shape file was created featuring only these results.  
  • ·         Summary of the results based on the district and the sum of counties.
  • ·         Join back to Interest results. Add a new field and use the ‘Field Calculator’ to determine the percentage of counties that were divided by the district.

Tuesday, November 28, 2017

Lab 13 - Effects of SCALE

Lab 13 examined the effects of scale and resolution between a Light Detection and Ranging (LIDAR) DEM and a Shuttle Radar Topography Mission (SRTM) DEM. Both DEMs were resampled from the original DEM to a 90-meter resolution DEM in order to compare the differences between the two elevation modeling techniques. The highest and lowest values was first evaluated to determine the value range amongst the two DEMS. The LIDAR DEM had a broader range of values with the highest being 106.05 and the lowest being 4.265. Next the average slope was determined with the LIDAR having a slope value of 31.212 compared to the SRTM value of 28.737. Below are several images that represent comparisons between the slope and aspect between the two DEMs.
Comparing Aspect Between LIDAR & SRTM DEMs

Comparing Slope Between LIDAR & SRTM DEMs

Tuesday, November 21, 2017

Lab 12-Geographically Weighted Regression

Geographically Weighted Regression

Week 12 continued with regression models and consisted of comparing a Ordinary Least Square (OLS) analysis against a Geographically Weighted Regression (GWR) analysis. A OLS regression model makes assumptions about distance and is considered a global model. The distance between data is not considered and all coefficients are constant for every location. A GWR regression model is a considered a local model with distance between data considered. The GWR coefficients vary by location meaning geographic areas with unknown data can be grouped with areas of known data.

The data used in lab 12 consisted of instances of auto theft spread throughout a geographical region. The explanatory variables were demographic data for the region. A OLS regression model and a GWR model were compared. The OLS and GWR both consisted of similar Akaike's information criterion (AICc) as well as similar Z-scores after a Spatial Autocorrelation was ran. The GWR regression analysis had a higher Adjusted R-square value so this model was considered a better fit to correlate the demographic data in relation to auto theft.

GWR Regression with "Fixed" Kernel Type

Monday, November 13, 2017

Lab 11, Multivariate Regression

Multivariate Regression, Diagnostics and Regression in ArcGIS

For lab 11 we continued with regression analysis but this time working with multiple variables. Multivariate regression is an extension of bivariate regression except there are two or more explanatory variables in addition to the dependent variable. This weeks lab used Excel and ArcGIS. The lab involved performing a linear regression analysis in ArcGIS and using the six ordinary least squares (OLS) check listed below to determine how well the model performed.  In addition we worked with exploratory regression in ArcGIS to evaluate all possible combinations of explanatory variables that best explain the dependent variable.

When performing a multivariate regression analysis there are six checks that can be used to determine the performance of a model.The first check is to determine the significance of variables. Some variables are redundant and can be removed. When checking for the significance of variables, a p-value greater than 0.05 is considered insignificant and is not needed for the analysis.

The second check is to determining the strength of the coefficients. A positive coefficient indicates a positive influence on the model and a negative coefficient indicates a negative influence on the coefficient.

The third check determines if any of the explanatory variables are redundant. Utilizing the Variance Inflation Factor (VIF) tool helps determine if variables are redundant. A VIF value greater than 7.5 should be removed.

The fourth check determines if the residuals are normally distributed.  For a unbiased model the residuals are normally distributed.  A residual represents the difference between the predicted value and the observed value.The Jarque-Bera diagnostic can be used to determine the distribution of the residual. 

The fifth check is to determine if the correct variables are being used for the model. The Spatial Autocorrelation tool in ArcGIS can be used to indicate if all the key explanatory variables are being used.

The sixth check is to determine the model performance. The Adjusted R-squared and Akaike's information criterion (AICc) are used to help determine how well the explanatory variables are modeling the dependent variable. An adjusted R-squared value closer to 1 is better and the AICc is useful when comparing multiple models. A lower AICc value is better.

Tuesday, November 7, 2017

Lab 10

Regression Analysis

For the lab 10 assignment we were introduced to statistics, correlations and bivariate regression analysis basics. Regression analysis is a concept that uses correlations between known values to predict unknown values. Portion C of the lab focused on bivariate regression analysis to estimate the absent precipitation data at Station A. The lab provided two stations both containing precipitation data from the period of 1931- 2004. Only station B had complete data for the entire time period. Station A was absent of data from 1931-1949.

The first objective was to create a scatterplot of precipitation amounts during the time period of 1950-2004 when both stations provide data.

In order to determine the rainfall for Station A for the years of 1931-1949 a regression analysis using a dependent variable and explanatory variable was performed in Excel. By using the regression analysis the slope and intercept were determined. The formula Y=b*X+a was then used to calculate the rainfall estimates for Station A. In the formula b= slope value, X =known value, and a = intercept value.

Monday, October 30, 2017

Lab 9- Accuracy of DEMs

For the lab 9 assignment we determined the accuracy of a DEM and analyzed the accuracy of three different interpolation methods. A common method used to determine the accuracy of a surface is to compare surveyed elevation values with the interpolated surface. Two accuracy metrics used to describe the differences between the elevation values and the surface are described below.

The first method for describing error is the root mean square error (RMSE) which measures the overall error. The RMSE results are always positive and a smaller value indicates less discrepancy between the "true" value and the DEM. The second method is determining the introduction of Bias. A bias in data is the systematic pattern in error. This can be determined using the metric mean error.  A value closer to zero would indicate less bias.

The following table is a summary of the overall accuracy assessment of the DEM provided for lab 9.  The DEM was a raster file of a high resolution bare earth DEM obtained through LIDAR. The field observations were collected for the elevation at the ground surface. The RMSE and mean error are relatively low which indicates there is very little overall error or bias.