Data & Map Methods

Updating the SCI

The first version of the SCI website (formerly known as the Healthy Development Measurement Tool, HMDT) was launched in March 2007. To date, the web-based version of the SCI has undergone a number of revisions to improve its applicability and specificity. Indicators have been updated several times to provide as current data as possible. However, because of the large number of indicators within the SCI website, it is not always possible to include the most up-to-date data on the website.

SFDPH staff conducts regular, comprehensive updates to the SCI website, primarily focusing on revising indicators, data, and development targets. SFDPH is committed to maintaining this tool and ensuring its continued relevance and utility.

Data Sources

SCI data comes from a variety of sources, including:  US Census, American Community Survey, San Francisco City agencies, State agencies, and others. All data sources are noted on indicator pages. As all data sources have limitations, these are noted in the limitations section on the indicator pages.

Demographic data on age, ethnicity, marital status, and housing tenure are from the 2010 US Census. This data was downloaded from the Census using American Fact Finder: http://factfinder2.census.gov/main.html.

In 2010 the Census discontinued its Long Form questionnaire that asked additional questions about income, employment, nativity, residential mobility, housing costs, education, and other personal and housing characteristics. Going forward, data in the aforementioned areas will be collected through the American Community Survey. Currently, most of the American Community Survey data on the SCI is from the 2005-2009 5-year estimates. See the American Community Survey section below.

SCI Geographic Units of Analysis:  Tables and Maps

We constructed the SCI indicator tables and maps using a number of different geographic units of analysis. Most maps are created by presenting data at the Census tract or point level, where the point is the physical location of the feature being examined (e.g. school addresses). In some cases map data is analyzed at the intersection level and the intersection values are interpolated onto a continuous surface (interpolation means that numerical values are estimated for all of the areas around the intersections by using the 12 intersections nearest to any point). Other SCI indicators are based on zip codes or Supervisoral Districts, as that was the lowest geographic level of analysis that the data source offered.

Whenever possible, table data is presented at the planning neighborhood level. Planning neighborhoods are defined by the San Francisco Planning Department and are used for neighborhood outreach and data presentation. Planning neighborhoods do not represent official neighborhood boundaries, and may differ from boundaries used by realtors, neighborhood associations, or residents. However, the Planning Department list offers a manageable number of neighborhoods for data presentation.  It is important to always consider the environmental and demographic variation that exists between different geographic units of analysis when interpreting data at different geographic levels.

For the neighborhood tables, data were aggregated to the neighborhood level in a variety of ways. Generally, each indicator’s analysis methods should be detailed in the methods section of the indicator page. One key analytical method that was used for creating neighborhood tables is dasymetric mapping. Dasymetric mapping involves assigning each residential lot to a Census block group and calculating the total number of residential square feet within the block group. Each lot’s residential floor volume is then divided by the total residential floor volume in its assigned block group, to approximate the percentage of residential space that each lot makes up. This percentage is then multiplied by the number of people within the assigned block group, such that we can estimate the number of people living in each lot. Once we have estimates for the number of people living in each lot, we assigned each lot to the neighborhood that it falls within and calculate the number of people living in each neighborhood. This technique was also applied to calculating the number of housing units or people of different ages or ethnicities living on each lot. Dasymetric mapping is used because Census tracts and block groups do not fit neatly into neighborhoods, while residential lots generally do.  Additionally, populations are not evenly distributed throughout tracts or block groups which could lead to incorrect assignments if tracts or block groups were assigned to neighborhoods. With people and housing units assigned to lots, it is easy to calculate the percent of people or households within a neighborhood that are within a certain distance of a feature.

Creating SCI Maps

How were the maps created?

The system used for mapping and various analyses was ArcGIS 10.0 software by ESRI (2011). This software integrated the data for mapping and analysis.

What projection and coordinate system was used for the maps?


Projected Coordinate System:
NAD_1983_StatePlane_California_III_FIPS_0403_Feet
Projection: Lambert_Conformal_Conic
False_Easting: 6561666.666667
False_Northing: 1640416.666667
Central_Meridian: -120.500000
Standard_Parallel_1: 37.066667
Standard_Parallel_2: 38.433333
Latitude_Of_Origin: 36.500000
Linear Unit: Foot_US
Geographic Coordinate System:
GCS_North_American_1983
Datum: D_North_American_1983

Census Undercount

According to the 2003 GeoLytics Neighborhood Change Database (NCDB) Data Users Guide: "Since its inception in 1790, controversy has surrounded the decennial census's alleged undercount of individuals (Anderson 1988). This is a significant issue because data from the census are so widely used in social science research and are the basis of important political decisions, including the drawing of congressional districts and the allocation of government funding…..No one, not even the Census Bureau, denies that the census misses many people. Also, to a lesser extent, there is some enumeration of fictitious or deceased individuals and double counting. The undercount problem exists for many reasons. For instance, the Census Bureau may miss some housing units when sending out forms or some people who have received forms may not complete and return them. The former case is prevalent among individuals with no stable address (such as the homeless), while the latter is particularly common among illegal immigrants, many of whom wish to remain hidden from the government. While the Census Bureau makes several attempts to locate non-responding households, some are inevitably missed." (page 4-7 and 4-8)

"Of particular concern is the so-called "differential undercount," which refers to the fact that certain types of individuals and households are more likely to be missed by the census than others. According to one study, the undercount for black persons remained at 5.7 percent in 1990—an improvement from the 8.4 percent mark in 1940, but an increase from 4.5 percent in 1980 (Robinson, et. al. 1991). Men and the young are more likely to be missed than women and the old, and one study estimated that for black males between 20 and 29, the undercount was 10.1 percent in 1990 (Skerry 1992). The number of illegal immigrants, most of whom are of Hispanic origin, is believed to be around 3 million, and the Census Bureau estimates that 30 percent of this population was missed in 1990." (page 4-8)

According to the U.S. Census, "data indicate that populations were undercounted at different rates. In general, Blacks, American Indians and Alaskan Natives, Asians and Pacific Islanders, and Hispanics were missed at higher rates than Whites." For more information about the undercount study conducted in 1990, visit: http://www.census.gov/dmd/www/techdoc1.html 

American Community Survey

In years before the 2010 Census, many of the Census-based indicators in the SCI were based on the Census Summary File 3 (SF3), also known as the long form. SF3 data was collected by surveying 1 in every 6 households, while Summary File 1 (SF1) data, also known as the short form, was and still is collected by surveying 100% of household in the US. However, after the 2000 census, SF3 was discontinued and the American Community Survey was used to take its place.

The American Community Survey (ACS) is a continuous sample survey, where 250,000 households are surveyed each month across the country. Because much fewer households are surveyed per year compared to the SF3, the ACS prepares 5-year estimates for analysis of data at the Census tract level. This leads to less precise estimates of population characteristics than the SF3; however, the trade-off is that data are updated on an annual basis as opposed to every 10 years.

While data from the SF3 were also statistical estimates like those produced by the ACS, estimates of error (such as confidence intervals or margins of error) were rarely presented with the data. Because the ACS data can have a great deal of statistical uncertainty, it is particularly important to account for and report error in analysis and presentation. In our analyses of ACS data we recalculated 90% margins of error each time data were aggregated by category (e.g. sex) or geography to the neighborhood level. To aggregate data to the neighborhood level, each Census tract was assigned to the neighborhood that its geographic mean center point fell within. In a few cases, particularly around Chinatown, tracts were manually reassigned to a different neighborhood due to knowledge of that tract’s population distribution and characteristics. However, often tracts do not neatly fall within a neighborhood and may actually be halfway in one neighborhood and halfway in another. This phenomenon is the rationale behind using the dasymetric mapping method with data from SF1. Unfortunately because ACS data are estimates with error, it would not be prudent to use dasymetric mapping. Therefore, we are forced to accept and acknowledge the limitations that come with this neighborhood aggregation methodology.

For more information on ACS data and how to use it visit this site: http://www.census.gov/acs/www/guidance_for_data_users/handbooks/

References

American Community Survey Handbooks for Data Users: http://www.census.gov/acs/www/guidance_for_data_users/handbooks/

GeoLytics Neighborhood Change Database Data Users’ Guide: http://www.geolytics.com/pdf/NCDB-LF-Data-Users-Guide.pdf

Maantay JA, Maroko AR, Herrmann C. Mapping Population Distribution in the Urban Environment: The Cadastral-based Expert Dasymetric System(CEDS).  Cartography and Geographic Information Science. 2007;34(2):77-102. http://www.lehman.edu/deannss/geography/publications/Dasymetric_CaGIS_Maantay.pdf