- Data Sources
- SCI Geographic Units of Analysis: Tables and Maps
- Creating SCI Maps
- Census Undercount
- American Community Survey
The measures pages describe the methodologies we use for constructing the indicators from source data.
The SCI data comes from a variety of sources, including: US Census, American Community Survey, San Francisco City agencies, State agencies, and others. All data sources are noted on indicator pages. As all data sources have limitations, these are noted in the limitations section on the indicator pages.
Demographic data on age, ethnicity, marital status, and housing tenure are from the 2010 US Census. This data was downloaded from the Census using American Fact Finder: http://factfinder2.census.gov/main.html.
In 2010 the Census discontinued its Long Form questionnaire that asked additional questions about income, employment, nativity, residential mobility, housing costs, education, and other personal and housing characteristics. Going forward, data in the aforementioned areas will be collected through the American Community Survey. Currently, most of the American Community Survey data on the SCI is from the 2005-2009 5-year estimates. See the American Community Survey section below.
The SCI indicators represent multiple geographic scales including, census tract, zip code, point, and intersection. In some cases map data is analyzed at the intersection level and the intersection values are interpolated onto a continuous surface (interpolation means that numerical values are estimated for all of the areas around the intersections by using the 12 intersections nearest to any point).
Data tables on many pages aggregate the indicators at the "neighborhood" level. There is no standard national convention for neighborhoods and the conventions used in the SCI vary by city. For example, in San Francisco the Department of City Planning defines neighborhood boundries for planning purposes. Understandably, there can bevariation of the indicator within a neighborhood.
We often used dasymetric mapping to aggregate indicators at the neighborhood level. Dasymetric mapping involves assigning each residential lot to a Census block group and calculating the total number of residential square feet within the block group. Each lot’s residential floor volume is then divided by the total residential floor volume in its assigned block group, to approximate the percentage of residential space that each lot makes up. This percentage is then multiplied by the number of people within the assigned block group, such that we can estimate the number of people living in each lot. Once we have estimates for the number of people living in each lot, we assigned each lot to the neighborhood that it falls within and calculate the number of people living in each neighborhood. This technique was also applied to calculating the number of housing units or people of different ages or ethnicities living on each lot. Dasymetric mapping is used because Census tracts and block groups do not fit neatly into neighborhoods, while residential lots generally do. Additionally, populations are not evenly distributed throughout tracts or block groups which could lead to incorrect assignments if tracts or block groups were assigned to neighborhoods. With people and housing units assigned to lots, it is easy to calculate the percent of people or households within a neighborhood that are within a certain distance of a feature.
The system used for mapping and various analyses was ArcGIS 10.0 software by ESRI (2011). This software integrated the data for mapping and analysis. We used the following projected coordinate system:
Projected Coordinate System:
Linear Unit: Foot_US
Geographic Coordinate System:
According to the 2003 GeoLytics Neighborhood Change Database (NCDB) Data Users Guide: "Since its inception in 1790, controversy has surrounded the decennial census's alleged undercount of individuals (Anderson 1988). This is a significant issue because data from the census are so widely used in social science research and are the basis of important political decisions, including the drawing of congressional districts and the allocation of government funding…..No one, not even the Census Bureau, denies that the census misses many people. Also, to a lesser extent, there is some enumeration of fictitious or deceased individuals and double counting. The undercount problem exists for many reasons. For instance, the Census Bureau may miss some housing units when sending out forms or some people who have received forms may not complete and return them. The former case is prevalent among individuals with no stable address (such as the homeless), while the latter is particularly common among illegal immigrants, many of whom wish to remain hidden from the government. While the Census Bureau makes several attempts to locate non-responding households, some are inevitably missed." (page 4-7 and 4-8)
"Of particular concern is the so-called "differential undercount," which refers to the fact that certain types of individuals and households are more likely to be missed by the census than others. According to one study, the undercount for black persons remained at 5.7 percent in 1990—an improvement from the 8.4 percent mark in 1940, but an increase from 4.5 percent in 1980 (Robinson, et. al. 1991). Men and the young are more likely to be missed than women and the old, and one study estimated that for black males between 20 and 29, the undercount was 10.1 percent in 1990 (Skerry 1992). The number of illegal immigrants, most of whom are of Hispanic origin, is believed to be around 3 million, and the Census Bureau estimates that 30 percent of this population was missed in 1990." (page 4-8)
According to the U.S. Census, "data indicate that populations were undercounted at different rates. In general, Blacks, American Indians and Alaskan Natives, Asians and Pacific Islanders, and Hispanics were missed at higher rates than Whites." For more information about the undercount study conducted in 1990, visit: http://www.census.gov/dmd/www/techdoc1.html
American Community Survey
In years before the 2010 Census, many of the Census-based indicators in the SCI were based on the Census Summary File 3 (SF3), also known as the long form. SF3 data was collected by surveying 1 in every 6 households, while Summary File 1 (SF1) data, also known as the short form, was and still is collected by surveying 100% of household in the US. However, after the 2000 census, SF3 was discontinued and the American Community Survey was used to take its place.
The American Community Survey (ACS) is a continuous sample survey, where 250,000 households are surveyed each month across the country. Because much fewer households are surveyed per year compared to the SF3, the ACS prepares 5-year estimates for analysis of data at the Census tract level. This leads to less precise estimates of population characteristics than the SF3; however, the trade-off is that data are updated on an annual basis as opposed to every 10 years.
While data from the SF3 were also statistical estimates like those produced by the ACS, estimates of error (such as confidence intervals or margins of error) were rarely presented with the data. Because the ACS data can have a great deal of statistical uncertainty, it is particularly important to account for and report error in analysis and presentation. In our analyses of ACS data we recalculated 90% margins of error each time data were aggregated by category (e.g. sex) or geography to the neighborhood level. To aggregate data to the neighborhood level, each Census tract was assigned to the neighborhood that its geographic mean center point fell within. In a few cases, particularly around Chinatown, tracts were manually reassigned to a different neighborhood due to knowledge of that tract’s population distribution and characteristics. However, often tracts do not neatly fall within a neighborhood and may actually be halfway in one neighborhood and halfway in another. This phenomenon is the rationale behind using the dasymetric mapping method with data from SF1. Unfortunately because ACS data are estimates with error, it would not be prudent to use dasymetric mapping. Therefore, we are forced to accept and acknowledge the limitations that come with this neighborhood aggregation methodology.
For more information on ACS data and how to use it visit this site: http://www.census.gov/acs/www/guidance_for_data_users/handbooks/
American Community Survey Handbooks for Data Users: http://www.census.gov/acs/www/guidance_for_data_users/handbooks/
GeoLytics Neighborhood Change Database Data Users’ Guide: http://www.geolytics.com/pdf/NCDB-LF-Data-Users-Guide.pdf
Maantay JA, Maroko AR, Herrmann C. Mapping Population Distribution in the Urban Environment: The Cadastral-based Expert Dasymetric System(CEDS). Cartography and Geographic Information Science. 2007;34(2):77-102. http://www.lehman.edu/deannss/geography/publications/Dasymetric_CaGIS_Maantay.pdf