Category Archives: Census

Some urban researchers are careless…and wrong

I have read a number of scholarly articles in which the authors were using census Urbanized Area data from 2000 or later in which they described those areas as consisting of territory with a population density of 1,000 or more. And that is incorrect. The density threshold for adding blocks or other small areas to an Urbanized Area (or Urban Cluster) is 500 persons per square mile. I’m not into naming and shaming and won’t. But come on! If you can’t even describe the data you are using accurately, why should anyone trust anything else you are saying?

I know where the error comes from. Starting with the 2000 census, the Census Bureau dramatically changed how they defined the notion of “urban” and Urbanized Areas (for the most part greatly improving the definition). Under the old definition, it was the case that a small area had to have a population density of at least 1,000 persons per square mile to be included in an Urbanized Area. An excellent summary of how the census definition of “urban” has evolved can be found here.

I assume that a researcher making this error had read earlier articles that described Urbanized Areas as consisting of areas with densities of 1,000 or more (either correctly, if referring to pre–2000 Urbanized Areas or incorrectly, if referring to the later areas). I expect this would be the source, not the census definition of the earlier Urbanized Areas, for if these authors were too careless and lazy to look up the definition for their current work, they likely would not have done so in the past either.

The current Urbanized Area density minimum plays a key role in the definition of urban areas for my urban patterns research. And of course I am continuing to read new articles that are published that deal with urban patterns, including those using Urbanized Area data. The first few times I read articles referring to the 1000-person-per-square-mile cutoff for 2000 or 2010 Urbanized Areas, I panicked. Did I make a mistake in understanding the definition and get it wrong? (It is a complex definition.) Each of those times I went back and re-read the formal notices on urban area criteria for 2000 and 2010  in the Federal Register. After having assured myself several times that I was correct, I no longer have to repeat this.

Technical note

The 2000 and 2010 urban area criteria do make use of a population density minimum of 1,000 persons per square mile in the first stage of the delineation process. An urban area core is defined that includes small areas with population densities of 1,000 or more. Then additional areas are added with densities of 500 persons per square mile and above. The existence of an initial urban area core meeting the higher density threshold will not be an issue for Urbanized Areas.

Advertisements

On the choice of Combined Statistical Areas

Last year, I wrote a post discussing why I chose to use the larger Combined Statistical Areas (CSAs) for my urban patterns research rather than the commonly used Metropolitan Statistical Areas (MSAs). I followed this up with a second post giving examples of how the sharing of transportation infrastructure–commuter rail and airports–could be an indicator of the integration of areas that should be considered together as a single, larger metropolitan area.

This decision to use the CSAs is of such fundamental importance to my research that I felt it deserved more extended, formal treatment. I prepared the paper “On the Choice of Combined Statistical Areas” that provides greater background, covers the topics addressed in those blog posts in more detail, and addresses some other implications of the the choice of CSAs over MSAs. It also shows how the CSAs are comparable in extent to MSAs as they had been defined earlier for the 2000 census. This last topic was also addressed in an earlier post.

The paper is posted on the Research page of the website and can also be downloaded here.

Data for studying urban patterns over time

I want to study urban spatial structure over an extended period of time. Here are my data requirements: Data for population or housing units that can show the level of urban development. Data for small areas that enable the definition of the extent of urban development and the examination of distributions within the urban areas. Data for multiple points in time–as many as possible. Data for the same small areas at each point in time, to allow examination of changes in those areas over time.

My dataset begins with a unique resource, the Neighborhood Change Database created by the Urban Institute and Geolytics. This dataset includes census tract data from the 1970 through 2000 censuses, with the data for the years from 1970 through 1990 normalized for the 2000 census tract boundaries. So that’s 4 points in time. The block data from the 2010 census for population and housing units can be aggregated to the year 2000 tract boundaries, giving another year.

While many studies use population and population densities to study urban patterns, I have chosen to use housing units (as have others). They are more fixed and I think better represent the pattern of urban development. (The Census Bureau uses a minimum population density threshold to define urban areas. It is literally possible for an area to go from rural to urban from one census to the next without any new housing being developed. All it would take is an increase in population, for example, some babies being born.)

Using housing units also provides the opportunity to extend the data back in time. The census and the Neighborhood Change Database include the distribution of housing units by the year in which they were built. One can use this information for 1970 to estimate numbers of housing units that existed in earlier years. There are errors, as this approach cannot take into account changes to the stock of the older units that have occurred in the interim. I did an analysis that considered the extent of the error and concluded that it was reasonable to estimate housing units for the tracts back two decades, to 1950 but not further. This is discussed in a note Year-built Estimates Analysis on the Research page.

A remaining question involved which areas to examine and what would be their extent? As I noted in an earlier post, I believe Combined Statistical Areas (CSAs) better represent the extent of metropolitan areas than Metropolitan Statistical Areas (MSAs). I am choosing to examine urban patterns within the 59 CSAs (or MSAs, for areas not included in a CSA) that had populations over 1,000,000 in 2010.

Documentation of the urban patterns dataset is provided in a note Urban Patterns Dataset Description on the Research page.

Problems with the urban and metropolitan area definitions

The previous post described how the changes to the metropolitan area definition resulted in the splitting of numbers of large Metropolitan Statistical Areas (MSAs) as they were delineated for the 2000 census into 2 or more MSAs in 2003. This raised the obvious question, what was it about the new definition that produced these changes? The answer proved to be complex and bizarre, and the situation was only made worse by a horrible decision made by the Census Bureau with the 2010 Urbanized Area (UA) definition.

A major change made in the MSA definition first used in 2003 was to begin the delineation with the Urbanized Areas, including all counties with substantial portions of a UA in the MSA. After that, commuting to those central counties was used to add outlying counties to the MSA. The definition included provisions for merging adjacent MSAs using the same commuting criterion, but this made it highly unlikely that adjacent large MSAs could ever be merged. So as a result, the general extent of MSAs was determined by the extent of the UAs. If an area of contiguous urban settlement were split into 2 or more UAs, this would likely produce multiple MSAs.

So now we have to go to the UA definition. The new UA definition for the 2000 census provided for the splitting of large urban agglomerations into multiple UAs using the MSA (and CMSA and PMSA) boundaries as delineated for the census. So the extent of the MSAs depended on the UAs, and the extent of the UAs depended on the MSAs! It is a circular definition!

So what happens with the UAs for 2010? The Census proposed to maintain the status quo, keep the largest UAs with populations over a million the same, but not split smaller urban agglomerations, so contiguous UAs would be merged. This generated opposition from those in areas that would be merged, as this could affect the receipt of federal funding. The. Census response was to surrender and make the decision that the set of UAs that were delineated in 2000 would be frozen! Each 2000 UA would continue to be a UA in 2010! And since the MSAs continued to be based on the UAs, they would be largely frozen as well.

Since the beginning of the UA and MSA definitions in the mid-twentieth century, the areas had been allowed to evolve, with areas being combined as formerly separate areas grew together and were more reasonably considered a single entity. For example, Dallas and Fort Worth started as separate UAs and MSAs but for decades have been considered to be a combined area. What the Census Bureau did with the 2010 UA definition was to say that the delineation of urban and metropolitan America would be frozen as it was in 2000 and would not be allowed to further evolve. This was a truly horrible decision. And the Census Bureau knew it, as was clear from the misleading obfuscation of what they were doing in the Federal Register notice for the 2010 UA definition.

Far more detail is provided in a research note discussing the problems with the urban and metropolitan area definitions that is posted on the Research page and can also be downloaded here.

The effect of the changed definition of Metropolitan Statistical Areas

In an earlier post I explained why I chose to use the larger Combined Statistical Areas (CSAs) for my urban patterns research rather than the more common and familiar Metropolitan Statistical Areas (MSAs). I felt that in some cases the MSAs did not encompass what I felt was the whole metropolitan area. Exhibit 1 was the New York MSA, which did not include any of the Connecticut suburbs.

Before this, I had no occasion to systematically look at the extent of all of the large MSAs. But my recollection was that the MSAs were not always this limited. For example, the New York MSA used to include areas in Connecticut, and Raleigh and Durham had been a single MSA. I decided to start digging to find what had happened. It turns out that the Office of Management and Budget (OMB) made major changes to the MSA definition in 2000, which was first used to delineate new MSAs in 2003. This is also when the CSAs were introduced.

I decided to do a systematic comparison of the last MSAs delineated under the old definition, which were used in reporting the 2000 census, and the areas delineated in 2003 using the new standards. I looked at the 49 MSAs (and CMSAs, which were nothing more or less than MSAs for which subdivisions had been delineated) with populations over a million in the 2000 census. For a majority of the 2000 MSAs, the 2003 MSAs produced with the new definition were similar, varying only in the outlying counties included. But 18 of the 2000 MSAs were split into 2 or more MSAs in 2003, in one instance, into 6 different MSAs. These included New York and Raleigh-Durham.

For those areas where CSAs had been delineated in 2003, I compared their extent to the 2000 MSAs. In nearly all cases, the CSAs were quite comparable to the 2000 MSAs. They included the multiple new MSAs produced by the splitting of the older areas. No wonder I found the CSAs more reasonable than the MSAs. OMB thought those larger areas better represented the extent of metropolitan areas up through 2000!

A research note providing the complete results of these comparisons is posted on the Research page and can also be downloaded here.

Problems using Urbanized Areas for comparisons over time

Since 1950, the Census has defined Urbanized Areas (UAs) as the contiguous, built-up areas of large urban settlements. These are areas with populations of 50,000 or more (though population of what has varied), including adjacent small areas (census tracts or blocks) that meet a minimum density threshold. The population within this area is then literally defined as urban, as opposed to the population outside, unless located in another urban area.

UAs have an advantage over Metropolitan Statistical Areas (MSAs) for some purposes because they include only the territory that is (more-or-less) built up and that can be considered unambiguously urban, making distinctions literally at the block level. MSAs, being composed of entire counties, can include widely varying amounts of land with no urban characteristics and little or no connection to the urban area. Because of this, UAs are especially appropriate for the examination of patterns of urban settlement, including urban sprawl. It makes sense, for example, to calculate densities for UAs, while it does not make sense for MSAs (as discussed here).

Because UAs include only the contiguous developed urban territory, they have very irregular boundaries and are generally much smaller than their MSAs. Here is the example of the Indianapolis UA within the counties making up the Indianapolis MSA:

Indianapolis Urbanized Area (in red) within the Indianapolis Metropolitan Statistical Area

Indianapolis Urbanized Area (in red) within the Indianapolis Metropolitan Statistical Area

While UAs are very useful for examining the patterns of urban areas at a single point in time, significant difficulties arise when looking at the changes in urban patterns over time, especially over extended periods of time. Two major problems stem from the irregular boundaries created using blocks as the smallest building blocks and from the changes in the UA definition over time.

UAs are delineated for each decennial census, and a variety of data from that census are reported for each of those areas. So it is possible to compare the population, land area, and density, and many other characteristics of a UA in 2010 with a UA in 2000. Of course, by definition, these represent different urban extents. The Census does not report the 2010 statistics for the 2000 (or earlier) UAs, nor does it go back to the 2000 Census data and report that information for the newly delineated 2010 UAs. So while you can make comparisons of the entire UAs at two points in time, you cannot directly consider the population, land area, or other characteristics of the territory that has been added to a UA between 2000 and 2010. You can say that the population of the entire UA has increased by some amount over the decade, but you cannot say how much of that increase occurred from the addition of the new territory as opposed to an increase (or decrease) in the population in the original 2000 UA.

A workaround is possible for recent censuses. You could take the block data from the 2010 census and use it to estimate values for population and other variables for the 2000 UA and vice versa. (Because block data would be required, you would only be able to use the limited set of variables reported at that level.) There would be some minor problems resulting from changes in block boundaries. And (sometimes related to the changes in block boundaries), some areas that were within the 2000 UA will not be in the 2010 UA. But these problems are relatively small and this procedure would be workable.

Making the estimates using block data will be possible only for those recent censuses for which machine-readable block data and the associated geographic boundary files for both blocks and UAs are available. No problem for 2000 and 2010. I think it might be possible for 1990. Doubtful whether one could go back further. (And don’t consider doing this without machine-readable data. My dissertation involved my doing this type of estimation, by hand, for data from multiple points in time for traffic analysis zones, census tracts, and wards (not blocks), for one urban area. That was several months of my life.)

But even if one would be content with comparing the totals for the UAs from each census over time, changes in the definition of UAs makes this highly problemmatic. From the 1950 through the 1990 Censuses, the definitions evolved each decade but were fairly consistent, allowing for reasonable comparisons. But a major change in the definition of UAs was made for the 2000 Census. The change was made to create a cleaner, more consistent definition that made use of the data processing capabilities that had become available. The new definition was a major improvement, but it creates problems for making comparisons with the earlier UAs.

While the 2000 UA definition was different from the earlier one in many ways, two changes are of greatest importance. First, the minimum population density required for an area to be added to the UA was reduced from 1,000 to 500 persons per square mile. It is obvious that this is a major change. (I have seen numbers of articles, post–2000, still referring to the 1,000 person per square mile value as the threshold for inclusion in the UA. Either the authors have not read the new standards or have not read far enough. The new definition requires that an urban area start with a core with a population density of at least 1,000 persons per square mile, but then goes on to require only the 500 persons-per-square-mile density for adding adjacent areas. And for UAs, with populations exceeding 50,000, there will hardly be an issue of their having a core meeting the higher density threshold.

The second major change from 1990 to 2000 is that the earlier UAs included the entire areas of incorporated places and Census Defined Places. The new definition uses only blocks and either block groups (in 2000) or census tracts (in 2010) and totally ignores the boundaries of these larger areas. For incorporated places that are entirely developed and urban, where all of the subareas meet the minimum density threshold, this makes no difference. But for cities that have annexed significant areas of nonurban land in advance of development, these areas were included in the UAs in 1990 and before while they are no longer included in 2000 and 2010. A comparison of the areas of UAs between 1990 and 2000 shows the effect. Many UAs grew in both population and land area over the decade. But a number of areas, while growing in population, showed significant drops in land area. This almost certainly was the result of the nondeveloped land within incorporated places being excluded in 2000.

At least the UA definition has remained largely consistent from 2000 to 2010. Even then, little changes in Census procedures can produce strange anomolies. I was comparing the extent of 2000 and 2010 UAs and was surprised to find that the land area of the San Diego UA declined dramatically despite a major population increase. I compared the areas on a map and found what happened. A significant portion of Camp Pendelton, a very large Marine base at the northern tip of the UA was included in 2000, despite the fact that much of that area was hardly urban. (It would not do to have military exercises, including artillery practice and tank operations, within a real urban area!) In 2010, only a very small portion of the base was included in the UA, presumably the areas where the Marines were housed. It appears that this change resulted from an agreement between the Department of Defense and the Census Bureau on changes in how census subdivisions like tracts and blocks could be delineated on military installations. (Don’t tell me that DOD forbad such subdivision earlier because that would provide intelligence on the location of personnel on military bases! Probably!)

In case anyone actually wants to read the official definitions of the UAs, they are published in the Federal Register for 2000 and 2010 and are available on the Census website here. A history of the UA definitions up through 1990 can be found here.

The problem with Metropolitan Divisions

In addition to defining Metropolitan Statistical Areas (MSAs), the Office of Management and Budget defines subareas of some of the largest MSAs, which are called Metropolitan Divisions. While I suppose there may be some demand for data on smaller parts of large metropolitan areas, the Metropolitan Divisions that have been delineated are an inconsistent mix of very different types of areas and probably should not be used for any analytical purposes. (Before 2003, a somewhat different subdivision called Primary Metropolitan Statistical Areas was used, which will be discussed further below.)

Currently, 9 MSAs are subdivided into Metropolitan Divisions. To be a candidate for subdivision, the MSA must have a population of at least 2,500,000. Metropolitan Divisions are combinations of the MSA counties using a tortured definition involving the percentage of workers living in a county who also work in the county, the ratio of the number of workers working in the county and the number living in the county, and the percentage employment interchange between the counties.

The problem is that the Metropolitan Divisions that result from this are very different types of areas. Some are portions of MSAs where two or three metropolitan areas, each with their own large city, have merged into a single MSA. The Dallas and Fort Worth Metropolitan Divisions are the subareas of their MSA, as are the Miami, Fort Lauderdale, and West Palm Beach Metropolitan Divisions. Other Metropolitan Divisions are centered around cities that are distinctly smaller than and subsidiary to the largest city in their MSA. Examples include Newark, New Jersey in the New York MSA and Camden, New Jersey in the Philadelphia MSA. These are also cities that have been included in the larger city’s metropolitan area probably for as long as we have had a concept of metropolitan areas. And finally, some Metropolitan Divisions are so devoid of significant cities that they are named using the county names. Examples, again in New York and Philadelphia, respectively, are the Nassau and Suffolk Counties Metropolitan Division and the Bucks, Montgomery, and Chester Counties Metropolitan Division.

At least this is an improvement over the somewhat analogous subdivision of metropolitan areas prior to the current system being instituted in 2003. That earlier system was more problemmatic in two ways. First, it subdivided more metropolitan areas into subdivisions, 18 in 2000 versus the current 9. But more problemmatic was the nomenclature used. The subdivisions were called Primary Metropolitan Statistical Areas (PMSAs), while the MSAs of which they were a part were renamed Consolidated Metropolitan Statistical Areas (CMSAs). The PMSAs were as heterogeneous a mix of areas as are the Metropolitan Divisions (Dallas, Fort Worth, and Nassau-Suffolk were PMSAs as well). Terming the areas “Primary” was misleading, suggesting that these were somehow the foundational metropolitan areas. This, in turn, has led some to believe it was appropriate to compare the PMSAs to the (non-CMSA) MSAs. This was not correct. The MSAs were delineated first. Then, some MSAs with populations greater than 1,000,000 were subdivided into PMSAs. It was only after subdivision that those MSAs were given the (confusing) name Consolidated MSAs. The CMSAs were not the product of any consolidation of PMSAs. Rather, the PMSAs were the product of the subdivision of the CMSAs. At least the current Metropolitan Division nomenclature gets that right.

A good example illustrating why it would be inappropriate to use Metropolitan Divisions or PMSAs would be the measurement, comparison, and analysis of urban sprawl for entire metropolitan areas, where one might be using MSAs for those areas that have not been subdivided. It would not necessarily be unreasonable to compare measures of sprawl for the Dallas and Fort Worth Metropolitan Divisions with other MSAs. But now consider the Nassau-Suffolk and Bucks-Montgomery-Chester Metropolitan Divisions. These are both portions of the suburb parts of their MSAs. These are the locations of the first two Levittowns, for goodness sake. As suburban areas, one would expect these Metropolitan Divisions to have higher levels of sprawl. But the effect is not only on their levels of sprawl. By taking these (and other) suburban areas out of the New York and Philadelphia MSAs and then computing levels of sprawl for the New York and Philadelphia Metropolitan Divisions, artificially lower levels of sprawl will be observed for those areas. I am quite confident that the level of sprawl for any MSA can be reduced if one excludes some of the sprawling suburban areas from the computation.

One final point. Some may argue that the use of Metropolitan Divisions or PMSAs along with the (non-subdivided) MSAs may be appropriate in some circumstances. (Though an argument that they are better because they are more homogeneous than the non-subdivided MSA or CMSA is rather silly, because MSAs are inherently heterogeneous.) However, if any work uses both the PMSAs or Metropolitan Divisions and the CMSAs or MSAs for the subdivided MSAs within a single study, I think one has to be very suspicious of the motivations.

A note on names

The current metropolitan area definition seems to favor including as many cities as possible in the naming of MSAs and Metropolitan Divisions. I have included only the names of the major cities necessary for identifying the various areas to which I am referring.