Author Archives: John Ottensmann

Defining exurban areas

For the urban patterns research, in addition to delineating the urban areas for each year, I wanted to delineate exurban areas beyond the urban areas that could reasonably be considered to be parts of the metropolitan area related to the urban core. Unlike the census Urbanized Areas, however, there is no accepted standard definition for exurban areas. Fortunately, a thorough review of past studies of exurban areas and how they were defined has been provided by Berube and others (Finding Exurbia, Brookings, 2006).

A minimum population or housing unit density–obviously much lower than the urban density threshold–was the most common criterion used in defining exurban areas. Other factors were also considered, especially commuting to the urban area. Data are not available over the entire period of the urban patterns dataset to allow the use of commuting. However, the maximum extent of the exurban area would be limited to the area of the Combined Statistical Area (CSA) or Metropolitan Statistical Area (MSA), which at a minimum guarantees interaction with the urban area for 2010 for the counties as a whole, if not individual tracts.

I decided to define exurban areas as the sets of contiguous tracts that were adjacent to the urban areas and had housing unit densities greater than some value. The minimum density levels used to define exurban areas in various studies varied widely, from 40 acres per housing unit down to about 10 acres per unit. (For studies using the lowest densities, the extent of the exurban areas was most often limited by the commuting criterion rather than density.) I approached the problem by mapping the tracts meeting different minima in 2010 to make a judgment as to what looked reasonable.

The very low minimum density thresholds of 30 or 40 acres per unit frequently resulted in all or most of the CSA or MSA being considered exurban, with the tracts meeting these levels extending far beyond those areas, especially in the eastern U.S. On the other hand, a density minimum of 10 acres per unit produced much smaller exurban areas than seemed reasonable and consistent with personal observation.

The choice came down to thresholds of either 15 acres per unit or 20 acres per unit. The resulting exurban areas generally looked appropriate for most areas. The final choice of 15 acres per unit came down to a number of specific situations where the lower density level produced areas that seemed too large. I’ll give two examples: The exurban area for Indianapolis in 2010 would have extended south at least halfway to Louisville, through area I would never consider exurban. And the Portland exurban area would have encompassed a large portion of the Willamette valley.

A further check reinforced my decision on the minimum density for exurban areas of 15 acres per unit, which is one-fifth of the urban density theshold. For CSAs or MSAs adjacent to other CSAs or MSAs, it was not uncommon for both exurban areas to extend to the common boundary. But for areas not adjacent to others, the extent of contiguous exurban density tracts was generally either confined within the boundaries of the CSA or MSA or extended beyond the boundary at only one or two points, with a string of exurban density tracts along a highway. (This is much like census Urbanized Areas, which frequently have such tendrils of urban development extending outward.) So the density threshold for exurban areas seems consistent with the areas of significant metropolitan interaction as indicated by the CSA and MSA boundaries.

This process of defining the exurban areas is treated in greater detail in the paper, “Defining Exurban Areas for the Analysis of Urban Patterns Over Time” which can be downloaded here.

On the sharing of data from research

The National Academies recently released a report addressing integrity in scientific research, including the social and behavioral sciences. One of the recommendations is that after publication researchers share with others the data on which an article is based. This supports research transparency and should lead to greater reproducability of research. I think sharing data is generally a good thing, and I have done so. But I feel that the authors of the report have failed to address some important issues related to such data sharing. This is obviously a topic much broader that the subject of this blog. But at two points in my comments I will give examples that relate directly to things discussed here.

In discussing the recommendation on data sharing, the report points favorably to the policies of some journals requiring that authors make the data for an article availble to others on request. But further discussion in the report strongly implies that data should be made available in a repository from which anyone can download it. The difference is significant because in making the data available online, the person(s) who created the data then lose all control over how it might be used.

But first, a simple, practical issue. Making data available for others to use entails a significant amount of work relating to formatting, documentation, and so forth. I am very careful about documenting my data as I do research, but that original documentation is completely meaningful only to me. Just sharing data with co-authors requires some additional effort. Sharing it publicly would require more. I suspect that the majority of datasets from articles would never be used by others. So it is inefficient to put the work in for every dataset to get it into a form in which it can be shared. It makes much more sense to put in this effort when someone makes the request to use the data. At that point, I am happy to do so.

The authors of the report (many from the natural sciences) seem to most often view datasets as the products of experiments, to be reported in an paper, which then is the end of the story. Indeed, they actually see as a problem “the temptation to publish multiple papers on just one experiment or dataset.” (p. 17) They fail to realize that for certain types of research, datasets are developed, often with a great deal of effort, to support the investigation of multiple research questions. Those creating the data have a reasonable expectation of being able to carry out their research without having it preempted by others using their data.

My urban patterns dataset with data on housing units by census tract for 59 large urban areas from 1950 to 2010 is an example of this. I spent at least a year-and-a-half building the dataset. I have a long list of research questions I intend to address using this dataset. The papers currently on the Research page represent just a start. I feel that it is reasonable that I shold be able to be the first to use this data to address these questions. I certainly would not have put in the effort I did in creating the dataset only for one or two papers. This does not mean that I would be unwilling to share the data with others before I have completed this program. I’m finished with all of the questions I have intended to address relating to the negative exponential model. If someone wants to do more, I could be willing to share the data. Or if someone wants to combine my data with some other dataset, sharing could be appropriate. But that’s why I believe I need to have control over the sharing.

I was surprised that the authors of the report failed to address reputational risks that could be associated with data sharing (and by this, I am not including risks associated with others finding out about problems with the original research). Putting data on an archive for anyone to use can result in uses that can negatively impact the reputation of the data creator.

The first (and least significant) reputational risk comes from someone taking the data and producing and publishing a very crappy piece of work. While most such efforts are justifiably ignored, occasionally they will achieve notoriety for their sheer absence of quality. Assuming the author of the crappy research appropriately cites the creator of the data, the creator will forever be linked with the work. While everyone should understand that the data creator is not responsible, just being associated would not the most pleasant thing.

For certain types of data, the reputational risk can be much greater. For example, suppose researchers post data dealing with a social problem that includes information on race. A white supremacist could obtain the data, improperly manipulate it, and falsely claim that the results supported their racist views. And they might well prominently note that the creators of the data were respected researchers at a major university. Such a nightmare scenario is why researchers have a legitimate interest in controlling the sharing of their data.

For researchers working in a field involving contentious positions with extremely strong partisans on both sides, risks can extend to the use of the data by others in that field. Getting back to the subject of this blog, urban sprawl and its effects represents just such a field. A study is published indicating that sprawl or compact cities does or does not have some effect, and those whose position has not been supported can be vociferous in their attacks and arguments against it. This has happened–in both directions. I have no doubt that if the data from such a study were made freely available for download that that someone whose position had not been supported might reanalyze the data making the assumptions necessary to reach the opposite conclusion in an attempt to discredit the original study and its author.

On the choice of Combined Statistical Areas

Last year, I wrote a post discussing why I chose to use the larger Combined Statistical Areas (CSAs) for my urban patterns research rather than the commonly used Metropolitan Statistical Areas (MSAs). I followed this up with a second post giving examples of how the sharing of transportation infrastructure–commuter rail and airports–could be an indicator of the integration of areas that should be considered together as a single, larger metropolitan area.

This decision to use the CSAs is of such fundamental importance to my research that I felt it deserved more extended, formal treatment. I prepared the paper “On the Choice of Combined Statistical Areas” that provides greater background, covers the topics addressed in those blog posts in more detail, and addresses some other implications of the the choice of CSAs over MSAs. It also shows how the CSAs are comparable in extent to MSAs as they had been defined earlier for the 2000 census. This last topic was also addressed in an earlier post.

The paper is posted on the Research page of the website and can also be downloaded here.

Scattered, leapfrog development vs. low-density development

Two residential development patterns are most often associated with urban sprawl. Scattered or leapfrog development refers to the building of new residences, either separately or in a subdivision, at some distance from existing built-up areas. Low-density development refers to the construction of individual houses on larger lots. It is possible, of course, for scattered development to also be done on larger lots, though this is not the distinguishing feature of such leapfrog patterns.

In looking at densities in areas larger than the actual residential lots, scattered development will also be low-density because of the vacant land that has been skipped over. But I think most people would agree that scattered, leapfrog development and low-density development are two distinctive types of residential development and sprawl.

But how different are the two? Here is a thought experiment: Imagine an undeveloped area of land one mile square at the edge of an urban area. Now consider two ways in which this land might be developed. The first will be very low-density development. The area is divided into 64 10-acre lots and a house is built on each, completely developing the area. (For simplicity, we will be ignoring the need for land for roads to provide access.) Let’s assume that the owners only landscape small areas surrounding their residences, leaving the remainder of their lots undisturbed.

Now consider the second alternative of extreme scattered development. Sixty-four houses are constructed on 1-acre lots that are fairly evenly distributed across the mile-square area. In this case, only 10 percent of the land has been developed, but the developed area is 10 times as dense as in the previous case. Now suppose that these 64 houses are exactly the same as the very-low density houses and are located in exactly the same places. There would literally be no way to distinguish the scattered development from the very low-density development based on any physical characteristics of the developments. The only way to tell whether the development is very low-density or scattered is by looking at the land records.

Of course that difference in ownership matters–to a degree. The owners of the homes on the scattered 1-acre lots have no control over the undeveloped 90 percent of the land, which could be developed at any time. In the case of the very low-density development, each owner exercises control over 10 acres surrounding the residence by virtue of ownership. One might assume that these very large lots were acquired because the owners wanted the space and the control. (Though it is possible that land use regulations and/or choices made by the prior owner or developer of the area limited options available to the purchasers of these lots.) It is very likely that you are not going to see the purchasers of these large lots soon subdividing their land for higher-density development.

But the operative word here is “soon.” Over time, as demand increases and conditions change, further subdivision and development in the very low-density area becomes an increasing possibility. I currently live in an area that was developed from that late 1960s through the early 1980s with lots around a half acre. There are several lots in the neighborhood where the owners have built a second, substantial house on the rear portion of the lot (more than just an accessory unit or “granny flat.”)

So the very low-density developed area perhaps is not that completely different from the area with the scattered development.

The negative exponential model and the size of cities

Researchers have long noted the tendency for densities to decline as a negative exponential function of distance from the center. They have looked at declines in the density gradient over time as a measure of decentralization in urban areas. They have noted the relationships of the estimated parameters of the model–the density gradient and the density at the center–to a variety of characteristics of urban areas, including, naturally, the size of the area. The consistent finding has been that the gradients tend to be smaller for larger urban areas, while the central densities tend to be larger.

Consider the relationships among the three–the gradient, the central density, and the size of the urban area. If density declines with distance following the negative exponential model, these three values must necessarily be mathematically related. But what affects what? It seems reasonable to believe that the size of the urban area is primarily affected by factors other than the parameters of the negative exponential model.

But what about the model parameters? Housing is long lasting and once established, the patterns in developed areas can remain remarkably stable for many decades. The density of urban development was much higher before widespread use of the automobile. And it turns out that the central densities are very strongly related to the sizes of urban areas in 1910. So it may not be unreasonable to conclude that, at least to some extent the density gradient is determined by the central density and the size of the urban area.

Solving for the mathematical relationship between the gradient, central density, and size yields a somewhat complex expression. However, a simplified approximation can be used. This approximation has the density gradient being directly proportional to the square root of the central density and inversely proportional to the square root of the size of the urban area.

As described in an earlier post and in a paper, I had used my urban patterns data to estimate the parameters of the negative exponential model for large urban areas in the United States from 1950 to 2010. It was straightforward to test for the conformity with the expected relationships among the density gradient, central density, and the size of the urban area. The gradient was indeed approximately inversely proportional to the size of the area, as expected. And the gradient did increase with the central density, though the proportionality was closer to the density itself rather than the square root. It may be possible that this is the result of the fact that the census tract densities in my data (and used by most other researchers) are measures of gross density including nonresidential uses, streets, and vacant land and are therefore lower than the net residential densities within the residential areas alone.

More information on this analysis, including the mathematical derivation of the relationship among the 3 values, is in the paper “Negative Exponential Model Parameters and the Size of Large Urban Areas in the U.S., 1950–2010,” which can be downloaded here.

Transportation and “catalog” retailing

The last post discussed the role of urban transportation improvements leading to the development of the department store in the late nineteenth century. And of course the role of the automobile in shaping retail developments in the twentieth century is obvious. This got me thinking about the role of transportation (and communication) improvements in the evolution of retailing where the customer orders goods from a remote vendor and those goods are delivered to the customer.

We tend to think of modern developments such as e-commerce as novel developments. However, I’m going to start again in the late nineteenth century. But first, a brief excursion into the pros and cons of this type of purchase from the standpoint of the consumer. The major advantage is the selection of goods available, the ability to purchase things that are not available in local retail establishments, along with the convenience of being able to purchase the merchandise without having to travel to a store. The major cons are the inability to physically view the items to be purchased and the delay associated with the need for delivery, the lack of the instant gratification associated with physical purchase. Both introduce some uncertainties into the transaction. I am not mentioning price. The vendor saves money by not having brick-and-mortar stores, but this will be offset at least to some extent by the costs of shipping. This could go either way.

The mail-order catalog business emerged in the late nineteenth century with the major vendors being Sears, Roebuck and Montgomery Ward. The retailers made available a variety of merchandise to residents of rural areas that they otherwise would have been unable to acquire. The development of the railroads along with express freight services and parcel post to deliver the merchandise was undoubtedly a prerequisite. On the communications side, regular reliable mail service had been available for some time. But this mail-order business also required the development of printing technology that enabled production of the catalogs at a reasonable cost. (I don’t know just when this threshold may have been crossed, but I seriously doubt that Ben Franklin could have printed large numbers of Sears catalogs economically.) Just as transportation improvements enabled the rise of the general mail-order catalog, widespread use of the automobile made physical stores accessible to rural residents and led to its decline.

Another wave of remote shopping expanded in the second half of the twentieth century with the growth of specialized catalog shopping with telphone ordering, ranging from clothing (Lands End, L.L. Bean, etc.) to gourmet foods (Dean & Deluca). The attraction to the consumer was access to a wider selection and to specialized goods they could not purchase locally. Some improvements to delivery services helped. UPS did far better than the very long delivery times the post office provided, especially in the past. The role of improvements in communication should not be discounted. Toll-free 800 numbers made the calls free. I imagine customers would not have relished the idea of paying the expensive long-distance charges of the past to make purchases. Again, I don’t know at what point the costs of high-quality color printing for catalogs became reasonable, especially since they send out huge numbers. But it does seem that I saw a lot less color printing in the mid-twentieth century. These catalog retailers also innovated to minimize the risks associated with remote purchasing, offering no-questions-asked returns if something didn’t fit or even if you just didn’t like it.

We finally get to today’s e-commerce. It is noteworthy that Amazon started with books, which have two features favoring this model. First, for any given author and title, all books are the same. There is not the guessing that would be involved in choosing among several green sweaters. And second, with books, the breadth of selection is everything. No brick-and-mortar bookstore can possibly approach the inventory of an online retailer. Amazon and the other online retailers also adopted the policies of the catalog retailers (now, of course, also online) with easy returns and high levels of customer service. Zappos has no problem with your ordering multiple pairs of shoes in order to pick the one pair you want and send the others back.

Obviously the World Wide Web was the innovation on the communications side, making both obtaining informtion on available items and ordering quick and easy. And on the transportation side, the expansion of e-commerce is driving improvements in delivery services, with 2-day and even 1-day delivery becoming commonplace without excessive charges. This, of course, reduces the penalty of having to wait for delivery. Indeed, considering the likelihood of a lag between wanting to purchase an item and having the time to go out to a store, online ordering may be quicker.

Given the rapid developments in e-commerce and speedy delivery, we may be seeing only the first stages in the effects on physical retailers and therefore our urban areas.

Transportation and economies of scale in retailing

The automobile may be blamed for the evolution of big-box retailers, but the effect of improvements to intraurban transportation on retailing began much earlier. This can be seen clearly with the development of the department store in the latter part of the nineteenth century.

In the walking city of the early nineteenth century, most urban residents could only move around on foot. This necessarily limited the distances they could travel and the amounts of goods they could carry. Stores tended to be small and rather limited.

Transportation improvements–horsecars, cable cars, electric streetcars, and more–dramatically increased mobility in urban areas. Cities greatly expanded as residents took advantage of the greater ease of travel. Going to the developing central business districts several miles away became feasible.

A larger number of potential customers could travel to a store located in the downtown area, creating a greater market. This allowed the emergence of the modern department store carrying a far larger range of goods with greater selections. More volume provided greater economies of scale to the store in the sale of its merchandise. But these were also economies of scale from the perspective of their customers, who benefited from the convenience, wider selection, and lower prices.

Coming to shop at the department stores via public transportation did have one limitation, however. Customers purchasing large numbers of items or very large items could find it difficult or impossible to carry their purchases back home with them. The stores recognized this problem and offered delivery of merchandise purchased in the various departments of the store.

This evolution depended solely on the transportation improvements made in the late nineteenth century. It had nothing to do with the automobile. Indeed, at least some department stores continued to assume that significant numbers of their customers would come to their downtown stores using public transportation at least into the 1950s. When growing up and shopping at the large downtown department stores in Milwaukee during that decade, the stores were continuing to offer their delivery services. Of course now, the assumption more often is that customers will be arriving by automobile and can take all but the largest items home themselves.