On the pursuit of efficiency in different types of retailing

Many forms of retailing have experienced transformations aimed at increaising efficiency, especially increasing the size of stores and reducing customer service. As I thought about this, I realized that these changes happened at very different times for differnet types of retailing. I decided it would be interesting to look at the differences in timing.

For general merchandise, the department store came as the very early innovation in the mid–19th century, with large stores offering very broad ranges of goods. But it took a century before the self-service discount department store emerged with very limited assistance within the store and customers paying up front. I looked up some dates and found an amazing coincidence: Walmart, Kmart, and Target were all founded in 1962.

For groceries, the order of innovation was reversed. Piggly Wiggly was established as the first self-service grocery in 1916. But like other grocers at the time, it did not carry items such as meat and produce. The first “true” supermarket, King Kullen, did not arrive until 1930. (This was established by research conducted by the Food Marketing Institute and the Smithsonian!)

Looking at more specialized forms of retailing, Toys “R” Us was early, having been founded in 1957. Home Depot opened its first two home improvement superstores in 1979. Lowes, their largest competitor, was founded decades earlier but did not adopt the big-box format until forced to by the competition from Home Deport.

The coincidences of several large-format retailers being founded at about the same time was not limited to discount department stores. Office Depot and Staples were both established in 1984, with Office Max coming only two years later. Best Buy was originally formed as a specialty audio retailer. It changed to its current name, expanded into other types of consumer electronics, and opened its first superstore in 1983. Circuit City followed a similar evolution, changing its name and expanding to become a general consumer electronics retailer a year later. PetSmart was founded in 1985. Petco started much earlier as a mail-order business. I could not find when they first opened physical stores.

I find it fascinating that the big box stores in so many categories started around the same time in the 1980s. Why then? The efficiency of large self-service stores had been established far earlier with supermarkets and discount department stores. Toys “R” us started as a specialized “category killer” retailer over two decades earlier, and the example of Home Depot had been around for a number of years. And suddenly retailers in very different categories moved in that direction at about the same time.

Economies of scale in retailing

In the previous post, I discussed how economies of scale, enabled by improvements to transportation, led to the development of segregated land uses in the 19th century. Now I’d like to focus on the various economies of scale in retailing and their implications.

The obvious benefits of larger stores and the ability to take advantage of economies of scale accrue to the retailer, of course. The larger store will likely require fewer employees per volume of goods sold. There may also be other efficiency benefits relating to things like inventory. A wider range of goods can be stocked, making the store more appealing to customers.

Economies of scale can also benefit the customers. Wider selection is in most instances a plus. I don’t know how my daughter could have constructed her science fair project, a closed-circuit wind tunnel, without Lowes and Menards. And to the extent larger stores and the attendant economies of scale reduce costs and result in lower prices, customers benefit.

Larger stores selling more goods require a larger market and as a result they will be more widely spaced. It might have been possible to have neighborhood hardware stores, but there cannot be a comparable neighborhood Home Depot.

Beyond these internal economies of scale, external economies of scale or agglomeration economies are also very important for retail land use patterns. Some types of retailers choose to cluster with other similar outlets because customers are attracted by the opportunity to comparison shop. Other businesses locate near larger retailers to take advantage of the customer traffic they generate. This can be within a development, whether it is a large shopping center anchored by department stores or a smaller neighborhood center anchored by a supermarket. Or it can take place simply by locating in the vicinity of the larger, traffic generating businesses, whether in central business districts or outlying retail areas. In either case, the product is larger areas of segregated land uses.

Segregation of land uses is not new

A great deal of (negative) attention has been devoted to the segregation of land uses in newly developed suburban areas in recent decades. The critique is that the development of exclusively residential neighborhoods and the segregation of commercial activities reduces opportunities for walking, requiring increased automobile use. This is sometimes portrayed as a recent phenomenon, bought on by the widespread use of the automobile.

Some perspective is in order, however. Land use segregation is hardly a product of the latter part of the 20th century. The original cause was not the use of the automobile (though transportation was critical). Rather, the initial separation of land uses in American cities dates to the 19th century.

The pre-industrial walking city at the start of the 19th century had very limited separation of different land uses. Given that interaction was limited by reasonable walking distance, different activities just could not be located that far apart.

As the industrial city emerged in the 19th centure, this changed as enterprises sought to capture the advantages of economies of scale and was made possible by improvements in transportation within the city. First the omnibus, then the horsecar, and then electric streetcars and mass transit increased the distances people could travel to work and shop. Factories increased in size and formed increasingly large industrial areas. Larger enterprises required management by concentrations of office workers. The department store emerged to provide a previously unseen variety of goods to shoppers from throughout the urban area. The offices, department stores, and related retail formed the new central business districts, another area of largely segregated land uses.

Of course not all types of establishments saw these increases in scale in the 19th century. For grocery stores, the changes came later. But this was the start of increasing sizes of enterprises, made possible by improvements in transportation, forming areas of segregated land use.

Some urban researchers are careless…and wrong

I have read a number of scholarly articles in which the authors were using census Urbanized Area data from 2000 or later in which they described those areas as consisting of territory with a population density of 1,000 or more. And that is incorrect. The density threshold for adding blocks or other small areas to an Urbanized Area (or Urban Cluster) is 500 persons per square mile. I’m not into naming and shaming and won’t. But come on! If you can’t even describe the data you are using accurately, why should anyone trust anything else you are saying?

I know where the error comes from. Starting with the 2000 census, the Census Bureau dramatically changed how they defined the notion of “urban” and Urbanized Areas (for the most part greatly improving the definition). Under the old definition, it was the case that a small area had to have a population density of at least 1,000 persons per square mile to be included in an Urbanized Area. An excellent summary of how the census definition of “urban” has evolved can be found here.

I assume that a researcher making this error had read earlier articles that described Urbanized Areas as consisting of areas with densities of 1,000 or more (either correctly, if referring to pre–2000 Urbanized Areas or incorrectly, if referring to the later areas). I expect this would be the source, not the census definition of the earlier Urbanized Areas, for if these authors were too careless and lazy to look up the definition for their current work, they likely would not have done so in the past either.

The current Urbanized Area density minimum plays a key role in the definition of urban areas for my urban patterns research. And of course I am continuing to read new articles that are published that deal with urban patterns, including those using Urbanized Area data. The first few times I read articles referring to the 1000-person-per-square-mile cutoff for 2000 or 2010 Urbanized Areas, I panicked. Did I make a mistake in understanding the definition and get it wrong? (It is a complex definition.) Each of those times I went back and re-read the formal notices on urban area criteria for 2000 and 2010  in the Federal Register. After having assured myself several times that I was correct, I no longer have to repeat this.

Technical note

The 2000 and 2010 urban area criteria do make use of a population density minimum of 1,000 persons per square mile in the first stage of the delineation process. An urban area core is defined that includes small areas with population densities of 1,000 or more. Then additional areas are added with densities of 500 persons per square mile and above. The existence of an initial urban area core meeting the higher density threshold will not be an issue for Urbanized Areas.

On the sharing of data from research

The National Academies recently released a report addressing integrity in scientific research, including the social and behavioral sciences. One of the recommendations is that after publication researchers share with others the data on which an article is based. This supports research transparency and should lead to greater reproducability of research. I think sharing data is generally a good thing, and I have done so. But I feel that the authors of the report have failed to address some important issues related to such data sharing. This is obviously a topic much broader that the subject of this blog. But at two points in my comments I will give examples that relate directly to things discussed here.

In discussing the recommendation on data sharing, the report points favorably to the policies of some journals requiring that authors make the data for an article availble to others on request. But further discussion in the report strongly implies that data should be made available in a repository from which anyone can download it. The difference is significant because in making the data available online, the person(s) who created the data then lose all control over how it might be used.

But first, a simple, practical issue. Making data available for others to use entails a significant amount of work relating to formatting, documentation, and so forth. I am very careful about documenting my data as I do research, but that original documentation is completely meaningful only to me. Just sharing data with co-authors requires some additional effort. Sharing it publicly would require more. I suspect that the majority of datasets from articles would never be used by others. So it is inefficient to put the work in for every dataset to get it into a form in which it can be shared. It makes much more sense to put in this effort when someone makes the request to use the data. At that point, I am happy to do so.

The authors of the report (many from the natural sciences) seem to most often view datasets as the products of experiments, to be reported in an paper, which then is the end of the story. Indeed, they actually see as a problem “the temptation to publish multiple papers on just one experiment or dataset.” (p. 17) They fail to realize that for certain types of research, datasets are developed, often with a great deal of effort, to support the investigation of multiple research questions. Those creating the data have a reasonable expectation of being able to carry out their research without having it preempted by others using their data.

My urban patterns dataset with data on housing units by census tract for 59 large urban areas from 1950 to 2010 is an example of this. I spent at least a year-and-a-half building the dataset. I have a long list of research questions I intend to address using this dataset. The papers currently on the Research page represent just a start. I feel that it is reasonable that I shold be able to be the first to use this data to address these questions. I certainly would not have put in the effort I did in creating the dataset only for one or two papers. This does not mean that I would be unwilling to share the data with others before I have completed this program. I’m finished with all of the questions I have intended to address relating to the negative exponential model. If someone wants to do more, I could be willing to share the data. Or if someone wants to combine my data with some other dataset, sharing could be appropriate. But that’s why I believe I need to have control over the sharing.

I was surprised that the authors of the report failed to address reputational risks that could be associated with data sharing (and by this, I am not including risks associated with others finding out about problems with the original research). Putting data on an archive for anyone to use can result in uses that can negatively impact the reputation of the data creator.

The first (and least significant) reputational risk comes from someone taking the data and producing and publishing a very crappy piece of work. While most such efforts are justifiably ignored, occasionally they will achieve notoriety for their sheer absence of quality. Assuming the author of the crappy research appropriately cites the creator of the data, the creator will forever be linked with the work. While everyone should understand that the data creator is not responsible, just being associated would not the most pleasant thing.

For certain types of data, the reputational risk can be much greater. For example, suppose researchers post data dealing with a social problem that includes information on race. A white supremacist could obtain the data, improperly manipulate it, and falsely claim that the results supported their racist views. And they might well prominently note that the creators of the data were respected researchers at a major university. Such a nightmare scenario is why researchers have a legitimate interest in controlling the sharing of their data.

For researchers working in a field involving contentious positions with extremely strong partisans on both sides, risks can extend to the use of the data by others in that field. Getting back to the subject of this blog, urban sprawl and its effects represents just such a field. A study is published indicating that sprawl or compact cities does or does not have some effect, and those whose position has not been supported can be vociferous in their attacks and arguments against it. This has happened–in both directions. I have no doubt that if the data from such a study were made freely available for download that that someone whose position had not been supported might reanalyze the data making the assumptions necessary to reach the opposite conclusion in an attempt to discredit the original study and its author.

On the choice of Combined Statistical Areas

Last year, I wrote a post discussing why I chose to use the larger Combined Statistical Areas (CSAs) for my urban patterns research rather than the commonly used Metropolitan Statistical Areas (MSAs). I followed this up with a second post giving examples of how the sharing of transportation infrastructure–commuter rail and airports–could be an indicator of the integration of areas that should be considered together as a single, larger metropolitan area.

This decision to use the CSAs is of such fundamental importance to my research that I felt it deserved more extended, formal treatment. I prepared the paper “On the Choice of Combined Statistical Areas” that provides greater background, covers the topics addressed in those blog posts in more detail, and addresses some other implications of the the choice of CSAs over MSAs. It also shows how the CSAs are comparable in extent to MSAs as they had been defined earlier for the 2000 census. This last topic was also addressed in an earlier post.

The paper is posted on the Research page of the website and can also be downloaded here.

Scattered, leapfrog development vs. low-density development

Two residential development patterns are most often associated with urban sprawl. Scattered or leapfrog development refers to the building of new residences, either separately or in a subdivision, at some distance from existing built-up areas. Low-density development refers to the construction of individual houses on larger lots. It is possible, of course, for scattered development to also be done on larger lots, though this is not the distinguishing feature of such leapfrog patterns.

In looking at densities in areas larger than the actual residential lots, scattered development will also be low-density because of the vacant land that has been skipped over. But I think most people would agree that scattered, leapfrog development and low-density development are two distinctive types of residential development and sprawl.

But how different are the two? Here is a thought experiment: Imagine an undeveloped area of land one mile square at the edge of an urban area. Now consider two ways in which this land might be developed. The first will be very low-density development. The area is divided into 64 10-acre lots and a house is built on each, completely developing the area. (For simplicity, we will be ignoring the need for land for roads to provide access.) Let’s assume that the owners only landscape small areas surrounding their residences, leaving the remainder of their lots undisturbed.

Now consider the second alternative of extreme scattered development. Sixty-four houses are constructed on 1-acre lots that are fairly evenly distributed across the mile-square area. In this case, only 10 percent of the land has been developed, but the developed area is 10 times as dense as in the previous case. Now suppose that these 64 houses are exactly the same as the very-low density houses and are located in exactly the same places. There would literally be no way to distinguish the scattered development from the very low-density development based on any physical characteristics of the developments. The only way to tell whether the development is very low-density or scattered is by looking at the land records.

Of course that difference in ownership matters–to a degree. The owners of the homes on the scattered 1-acre lots have no control over the undeveloped 90 percent of the land, which could be developed at any time. In the case of the very low-density development, each owner exercises control over 10 acres surrounding the residence by virtue of ownership. One might assume that these very large lots were acquired because the owners wanted the space and the control. (Though it is possible that land use regulations and/or choices made by the prior owner or developer of the area limited options available to the purchasers of these lots.) It is very likely that you are not going to see the purchasers of these large lots soon subdividing their land for higher-density development.

But the operative word here is “soon.” Over time, as demand increases and conditions change, further subdivision and development in the very low-density area becomes an increasing possibility. I currently live in an area that was developed from that late 1960s through the early 1980s with lots around a half acre. There are several lots in the neighborhood where the owners have built a second, substantial house on the rear portion of the lot (more than just an accessory unit or “granny flat.”)

So the very low-density developed area perhaps is not that completely different from the area with the scattered development.