Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania¶

Card, D., & Krueger, A. B. (1993)¶

On April 1, 1992, New Jersey's minimum wage rose from $ 4.25 to $ 5.05 per hour. The work Card and Kruegar (1993) evaluates the impact of the law, by surveying 410 fast-food restaurants in New Jersey and eastern Pennsylvania before and after the rise. Comparisons of employment growth at stores in New Jersey and Pennsylvania (where the minimum wage was constant) provide simple estimates of the effect of the higher minimum wage. They also compare employment changes at stores in New Jersey that were initially paying high wages (above $5) to the changes at lower-wage stores. They find no indication that the rise in the minimum wage reduced employment.(adapted from abstract)

The objective of this notebook is to replicate the findings of the paper above, and practise DiD along the way :)

In [1]:

Copied!

import pandas as pd
import numpy as np
import pandas as pd
import numpy as np

Data¶

Download and Preprocessing¶

In [2]:

Copied!





# Need to done only once

import os, io, re
import zipfile
import requests
from io import StringIO
# Need to done only once

import os, io, re
import zipfile
import requests
from io import StringIO

In [3]:

Copied!





# # Create data folder
# os.makedirs("data/original", exist_ok=True)
# zip_path = os.path.join("data/original", "njmin.zip")
# codebook_path = "data/original/codebook"
# data_path = "data/original/public.dat"

# # Download from zip from source and unzip
# url = "http://davidcard.berkeley.edu/data_sets/njmin.zip"
# r = requests.get(url)
# with open(zip_path, "wb") as f:
#     f.write(r.content)

# with zipfile.ZipFile(zip_path, "r") as zf:
#     zf.extractall("data/original")

# z = zipfile.ZipFile(io.BytesIO(r.content))

# # Read the codebook ( legacy encoding )
# with open(codebook_path, "r", encoding="latin1") as f:
#     codebook_text = f.read()
    
# pattern = re.compile(r"^([A-Z0-9_]+)\s+(\d+)\s+(\d+)\s+\d+\.\d+", re.MULTILINE)

# colspecs = []
# names = []

# for match in pattern.finditer(codebook_text):
#     varname = match.group(1).lower()
#     start = int(match.group(2)) - 1  # convert to 0-based index for pandas
#     end = int(match.group(3))
#     names.append(varname)
#     colspecs.append((start, end))
    
# # Read the data file     
# df = pd.read_fwf(data_path, colspecs=colspecs, names=names)
# df = df.apply(pd.to_numeric, errors="coerce")
# df.to_csv("data/card-and-kruegar-1993.csv", index=False)

# df.sample(5)
# # Create data folder
# os.makedirs("data/original", exist_ok=True)
# zip_path = os.path.join("data/original", "njmin.zip")
# codebook_path = "data/original/codebook"
# data_path = "data/original/public.dat"

# # Download from zip from source and unzip
# url = "http://davidcard.berkeley.edu/data_sets/njmin.zip"
# r = requests.get(url)
# with open(zip_path, "wb") as f:
#     f.write(r.content)

# with zipfile.ZipFile(zip_path, "r") as zf:
#     zf.extractall("data/original")

# z = zipfile.ZipFile(io.BytesIO(r.content))

# # Read the codebook ( legacy encoding )
# with open(codebook_path, "r", encoding="latin1") as f:
#     codebook_text = f.read()
    
# pattern = re.compile(r"^([A-Z0-9_]+)\s+(\d+)\s+(\d+)\s+\d+\.\d+", re.MULTILINE)

# colspecs = []
# names = []

# for match in pattern.finditer(codebook_text):
#     varname = match.group(1).lower()
#     start = int(match.group(2)) - 1  # convert to 0-based index for pandas
#     end = int(match.group(3))
#     names.append(varname)
#     colspecs.append((start, end))
    
# # Read the data file     
# df = pd.read_fwf(data_path, colspecs=colspecs, names=names)
# df = df.apply(pd.to_numeric, errors="coerce")
# df.to_csv("data/card-and-kruegar-1993.csv", index=False)

# df.sample(5)

In [3]:

Copied!

df = pd.read_csv('data/card-and-kruegar-1993.csv')
df.sample(5)
df = pd.read_csv('data/card-and-kruegar-1993.csv')
df.sample(5)

Out[3]:

	sheet	chain	co_owned	state	southj	centralj	northj	...	firstin2	special2	meals2	open2r	hrsopen2	psoda2	pfry2	pentree2	nregs2	nregs112
239	132	3	1	1	0	1	0	...	0.25	0.0	2.0	7.0	14.0	NaN	NaN	NaN	2.0	2.0
158	356	4	0	1	1	0	0	...	0.15	1.0	1.0	10.5	11.5	1.00	0.85	1.05	2.0	2.0
93	315	1	0	1	1	0	0	...	0.08	0.0	2.0	7.0	16.0	1.05	0.84	0.87	3.0	2.0
303	235	1	0	1	0	0	1	...	0.50	1.0	2.0	8.0	15.0	1.11	1.11	1.03	4.0	2.0
94	335	1	1	1	1	0	0	...	0.13	0.0	1.0	6.0	17.0	1.05	0.94	0.94	5.0	2.0

5 rows × 46 columns

Codebook decoded¶

In [4]:

Copied!

# STATE           9        9     1.0   1 if NJ; 0 if Pa 
df['state'] = df['state'].apply(lambda x: 'NJ' if x else 'PA')
# STATE           9        9     1.0   1 if NJ; 0 if Pa 
df['state'] = df['state'].apply(lambda x: 'NJ' if x else 'PA')

In [5]:

Copied!





#"CHAIN           5        5     1.0   chain 1=bk; 2=kfc; 3=roys; 4=wendys"
decode_chains = {
    1 : "Burger King",
    2: "KFC",
    3: "Roy Rogers",
    4: "Wendy's"
}
df['chain'] = df['chain'].map(decode_chains)
#"CHAIN           5        5     1.0   chain 1=bk; 2=kfc; 3=roys; 4=wendys"
decode_chains = {
    1 : "Burger King",
    2: "KFC",
    3: "Roy Rogers",
    4: "Wendy's"
}
df['chain'] = df['chain'].map(decode_chains)

In [6]:

Copied!

#CO_OWNED        7        7     1.0   1 if company owned
df.rename(columns = {'co_owned':'is_company_owned'}, inplace=True)
df['is_company_owned'] = df['is_company_owned'].apply(lambda x: True if x else False)
#CO_OWNED        7        7     1.0   1 if company owned
df.rename(columns = {'co_owned':'is_company_owned'}, inplace=True)
df['is_company_owned'] = df['is_company_owned'].apply(lambda x: True if x else False)

Table 2: Means of Key Variables¶

No description has been provided for this image

Distribution of Store Types (percentages)¶

In [7]:

Copied!

# Count stores by chain × state
store_distrib = df.groupby(["chain", "state"]).size().unstack(fill_value=0)
store_distrib
# Count stores by chain × state
store_distrib = df.groupby(["chain", "state"]).size().unstack(fill_value=0)
store_distrib

Out[7]:

state	NJ	PA
chain
Burger King	136	35
KFC	68	12
Roy Rogers	82	17
Wendy's	45	15

In [8]:

Copied!





# Compute column-wise percentages
store_distrib = store_distrib.div(store_distrib.sum(axis=0), axis=1) * 100
store_distrib.columns = ["PA", "NJ"]

# Sort by chain for tidy comparison
store_distrib = store_distrib.loc[["Burger King", "KFC", "Roy Rogers", "Wendy's"]]
store_distrib
# Compute column-wise percentages
store_distrib = store_distrib.div(store_distrib.sum(axis=0), axis=1) * 100
store_distrib.columns = ["PA", "NJ"]

# Sort by chain for tidy comparison
store_distrib = store_distrib.loc[["Burger King", "KFC", "Roy Rogers", "Wendy's"]]
store_distrib

Out[8]:

	PA	NJ
chain
Burger King	41.087613	44.303797
KFC	20.543807	15.189873
Roy Rogers	24.773414	21.518987
Wendy's	13.595166	18.987342

In [9]:

Copied!

store_distrib.loc['Company owned'] = df.groupby("state")["is_company_owned"].mean() * 100
store_distrib.round(1)
store_distrib.loc['Company owned'] = df.groupby("state")["is_company_owned"].mean() * 100
store_distrib.round(1)

Out[9]:

	PA	NJ
chain
Burger King	41.1	44.3
KFC	20.5	15.2
Roy Rogers	24.8	21.5
Wendy's	13.6	19.0
Company owned	35.4	34.1

Log¶

Oct 20, 2025: Completed Section 1 of Table 2. The main challenge today was understanding how to work with specific type of data. Also came across a legacy encoding style for the first time today :)