Extract Column from Tables | Chenyu's Script

Problem description

I have some census data, which are 20 excel files for 20 year respectively (1980 - 1999). Each of the files contains county names and corresponding census data, e.g., GRDP, population, education.

The aim is to get GDP data from each year, and save them in one excel file.

I used Python 3.7 pandas packge to solve this problem.

Code

Function 1: get GPD column from a excel file

# get GRDP and name from a certain year

def getGRDP(year):
    # read data from excel
    file_name = "Jiangsu_" + str(year) + ".xls"
    data = pd.read_excel(file_name)
    
    # form column names
    name = "_" + str(year) + "$.name"
    gdp = "_" + str(year) + "$.GRDP"
    
    # get name and GRDP column
    data = data.loc[:, [name, gdp]][:60]
    
    return data

Function 2: join a column to an existing table

# join a table to the table of 1999

def join2_table(year, result): 
    
    targetGDP = getGRDP(year)
    
    # get index name
    index_name = "_"+ str(year) + "$.name"
        
    result = result.join(targetGDP.set_index(index_name))
    
    return result

Global variable

# initialize
result = getGRDP(1999)

# set key for the table
result.set_index("_1999$.name")

Add all the GPD column to the global variable

for i in range(1998, 1980, -1):
    data = getGRDP(i)
    index_name = '_' + str(i) + "$.name"
    result = result.join(data.set_index(index_name))