Io File Read And Write

1. READING THE FILE

We will learn how to read the three most commonly used file format - csv, xls(xlsx) and json

1.1 Reading a CSV file

CSV is a comma separated key-value pairs, where key is the column name and value its corresponding column values
pd.read_csv() constructor is used to read the csv file
First (and only required) keyword argument is the location of the file. The returned output is the DataFrame with integer indexes and comma-separated column names as column labels for the DataFrame
However, when we specify the keyword argument index_col, we can specify which columns we want to use as row labels

In the example below, we are going to read the file which is available here. I dropped some columns in the file to make it presentable because our main purpose is to read/write files rather than working with actual content of the file. In addition, following short terms are used column labels:

Number of firms = No
Average Unlevered Beta = AU Beta
Average Levered Beta = AL Beta

import numpy as np
import pandas as pd 

# location of csv file we are going to read
csv_location = "data/totalbeta.csv"

csv_df = pd.read_csv(csv_location)

# .head() shows only the first five rows in output
csv_df.head()

Industry

No.

AU Beta

AL Beta

Advertising

0.93

1.44

Aerospace/Defense

1.08

1.23

Air Transport

0.84

1.44

Apparel

0.83

1.06

Auto & Truck

0.53

1.10

Let use the kwarg index_col to use specified column value as row label:

# Assigning column "Industry Name" as row labels
# Index positon of "industry Name" is 0 

pd.read_csv(csv_location, index_col=0).head()

No.

AU Beta

AL Beta

Industry

---

Advertising

0.93

1.44

Aerospace/Defense

1.08

1.23

Air Transport

0.84

1.44

Apparel

0.83

1.06

Auto & Truck

0.53

1.10

1.2. Reading an Excel File

We can read all spreadsheets in an excel workbook or individual sheets inside the workbook
pd.read_excel constructor is used to read an excel file
We can provide keyword argument, sheet_name to read a specific sheet in workbook — it can be either integer index or exact sheet name as string. sheet_name=None returns all sheets
We can provide keyword argument, index_col to set the column name as row label (just like we do in pd.read_csv)

xls_location  = "data/totalbeta.xls"

xls_df = pd.read_excel(xls_location)
xls_df.head()

Industry

No.

AU Beta

AL Beta

Advertising

0.934953

1.439612

Aerospace/Defense

1.078522

1.231584

Air Transport

0.843673

1.435348

Apparel

0.829641

1.055097

Auto & Truck

0.525735

1.095074

Let’s fetch all sheet names inside the workbook using kwarg sheet_name=None and .keys method:

# reading by sheet_name
# first, let fetch all the sheet names in the workbook

pd.read_excel(xls_location, sheet_name=None).keys()

odict_keys(['US', 'Global', 'Europe', 'Emerging'])

Let’s store each sheet content inside the variable

# storing each sheet in its own dataframe

us_sheet = pd.read_excel(xls_location, sheet_name='US')
global_sheet = pd.read_excel(xls_location, sheet_name='Global')
europe_sheet = pd.read_excel(xls_location, sheet_name='Europe')
emerging_sheet = pd.read_excel(xls_location, sheet_name='Emerging')

# read the first sheet by integer indexing

pd.read_excel(xls_location, sheet_name=0).head()

Industry

No.

AU Beta

AL Beta

Advertising

0.934953

1.439612

Aerospace/Defense

1.078522

1.231584

Air Transport

0.843673

1.435348

Apparel

0.829641

1.055097

Auto & Truck

0.525735

1.095074

# read the first sheet by its sheet name

pd.read_excel(xls_location, sheet_name='US').head()

Industry

No.

AU Beta

AL Beta

Advertising

0.934953

1.439612

Aerospace/Defense

1.078522

1.231584

Air Transport

0.843673

1.435348

Apparel

0.829641

1.055097

Auto & Truck

0.525735

1.095074

# reading more than one sheets

excel_file = pd.read_excel(xls_location, sheet_name=[0,-1])
excel_file.keys()

odict_keys([0, -1])

excel_file[-1].head()

Industry Name

No.

AU Beta

AL Beta

Advertising

103

1.463880

1.495675

Aerospace/Defense

1.099578

1.175447

Air Transport

0.590283

1.111339

Apparel

907

0.635054

0.749096

Auto & Truck

1.127485

1.379411

1.3. Reading a JSON file

Think of json file as a python dictionary with key-value pair
pd.read_json() constructor is used to read json file
When no keyword argument is used: outer keys are used as column labels and inner keys are used as row labels
When we use orient='index' keyword argument, outer keys are used as row labels and inner keys are used as column labels

json_location = "data/totalbeta.json"

json_df = pd.read_json(json_location)
json_df.head()

Industry

No.

AU Beta

AL Beta

Advertising

0.93

1.44

Aerospace/Defense

1.08

1.23

Air Transport

0.84

1.44

Apparel

0.83

1.06

Auto & Truck

0.53

1.10

2. Writing to File

We will write in three most commonly used formats - csv, xls(xlsx) and json

2.1. Writing to a CSV file

.to_csv() constructor is used to write in csv format
The first keyword argument is the name of file
By default, to.csv() uses the index labels as first column in the csv file, which can be non-meaningful if just integer, we can provide index=False keyword argument to not write the row label into a csv file
for the purpose of demonstration, we will use the data we have stored above under variables csv_df, excel_df and json_df

# writing
csv_df.to_csv('data/tocsv_file.csv') 

# reading the written file
pd.read_csv("data/tocsv_file.csv").head()

Unnamed: 0

Industry

No.

AU Beta

AL Beta

Advertising

0.93

1.44

Aerospace/Defense

1.08

1.23

Air Transport

0.84

1.44

Apparel

0.83

1.06

Auto & Truck

0.53

1.10

We can see that the index label is used as first column, which is meaningless in our case, let use index=False to not use the index as first column

# setting index=False

# writing
csv_df.to_csv('data/tocsv_index_false.csv', index=False) 

# reading the written file
pd.read_csv("data/tocsv_index_false.csv").head()

Industry

No.

AU Beta

AL Beta

Advertising

0.93

1.44

Aerospace/Defense

1.08

1.23

Air Transport

0.84

1.44

Apparel

0.83

1.06

Auto & Truck

0.53

1.10

2.2. Writing to Excel File

.to_excel() constructor is used to write single sheet to the excel workbook
To write multiple sheets in excel workbook, first we need to load the excel file using pd.ExcelWriter, then use it as first argument in the .to_excel()
default label for sheet is Sheet1, Sheet2 unless we provide it explicitly using sheet_name keyword argument
to avoid writing row label as first column, use index=False

# writing
xls_df.to_excel('data/toexcel_file.xls', index=False)

# reading the written file
pd.read_excel('data/toexcel_file.xls').head()

Industry

No.

AU Beta

AL Beta

Advertising

0.934953

1.439612

Aerospace/Defense

1.078522

1.231584

Air Transport

0.843673

1.435348

Apparel

0.829641

1.055097

Auto & Truck

0.525735

1.095074

# writing individual sheets to excel workbook

with pd.ExcelWriter('data/toexcel_file_multiple_sheets.xls') as writer:
    us_sheet.to_excel(writer, index=False, sheet_name='usa')
    europe_sheet.to_excel(writer, index=False, sheet_name='europe')
    global_sheet.to_excel(writer, index=False, sheet_name='global')
    emerging_sheet.to_excel(writer, index=False, sheet_name='emerging')
    
# reading the written file

xls_df_new = pd.read_excel('data/toexcel_file_multiple_sheets.xls', sheet_name=None)
xls_df_new.keys()

odict_keys(['usa', 'europe', 'global', 'emerging'])

xls_df_new['usa'].head()

Industry

No.

AU Beta

AL Beta

Advertising

0.934953

1.439612

Aerospace/Defense

1.078522

1.231584

Air Transport

0.843673

1.435348

Apparel

0.829641

1.055097

Auto & Truck

0.525735

1.095074

2.3. Write to JSON File

.to_json() constructor is used
Meaning and application of orient='index' in .to_json will remain the same as explained in pd.read_json()

# writing
json_df.to_json('data/tojson_file.json')

# reading the written file
pd.read_json('data/tojson_file.json').head()

Industry

No.

AU Beta

AL Beta

Advertising

0.93

1.44

Aerospace/Defense

1.08

1.23

Air Transport

0.84

1.44

Apparel

0.83

1.06

Auto & Truck

0.53

1.10

PreviousIndexing Pandas Series And Dataframe NextUniversal Functions In Pandas

Last updated 2 years ago

Was this helpful?