Io File Read And Write
1. READING THE FILE
We will learn how to read the three most commonly used file format - csv, xls(xlsx) and json
1.1 Reading a CSV file
CSV is a comma separated key-value pairs, where
key
is the column name andvalue
its corresponding column valuespd.read_csv()
constructor is used to read thecsv
fileFirst (and only required) keyword argument is the location of the file. The returned output is the DataFrame with integer indexes and comma-separated column names as column labels for the DataFrame
However, when we specify the keyword argument
index_col
, we can specify which columns we want to use as row labels
In the example below, we are going to read the file which is available here. I dropped some columns in the file to make it presentable because our main purpose is to read/write files rather than working with actual content of the file. In addition, following short terms are used column labels:
Number of firms = No
Average Unlevered Beta = AU Beta
Average Levered Beta = AL Beta
import numpy as np
import pandas as pd
# location of csv file we are going to read
csv_location = "data/totalbeta.csv"
csv_df = pd.read_csv(csv_location)
# .head() shows only the first five rows in output
csv_df.head()
0
Advertising
47
0.93
1.44
1
Aerospace/Defense
77
1.08
1.23
2
Air Transport
18
0.84
1.44
3
Apparel
51
0.83
1.06
4
Auto & Truck
13
0.53
1.10
Let use the kwarg index_col
to use specified column value as row label:
# Assigning column "Industry Name" as row labels
# Index positon of "industry Name" is 0
pd.read_csv(csv_location, index_col=0).head()
Industry
---
---
---
---
Advertising
47
0.93
1.44
Aerospace/Defense
77
1.08
1.23
Air Transport
18
0.84
1.44
Apparel
51
0.83
1.06
Auto & Truck
13
0.53
1.10
1.2. Reading an Excel File
We can read all spreadsheets in an excel workbook or individual sheets inside the workbook
pd.read_excel
constructor is used to read an excel fileWe can provide keyword argument,
sheet_name
to read a specific sheet in workbook — it can be either integer index or exact sheet name as string.sheet_name=None
returns all sheetsWe can provide keyword argument,
index_col
to set the column name as row label (just like we do inpd.read_csv
)
xls_location = "data/totalbeta.xls"
xls_df = pd.read_excel(xls_location)
xls_df.head()
0
Advertising
47
0.934953
1.439612
1
Aerospace/Defense
77
1.078522
1.231584
2
Air Transport
18
0.843673
1.435348
3
Apparel
51
0.829641
1.055097
4
Auto & Truck
13
0.525735
1.095074
Let’s fetch all sheet names inside the workbook using kwarg
sheet_name=None
and.keys
method:
# reading by sheet_name
# first, let fetch all the sheet names in the workbook
pd.read_excel(xls_location, sheet_name=None).keys()
odict_keys(['US', 'Global', 'Europe', 'Emerging'])
Let’s store each sheet content inside the variable
# storing each sheet in its own dataframe
us_sheet = pd.read_excel(xls_location, sheet_name='US')
global_sheet = pd.read_excel(xls_location, sheet_name='Global')
europe_sheet = pd.read_excel(xls_location, sheet_name='Europe')
emerging_sheet = pd.read_excel(xls_location, sheet_name='Emerging')
# read the first sheet by integer indexing
pd.read_excel(xls_location, sheet_name=0).head()
0
Advertising
47
0.934953
1.439612
1
Aerospace/Defense
77
1.078522
1.231584
2
Air Transport
18
0.843673
1.435348
3
Apparel
51
0.829641
1.055097
4
Auto & Truck
13
0.525735
1.095074
# read the first sheet by its sheet name
pd.read_excel(xls_location, sheet_name='US').head()
0
Advertising
47
0.934953
1.439612
1
Aerospace/Defense
77
1.078522
1.231584
2
Air Transport
18
0.843673
1.435348
3
Apparel
51
0.829641
1.055097
4
Auto & Truck
13
0.525735
1.095074
# reading more than one sheets
excel_file = pd.read_excel(xls_location, sheet_name=[0,-1])
excel_file.keys()
odict_keys([0, -1])
excel_file[-1].head()
0
Advertising
103
1.463880
1.495675
1
Aerospace/Defense
89
1.099578
1.175447
2
Air Transport
86
0.590283
1.111339
3
Apparel
907
0.635054
0.749096
4
Auto & Truck
83
1.127485
1.379411
1.3. Reading a JSON file
Think of json file as a python dictionary with key-value pair
pd.read_json()
constructor is used to read json fileWhen no keyword argument is used: outer keys are used as column labels and inner keys are used as row labels
When we use
orient='index'
keyword argument, outer keys are used as row labels and inner keys are used as column labels
json_location = "data/totalbeta.json"
json_df = pd.read_json(json_location)
json_df.head()
0
Advertising
47
0.93
1.44
1
Aerospace/Defense
77
1.08
1.23
2
Air Transport
18
0.84
1.44
3
Apparel
51
0.83
1.06
4
Auto & Truck
13
0.53
1.10
2. Writing to File
We will write in three most commonly used formats - csv, xls(xlsx) and json
2.1. Writing to a CSV file
.to_csv()
constructor is used to write incsv
formatThe first keyword argument is the name of file
By default,
to.csv()
uses the index labels as first column in the csv file, which can be non-meaningful if just integer, we can provideindex=False
keyword argument to not write the row label into a csv filefor the purpose of demonstration, we will use the data we have stored above under variables
csv_df
,excel_df
andjson_df
# writing
csv_df.to_csv('data/tocsv_file.csv')
# reading the written file
pd.read_csv("data/tocsv_file.csv").head()
0
0
Advertising
47
0.93
1.44
1
1
Aerospace/Defense
77
1.08
1.23
2
2
Air Transport
18
0.84
1.44
3
3
Apparel
51
0.83
1.06
4
4
Auto & Truck
13
0.53
1.10
We can see that the index label is used as first column, which is meaningless in our case, let use index=False
to not use the index as first column
# setting index=False
# writing
csv_df.to_csv('data/tocsv_index_false.csv', index=False)
# reading the written file
pd.read_csv("data/tocsv_index_false.csv").head()
0
Advertising
47
0.93
1.44
1
Aerospace/Defense
77
1.08
1.23
2
Air Transport
18
0.84
1.44
3
Apparel
51
0.83
1.06
4
Auto & Truck
13
0.53
1.10
2.2. Writing to Excel File
.to_excel()
constructor is used to write single sheet to the excel workbookTo write multiple sheets in excel workbook, first we need to load the excel file using
pd.ExcelWriter
, then use it as first argument in the.to_excel()
default label for sheet is
Sheet1
,Sheet2
unless we provide it explicitly usingsheet_name
keyword argumentto avoid writing row label as first column, use
index=False
# writing
xls_df.to_excel('data/toexcel_file.xls', index=False)
# reading the written file
pd.read_excel('data/toexcel_file.xls').head()
0
Advertising
47
0.934953
1.439612
1
Aerospace/Defense
77
1.078522
1.231584
2
Air Transport
18
0.843673
1.435348
3
Apparel
51
0.829641
1.055097
4
Auto & Truck
13
0.525735
1.095074
# writing individual sheets to excel workbook
with pd.ExcelWriter('data/toexcel_file_multiple_sheets.xls') as writer:
us_sheet.to_excel(writer, index=False, sheet_name='usa')
europe_sheet.to_excel(writer, index=False, sheet_name='europe')
global_sheet.to_excel(writer, index=False, sheet_name='global')
emerging_sheet.to_excel(writer, index=False, sheet_name='emerging')
# reading the written file
xls_df_new = pd.read_excel('data/toexcel_file_multiple_sheets.xls', sheet_name=None)
xls_df_new.keys()
odict_keys(['usa', 'europe', 'global', 'emerging'])
xls_df_new['usa'].head()
0
Advertising
47
0.934953
1.439612
1
Aerospace/Defense
77
1.078522
1.231584
2
Air Transport
18
0.843673
1.435348
3
Apparel
51
0.829641
1.055097
4
Auto & Truck
13
0.525735
1.095074
2.3. Write to JSON File
.to_json()
constructor is usedMeaning and application of
orient='index'
in.to_json
will remain the same as explained inpd.read_json()
# writing
json_df.to_json('data/tojson_file.json')
# reading the written file
pd.read_json('data/tojson_file.json').head()
0
Advertising
47
0.93
1.44
1
Aerospace/Defense
77
1.08
1.23
2
Air Transport
18
0.84
1.44
3
Apparel
51
0.83
1.06
4
Auto & Truck
13
0.53
1.10
Last updated
Was this helpful?