Combining Datasets In Pandas
A: COMBINING DATASETS: CONCAT AND APPEND
import numpy as np
import pandas as pd Function to construct DataFrame
def make_df(columns, indices):
data = {c: [f"{c}{i}" for i in indices]
for c in columns}
return pd.DataFrame(data, indices)# understanding the components of function
data = {c: [f"{c}{i}" for i in range(3)]
for c in 'ABC'}
data{'A': ['A0', 'A1', 'A2'], 'B': ['B0', 'B1', 'B2'], 'C': ['C0', 'C1', 'C2']}# example to run function
example_df = make_df('ABC', [1,2,3])
print(example_df)1. SIMPLE CONCATENATION WITH pd.concat
Parameter
Default
1.1. Concatenating Series
1.2. Concatenating DataFrame
1.3. Duplicate Indices
a) Handling duplicate indices through varify_integrity
varify_integrityb) Using keys Argument
keys Argument1.4. Concatenation with βJoinβ
1.5. The append() method
B: COMBINING DATASETS: MERGE AND JOIN
1. CATEGORIES OF MERGE
1.1. One-to-one Merge
1.2. Many-to-one Merge
1.3. Many-to-many Merge
2. SPECIFICATION OF MERGE KEY
2.1. The on Keyword Argument
on Keyword Argument2.2. The left_on and right_on Keyword Argument
left_on and right_on Keyword Argument2.3. drop method
drop method2.4. The left_index and right_index
left_index and right_index2.5. Join method
2.6. Mix Merge with Indices and Columns
3. SPECIFYING how TO MERGE
how TO MERGE3.1. how='inner'
how='inner'3.2. how='outer'
how='outer'3.3. how=βleftβ
how=βleftβ3.4. how=βrightβ
how=βrightβ4. OVERLAPPING COLUMN NAMES
Last updated