There are several packages available for Python to import CSV files. However, I prefer using the pandas package. Here is an example of how to import a csv file to a variable in Python using pandas.
Note: This example is based on a Python 3.10.x virtual environment in Visual Studio Code on macOS.
Install pandas
I have already explained how to install a python package in a virtual environment in visual studio code. Here is the pip installation code for pandas.
pip install pandas
data:image/s3,"s3://crabby-images/12074/12074a5d6d680e817a585f3f7b06e263ada0da18" alt="Installing pandas package in a Python environment"
Syntax
Here is the syntax of the code to be used to import a csv file to a variable.
# Import the pandas module
import pandas as ps
# Use read_csv in pandas to read the csv file and assign it to a variable
df = ps.read_csv('Path of the source csv file')
Here read_csv is a function in pandas, which reads the csv files and returns it as a 2 dimensional array called DataFrame.
Examples
Import a CSV file to a variable
Here is a simple example on how to import a csv file called costs-price.csv from a folder called samples using the pandas.read_csv function. Then assigning the retrieved DataFrame or the 2 dimensional array to a variable. Finally print the DataFrame to the terminal
import pandas as ps
csv_data = ps.read_csv('samples/costs-price.csv')
print(csv_data)
data:image/s3,"s3://crabby-images/8b6ed/8b6ed0e6ba47df4be1e277dc6dd8e84ab41c8c43" alt="import a csv file to a variable in Python"
Segregating few columns from the imported data
In the previous example, after importing the data from csv file, I have just printed the data. If you want to work with just a few selected columns, then you can use the pandas.DataFrame() class. Let’s modify the above example to list down the columns hlpi_name, year, income and expenditure.
import pandas as ps
csv_data = ps.read_csv('samples/costs-price.csv')
data_partial = ps.DataFrame(
csv_data, columns=['hlpi_name', 'year', 'income', 'expenditure'])
print(data_partial)
data:image/s3,"s3://crabby-images/6e3e7/6e3e743cb2ad36a8b8031dd6986f4125db3d1bfa" alt="Segregating few columns from the imported data"
Filtering rows based on column value
Furthermore, if you want to filter the number of records based on values in a column, then we can use the pandas.DataFrame.loc[] property. In our example, let us display only the records which are for the year 2020.
import pandas as ps
csv_data = ps.read_csv('samples/costs-price.csv')
data_partial = ps.DataFrame(
csv_data, columns=['hlpi_name', 'year', 'income', 'expenditure'])
data_2020 = data_partial.loc[data_partial['year'] == 2020 ]
print(data_2020)
data:image/s3,"s3://crabby-images/814f0/814f056f45223906c7aa8b399d001205da7a949f" alt="Filtering rows based on column value"
Reference
- More about the pandas package at GitHub.
- More about pandas.DataFrame at Pandas Documentation.