How to load dataset in Python

Python can be used to read data from a variety of places, including databases and files. Two file types that are often used are .txt and .csv. You can import and export files using built-in Python functionality or Python's CSV library. We’ll go through both options!

How to load dataset in Python
Loading data means transferring data from files to code or vice versa.

Load Data With Built-In Python Functions

To both read from and write to a file, you can use the built-in function  open(), which takes in two parameters: file name and mode. 

File name: the directory path to the file that you want to read or write to. 

Mode: the mode you want to use for the file. The main options are:

  • Read:  "r"

  • Write:  "w"

  • Append:  

    with open("file.txt") as f:
        for line in f:
            #do something with line
            print(line)
    0

  • Read and write:  

    with open("file.txt") as f:
        for line in f:
            #do something with line
            print(line)

To create a new file called “hello.txt” and write “Hello, world!” to it, use the following code:

file = open("hello.txt", "w")
file.write("Hello, world!")
file.close()

 You can also use the 

with open("file.txt") as f:
    for line in f:
        #do something with line
        print(line)
2 statement to read a file line by line:

with open("file.txt") as f:
    for line in f:
        #do something with line
        print(line)

This will print out the input file line by line.

The CSV Library

While the  open()  method can read and write to both .txt and .csv files, you can also use Python’s CSV library to read from and write to CSV files. This library gives you extra functionality.

 Python's official CSV documentation says that the CSV library "allows programmers to say, 'write this data in the format preferred by Excel,' or 'read data from this file which was generated by Excel,' without knowing the precise details of the CSV format used by Excel."

When using the CSV library, you also need to use the  open()  function to open the file, but then you can pass the file to the CSV  

with open("file.txt") as f:
    for line in f:
        #do something with line
        print(line)
5  or  
with open("file.txt") as f:
    for line in f:
        #do something with line
        print(line)
6  methods to read from or write to a file.

Read External Files

Let's start with reading external files. Let’s say you have a CSV file named favorite_colors.csv that looks like the following: 

name,occupation,favorite_color
Jacob Smith,Software Engineer,Purple
Nora Scheffer,Digital Strategist,Blue
Emily Adams,Marketing Manager,Orange

The  

with open("file.txt") as f:
    for line in f:
        #do something with line
        print(line)
7  method will take all the text in a CSV, parse it line by line, and convert each row into a list of strings. You can use different delimiters to decide how to break up each row, but the most common one is a comma. The code snippet below reads the CSV file and prints each row.

import csv

with open('favorite_colors.csv') as file:
    reader = csv.reader(file, delimiter=',')
    for row in reader:
        print row

The output will be the following:

['name', 'occupation', 'favorite_color']
['Jacob Smith', 'Software Engineer', 'Purple']
['Nora Scheffer', 'Digital Strategist', 'Blue']
['Emily Adams', 'Marketing Manager', 'Orange']

While this approach can be helpful sometimes, it treats the header row the same as any other. A more useful method for reading CSVs while recognizing headers to identify the columns is the  

with open("file.txt") as f:
    for line in f:
        #do something with line
        print(line)
8  method. This method knows the first line is a header and saves the rest of the rows as dictionaries with each key as the column name and the value as the column value.

The code below shows how to use the  

with open("file.txt") as f:
    for line in f:
        #do something with line
        print(line)
8  method. 

import csv

with open('favorite_colors.csv') as file:
    reader = csv.DictReader(file, delimiter=',')
    for row in reader:
        print(row['name'] + " works as a " + row['occupation'] + " and their favorite color is " + row['favorite_color'])

The output for this will be:

Jacob Smith works as a Software Engineer and their favorite color is Purple
Nora Scheffer works as a Digital Strategist and their favorite color is Blue
Emily Adams works as a Marketing Manager and their favorite color is Orange

Much more useful, right?

Write to External Files

To understand writing to external files, let’s go back to our web scraping example. We’ve already written the code to extract and transform the data from the UK government services and information website. We have all the titles and descriptions saved as lists of strings. Now we can use the  

import csv

with open('favorite_colors.csv') as file:
    reader = csv.reader(file, delimiter=',')
    for row in reader:
        print row
0  and  
import csv

with open('favorite_colors.csv') as file:
    reader = csv.reader(file, delimiter=',')
    for row in reader:
        print row
1  functions to write the data into a CSV file. 

#Create list for the headers
headers = ["title", "description"]
 
#Open a new file to write to called ‘data.csv’
with open('data.csv', 'w', newline='') as csvfile:
    #Create a writer object with that file
    writer = csv.writer(csvfile, delimiter=',')
    writer.writerow(headers)
    #Loop through each element in titles and descriptions lists
    for i in range(len(titles)):
        #Create a new row with the title and description at that point in the loop
        row = [titles[i], descriptions[i]]
        writer.writerow(row)

And there you have it! Your very own file populated with data scraped from the web. Follow along with the screencast below to go through each line.

Print out your variables to help you keep track of what your code does at each line!

https://vimeo.com/483572506

Now download the code by clicking here and run it on your own in your editor. Take the time to understand what each line does, and feel free to revisit the screencast if needed.

You may have noticed that some instructions in this code repeat. Try and separate some of this functionality into functions on your own. Once you’ve given it a go, check out this file to compare how I’ve done it, but there is no right or wrong answer.

Level-Up: Create, Read, and Write to Files

Time for some practice! 😁

https://api.next.tech/api/v1/publishable_key/2A9CAA3419124E3E8C3F5AFCE5306292?content_id=0f7a256b-78c6-4586-bce1-ae47896651d2 

Level-Up, Bonus Round: Work With CSV Files

Here's a chance to get more comfortable with CSV Files. 😁

https://api.next.tech/api/v1/publishable_key/2A9CAA3419124E3E8C3F5AFCE5306292?content_id=f5158d8b-0360-4159-a7c6-91b223f4f2a1 

Let’s Recap!

  • You load data by reading from or writing to a file.

  • You can read and write to files using Python’s built-in open()method.

  • The 

    import csv
    
    with open('favorite_colors.csv') as file:
        reader = csv.reader(file, delimiter=',')
        for row in reader:
            print row
    3  and  
    import csv
    
    with open('favorite_colors.csv') as file:
        reader = csv.reader(file, delimiter=',')
        for row in reader:
            print row
    4  methods from Python's CSV library make it even easier to work with CSV files in your Python code. 

  • The main modes of writing files are  

    import csv
    
    with open('favorite_colors.csv') as file:
        reader = csv.reader(file, delimiter=',')
        for row in reader:
            print row
    5  for read,  
    import csv
    
    with open('favorite_colors.csv') as file:
        reader = csv.reader(file, delimiter=',')
        for row in reader:
            print row
    6  for write, and  
    import csv
    
    with open('favorite_colors.csv') as file:
        reader = csv.reader(file, delimiter=',')
        for row in reader:
            print row
    7  for append. 

Awesome! You’ve learned how to web scrape by extracting, transforming, and loading data from the web. Next, we’ll delve into the ethical concerns and challenges with web scraping.

Extract and Transform Data With Web Scraping Meet the Challenges to Web Scraping

1

2

Create an
OpenClassrooms account

Wow!

We're happy to see that you're enjoying our courses (already 5 pages viewed today)! You can keep checking out our courses by becoming a member of the OpenClassrooms community. It's free!

You will also be able to keep track of your course progress, practice on exercises, and chat with other members.

Register Sign in

How to load dataset in Python

1

2

Create an
OpenClassrooms account

Only Premium members can download videos from our courses. However, you can watch them online for free.

How to load dataset in Python

1

2

Create an
OpenClassrooms account

Only Premium members can download videos from our courses. However, you can watch them online for free.

Extract Data From the Web Using Python Libraries

  1. Import Python Libraries
  2. Extract and Transform Data With Web Scraping
  3. Load Data With Python
  4. Meet the Challenges to Web Scraping

  • Quiz: Extract Data From the Web Using Python Libraries

Teachers

Will Alexander

Scottish developer, teacher and musician based in Paris.

Raye Schiller

Raye Schiller is a backend software engineer based in New York City and has an MEng. in Computer Science from Cornell University 🙏💻

How do I import a dataset in Python?

Steps to Import a CSV File into Python using Pandas.
Step 1: Capture the File Path. Firstly, capture the full path where your CSV file is stored. ... .
Step 2: Apply the Python code. ... .
Step 3: Run the Code. ... .
Optional Step: Select Subset of Columns..

How to load dataset in Python using pandas?

Pandas Read CSV.
Load the CSV into a DataFrame: import pandas as pd. df = pd.read_csv('data.csv') ... .
Print the DataFrame without the to_string() method: import pandas as pd. ... .
Check the number of maximum returned rows: import pandas as pd. ... .
Increase the maximum number of rows to display the entire DataFrame: import pandas as pd..

How do I import a dataset?

Importing data into a dataset.
If needed, select your dataset from list on the Datasets page to open its Import tab..
Choose the import source for your data: BigQuery, Cloud Storage, or your local computer. Provide the information required. ... .
Click Import to start the import process..

How do you use dataset in Python?

Using Pandas and Python to Explore Your Dataset.
Setting Up Your Environment..
Using the Pandas Python Library..
Getting to Know Your Data. Displaying Data Types. ... .
Getting to Know Pandas' Data Structures. ... .
Accessing Series Elements. ... .
Accessing DataFrame Elements. ... .
Querying Your Dataset..
Grouping and Aggregating Your Data..