Files I/O#

Files#


πŸ‘‰ Files are named locations on disk to store related information. They are used to permanently store data in a non-volatile memory (e.g. hard disk).

πŸ‘‰ Since Random Access Memory (RAM) is volatile (which loses its data when the computer is turned off), we use files for future use of the data by permanently storing them.

πŸ‘‰ When we want to read from or write to a file, we need to open it first. When we are done, it needs to be closed so that the resources that are tied with the file are freed.

Hence, in Python, a file operation takes place in the following order:

  1. Open a file

  2. Close the file

  3. Write into files (perform operation)

  4. Read contents of files (perform operation)

Opening Files#

πŸ‘‰ Python has a built-in open() function to open a file. This function returns a file object, also called a handle, as it is used to read or modify the file accordingly.

>>> f = open("test.txt")  # open file in current directory
>>> f = open("C:/Python99/README.txt")   # specifying full path

πŸ‘‰ We can specify the mode while opening a file. In mode, we specify whether we want to read r, write w or append a to the file.

πŸ‘‰ We can also specify if we want to open the file in text mode or binary mode.

πŸ‘‰ The default is reading in text mode. In this mode, we get strings when reading from the file.

πŸ‘‰ Binary mode returns bytes and this is the mode to be used when dealing with non-text files like images or executable files.

Mode

Description

r

Read -Opens a file for reading only. The file pointer is placed at the beginning of the file. This is the default mode.

t

Text - Opens in text mode. (default).

b

Binary - Opens in binary mode (e.g. images).

x

Create - Opens a file for exclusive creation. If the file already exists, the operation fails.

rb

Opens a file for reading only in binary format. The file pointer is placed at the beginning of the file. This is the default mode.

r+

Opens a file for both reading and writing. The file pointer placed at the beginning of the file.

rb+

Opens a file for both reading and writing in binary format. The file pointer placed at the beginning of the file.

w

Write - Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.

wb

Opens a file for writing only in binary format. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.

w+

Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.

wb+

Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.

a

Append - Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.

ab

Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.

a+

Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.

ab+

Opens a file for both appending and reading in binary format. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.

f = open("test.txt",'w')  # write in text mode
print(f)
<_io.TextIOWrapper name='test.txt' mode='w' encoding='UTF-8'>
f = open("test.txt")   # equivalent to 'r' or 'rt'
print(f)               # <_io.TextIOWrapper name='test.txt' mode='r' encoding='cp1252'>
<_io.TextIOWrapper name='test.txt' mode='r' encoding='UTF-8'>
f = open("logo.png",'wb+')  # read and write in binary mode

Hence, when working with files in text mode, it is highly recommended to specify the encoding type.

f = open("test.txt", mode='r', encoding='utf-8')

Closing files#

πŸ‘‰ When we are done with performing operations on the file, we need to properly close the file.

πŸ‘‰ Closing a file will free up the resources that were tied with the file. It is done using the close() method available in Python.

πŸ‘‰ Python has a garbage collector to clean up unreferenced objects but we must not rely on it to close the file.

f = open("test.txt", encoding = 'utf-8')
# perform file operations
f.close()

This method is not entirely safe. If an exception occurs when we are performing some operation with the file, the code exits without closing the file.

So to avoid this use, exception handling.

try:
    f = open("test.txt", encoding = 'utf-8')
    # perform file operations
finally:
    f.close()

This way, we are guaranteeing that the file is properly closed even if an exception is raised that causes program flow to stop.

The best way to close a file is by using the with statement. This ensures that the file is closed when the block inside the with statement is exited.

We don’t need to explicitly call the close() method. It is done internally.

>>>with open("test.txt", encoding = 'utf-8') as f:
   # perform file operations

Writing to files#

πŸ‘‰ In order to write into a file in Python, we need to open it in write w, append a or exclusive creation x mode.

πŸ‘‰ We need to be careful with the w mode, as it will overwrite into the file if it already exists.

πŸ‘‰ Writing a string or sequence of bytes (for binary files) is done using the write() method. This method returns the number of characters written to the file.

with open("test_1.txt",'w',encoding = 'utf-8') as f:
    f.write("my first file\n")
    f.write("This file\n\n")
    f.write("contains three lines\n")

Reading files#

πŸ‘‰ To read a file in Python, we must open the file in reading r mode.

πŸ‘‰ We can use the read(size) method to read in the size number of data. If the size parameter is not specified, it reads and returns up to the end of the file.

f = open("test_1.txt",'r',encoding = 'utf-8')
txt = f.read()  # read all the characters in the file
print(type(txt))
print(txt)
f.close()
<class 'str'>
my first file
This file

contains three lines

Alternatively, we can use the readline() method to read individual lines of a file. This method reads a file till the newline, including the newline character.

with open("test_1.txt",'r',encoding = 'utf-8') as f:
    txt = f.readlines()
    print(txt)
['my first file\n', 'This file\n', '\n', 'contains three lines\n']

Here is the complete list of methods in text mode with a brief description:

Method

Description

close()

Closes an opened file. It has no effect if the file is already closed.

detach()

Separates the underlying binary buffer from the TextIOBase and returns it.

fileno()

Returns an integer number (file descriptor) of the file.

flush()

Flushes the write buffer of the file stream.

isatty()

Returns True if the file stream is interactive.

read(n)

Reads at most n characters from the file. Reads till end of file if it is negative or None.

readable()

Returns True if the file stream can be read from.

readline(n=-1)

Reads and returns one line from the file. Reads in at most n bytes if specified.

readlines(n=-1)

Reads and returns a list of lines from the file. Reads in at most n bytes/characters if specified.

seek(offset,from=SEEK_SET)

Changes the file position to offset bytes, in reference to from (start, current, end).

seekable()

Returns True if the file stream supports random access.

tell()

Returns the current file location.

truncate(size=None)

Resizes the file stream to size bytes. If size is not specified, resizes to current location…

writable()

Returns True if the file stream can be written to.

write(s)

Writes the string s to the file and returns the number of characters written…

writelines(lines)

Writes a list of lines to the file…

File types#


Text files#

A common file extension, covered in previous sections

Json files#

JSON stands for JavaScript Object Notation. Actually, it is a stringified JavaScript object or Python dictionary.

# dictionary
person_dct= {
    "name":"Anukool",
    "country":"England",
    "city":"London",
    "skills":["Python", "ML","AI"]
}
# JSON: A string form a dictionary
person_json = "{'name': 'Anukool', 'country': 'England', 'city': 'London', 'skills': ['Python', 'ML','AI']}"

# we use three quotes and make it multiple line to make it more readable
person_json = '''{
    "name":"Anukool",
    "country":"England",
    "city":"London",
    "skills":["Python", "ML","AI"]
}'''

To convert from JSON to dictionary we use json.loads() method

import json
# JSON
person_json = '''{
    "name":"Anukool",
    "country":"England",
    "city":"London",
    "skills":["Python", "ML","AI"]
}'''
# let's change JSON to dictionary
person_dct = json.loads(person_json)
print(type(person_dct))
print(person_dct)
print(person_dct['name'])
<class 'dict'>
{'name': 'Anukool', 'country': 'England', 'city': 'London', 'skills': ['Python', 'ML', 'AI']}
Anukool

To convert the dictionary into JSON, we use the json.dumps() method.

import json
# python dictionary
person = {
    "name":"Anukool",
    "country":"England",
    "city":"London",
    "skills":["Python", "ML","AI"]
}
# let's convert it to  json
person_json = json.dumps(person, indent=4) # indent could be 2, 4, 8. It beautifies the json
print(type(person_json))
print(person_json)

# when you print it, it does not have the quote, but actually it is a string
# JSON does not have type, it is a string type.
<class 'str'>
{
    "name": "Anukool",
    "country": "England",
    "city": "London",
    "skills": [
        "Python",
        "ML",
        "AI"
    ]
}

You can save it as a json file using the json.dump() method.

import json
# python dictionary
person = {
    "name":"Anukool",
    "country":"England",
    "city":"London",
    "skills":["Python", "ML","AI"]
}
with open('json_example.json', 'w', encoding='utf-8') as f:
    json.dump(person, f, ensure_ascii=False, indent=4)

File management#

πŸ‘‰ If there are a large number of files to handle in our Python program, we can arrange our code within different directories to make things more manageable.

πŸ‘‰ A directory or folder is a collection of files and subdirectories. Python has the os module that provides us with many useful methods to work with directories (and files as well).

getcwd()#

We can get the present working directory using the getcwd() method of the os module.

import os
print(os.getcwd())

chdir()#

We can change the current working directory using the chdir() method of the os module.

import os

os.chdir(r"C:\Users\Anukool\xyz") 
  
print("Directory changed") 

print(os.getcwd())

mkdir() & listdir()#

πŸ‘‰ We can create a new directory using the mkdir() method of the os module. πŸ‘‰ The listdir() method displays all files and sub-directories inside a directory.

import os
os.mkdir('python_study')
print("Directory created") 

os.listdir()

rmdir()#

We can remove a directory using the rmdir() method of the os module.

import os
os.rmdir('python_study')

There are many more functions which are supported by os module which makes it easier to interact for various system level operations