Creating and Writing to Different Type of Files in Python

Creating and Writing to Different Type of Files in Python

In this blog, we will create and write different type of file in Python. We will create and write text file, word file, tsv file, csv file, excelsheet and json.

Writing to a text file

Write text which we want to add in text file.

In [1]:
lines = [
    'Alice in Wonderland (2010 film)',
    'Alice in Wonderland is a 2010 American dark fantasy period film directed by Tim Burton from a screenplay written by Linda Woolverton.',
    'The film stars Mia Wasikowska in the title role, with Johnny Depp, Anne Hathaway, Helena Bonham Carter, Crispin Glover, and Matt Lucas, and features the voices of Alan Rickman, Stephen Fry, Michael Sheen, and Timothy Spall.',
    'Alice in Wonderland was produced by Walt Disney Pictures and shot in the United Kingdom and the United States. ',
    'The film premiered in London at the Odeon Leicester Square on February 25, 2010.'
]

Write in text file

In [3]:
with open('alice-in-wonderland.txt', 'w') as f:
    for line in lines:
        f.write(line)
        f.write('\n')
        pass
    pass

View after writing text file

Writing to a word file

python-docx

python-docx is a Python library for creating and updating Microsoft Word (.docx) files.

Installation via pip

pip install python-docx

Import module

In [5]:
from docx import Document

Write to a word file

In [10]:
document = Document()

document.add_heading('Alice in Wonderland (2010 film)', 0)
document.add_paragraph('Alice in Wonderland is a 2010 American dark fantasy period film directed by Tim Burton from a screenplay written by Linda Woolverton. \n')
document.add_paragraph('The film stars Mia Wasikowska in the title role, with Johnny Depp, Anne Hathaway, Helena Bonham Carter, Crispin Glover, and Matt Lucas, and features the voices of Alan Rickman, Stephen Fry, Michael Sheen, and Timothy Spall.')

document.add_paragraph('Alice in Wonderland was produced by Walt Disney Pictures and shot in the United Kingdom and the United States. ')
document.add_paragraph('The film premiered in London at the Odeon Leicester Square on February 25, 2010.')

document.save('alice-in-wonderland.docx')

View after creating and writing a word file

Writing to a CSV file

csv — CSV File Reading and Writing

The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats.

The csv module's reader and writer objects read and write sequences. Programmers can also read and write data in dictionary form using the DictReader and DictWriter classes.

csv.writer(csvfile, dialect='excel', **fmtparams)

Return a writer object responsible for converting the user's data into delimited strings on the given file-like object.

csvfile can be any object with a write() method. If csvfile is a file object, it should be opened with newline='' 1. An optional dialect parameter can be given which is used to define a set of parameters specific to a particular CSV dialect.

Import csv module

In [1]:
import csv

Declare header and row data

In [2]:
csv_header_name = ['id', 'firstname', 'lastname', 'age']
In [3]:
each_row = [
    ['1', 'James', 'Moore', '10'],
    ['2', 'Robert', 'Donald', '15'],
    ['3', 'John', 'Jennifer', '12'],
    ['4', 'Michael', 'Patricia', '18'],
    ['5', 'Mary', 'Donald', '14']
]

Declare csv file name

In [4]:
csvFileName = 'person.csv'

Write csv file

In [ ]:
with open(csvFileName, 'w', newline='') as csvfile:

    #csv writer to write in csv file
    csv_writer = csv.writer(csvfile)

    #write header in csv file
    csv_writer.writerow(csv_header_name)

    #write rows
    csv_writer.writerows(each_row)

    #close csv file
    csvfile.close()

    pass

View csv file after writing

Writing to a TSV File

For writing in CSV file, we need to import csv python module.

In [11]:
import csv

Declare file name

In [ ]:
tsvFileName = 'person.tsv'

Create header and row data

In [ ]:
tsv_header_name = ['id', 'firstname', 'lastname', 'age']
In [ ]:
each_row = [
    ['1', 'James', 'Moore', '10'],
    ['2', 'Robert', 'Donald', '15'],
    ['3', 'John', 'Jennifer', '12'],
    ['4', 'Michael', 'Patricia', '18'],
    ['5', 'Mary', 'Donald', '14']
]

Write tsv file

In [ ]:
with open(tsvFileName, 'w') as tsvfile:

    #csv writer to write in tsv file
    tsv_writer = csv.writer(tsvfile, delimiter='\t')

    #write header in tsv file
    tsv_writer.writerow(tsv_header_name)

    #write rows
    tsv_writer.writerows(each_row)

    #close csv file
    tsvfile.close()

    pass

View tsv file after writing

Creating Excel files with XlsxWriter

XlsxWriter

XlsxWriter is a Python module that can be used to write text, numbers, formulas and hyperlinks to multiple worksheets in an Excel 2007+ XLSX file. It supports features such as formatting and many more, including:

1. 100% compatible Excel XLSX files.
2. Full formatting.
3. Merged cells.
4. Defined names.
5. Charts.
6. Autofilters.
7. Data validation and drop down lists.
8. Conditional formatting.
9. Worksheet PNG/JPEG/GIF/BMP/WMF/EMF images.
10. Rich multi-format strings.
11. Cell comments.
12. Textboxes.
13. Integration with Pandas.
14.Memory optimization mode for writing large files.

It supports Python 3.4+ and PyPy3 and uses standard libraries only.

Installation via pip

pip install XlsxWriter

Import XlsxWriter module

In [1]:
import xlsxwriter

Create xls sheet

In [13]:
persons = [
    {'id': 1, 'firstname': "James", 'lastname': 'Moore', 'age': 10},
    {'id': 2, 'firstname': "Robert", 'lastname': 'Donald', 'age': 15},
    {'id': 3, 'firstname': "John", 'lastname': 'Jennifer', 'age': 12},
    {'id': 4, 'firstname': "Michael", 'lastname': 'Patricia', 'age': 18},
    {'id': 5, 'firstname': "Mary", 'lastname': 'Donald', 'age': 14}
]
In [14]:
workbook = xlsxwriter.Workbook('person.xlsx')
worksheet = workbook.add_worksheet()


#write headers
worksheet.write('A1', 'id')
worksheet.write('B1', 'firstname')
worksheet.write('C1', 'lastname')
worksheet.write('D1', 'age')


# Start from the first cell below the headers.
row = 1
col = 0

#insert person data
for person in persons:
    worksheet.write(row, col,     person['id'])
    worksheet.write(row, col + 1, person['firstname'])
    worksheet.write(row, col + 2, person['lastname'])
    worksheet.write(row, col + 3, person['age'])
    row += 1
    pass

workbook.close()

View excel sheet after writing

Open excel sheet and view.

Add styling to the excel sheet

worksheet.set_column()

set_column(first_col, last_col, width, cell_format, options)

Set properties for one or more columns of cells.
Parameters: 

    first_col (int) – First column (zero-indexed).
    last_col (int) – Last column (zero-indexed). Can be same as first_col.
    width (float) – The width of the column(s), in character units.
    cell_format (Format) – Optional Format object.
    options (dict) – Optional parameters: hidden, level, collapsed.

Returns:    

0: Success.
Returns:    

-1: Column is out of worksheet bounds.

The set_column() method can be used to change the default properties of a single column or a range of columns:

worksheet.set_column(1, 3, 30)  # Width of columns B:D set to 30.

If set_column() is applied to a single column the value of first_col and last_col should be the same:

worksheet.set_column(1, 1, 30)  # Width of column B set to 30.
workbook.add_format()

add_format([properties])

Create a new Format object to formats cells in worksheets.
Parameters: properties (dictionary) – An optional dictionary of format properties.
Return type:    A format object.
In [15]:
workbook = xlsxwriter.Workbook('person.xlsx')
worksheet = workbook.add_worksheet()


# Widen the firstname and lastname column to make the text clearer.
worksheet.set_column('B:C', 20)
 
# Add a bold format to use to highlight cells.
header_cell_format = workbook.add_format({'bold': True, 'font_color': 'red'})


#write headers
worksheet.write('A1', 'id', header_cell_format)
worksheet.write('B1', 'firstname', header_cell_format)
worksheet.write('C1', 'lastname', header_cell_format)
worksheet.write('D1', 'age', header_cell_format)


# Start from the first cell below the headers.
row = 1
col = 0

#insert person data
for person in persons:
    worksheet.write(row, col,     person['id'])
    worksheet.write(row, col + 1, person['firstname'])
    worksheet.write(row, col + 2, person['lastname'])
    worksheet.write(row, col + 3, person['age'])
    row += 1
    pass

workbook.close()

View excel sheet after adding styling

Writing to a JSON file

json — JSON encoder and decoder

JSON (JavaScript Object Notation) is a lightweight data interchange format inspired by JavaScript object. json exposes an API familiar to users of the standard library marshal and pickle modules.

json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object) using this conversion table.

If skipkeys is true (default: False), then dict keys that are not of a basic type (str, int, float, bool, None) will be skipped instead of raising a TypeError.

The json module always produces str objects, not bytes objects. Therefore, fp.write() must support str input.

If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.

If check_circular is false (default: True), then the circular reference check for container types will be skipped and a circular reference will result in an RecursionError (or worse).

If allow_nan is false (default: True), then it will be a ValueError to serialize out of range float values (nan, inf, -inf) in strict compliance of the JSON specification. If allow_nan is true, their JavaScript equivalents (NaN, Infinity, -Infinity) will be used.

If indent is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0, negative, or "" will only insert newlines. None (the default) selects the most compact representation. Using a positive integer indent indents that many spaces per level. If indent is a string (such as "\t"), that string is used to indent each level.

Import json module

In [13]:
import json

Create person data in json format

In [14]:
data = {
    'persons' : [
        {
            'id' : 1,
            'firstname' : 'James',
            'lastname' : 'Moore',
            'age': 10
        },
        {
            'id' : 2,
            'firstname' : 'Robert',
            'lastname' : 'Donald',
            'age': 15
        },
        {
            'id' : 3,
            'firstname' : 'John',
            'lastname' : 'Jennifer',
            'age': 12
        },
        {
            'id' : 4,
            'firstname' : 'Michael',
            'lastname' : 'Patricia',
            'age': 18
        },
        {
            'id' : 5,
            'firstname' : 'Mary',
            'lastname' : 'Donald',
            'age': 14
        },
    ]
}

Create json file

In [15]:
with open('person.json', 'w') as jsonfile:
    json.dump(data, jsonfile)
    pass
  

View json file after writing

Thanks for reading.

In [ ]:
 

Machine Learning

  1. Deal Banking Marketing Campaign Dataset With Machine Learning

TensorFlow

  1. Difference Between Scalar, Vector, Matrix and Tensor
  2. TensorFlow Deep Learning Model With IRIS Dataset
  3. Sequence to Sequence Learning With Neural Networks To Perform Number Addition
  4. Image Classification Model MobileNet V2 from TensorFlow Hub
  5. Step by Step Intent Recognition With BERT
  6. Sentiment Analysis for Hotel Reviews With NLTK and Keras
  7. Simple Sequence Prediction With LSTM
  8. Image Classification With ResNet50 Model
  9. Predict Amazon Inc Stock Price with Machine Learning
  10. Predict Diabetes With Machine Learning Algorithms
  11. TensorFlow Build Custom Convolutional Neural Network With MNIST Dataset
  12. Deal Banking Marketing Campaign Dataset With Machine Learning

PySpark

  1. How to Parallelize and Distribute Collection in PySpark
  2. Role of StringIndexer and Pipelines in PySpark ML Feature - Part 1
  3. Role of OneHotEncoder and Pipelines in PySpark ML Feature - Part 2
  4. Feature Transformer VectorAssembler in PySpark ML Feature - Part 3
  5. Logistic Regression in PySpark (ML Feature) with Breast Cancer Data Set

PyTorch

  1. Build the Neural Network with PyTorch
  2. Image Classification with PyTorch
  3. Twitter Sentiment Classification In PyTorch
  4. Training an Image Classifier in Pytorch

Natural Language Processing

  1. Spelling Correction Of The Text Data In Natural Language Processing
  2. Handling Text For Machine Learning
  3. Extracting Text From PDF File in Python Using PyPDF2
  4. How to Collect Data Using Twitter API V2 For Natural Language Processing
  5. Converting Text to Features in Natural Language Processing
  6. Extract A Noun Phrase For A Sentence In Natural Language Processing