When the lines of code in a program or project start to increase substantially, then it becomes hard to keep track of changes being made. Things start to become counter-productive. As a result, there is a lot of wastage of time, money, and manpower. Hence, being able to discern or, in other words, point out the changes made becomes a necessary task. Python provides an excellent built-in library named difflib to handle these kinds of problems. Show Contents What is difflib?The Difflib library of Python contains functions and classes used for computing the differences(deltas) of sequences or files. Usually, it is used to compare string sequences. You might have worked with or come across a term named ‘git’. Git is a version control system. In other words, it keeps track of changes to a file or multiple files(stored in a computer) over time. Moreover, changes made to files can be rolled back to a previous state. It can be due to an error etc. git changes depicted using red and green representing removed and added linesSimilarly, Python’s diff module works. In the following article, we will be looking at Python’s built-in difflib module, its relevance, functioning, types, and some examples. Importing difflibimport difflib differ classThe difflib’s differ class compares lines of text or strings or sequences and produces differences(deltas) that a person can easily understand. differ has different codes for comparison of text: CodeMeaning‘ – ‘means string is unique to text 1‘ + ‘means string is unique to text 2‘ ‘string common to text 1 and 2‘ ? ‘string not common to text 1 and 2differ class codesSyntax & ParametersSyntax: difflib.Differ(linejunk=None, charjunk=None) Parameter: linejunk and charjunk, by default, are set to value None
compare(firstString, secondString) – compare functions compares two sequences of the line are returns their differences or deltas. Sequences must have a newline character(‘\n’) at the end. from difflib import Differ import sys differ_inst = Differ() string1 = """This is a random string. Lets call it string 1. This is so random """.splitlines(keepends=True) string2 = """This is a random string. Lets call it string 2. This is so random. Or mayble not, or is it. """.splitlines(keepends=True) deltas = list(differ_inst.compare(string1,string2)) sys.stdout.writelines(deltas) Let’s try to breakdown the code above:
In the output image above ‘ – ‘, ‘ + ‘, & ‘ ‘ represents lines of string1 , string2 and the common lines of string1 and string2 respectively. Popular now [Fixed] Module Seaborn has no Attribute Histplot Error SequenceMatcher classSequenceMatcher compares two sequences and returns how close they are. We will try to demonstrate using some examples. Syntax: SequenceMatcher(isjunk=None,a=”, b=”, autojunk=True)
Let’s discuss popular sequence matcher methods: ratioIt returns the similarity of sequence as a floating-point value in the range 0-1(both inclusive). import difflib from difflib import SequenceMatcher string_one = 'He is right' string_two = 'He was right' seq_ratio = SequenceMatcher(a=string_one,b=string_two) print(f"SequenceMatcher ratio: {seq_ratio.ratio()}")Match ratio of sequences Difference between difflib’s ratio, quick_ratio, real_quick_ratio?The three methods return the ratio of matching to total characters. They have different levels of approximations. As a result, quick_ratio() and real_quick_ratio() might vary with the result of ratio(). However, they are as large as the ratio(). sequence = SequenceMatcher(None, "abcd", "bcde") print(s.ratio()) print(s.quick_ratio()) print(s.real_quick_ratio())difference between ratio, quick_ratio, and real_quick_ratio Difflib’s Ratio vs Levenshtein
import Levenshtein import difflib print(Levenshtein.ratio('united states of america','america')) print(difflib.SequenceMatcher(None,'united states of america','america').ratio())Output of Levenshtein and difflib’s ratio find_longest_matchIt compares the two sequences and returns the longest subsequence. Syntax: find_longest_match(alo=0, ahi=None, blo=0, bhi=None) Paramters:
It takes in the staring and ending indices of the two sequences and returns the longest subsequence. import difflib list_one = [1,2,3,4,5,6,7,8,9] list_two = [1,3,4,5,6,8,9,10,11] match_seq = difflib.SequenceMatcher(a=list_one,b=list_two) match = match_seq.find_longest_match(alo=0,ahi=len(list_one),blo=0,bhi=len(list_two)) print(f"Match object:{match}") print(f"Matching sequence list_one: {list_one[match.a:match.a+match.size]}")Longest matching sequence get_matching_blocksThis method of sequence matcher simply returns all the matching blocks in both sequences. import difflib list_one = [1,2,3,4,5,6,7,8,9] list_two = [1,3,4,5,6,8,9,10,11] match_seq = difflib.SequenceMatcher(a=list_one,b=list_two) match = match_seq.find_longest_match(alo=0,ahi=len(list_one),blo=0,bhi=len(list_two)) for match in match_seq.get_matching_blocks(): print(f"Match object:{match}") print(f"Matching sequence list_one: {list_one[match.a:match.a+match.size]}") print(f"Matching sequence list_two: {list_two[match.b:match.b+match.size]}") print()The output of get_matching_sequences Popular now Thonny: Text Wrapping Made Easy Difflib’s methodscontext_diff methoddifflib.context_diff compares two sequences and returns a delta in context format. In other words, it is a generator generating delta(difference) lines. In context format, the output shows which lines have been changed by returning the changed lines with a prefix of ‘!’. import difflib string_one = """Lorem ipsum dolor sit amet. Pellentesque at leo neque. Aenean sit amet tempor sem, eu tristique sapien. Ut id quam at mauris volutpat fringilla sit amet et enim. Morbi faucibus maximus massa, in commodo erat luctus ut.""".splitlines(keepends=True) string_two = """Lorem ipsum dolor sit amet. Pellentesque at leo nequed rattla bisawed. Aenean sit amet tempor sem, eu tristique sapien. Cras consequat ornare arcu, ac dapibus elit tincidunt non. Fusce massa diam, tristique pellentesque ultricies eu, auctor nec ipsum.""".splitlines(keepends=True) difference = difflib.context_diff(string_one,string_two) for item in difference: print(item, end='')Sequence split into individual lines by split lines method Let’s breakdown the above code:
Look at the output below, ‘ ! ‘ shows the different lines in each sequence. unified_diff methoddifflib.unfied_diff compares two sequences and returns a delta in a unified format. In unified format, the output shows each word that was either added or removed from the first sequence. import difflib string_one = """Lorem ipsum dolor sit amet. Pellentesque at leo neque. Aenean sit amet tempor sem, eu tristique sapien. Ut id quam at mauris volutpat fringilla sit amet et enim. Morbi faucibus maximus massa, in commodo erat luctus ut.""".splitlines(keepends=True) string_two = """Lorem ipsum dolor sit amet. Pellentesque at leo nequed rattla bisawed. Aenean sit amet tempor sem, eu tristique sapien. Cras consequat ornare arcu, ac dapibus elit tincidunt non. Fusce massa diam, tristique pellentesque ultricies eu, auctor nec ipsum.""".splitlines(keepends=True) difference = difflib.unified_diff(string_one,string_two) for item in difference: print(item, end='') Similarly, the code above works like the context_diff example. However, the only change is that instead of a context diff format, the returned generator is of unified diff format. Look at the output below, ‘ – ‘ shows the lines removed in the first sequence, and ‘ + ‘ shows the lines added to it. The output of the unified_diff functionTrending [Fixed] JavaScript error: IPython is Not Defined get_close_matches methodThe get close matches function of difflib takes in a word and a list of words to match against. It returns a list of closest matches to the word. Syntax & ParametersSyntax: get_close_matches(word, possible_words, n, cutoff) Parameters:
Example 1:difflib.Differ(linejunk=None, charjunk=None)0 The above code takes in arguments and returns a list of close matches. list of matchesExample 2:difflib.Differ(linejunk=None, charjunk=None)1 ndiff methoddifflib’s nidff also compares the two sequences and returns differ style delta(difference). difflib.Differ(linejunk=None, charjunk=None)2The output of ndiff HtmlDiffHtmlDiff of difflib module compares two sequences and returns the delta in an HTML file format. Let’s understand it using an example. difflib.Differ(linejunk=None, charjunk=None)3 Let’s breakdown the above code:
Trending [Fixed] “io.unsupportedoperation not readable” Error FAQs on Python difflibHow do you close a match in Python? Python has a built-in module named difflib, which provides function get_close_matches. It returns a list of words closest to the target word. What is difflib in Python? The Difflib library of Python contains functions and classes used for computing the deltas of sequences or files. How to ignore whitespaces in difflib? To ignore whitespaces in difflib, you will need to create a function that removes whitespaces, for instance. difflib.Differ(linejunk=None, charjunk=None)4 Then pass the strings and then compare using difflib. ConclusionIn this article, we covered an exciting and valuable module named difflib. We also looked at its classes and functions through examples. For instance, we covered difflib’s differ, sequence matcher classes, and their functions. Other than that, we covered difflib’s methods, for example, HtmlDiff, ndiff, context_diff, unified_diff, and get_close_matches. |