[personal profile] snarp
I need a piece of software that will thwart evil.

I have about a hundred text files, which contain lists of about eighty numbered questions, 4000 words in length. Most of them seem to be identical text-wise, but I've turned up one that has some very minor differences.

I need a way to search through all these files and figure out if there are any others with different text. I really don't want to have to skim 1600 pages of this stuff - I'm afraid I'll miss something, and I've got more important things to be doing. I guess conceivably I could put a script together myself, but I've only recently started playing with regular expressions, and it might take longer than doing the project by hand.

The thwarting evil part comes in here: We were sent these files by a horrible law firm representing a horrible corporation, who inserted these discrepancies precisely to waste our time. The horrible corporation needs representation because they recently -ed over several hundred desperately poor people, a number of whom are now homeless. There will come a reckoning. But first I need, like, a Python script or something.

Date: 2010-11-08 12:55 am (UTC)
From: [personal profile] vector
There should be a variety of things that let you diff two files, if the files are themselves almost identical. Let's see... try looking at diff? That wiki page has a section on "free file comparison tools" that might have what you want. You may or may not then still need a really basic script to iterate through the files and compare them all.

Date: 2010-11-08 10:00 am (UTC)
From: [personal profile] azurelunatic
The rather low-tech way one might start, without scripting, but assumes someone doing grunt work:

If they're numbered, and each number should be the same, one ouuuuught to be able to put them in a table, with one column for each file, each question or line being one cell. I believe that the spreadsheet should be able to evaluate equality of text strings, in addition to just numbers. Fill down. Set a conditional so that equal displays as green, unequal displays as red. Scan down for red.

Date: 2010-11-08 08:55 pm (UTC)
From: [personal profile] nicki
could you feed them into one of those programs that checks for plagarism in term papers? My understanding is that they highlight identical text in some color and original text in another, so you could just identify the "original" text and discard the rest?

uncouth

Date: 2010-11-08 09:57 pm (UTC)
From: [identity profile] lacrimawanders.livejournal.com
Step 3: Cockslap the asshats.* I have service called "Slaps Across America" that can help with that part.

* Kidding. Seriously.

Re: uncouth

Date: 2010-11-10 01:33 am (UTC)
From: [identity profile] lacrimawanders.livejournal.com
Well, I wouldn't actually /cockslap/ them, per se.

Date: 2010-11-10 09:40 am (UTC)
From: [identity profile] alexey-rom.livejournal.com
If identical files are exactly identical, calculate their hash with something like http://www.fourmilab.ch/md5/ Any difference will lead to very different hashes, so you'll see different files immediately. Then to find what actual differences are, use any diff software.

December 2018

S M T W T F S
      1
2345 678
9101112131415
16171819202122
23242526272829
3031     

Style Credit

Page generated Jan. 22nd, 2026 12:48 pm
Powered by Dreamwidth Studios

Expand Cut Tags

No cut tags

Most Popular Tags

Creative Commons



The contents of this blog and all comments I make are licensed under a Creative Commons Attribution-Noncommercial-Share Alike License. I hope that name is long enough. I could add some stuff. It could also be a Bring Me A Sandwich License.

If you desire to thank me for the pretend internet magnanimity I show by sharing my important and serious thoughts with you, I accept pretend internet dollars (Bitcoins): 19BqFnAHNpSq8N2A1pafEGSqLv4B6ScstB