Reading from Zip archives in Ruby
The war isn't going anywhere for now, so every couple of days I have to do the following steps to update Russian losses tracker:
- download zip from Kaggle
- unzip it with unall utility
- run
update_csv
script - verify that data looks right with
git diff
, as occasionally there's a typo which makes losses go backwards (always corrected on next update) - delete
archive
folder
The part that I'd like to get rid of is unzipping. We know exactly the path inside the zip, so why bother?
Old script
The script is very straightforward, the methods updated_equipment
and updated_personnel
will need replacing to get data from a zip.
#!/usr/bin/env ruby
require "pathname"
class UpdateCSV
def initialize(archive_path)
@archive_path = Pathname(archive_path)
end
def updated_equipment
@updated_equipment ||= (@archive_path + "russia_losses_equipment.csv").read
end
def updated_personnel
@updated_personnel ||= (@archive_path + "russia_losses_personnel.csv").read
end
def csv_files
@csv_files ||= `git ls`.lines.map(&:chomp).grep(/\.csv\z/)
end
def call
csv_files.each do |path|
case path
when /russia_losses_equipment/
Pathname(path).write(updated_equipment)
when /russia_losses_personnel/
Pathname(path).write(updated_personnel)
else
puts "Unknown CSV file: #{path}"
end
end
end
end
unless ARGV[0]
STDERR.puts "Usage: #{$0} path_to_updated_archive"
exit 1
end
UpdateCSV.new(ARGV[0]).call
Gem rubyzip
vs zip
There's a bit of a problem with Zip for Ruby, as there's two gems - rubyzip
and zip
. rubyzip
is the correct one, zip
is an obsolete fork that somehow got a better name.
To use either of them, you need to do require "zip"
- which will use whichever one is installed. This is a leftover mess from early days of RubyGems, and such things don't really happen anymore. Just don't gem install zip
, and you'll be good.
Abstract archive handling
There's going to be some shared code between updated_equipment
and updated_personenel
, so let's move it into a new method read_file
:
def updated_equipment
@updated_equipment ||= read_file("russia_losses_equipment.csv")
end
def updated_personnel
@updated_personnel ||= read_file("russia_losses_personnel.csv")
end
Read file from either directory or archive
And now the read_file
method:
def read_file(path)
if @archive_path.directory?
(@archive_path + path).read
else
Zip::File.open(@archive_path).read(path)
end
end
There are a lot more things we could be doing with zips, but just reading is all we need.
And that's it! It saves me one of the steps.
Full code
Here's the full script:
#!/usr/bin/env ruby
require "pathname"
require "zip"
class UpdateCSV
def initialize(archive_path)
@archive_path = Pathname(archive_path)
end
def read_file(path)
if @archive_path.directory?
(@archive_path + path).read
else
Zip::File.open(@archive_path).read(path)
end
end
def updated_equipment
@updated_equipment ||= read_file("russia_losses_equipment.csv")
end
def updated_personnel
@updated_personnel ||= read_file("russia_losses_personnel.csv")
end
def csv_files
@csv_files ||= `git ls`.lines.map(&:chomp).grep(/\.csv\z/)
end
def call
csv_files.each do |path|
case path
when /russia_losses_equipment/
Pathname(path).write(updated_equipment)
when /russia_losses_personnel/
Pathname(path).write(updated_personnel)
else
puts "Unknown CSV file: #{path}"
end
end
end
end
unless ARGV[0]
STDERR.puts "Usage: #{$0} path_to_updated_archive"
exit 1
end
UpdateCSV.new(ARGV[0]).call