4/1/2013
Larry W. Cashdollar
@_larry0
User supplied input isn't sanitized against shell metacharacters and is fed directly to the shell. If the user is tricked into extracting a file with shell characters in the name code can be executed remotely.
https://rubygems.org/gems/karteek-docsplit
./karteek-docsplit-0.5.4/lib/docsplit/text_extractor.rb
59 def extract_from_ocr(pdf, pages) 60 tempdir = Dir.mktmpdir 61 base_path = File.join(@output, @pdf_name) 62 if pages 63 pages.each do |page| 64 tiff = "{tempdir}/{@pdf_name}{page}.tif" 65 file = "{basepath}{page}" 66 run "MAGICKTMPDIR={tempdir} OMP_NUM_THREADS=2 gm convert -despeckle +adjoin #{MEMORY_ARGS} #{OCR_FLAGS} {pdf}[{page - 1}] #{tiff} 2>&1" 67 run "tesseract #{tiff} {file} -l eng 2>&1" 68 clean_text(file + '.txt') if @clean_ocr 69 FileUtils.remove_entry_secure tiff 70 end 71 else 72 tiff = "{tempdir}/{@pdf_name}.tif" 73 run "MAGICK_TMPDIR={tempdir} OMP_NUM_THREADS=2 gm convert -despeckle #{MEMORY_ARGS} #{OCR_FLAGS} #{pdf} #{tiff} 2>&1" 74 run "tesseract #{tiff} #{base_path} -l eng 2>&1" 75 clean_text(base_path + '.txt') if @clean_ocr 76 end
Run is defined as:
94 def run(command) 95 result = `#{command}` 96 raise ExtractionFailed, result if $? != 0 97 result 98 end