1 Introduction
2 Types
3 Content Types
3.1 TIFF (image/ tiff)
3.2 JPEG/ Exif (image/ jpeg)
3.3 Ogg Vorbis (audio/ ogg)
4 Files and Scanning
5 Test Files
5.1 Current Test Files
5.2 Contributing Test Files
6 Known Issues
7 History
8 Legal
Version: 1:0

mediafile: Media File Metadata Utilities

Neil Van Dyke

 (require (planet neil/mediafile:1:0))

1 Introduction

Note: This package is in alpha-testing. Please see the “Contributing Test Files” subsection below. Thanks.
The mediafile package provides utilities for dealing with collections of media files (still image, audio, video) and the metadata properties of those files. Currently, this package provides procedures for extracting metadata from a few popular media file formats, and procedures for maintaining a database of media files currently in various filesystem directory trees. This functionality is useful for media-player applications, and for managing collections of media files.
Currently, this package is implemented in pure Racket code, without linking any new native code into the Racket process, nor running external programs.

2 Types


(mediafile-type? x)  boolean?

  x : any/c
Predicate for whether or not x is a mediafile-type.
A valid type is either a symbol, of a MIME content-type name, or a list of symbols, in which the last symbol is the MIME content-type and the one-or-more preceeding symbols are encodings atop the content-type. For example, file "foo.tif" might have type 'image/tiff, and file "foo.tif.gz" might have type '(gzip image/tiff).


(mediafile-props? x)  boolean?

  x : any/c
Predicate for whether or not x is a mediafile-props, which is used to represent properties of a media file.
A props is an alist of alists of symbols to datums. In other words, following this contract:
(listof (cons/c any/c
                (listof (cons/c symbol?
The top level alist is for “parts”, such as for distinguishing multiple media objects in a single container file. The car of each of these top level alist pairs can be any datum, although will often be a number representing the sequence of the part in the container, unless there is a better unique key. A special car is #f, which means properties of the entire container file.
The cdr of these top-level pairs is the second-level alist, which is symbol-to-datum pairs specific to the part. The names of the symbols are often specific to the type of either the file or the part. The datum values corresponding to the names in the part can be of any type; an application wishing to do more with the value than display it in raw form must have a priori knowledge of the type, such as that 'exif:metering-mode typically has values like 'center-weighted-average and 'spot, and what those values mean for the application.


(struct mediafile (path type identity size mtime props)
  #:extra-constructor-name make-mediafile
  path : path?
  type : mediafile-type?
  identity : any/c
  size : any/c
  mtime : any/c
  props : mediafile-props?
Struct representing a mediafile. The identity, size, and mtime values are intended to help determine whether a file has been modified since it was last scanned for properties.

3 Content Types

This package currently supports a few different MIME content-types, listed in the following subsections, along with lists of references that were used in the implementation for each content-type.

3.1 TIFF (image/tiff)

3.2 JPEG/Exif (image/jpeg)

3.3 Ogg Vorbis (audio/ogg)

4 Files and Scanning

This section lists procedures for maintaining a database of "mediafile" objects corresponding to files in filesystem directory trees.


(path->mediafile path 
  [#:canonicalize-path? canonicalize-path? 
  #:old-mediafile old-mediafile 
  #:type-mandatory? type-mandatory? 
  #:props-mandatory? props-mandatory? 
  #:exception? exception?]) 
  path : path-string?
  canonicalize-path? : boolean? = #true
  old-mediafile : (or/c #f mediafile?) = #f
  type-mandatory? : boolean? = #false
  props-mandatory? : boolean? = #false
  exception? : boolean? = #true
Yields a mediafile, given a path to the file. If #:old-mediafile is given, then that value will be returned if the file does not seem to have changed since that mediafile was created, which potentially saves the cost of scanning for properties.
If there is a problem creating a mediafile, then the behavior depends on #:exception? – if true, then an exception is raised; if false, then this procedure returns #false rather than a mediafile. The #:type-mandatory? and #:props-mandatory? arguments specify what should be considered a “problem” for this purpose.
The #:canonicalize-path? specifies whether or not to store a canonicalized path in the mediafile, rather than the path argument verbatim. Most applications will want to have a canonicalized path, which is the default behavior.


(scan-mediafiles start-path-or-paths 
  [#:canonicalize-paths? canonicalize-paths? 
  #:type-mandatory? type-mandatory? 
  #:props-mandatory? props-mandatory? 
  #:old-hash old-hash 
  #:remove-other-paths? remove-other-paths?]) 
  start-path-or-paths : (or/c path-string? (list-of path-string?))
  canonicalize-paths? : boolean? = #true
  type-mandatory? : boolean? = #false
  props-mandatory? : boolean? = #false
  old-hash : immutable-hash? = #f
  remove-other-paths? : boolean? = #true
Scans filesystems recursively, beneath the paths given as start-path-or-paths, and returns a hash of paths to mediafile objects.
If #:old-hash is provided, then this hash is used as a starting point for the hash that will ultimately be returned, such as for updating from a previous run of scan-mediafiles. If #:old-hash is provided, then #:remove-other-paths? determines whether paths in the old hash that are not within the scope of start-path-or-paths should be removed before returning the new hash.
The #:canonicalize-paths?, #:type-mandatory?, and #:props-mandatory? arguments are passed to path->mediafile.

5 Test Files

This package contains some files that are used for test data. Contributions of particular kinds of additional files are welcome.

5.1 Current Test Files

The following directory structure exists in the source code distribution for this package.
  • "test-files/"
    • "exif-org/" – JPEG/Exif and other files, from http://exif.org/samples.html, courtesy of John Hawkins.

    • "public-domain/" – Files known to be in the legal public domain, for testing with a breadth of file creators (e.g., different camera models) and situations (e.g., different Ogg container layouts).
      • "jpeg/" – JPEG/Exif and JPEG/JFIF files from public domain, especially verbatim as saved by particular camera models.

5.2 Contributing Test Files

If you’d like to contribute a JPEG file from a particular camera model, that would be very welcome. Here’s how:
  1. Set camera to capture the image in a relatively small file size. This means setting camera to low resolution, high compression, low quality, etc. (The small size is to make including files with the package more practical.)

  2. Choose a photographic subject (e.g., stop sign, cloud, thumbtack, light switch) that:
    • Does not contain any trademarks or copyrighted material (no brand names, logos, book pages, etc.).

    • Does not contain anything personally-identifiable, such as faces.

    • Is G-rated. (No showing off Racket programmer abs.)

    • Is not too complicated, so should compress well.

  3. Take photo with camera.

  4. Do not edit the photo in any way at all – it must be byte-for-byte identical to how the camera first wrote it to your memory card.

  5. Email the photo to: neil@neilvandyke.org
    In the text of the email, please state “This image is in the public domain.” Note that you are legally giving up all copyright to this image, to make including it in a regression test suite more practical.

6 Known Issues

7 History

8 Legal

Copyright 2012 Neil Van Dyke. This program is Free Software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See http://www.gnu.org/licenses/ for details. For other licenses and consulting, please contact the author. Test files from exif.org (used with permission) and/or that are in the public domain might also be included with this software, and no copyright on them is claimed on those test files by the author of this software.