ArsDigita Archives
 
 
   
 
spacer

File-Storage Design Specification

by Rob Storrs

I. Essentials

II. Introduction

We have our own file-storage application because we want all users to be able to collaboratively maintain a set of documents. Specifically, users can save their files to our server so that they may:
  • Organize files in a hierarchical directory structure
  • Upload using Web forms, using the file-upload feature of Web browsers (potentially SSL-encrypted)
  • Grab files that are served bit-for-bit by the server, without any risk that a cracker-uploaded file will be executed as code
  • Retrieve historical versions of a file

We want something that is relatively secure, and can be extended and maintained by any ArsDigita programmer, i.e., something that requires only AOLserver Tcl and Oracle skills.

III. Historical Considerations

File-storage was created to provide a mechanism for non-technical users to collaborate on a wide range of documents, with minimum sysadmin overhead. Specifically, it allowed clients to exchange design documents (often MS Word, Adobe PDF, or other proprietary desktop file format) that changed frequently without having to get bogged down by sifting thru multiple versions.

IV. Competitive Analysis

Why is a file-storage application useful?

If you simply give everyone FTP access to a Web-accessible directory, you are running some big security risks. FTP is insecure and passwords are transmitted in the clear. A cracker might sniff a password, upload .pl and .adp pages, then grab those URLs from a Web browser. The cracker is now executing arbitrary code on your server with all the privileges that you've given your Web server.

The file-storage module is not a web-based file system, and can not be fairly compared against such systems. The role of file-storage is to provide a simple web location where users could share a versioned document. It does not allow much functionality with respect to aggregate file administration (ex. selecting all files of a given type, or searching through specified file types).

V. Design Tradeoffs

File vs. Folder Permissions?

A folder is treated as a type of file. Files are owned by a single user, but may contain versions created by authors other than the owner.

Permissions were only given to files and not folders in order to simplify both the code and the user interface i.e. to avoid questions like "Why can't any of the people in my group see my files?" answered by "Did you notice that someone changed the permissions of the parent of the parent of the parent folder of this file?" However, the system is easy to extend to allow folders to have thier own permissions.

Full Text Indexing

Full Text Indexing of files within the file-storage system is available if you're running Oracle 8i (8.1.5 or later). You would need to build an Intermedia text index (ConText) on the contents of file versions. Intermedia incorporates very smart filtering software so that it can grab the text from within HTML, PDF, Word, Excel, etc. documents. It is also smart enough to ignore JPEGs and other pure binary formats.

Steps to using Intermedia:

  • install Intermedia (Oracle dbadmin hell)
  • get Intermedia's optional "INSO filtering" system to work. Here's what jsc@arsdigita.com had to say about his experience doing this...
    I got the INSO stuff working. The major holdup was that you have to
    configure listener.ora to have $ORACLE_HOME/ctx/lib in
    LD_LIBRARY_PATH. The docs mumble something about editing listener.ora,
    but a careful perusal of anything having to do with networking setup
    didn't turn up any examples. The networking assistant program has a
    field for "Environment", but when you try to put anything in there, the
    program hits a null pointer exception when you go to save it and doesn't
    write anything. I "solved" this eventually by just symlinking all the
    .so files in ctx/lib into $ORACLE_HOME/lib, which is already in the
    LD_LIBRARY_PATH for the listener.
  • In order to have the interMedia index synchronized whenever documents get added or updated, the index must be synchronized (using alter index indexname rebuild online parameters ('sync')), or the ctxsrv process must be run, which updates all interMedia indices periodically (ctxsrv -user ctxsys/ctxpassword). If using ctxsrv, the shell which starts it must have $ORACLE_HOME/ctx/lib as part of LD_LIBRARY_PATH.
  • uncomment the create index fs_versions_content_idx statement in file-storage.sql (and then feed it to Oracle)
  • set UseIntermediaP=1 in your ad.ini file
  • restart AOLserver (so that it reads the new parameter setting)
Warning: Intermedia is a tricky product for users. The default mode is exact phrase matching, which means that the more a user types the fewer search results will be returned (a violation of the user interface guidelines in developers). So you might be letting yourself in for some education of users...

Deletion of Files

Only an administrator can actually delete a file from storage within the database, thereby freeing up disk space. A user-level file deletion really only hides the file from view (by changing the deleted_p flag). From the user's perspective, the file has been deleted from the system. As such, users may be less respectful of storage requirements than if the system was fully explained to them.

This arrangement allows administrators the ability to retrieve files that users inadvertently deleted, but subsequently requires administrative involvement for the recovery of actual disk space.

VI. Data Model Discussion

The file-storage system is built around a data model consisting of two tables, one for files and a second for versions. A folder is treated as a type of file. Files are owned by a single user, but may contain versions created by authors other than the owner.

Indices on the file ids are required for the CONNECT BY queries used for ordering the files for display.

The view fs_files_tree simplifies the ability to "walk the tree" in Oracle.

VII. Legal Transactions

    /file-storage/
  • Create a folder
  • Upload a file
  • "Delete" a file (actually hides them)
  • Upload a newer version of a file
  • Download a version of a file
    /admin/file-storage/
  • View system usage
  • Delete files
  • Edit files

VIII. API

PL/SQL procedures

none

TCL procedures

fs_check_edit_p

fs_check_edit_p user_id version_id [ group_id ]
Returns 1 if the user has permission to edit the version of the file; 0 otherwise

Parameters:
user_id
version_id
group_id (optional)

fs_check_read_p

fs_check_read_p user_id version_id [ group_id ]
Returns 1 if the user can read the version of the file; 0 otherwise.

Parameters:
user_id
version_id
group_id (optional)

fs_check_write_p

fs_check_write_p user_id version_id [ group_id ]
Returns 1 if the user can write the file; 0 otherwise.

Parameters:
user_id
version_id
group_id (optional)

fs_date_picture

fs_date_picture
Returns date picture to use with Oracle's TO_CHAR function. Pulls it from ad.ini parameters file.

fs_folder_box

fs_folder_box user_id topmost_option
Returns the folder box. Arguments: user_id the user who is logged in topmost_option the option that should occur on top

Parameters:
user_id
topmost_option

fs_folder_def_selection

fs_folder_def_selection user_id [ group_id ] [ public_p ] [ file_id ] \
    [ folder_default ]
Write out the SELECT box that allows the user to move a file to another folder, or - if folder_default is provided - create a new folder.

Parameters:
user_id
group_id (optional)
public_p (optional)
file_id (optional)
folder_default (optional)

fs_folder_selection

fs_folder_selection user_id [ group_id ] [ public_p ] [ file_id ]
Write out the SELECT box that allows the user to move a file to another folder

Parameters:
user_id
group_id (optional)
public_p (optional)
file_id (optional)

fs_guess_source

fs_guess_source public_p owner_id group_id local_user_id
Given some information about a file, tries to guess in which subtree the file belongs. Mainly used by one-file.tcl.

Parameters:
public_p
owner_id
group_id
local_user_id

fs_header_row_for_files

fs_header_row_for_files [ -title title ] [ -author_p author_p ]
Returns a table header row containing column names appropriate for a listing of files alone (i.e., not versions of files). Name, Size, Type, Modified. If you set author_p to 1, you'll additionally get an author column.

Switches:
-title (optional)
-author_p (defaults to "0")

fs_order_files

fs_order_files [ user_id ] [ group_id ] [ public_p ]
Set the ordering and depth for the files so that they may be displayed quickly

Parameters:
user_id (optional)
group_id (optional)
public_p (optional)

fs_pretty_file_type

fs_pretty_file_type mime_type
Takes a MIME type and returns a string to be displayed for that type.

Parameters:
mime_type

fs_row_for_one_file

fs_row_for_one_file [ -n_pixels_in n_pixels_in ] [ -file_id file_id ] \
    [ -folder_p folder_p ] [ -client_file_name client_file_name ] \
    [ -n_kbytes n_kbytes ] [ -n_bytes n_bytes ] \
    [ -file_title file_title ] [ -file_type file_type ] [ -url url ] \
    [ -creation_date creation_date ] [ -version_id version_id ] \
    [ -links links ] [ -author_p author_p ] [ -owner_id owner_id ] \
    [ -owner_name owner_name ] [ -user_url user_url ] \
    [ -export_url_vars export_url_vars ] [ -folder_url folder_url ] \
    [ -file_url file_url ]
Returns one row of a HTML table displaying all the information about a file. Set links to 0 if you want this file to be output without links to manage it (to display the folder you're currently in).

A little explanation is in place here. The first bunch of arguments are all standard stuff we want to know about the file. The n_pixels_in is whatever amount of pixels you want this line indented.

Then there's the 'links' argument. It's used for one-folder, which likes to show the current folder first, without the hyperlinks. So if you don't want links from an entry (only works for folders) set this to 0.

Then there's author. If you want the author shown, set author_p and provide us with owner_id and owner_name, and you'll get the link. If you want the link to go somewhere different than /shared/community-member, you'll want to set user_url to the page you want to link to (user_id will be appended).

Set export_url_vars to the vars you want exported when a file or folder link is clicked. It should be a query string fragment. If you're unhappy with the default urls 'one-folder' or 'one-file' (say, you're implementing admin pages where they're named differently), change them here. The export_url_vars will be appended.

Switches:
-n_pixels_in (defaults to "0")
-file_id (optional)
-folder_p (defaults to "f")
-client_file_name (optional)
-n_kbytes (optional)
-n_bytes (optional)
-file_title (optional)
-file_type (optional)
-url (optional)
-creation_date (optional)
-version_id (optional)
-links (defaults to "1")
-author_p (defaults to "0")
-owner_id (defaults to "0")
-owner_name (optional)
-user_url (defaults to "/shared/community-member")
-export_url_vars (optional)
-folder_url (defaults to "one-folder")
-file_url (defaults to "one-file")

fs_user_contributions

fs_user_contributions user_id purpose
For site admin only, returns statistics and a link to a details page

Parameters:
user_id
purpose

IX. User Interface

The user interface attempts to replicate the file system metaphors familiar to most computer users, with folders containing files. Adding files and folders are hyperlinked options, and a web form is used to handle the search function.

Users can navigate to any specified document tree using a select box. Files and folders available within a document tree are presented with size, type, and modification date, alongside hyperlinks to the appropriate actions for a given file.

X. Configuration/Parameters

Configuration of the system is kept to a minimum.
     ; for the ACS File-Storage System
     [ns/server/yourserver/acs/fs]
     SystemName=File Storage System
     SystemOwner=file-administrator@yourserver.com
     DefaultPrivacyP=f
     ; do you want to maintain a public tree for site wide documents
     PublicDocumentTreeP=1
     MaxNumberOfBytes=2000000
     DatePicture=MM/DD/YY HH24:MI
     HeaderColor=#cccccc
     FileInfoDisplayFontTag=<font face=arial,helvetica size=-1>
     UseIntermediaP=0

XI. Acceptance Tests

  • Go to /file-storage/ and upload a file
  • Create a folder and move the file into it
  • Change the properties of a file
  • Upload another version of the same file
  • Delete a file from the system
  • Delete a folder

XII. Future Improvements/Areas of Likely Change

  • Currently the administration section needs considerable work. Instead of trying to clean /admin/file-storage/ up, we should build a better /file-storage/admin or even allow administrators to do more within /file-storage/.

  • Ticket Tracker style column sorting. We want the ability to sort the contents of each folder by name, author, size, type and last modified. In addition, the folders should be able to sort among themselves by name. You should use something very similar to the procedure ad_table. The procedure that you use will be slightly different because the files will be sorted on a per folder basis instead of on a per table basis.

  • Better organization of the folder tree - Make the interface more of a Window's style interface. Add a + type icon next to the folder if the folder is open and all of the files in the folder can be seen. Add a - icon when the folder is closed and can be expanded. Clicking on the + sends the user back to the same page with the contents of the folder to be hidden and the - icon in place of the +. Clicking on the - sends the user back to the same page causing a + to replace the - and all of the files in the folder to be shown. Clicking on the folder icon or name should act just as they do now.

  • Nifty javascript version

  • File viewer: allowing users to view multiple file formats within their browser.

  • Email alerts on a folder, so that a user could get an alert whenever a new document is posted within a document tree or a specified folder.

XIII. Authors


rstorrs@arsdigita.com
spacer