File-Storage Design Specification

by Rob Storrs

I. Essentials

II. Introduction

We have our own file-storage application because we want all users to be able to collaboratively maintain a set of documents. Specifically, users can save their files to our server so that they may:

We want something that is relatively secure, and can be extended and maintained by any ArsDigita programmer, i.e., something that requires only AOLserver Tcl and Oracle skills.

III. Historical Considerations

File-storage was created to provide a mechanism for non-technical users to collaborate on a wide range of documents, with minimum sysadmin overhead. Specifically, it allowed clients to exchange design documents (often MS Word, Adobe PDF, or other proprietary desktop file format) that changed frequently without having to get bogged down by sifting thru multiple versions.

IV. Competitive Analysis

Why is a file-storage application useful?

If you simply give everyone FTP access to a Web-accessible directory, you are running some big security risks. FTP is insecure and passwords are transmitted in the clear. A cracker might sniff a password, upload .pl and .adp pages, then grab those URLs from a Web browser. The cracker is now executing arbitrary code on your server with all the privileges that you've given your Web server.

The file-storage module is not a web-based file system, and can not be fairly compared against such systems. The role of file-storage is to provide a simple web location where users could share a versioned document. It does not allow much functionality with respect to aggregate file administration (ex. selecting all files of a given type, or searching through specified file types).

V. Design Tradeoffs

File vs. Folder Permissions?

A folder is treated as a type of file. Files are owned by a single user, but may contain versions created by authors other than the owner.

Permissions were only given to files and not folders in order to simplify both the code and the user interface i.e. to avoid questions like "Why can't any of the people in my group see my files?" answered by "Did you notice that someone changed the permissions of the parent of the parent of the parent folder of this file?" However, the system is easy to extend to allow folders to have thier own permissions.

Full Text Indexing

Full Text Indexing of files within the file-storage system is available if you're running Oracle 8i (8.1.5 or later). You would need to build an Intermedia text index (ConText) on the contents of file versions. Intermedia incorporates very smart filtering software so that it can grab the text from within HTML, PDF, Word, Excel, etc. documents. It is also smart enough to ignore JPEGs and other pure binary formats.

Steps to using Intermedia:

Warning: Intermedia is a tricky product for users. The default mode is exact phrase matching, which means that the more a user types the fewer search results will be returned (a violation of the user interface guidelines in developers). So you might be letting yourself in for some education of users...

Deletion of Files

Only an administrator can actually delete a file from storage within the database, thereby freeing up disk space. A user-level file deletion really only hides the file from view (by changing the deleted_p flag). From the user's perspective, the file has been deleted from the system. As such, users may be less respectful of storage requirements than if the system was fully explained to them.

This arrangement allows administrators the ability to retrieve files that users inadvertently deleted, but subsequently requires administrative involvement for the recovery of actual disk space.

VI. Data Model Discussion

The file-storage system is built around a data model consisting of two tables, one for files and a second for versions. A folder is treated as a type of file. Files are owned by a single user, but may contain versions created by authors other than the owner.

Indices on the file ids are required for the CONNECT BY queries used for ordering the files for display.

The view fs_files_tree simplifies the ability to "walk the tree" in Oracle.

VII. Legal Transactions

VIII. API

PL/SQL procedures

none

TCL procedures

fs_check_edit_p

fs_check_edit_p user_id version_id [ group_id ]
Returns 1 if the user has permission to edit the version of the file; 0 otherwise

Parameters:
user_id
version_id
group_id (optional)

fs_check_read_p

fs_check_read_p user_id version_id [ group_id ]
Returns 1 if the user can read the version of the file; 0 otherwise.

Parameters:
user_id
version_id
group_id (optional)

fs_check_write_p

fs_check_write_p user_id version_id [ group_id ]
Returns 1 if the user can write the file; 0 otherwise.

Parameters:
user_id
version_id
group_id (optional)

fs_date_picture

fs_date_picture
Returns date picture to use with Oracle's TO_CHAR function. Pulls it from ad.ini parameters file.

fs_folder_box

fs_folder_box user_id topmost_option
Returns the folder box. Arguments: user_id the user who is logged in topmost_option the option that should occur on top

Parameters:
user_id
topmost_option

fs_folder_def_selection

fs_folder_def_selection user_id [ group_id ] [ public_p ] [ file_id ] \
    [ folder_default ]
Write out the SELECT box that allows the user to move a file to another folder, or - if folder_default is provided - create a new folder.

Parameters:
user_id
group_id (optional)
public_p (optional)
file_id (optional)
folder_default (optional)

fs_folder_selection

fs_folder_selection user_id [ group_id ] [ public_p ] [ file_id ]
Write out the SELECT box that allows the user to move a file to another folder

Parameters:
user_id
group_id (optional)
public_p (optional)
file_id (optional)

fs_guess_source

fs_guess_source public_p owner_id group_id local_user_id
Given some information about a file, tries to guess in which subtree the file belongs. Mainly used by one-file.tcl.

Parameters:
public_p
owner_id
group_id
local_user_id

fs_header_row_for_files

fs_header_row_for_files [ -title title ] [ -author_p author_p ]
Returns a table header row containing column names appropriate for a listing of files alone (i.e., not versions of files). Name, Size, Type, Modified. If you set author_p to 1, you'll additionally get an author column.

Switches:
-title (optional)
-author_p (defaults to "0")

fs_order_files

fs_order_files [ user_id ] [ group_id ] [ public_p ]
Set the ordering and depth for the files so that they may be displayed quickly

Parameters:
user_id (optional)
group_id (optional)
public_p (optional)

fs_pretty_file_type

fs_pretty_file_type mime_type
Takes a MIME type and returns a string to be displayed for that type.

Parameters:
mime_type

fs_row_for_one_file

fs_row_for_one_file [ -n_pixels_in n_pixels_in ] [ -file_id file_id ] \
    [ -folder_p folder_p ] [ -client_file_name client_file_name ] \
    [ -n_kbytes n_kbytes ] [ -n_bytes n_bytes ] \
    [ -file_title file_title ] [ -file_type file_type ] [ -url url ] \
    [ -creation_date creation_date ] [ -version_id version_id ] \
    [ -links links ] [ -author_p author_p ] [ -owner_id owner_id ] \
    [ -owner_name owner_name ] [ -user_url user_url ] \
    [ -export_url_vars export_url_vars ] [ -folder_url folder_url ] \
    [ -file_url file_url ]
Returns one row of a HTML table displaying all the information about a file. Set links to 0 if you want this file to be output without links to manage it (to display the folder you're currently in).

A little explanation is in place here. The first bunch of arguments are all standard stuff we want to know about the file. The n_pixels_in is whatever amount of pixels you want this line indented.

Then there's the 'links' argument. It's used for one-folder, which likes to show the current folder first, without the hyperlinks. So if you don't want links from an entry (only works for folders) set this to 0.

Then there's author. If you want the author shown, set author_p and provide us with owner_id and owner_name, and you'll get the link. If you want the link to go somewhere different than /shared/community-member, you'll want to set user_url to the page you want to link to (user_id will be appended).

Set export_url_vars to the vars you want exported when a file or folder link is clicked. It should be a query string fragment. If you're unhappy with the default urls 'one-folder' or 'one-file' (say, you're implementing admin pages where they're named differently), change them here. The export_url_vars will be appended.

Switches:
-n_pixels_in (defaults to "0")
-file_id (optional)
-folder_p (defaults to "f")
-client_file_name (optional)
-n_kbytes (optional)
-n_bytes (optional)
-file_title (optional)
-file_type (optional)
-url (optional)
-creation_date (optional)
-version_id (optional)
-links (defaults to "1")
-author_p (defaults to "0")
-owner_id (defaults to "0")
-owner_name (optional)
-user_url (defaults to "/shared/community-member")
-export_url_vars (optional)
-folder_url (defaults to "one-folder")
-file_url (defaults to "one-file")

fs_user_contributions

fs_user_contributions user_id purpose
For site admin only, returns statistics and a link to a details page

Parameters:
user_id
purpose

IX. User Interface

The user interface attempts to replicate the file system metaphors familiar to most computer users, with folders containing files. Adding files and folders are hyperlinked options, and a web form is used to handle the search function.

Users can navigate to any specified document tree using a select box. Files and folders available within a document tree are presented with size, type, and modification date, alongside hyperlinks to the appropriate actions for a given file.

X. Configuration/Parameters

Configuration of the system is kept to a minimum.
     ; for the ACS File-Storage System
     [ns/server/yourserver/acs/fs]
     SystemName=File Storage System
     SystemOwner=file-administrator@yourserver.com
     DefaultPrivacyP=f
     ; do you want to maintain a public tree for site wide documents
     PublicDocumentTreeP=1
     MaxNumberOfBytes=2000000
     DatePicture=MM/DD/YY HH24:MI
     HeaderColor=#cccccc
     FileInfoDisplayFontTag=<font face=arial,helvetica size=-1>
     UseIntermediaP=0

XI. Acceptance Tests

XII. Future Improvements/Areas of Likely Change

XIII. Authors


rstorrs@arsdigita.com

Advertisements