Webware Document Management System

This would be interelated to nearly everything else.

Basic Ideas

The main object would be a document. This contains the actual text (or file), some information how to render the text and, most important, some attribute information. An attribute would be a basic meta information about the document.

Here is a simple example. We want store documents about software. Basic attributes would be:

Some of these attributes are related to each other while others are not. If I have for example a product, I have already the vendor (Word -- > Microsoft). This leads to a hierachical structure:

operating system

vendor
     |---product
     |---module
     |---version

information category

Every of these attributes can have several instances * operating system: Win98, Linux, ... * vendor: Microsoft, RedHat, ... * ...

All of this should be configurable via the web interface. This is exactly where I've seen some Document Management Systems fail: the attributes are hardcoded or, even if you can change something, it is not possible to introduce a hierachy of attributes.

Usage

This might be enough for a start and some dicussion. Please feel free to add your ideas.

-- StephanDiehl - 23 Nov 2001

Small point on the "exported on the fly". Currently the best way to go from XML style documents to PDF, is utilizing FO documents, and process them with FOP ( A java tool ). [ Purely my opinion RL ]

http://xml.apache.org/fop/

I haven't found an XML:FO option to reportlab, they do support some XML style tags inline of their flowables ( specifically their Paragraph one ). I'm looking into it to see how best to extend that. If possible tie it into a cheetah template like structure ... and push out some PDF.

Second small point "documents can converted". That implies utilizing XML as the basis of the document structure, and utilizing XSLT transformation via 4Suite, or some such. I would recommend possibly distributing the load for something like this via XmlRpc, and utilizing a Java / Xalan engine for the transformation. I'm looking into the same for the XML/XSLT to PDF translation utilizing FOP, and XmlRpc.

It could allow for a more scalable system architecture, even though it would probably slow the responsiveness of the system.

Having said all of the above. I find working with Cheetah / Webware / Python substantially faster ( for me! ), in development. I came from Taglibs / JSP ( Tomcat / WebLogic / JRun ) / XML ( XSP via Cocoon ) / Java. Personal preference is to stick with Cheetah / Webware template applications, spitting out reportlab PDF ... but hey they can't ( currently ) support your request of XML to PDF, or XML to HTML, but I find it easier to work with Cheetah templates than XML / XSLT transformations. Personal preference only.

-- RayLeyva - 21 Nov 2001

---

The reason I'd like to use XML is to be independent from any application in order to create the documents outside the management system and do whatever I want with them. I've played around with 4suite. It's quite easy to use but seems to be a little slow. A non python xslt processor might give better results in that area. Another way to produce PDF is going via LaTeX. It looks quite promising but is really not usable for "on the fly" processing.

Anyway, to be able to preprocess the files before sending to the browser is central. Another example that comes to mind is the usage of Wiki pages (see WebwareWiki): strcutured test --> html via a wiki parser.

-- StephanDiehl - 22 Nov 2001 <br/>

Couldn't we just use Cheetah to generate templated XML exports? It's what I was thinking to create the XML for the xalan based transformations. Just a thought.

-- RayLeyva - 22 Nov 2001

If it works, why not? I wouldn't want to force to anybody anyway to use something specific. There should be some configurable filetype handler. So, in the case of XML, the admin should decide, if he is using a xslt processor, some css stylesheet, Cheetah, or whatever works best for the situation.

-- StephanDiehl - 22 Nov 2001

Perhaps we can use the term "property" or "attribute" rather than "category". The word "category" normally means a group the document is "in". In other words, it's the value of a certain property, not the property itself. The category may also be expressed above the doucment; e.g., the directory the document is in. This encourages users to think of the document as being in a certain hierarchical location. But putting the category name in a property allows for easier sorting/searching (sort by any property, search for all documents matching a certain property value). Which is better depends on the application. But in any case, using the term category for property is confusing.

-- MikeOrr - 22 Nov 2001

Agreed. This makes a lot of sense. I wasn't too happy about "category", but couldn't think of anything better. From now on I'll use attribute.

-- StephanDiehl - 23 Nov 2001

Here are a bunch of wishlist items for a content management system (CMS): WebwareDocumentManagementSystemMSO1.

-- MikeOrr - 15 Apr 2002

I think RDF would probably give a good framework for document classification.

The backend storage has to be in native format -- not XML. If you demand that everything be converted to XML, you'll lose information in the process and not be able to retrieve it later -- especially annoying if you try to retrieve the original content and get something slightly different back.

To do this, you might rank every sort of conversion and find the shortest path from the original document format to the target document format, probably caching various formats while you're at it.

-- IanBicking - 15 Apr 2002

About XML transformations, performance and so on: 4Suite isn't the only option for XSLT - there's a wrapper around Xalan as well as libxslt (and libxml2), although those packages may not be compliant with the XML-SIG API.

-- PaulBoddie - 16 Apr 2002

We're using the Apache stuff in a separate process (i.e., os.system(...)) to generate PDFs from XML that we build using Cheetah. It's not the fastest in the world, but it's functional. I'm planning on updating it and bypassing the XML step, having the XSLT engine call directly into our object model, which saves the XML export and the subsequent XML parse steps. I'll report back on my experiences when I finish that :)

(PS: I haven't read closely enough to determine how much actual forward motion there's been on the Webware CMS. Anyway, in case it's relevant, we (HFD ) are "most likely" going to release our content management product (built on Webware) under an Open Source license by the end of 2003. I say "most likely" because we're all generally agreed that it's what we want to do, but we have to iron out the details, and I don't want to make a concrete statement without having concrete under me :) (PPS: Chuck, Geoff, the Monkey's grown up a lot since you last saw him :) ))

-- TrippLilley - 11 Nov 2002