lost password?

home
•  xaraya
•  rails +
•  django
•  webdev
•  xamp
•  musings

rss
Tag this page
   

ยป Blogs that link here
last modified: Jan 29, 2008
(first posted: Jan 27, 2008)
(1978 Reads)
keywords: ActiveRecord DynamicRecord
Permalink

Choosing a Schema for Dynamic Records

I want users to be able to design their own forms, and collect data submitted on those forms. An important decision is the design of how these custom dynamic records are stored and accessed.

I am developing a DynamicRecord module for Rail's ActiveRecord which lets users dynamically expand at runtime an ActiveRecord model's set of attributes. A goal is to make this feature completely transparent, so the dynamic model looks and quacks like a conventional ActiveRecord model. Other than aesthetics and my own coding preference, this is important so the DynamicRecord models can utilize ActiveRecord features (like validations), and use other Rails plug-ins.

This article reviews the various ways I considered for storing this dynamic data.

Custom Tables for Each Form

One approach might be to simply create a new database table for each unique form on the site. And then I could use meta-programming to dynamically create a Rails model (subclass of ActiveRecord::Base) on the fly based on what the current user is doing. (Cool idea). That'd mean these dynamic records would build completely on top of Rail's ActiveRecord. I could add search indexes which are form-specific, as needed. And other goodies that come with AR.

What if I have thousands of users, there might be thousands of distinct tables? I'm told that's not inherently a problem for modern database systems. But it could be a pain in the butt from an administration point of view.

Aside from that, users would be creating, altering, and dropping database tables on the fly. SQL gurus (on #mysql and #sql) warned me that letting users do this is an indication of really bad schema design. For one, databases are not optimized for these operations, so performance would suffer.

A more serious problem is I'd need to give my app these permissions on database tables, which poses a significant security risk. And this approach is not very scalable (e.g. multi-server configurations).

A compromise would be, after users define or modify their form definition, then generate the custom table via a Rails migration, perhaps even as a background process. That starts to sound a bit kludgey to me and compromises my users' experience.

Entity-Attribute-Value

Another approach is to think of the custom table and 'pivot' it, so that column values are stored as rows in a values table. For example, if the custom table would have looked like this:

                   foo        bar          baz
row 1:            'aaa1'     'bbb1'       'ccc1'
row 2:            'aaa2'     'bbb2'       'ccc2'

Then instead, lets store each value in a values table, along with the attribute (column) name and row_id:

                row          attribute     value
value 1:        1            foo           'aaa1'
value 2:        1            bar           'bbb1'
value 3:        1            baz           'ccc1'
value 4:        2            foo           'aaa2'
value 5:        2            bar           'bbb2'
etc

Now that's the ticket! And I'll keep track of each form's schema in a separate DynamicAttributes model. In Rails it might look something like this

DynamicModel
  has_many :dynamic_attributes
  has_many :dynamic_records
 
DynamicAttribute
  belongs_to :dynamic_model  
  has_many :dynamic_values
 
DynamicRecord
  belongs_to :dynamic_model
  has_many :dynamic_values
 
DynamicValue
  belongs_to :dynamic_record
  belongs_to :dynamic_attribute
  # => value.

So that's what I implemented... up to a point. I succeeded in mapping dynamic attributes to dynamic values, creating new dynamic records, and retrieving them (DynamicRecord.find).

The values table can get big. If my site has 1,000 forms, with 10 questions, with 1,000 responses each, that's a ten million row table. Better add indexes for queries by row_id, and by attribute. (But it cannot be indexed on a specific attribute, like indexing a column of a static table does).

While adding support for find conditions, the code started to get pretty ugly, especially the SQL being generated (watching my log file).

Consider DynamicRecord.find(:condition => { :foo => 'aaa', :bar => 'bbb' }). With our schema, the query becomes "find all the dynamic_values WHERE (attribute=foo AND value='aaa') AND (attribute=bar AND value='bbb') which itself is impossible because attribute cant be 2 things at the same time. We want the intersection of these conditions (nor the 'OR' union of them). This requires generating SQL sub-queries, or doing special joins.

The sub-query approach looked something like this: http://pastie.caboo.se/143508 Then I learned that many database implementations do not optimize subqueries very well (google "sql subquery optimize").

Or, use joins instead, I was told (by gnari on #mysql),

"join parts to things twice , each time with a different alias. For things that have parts with properties 'x' and 'y': select things.* from things as t join parts as p1 on t.id=p1.tid join parts as p2 on t.id=p2.tid where p1.property='x' and p2.property='y' .

Since Rails' find does support custom :join options, it is possible to use ActiveRecord find to query the DynamicValues table for a set of conditions. Or even, inject all the SQL I need using :finder_sql in the association. Good.

But wait. For the model we're extending, can we support all the options to its find method? That was the original goal - make the extended model behave like a standard ActiveRecord one.

Mapping the find options to multiple :join's and special :condition's quickly gets ugly and hairy (no offense to people who are that). For example, complex conditions passed to find as SQL strings, and with_scope, and :order, etc. I feared I'd end up writing a whole custom find method for DynamicRecord, not a path I planned to go down.

Rails conventions aside, I then found out this idea actually has a acronym (EAV) , its own wikipedia article (http://en.wikipedia.org/wiki/Entity-Attribute-Value_model ), and people on IRC asking "why do newbies always gravitate towards EAV???" If you're interested, read the article. Basically it says EAV is inefficient, not relational, hard to administrate, doesn't scale, and really only makes sense in data warehousing applications where a conventional table would end up with thousands of columns with only a few values used per row.

Fine.

Dynamic Rows

The third approach is kind of obvious but always seemed overly simplistic and wasteful. Create a table with a preset number of reusable columns, like an empty spreadsheet with fixed width.

                   string_0   string_1    string_2
row 1:            'aaa1'     'bbb1'       'ccc1'
row 2:            'aaa2'     'bbb2'       'ccc2'

Then define DynamicAttributes to map attribute names to actual column names ('foo' => 'string_0', 'bar' => 'string_1', 'baz' => 'string_2')

The main trade off is that you provide a limited number of columns. So if the user tries to build a really big form, he'll run out of columns. But, I ask myself, if the rows table provides, say, 100 columns, how many forms have more than 100 questions? That's not such a big trade off, at least in my application. If it's a recurring problem I can always migrate to more columns.

A few of the columns can be indexed ones (although the user will need to tag which form questions belong in an indexed column, hmm...)

What if most forms (and data submitted to these forms) are small, and only use a few of the columns, there'll be wasted space. That led to another IRC quip, "Why are newbies always afraid of wide database tables???" Memory is cheap, and queries on wide tables are as fast as anything.

It turns out this approach offers some wonderful advantages. I can use the full ActiveRecord finder's code, all the way down to the generated SQL. Provided I scope finds with :include => "dynamic_rows", then in find_by_sql, I can process the SQL string substituting (e.g. gsub) the dynamic attribute names with "dynamic_rows.#{column_name}"

Voila!

Now DynamicRecord supports any of the find condition formats (hash, string, array), as well as sort :order, :group, even :select. It also means that we can support any SQL the app passes, say via find_by_sql, and the app can use attribute names as though they're in the model's static table; we'll map them to the dynamic columns.

So, presently the architecture of my DynamicRecord roughly looks like this:

-----------------------------------------
thin DynamicRecord layer
-----------------------------------------

ActiveRecord base layer

----------------------------------------

thin intercept of generated SQL

-----------------------------------------

ActiveRecord connection layer

-----------------------------------------

 

 

Choosing a Schema for Dynamic Records

Posted by: linoj on March 05, 2008 04:31 PM
note to self: if reconsidering the EAV method, take a look at this plugin to simplify the SQL generation http://agilewebdevelopment.com/plugins/multi_statement_query

#

Choosing a Schema for Dynamic Records

Posted by: nerb on May 19, 2008 11:56 AM
Wondering how you've done with this approach in the last two months? I was thinking of your option 3 as well, as I'd like to implement this in an app that would allow a site administrator flexibility in naming the "columns" of their app. So i'd input 15 "column1", column2.... for them to rename in their site section of the app. I'm going to now read the rest of your blog for more updates? I take it, however, you did go with option 3, yes?

#

Choosing a Schema for Dynamic Records

Posted by: epitron on August 08, 2008 06:03 AM
Hey linoj! You're not on IRC, so I'll post this here. :) It seems that the EAV model is something people are getting more interested in lately as flexible databases are being needed for all the crazy semantic web stuff people want. It turns out that it's possible to write an efficient EAV system -- the problem seems to be that everyone was trying to do it using RDBMSes, which happen to have a really complex pipeline for every query (due to all the locking and transactions and query optimization and other misc things). An EAV store can do with a much, much simpler data access pipeline. :) Daniel Abadi (http://cs-www.cs.yale.edu/homes/dna/) has totally dug into the current 30-year old RDBMS technology and figured out some really interesting things. This paper is awesome -- http://cs-www.cs.yale.edu/homes/dna/papers/oltpperf-sigmod08.pdf. He shows that 95% of the work that an OLTP database does is wasted (locking/latching/logging/buffer-management), and if you remove all that stuff, and deal with the concurrency and consistency and problems by making the database single-threaded and replicated, you get a ridiculous boost in performance. :) Here's a great diagram: http://epi.ponzo.net/images/OLTP_overhead.png Another interesting development is the Freebase.com database called "graphd" (http://blog.freebase.com/2008/04/09/a-brief-tour-of-graphd/). It's got good performance (freebase is freakin huge), and the way you query it is neat -- instead of asking for rows, you pass it a (possibly-nested) hash of attribute=>value pairs. If the value is filled in, it finds that, and if a value is nil, that's one of the fields in your result. You basically leave out pieces, and it fills them in. :D You can see this in action by playing with their query editor: http://www.freebase.com/tools/queryeditor (a nice set of example queries are at the bottom.) I can see people moving to databases that are similar to what CouchDB et al. are doing -- you store all your data in a homogenous form, but have "views" of the data which are kinda like dynamic schema, and generate index structures for performance.

#

Choosing a Schema for Dynamic Records

Posted by: epitron on August 08, 2008 06:04 AM
Goddamn enter-stripping bastard quote system!

#

Properly formatted comment reply.

Posted by: epitron on August 08, 2008 06:06 AM
I put my comment up on my wiki for ya. :) http://wiki.ill-logic.com/EAV_Databases_comment

#

Choosing a Schema for Dynamic Records

Posted by: linoj on August 08, 2008 10:46 AM
epitron, I also recently found Amazon's SimpleDB to be interesting (tho I haven't looked to deeply into how it'd integrate with a rails app) http://www.amazon.com/SimpleDB-AWS-Service-Pricing/b?node=342335011

#

Post a new comment

: This is not spam

Name :