(first posted: Jan 27, 2008)
(1978 Reads)
keywords: ActiveRecord DynamicRecord
Permalink
Choosing a Schema for Dynamic Records
I am developing a DynamicRecord module for Rail's ActiveRecord which lets users dynamically expand at runtime an ActiveRecord model's set of attributes. A goal is to make this feature completely transparent, so the dynamic model looks and quacks like a conventional ActiveRecord model. Other than aesthetics and my own coding preference, this is important so the DynamicRecord models can utilize ActiveRecord features (like validations), and use other Rails plug-ins.
This article reviews the various ways I considered for storing this dynamic data.
Custom Tables for Each Form
One approach might be to simply create a new database table for each unique form on the site. And then I could use meta-programming to dynamically create a Rails model (subclass of ActiveRecord::Base) on the fly based on what the current user is doing. (Cool idea). That'd mean these dynamic records would build completely on top of Rail's ActiveRecord. I could add search indexes which are form-specific, as needed. And other goodies that come with AR.
What if I have thousands of users, there might be thousands of distinct tables? I'm told that's not inherently a problem for modern database systems. But it could be a pain in the butt from an administration point of view.
Aside from that, users would be creating, altering, and dropping database tables on the fly. SQL gurus (on #mysql and #sql) warned me that letting users do this is an indication of really bad schema design. For one, databases are not optimized for these operations, so performance would suffer.
A more serious problem is I'd need to give my app these permissions on database tables, which poses a significant security risk. And this approach is not very scalable (e.g. multi-server configurations).
A compromise would be, after users define or modify their form definition, then generate the custom table via a Rails migration, perhaps even as a background process. That starts to sound a bit kludgey to me and compromises my users' experience.
Entity-Attribute-Value
Another approach is to think of the custom table and 'pivot' it, so that column values are stored as rows in a values table. For example, if the custom table would have looked like this:
foo bar baz
row 1: 'aaa1' 'bbb1' 'ccc1'
row 2: 'aaa2' 'bbb2' 'ccc2'
Then instead, lets store each value in a values table, along with the attribute (column) name and row_id:
row attribute value
value 1: 1 foo 'aaa1'
value 2: 1 bar 'bbb1'
value 3: 1 baz 'ccc1'
value 4: 2 foo 'aaa2'
value 5: 2 bar 'bbb2'
etc
Now that's the ticket! And I'll keep track of each form's schema in a separate DynamicAttributes model. In Rails it might look something like this
DynamicModel
has_many :dynamic_attributes
has_many :dynamic_records
DynamicAttribute
belongs_to :dynamic_model
has_many :dynamic_values
DynamicRecord
belongs_to :dynamic_model
has_many :dynamic_values
DynamicValue
belongs_to :dynamic_record
belongs_to :dynamic_attribute
# => value.
So that's what I implemented... up to a point. I succeeded in mapping dynamic attributes to dynamic values, creating new dynamic records, and retrieving them (DynamicRecord.find).
The values table can get big. If my site has 1,000 forms, with 10 questions, with 1,000 responses each, that's a ten million row table. Better add indexes for queries by row_id, and by attribute. (But it cannot be indexed on a specific attribute, like indexing a column of a static table does).
While adding support for find conditions, the code started to get pretty ugly, especially the SQL being generated (watching my log file).
Consider DynamicRecord.find(:condition => { :foo => 'aaa', :bar => 'bbb' }). With our schema, the query becomes "find all the dynamic_values WHERE (attribute=foo AND value='aaa') AND (attribute=bar AND value='bbb') which itself is impossible because attribute cant be 2 things at the same time. We want the intersection of these conditions (nor the 'OR' union of them). This requires generating SQL sub-queries, or doing special joins.
The sub-query approach looked something like this: http://pastie.caboo.se/143508 Then I learned that many database implementations do not optimize subqueries very well (google "sql subquery optimize").
Or, use joins instead, I was told (by gnari on #mysql),
"join parts to things twice , each time with a different alias. For things that have parts with properties 'x' and 'y': select things.* from things as t join parts as p1 on t.id=p1.tid join parts as p2 on t.id=p2.tid where p1.property='x' and p2.property='y' .
Since Rails' find does support custom :join options, it is possible to use ActiveRecord find to query the DynamicValues table for a set of conditions. Or even, inject all the SQL I need using :finder_sql in the association. Good.
But wait. For the model we're extending, can we support all the options to its find method? That was the original goal - make the extended model behave like a standard ActiveRecord one.
Mapping the find options to multiple :join's and special :condition's quickly gets ugly and hairy (no offense to people who are that). For example, complex conditions passed to find as SQL strings, and with_scope, and :order, etc. I feared I'd end up writing a whole custom find method for DynamicRecord, not a path I planned to go down.
Rails conventions aside, I then found out this idea actually has a acronym (EAV) , its own wikipedia article (http://en.wikipedia.org/wiki/Entity-Attribute-Value_model ), and people on IRC asking "why do newbies always gravitate towards EAV???" If you're interested, read the article. Basically it says EAV is inefficient, not relational, hard to administrate, doesn't scale, and really only makes sense in data warehousing applications where a conventional table would end up with thousands of columns with only a few values used per row.
Fine.
Dynamic Rows
The third approach is kind of obvious but always seemed overly simplistic and wasteful. Create a table with a preset number of reusable columns, like an empty spreadsheet with fixed width.
string_0 string_1 string_2
row 1: 'aaa1' 'bbb1' 'ccc1'
row 2: 'aaa2' 'bbb2' 'ccc2'
Then define DynamicAttributes to map attribute names to actual column names ('foo' => 'string_0', 'bar' => 'string_1', 'baz' => 'string_2')
The main trade off is that you provide a limited number of columns. So if the user tries to build a really big form, he'll run out of columns. But, I ask myself, if the rows table provides, say, 100 columns, how many forms have more than 100 questions? That's not such a big trade off, at least in my application. If it's a recurring problem I can always migrate to more columns.
A few of the columns can be indexed ones (although the user will need to tag which form questions belong in an indexed column, hmm...)
What if most forms (and data submitted to these forms) are small, and only use a few of the columns, there'll be wasted space. That led to another IRC quip, "Why are newbies always afraid of wide database tables???" Memory is cheap, and queries on wide tables are as fast as anything.
It turns out this approach offers some wonderful advantages. I can use the full ActiveRecord finder's code, all the way down to the generated SQL. Provided I scope finds with :include => "dynamic_rows", then in find_by_sql, I can process the SQL string substituting (e.g. gsub) the dynamic attribute names with "dynamic_rows.#{column_name}"
Voila!
Now DynamicRecord supports any of the find condition formats (hash, string, array), as well as sort :order, :group, even :select. It also means that we can support any SQL the app passes, say via find_by_sql, and the app can use attribute names as though they're in the model's static table; we'll map them to the dynamic columns.
So, presently the architecture of my DynamicRecord roughly looks like this:
-----------------------------------------
thin DynamicRecord layer
-----------------------------------------
ActiveRecord base layer
----------------------------------------
thin intercept of generated SQL
-----------------------------------------
ActiveRecord connection layer




Choosing a Schema for Dynamic Records
Posted by: linoj on March 05, 2008 04:31 PM#