Minimize Bandwith in One-to-Many Joins

Summary: when a one-to-many JOIN produces excess “repetition” in a result set, use a UNION.

Example: Articles and Comments

Suppose you have a Web site with articles and comments. The database for this might have one table containing the articles, and another table for the comments on those articles, with a relationship between these tables, linking each comment to its article.

Articles table

create table articles
( id          integer      not null  primary key
, title       varchar(100) not null
, article     text         not null 
)

Each of these columns is NOT NULL because either this is required (for the primary key) or expected by the application (an article must have both a title and article text to be accepted).

The last column, article, could be any large string datatype, like VARCHAR(5000) or TEXT, which, depending on the database system, might hold up to 2 gigabytes. The exact datatype doesn’t really matter, just that the datalength of the column is very large, because it needs to hold the article text of any article.

Comments table

create table comments
( id          integer      not null  primary key
, article_id  integer      not null  foreign key
, title       varchar(100)  null
, comment     text         not null

The comments table has a similar structure, with an additional column being a foreign key to link each comment to the article it’s commenting on. The title is NULL because it is optional for comments. Finally, there is a large string datatype column for the comment text, which you might think need not be as large as the collumn for the article text, but just to be safe, it is usually TEXT as well.

This is a classical one-to-many relationship. Each article can have one or more comments, and each comment belongs to only one article.

Technically, it’s a one-to-zero-or-many relationship, because an article must be able to exist without any comments, which is the situation that pertains immediately after the article is first added, before the first comment on it is created.

Query to retrieve an article and its comments

To retrieve an article, its title and article text, along with all its comments, you can use the following query:

select articles.id
     , articles.title
     , articles.article
     , comments.id
     , comments.title
     , comments.comment
  from articles 
left outer
  join comments 
    on comments_article_id = articles.id
 where articles.id = 937
order
    by articles.id
     , comments.id

The query retrieves a specific article by id, plus its associated comments, if any, and it’s a left outer join because the article might not have comments. The result set will look something like this:

aid  atitle         article   cid  ctitle         comment
937  Article Title  [TEXT]   1013  Comment Title  [TEXT]
937  Article Title  [TEXT]   1024  Comment Title  [TEXT]
937  Article Title  [TEXT]   1037  Comment Title  [TEXT]
937  Article Title  [TEXT]   1042  Comment Title  [TEXT]

As the design was the classical one-to-many relationship, so the result set is the classical one-to-many result set. With one minor item of note: the article text is repeated in the result set four times, and it could be 2 gigabytes. The query works, but it wastes bandwidth.

So is the answer to write two queries, one for the article and one for its comments? Since there is overhead associated with any query, and two queries are more than one query, it might be interesting to see whether there is a tradeoff.

But there is another way to combine two queries into one, without the “repetition” of data in the result set that you see with a one-to-many join.

The UNION query

The following UNION query consists of two SELECTs combined with the UNION ALL operator:

select 'article' as rowtype
     , articles.id
     , articles.title
     , articles.article
  from articles 
 where articles.id = 937
union all
select 'comment' as rowtype
     , comments.id
     , comments.title
     , comments.comment
  from articles 
inner
  join comments 
    on comments.article_id = articles.id
 where articles.id = 937
order
    by rowtype

While there are two SELECTs, there is only one ORDER BY, which sorts the combined results of both SELECTs into one sorted final result set.

The two SELECTs return the article and its comments, respectively. Each SELECT is driven by an identical WHERE clause, to select a specific article or its comments. The second SELECT’s join must be an inner join instead of a left outer join, because the first SELECT retrieves the article, while the second query retrieves only related comments.

Results


rowtype   id  title          article  
article  937  Article Title  [TEXT]  
comment 1013  Comment Title  [TEXT]
comment 1037  Comment Title  [TEXT]
comment 1042  Comment Title  [TEXT]

Optimal bandwidth is achieved, because there is no repetition of any large column value. The rowtype can be a single-byte column if desired; its values are assigned as literals in the SELECTs.

In the second SELECT, is it necessary to do the join at all? Couldn’t the second SELECT simply retrieve from the comments table using a similar WHERE condition on articles_id instead? Yes, but it’s better to leave the join in place, so that, if desired, instead of retrieving a specific article by id, you wanted to return all articles that meet a certain condition, such as a keyword in their title. This requires using the same WHERE clause in each SELECT.

Why does this work so well?

You cannot take this approach with every one-to-many relationship that you meet. In order for you to write this type of UNION query, both SELECTs must return a union compatible result set. This means the same number of columns, with each pair of columns having compatible datatypes (dates with dates, numbers with numbers, strings with strings). Articles and comments are well suited for this type of UNION because the tables share such a similar design.

 

About eagle081183

Passionate, Loyal
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s