java.lang.ClassNotFoundException: org.springframework.web.context.ContextLoaderListener

I had a similar problem when running a spring web application in an Eclipse managed tomcat. I solved this problem by adding maven dependencies in the project’s web deployment assembly.

  1. Open the project’s properties (e.g., right-click on the project’s name in the project explorer and select “Properties”).
  2. Select “Deployment Assembly”.
  3. Click the “Add…” button on the right margin.
  4. Select “Java Build Path Entries” from the menu of Directive Type and click “Next”.
  5. Select “Maven Dependencies” from the Java Build Path Entries menu and click “Finish”.

You should see “Maven Dependencies” added to the Web Deployment Assembly definition.


Posted in Integration, Problem solving, Programming, Uncategorized | Leave a comment

Basic Strategy for Algorithmic Problem Solving

The strategy consists of five big steps:

  1. Read and comprehend the problem statement.
  2. Select theoretical concepts that may be applied.
  3. Qualitative description of the problem.
  4. Formalization of a solution strategy.
  5. Test and description of the solution.

Each step has attached a questionnaire, which contain questions that will lead you toward the solution of the problem or, if needed, to step back and review your work.

This document is based on the paper: Cabral, Luis G. et al. “Solucion de Problemas”. Contactos Vol II, No. 8. Oct-Dic 1985. pp.42-51. UAM-I, ciencias basicas e ingenieria, Mexico.

General Problem Solving Strategy

Guiding-questionnaires to be used with the General Strategy for algorithm creation

Guide 1

  1. Do you understand every word used within the problem statement?
  2. What computational elements are relevant to the problem?
  3. What non-computational elements are relevant to the problem: Mathematics, Physics, Geography, etc.
  4. Use your own words to describe the problem. If needed, make a drawing depicting the situation stating clearly relevant objects and times.
  5. Have you solved any similar problem? If so, take advantage of that experience and its information.
  6. What data or resources are provided within the statement?
  7. What data or results are requested within the statement?
  8. Check answers 6 and 7 and decide if they are consistent with your answers 2 and 3.

Guide 2

  1. Identify all theoretical (and empirical) concepts related with the problem.
  2. Select a structure able to simplify data handling: arrays, records, files, local variables, global variables, linked lists, etc.
  3. Identify the kind of problem(s) according with its (their) structure: sequential, selection, iterative.
  4. Identify available algorithmic elements and select: what you need: well-defined instructions, already known algorithms, etc.
  5. Is it possible to simplify the problem by dividing it into simpler cases and selecting a different approach for each one? Is it possible eliminate redundant or unnecessary data?

Guide 3

  1. Do you know any hand-written way to solve the problem? If so, propose several examples and solve them “by hand”, then attempt to create a generalization. In order to do that, carefully think on each step performed and watch what actions are common to every example.
  2. Make a list of variable elements, specifying their magnitude and measurement units. Associate them proper symbols or names but take care of avoid their repetition
  3. Which principles or relationships apply to the problem?
  4. Write down the selected relationships but using your own variables (symbols or names). If needed, describe equations with words.
  5. Are all variables in use? Are there as many relationships as unknown variables?
  6. Are you using all the information available in the problem statement? If not, select just the important.

Guide 4

  1. Describe your solution qualitatively (you can start by making a narration.)
  2. Make some predictions regarding the expected result based only upon the description you made. Do not assume anything that is not in your description.
  3. Make the required relationships and check that the result comes from the selected variables. (Keep in mind the measurement units.)
  4. Substitute values (with their corresponding signs and units) at the end of your development of relationships.
  5. Transform your description into an algorithm (pseudocode or flowchart). Remember, the algorithm must ask unknown values, show main results and store (in variables) the results of relationships and formulas.

Guide 5

  1. Manually compute the result (i.e. perform a hand-trace.) If needed, draw plots that describe the behavior of the variables.
  2. Follow strictly each step of the algorithm and look at the results. (Someone else can perform this step.)
  3. Are all your predictions from 4.2 accomplished? Measurement units are preserved?
  4. Do the units make sense?
  5. Is reasonable the order of magnitude of results?
  6. Does it work for boundary values?
  7. Do every variable has an initial value?
  8. Interpret the result to write down an explanation of it (how it was produced) and assign units.
Posted in Education and Training, Problem solving, Uncategorized | Leave a comment

All About TransactionScope

1. Introduction


TransactionScope is a very special and important class in the .NET Framework. Supporting transactions from a code block is the main responsibility of this class. We often use this class for managing local as well as distributed transactions from our code. Use of TransactionScope is very simple and straightforward. It is very reliable and easy to use. For this reason it is very popular among .NET developers. In this article, I explain transaction related theory with code sample, and show various scenarios where we can use TransactionScope with various options for managing real life transactions.

2. Background

Transaction management is very, very important to any business application. Each and every large scale development framework provides a component for managing transactions. .NET Framework is a large development framework and it also provides its own transaction management component. Before the launch of .NET Framework 2.0 we used SqlTransaction to manage transactions. From version 2 .NET Framework has the TransactionScope class. This class is available in the System.Transactions assembly. This class provides a transactional framework with the help of which any .NET developer can write transactional code without knowing much details. For this reason it is very popular among .NET developers and they widely use it for managing transactions. But the story is not finished yet. I will say the story has only just started.

In the real world any one you will find exceptional scenarios, exceptional issues where only a knowledge of how to use TransactionScope is not good enough. To resolve transactional issues like deadlocks, timeouts, etc., you must know each and every concept directly/indirectly related to a transaction. There is no alternative. So the concepts of a transaction and its related components need to be clear.

3. How to Use TransactionScope

Use of TransactionScope in a .NET application is very, very simple. Any one can use it by following these steps:

  1. Add a System.Transactions assembly reference to the project.
  2. Create a transactional scope/area with the help of the TransactionScope class starting with a usingstatement.
  3. Writing code which needs to have transactional support.
  4. Execute the TransactionScope.Complete method to commit and finish a transaction.

Really, as simple as that. But in a real life project only that knowledge is not sufficient. You need more transaction related knowledge, otherwise you can not handle transaction related issues. So first of all, we should be clear about the transactional concept.

4. Transaction

What is a transaction? You can find the definition of a transaction from various sources like Wikipedia, other websites, books, articles, blogs. In a very short, we can say, a series/number of works treated as a whole, either completed fully or not at all.

Example: Transfer money from Bank Account-A to Account-B

Series of (actually two) tasks/processes:

  1. Withdraw amount from Account-A
  2. Deposit that amount to Account-B

We understand that transfer of money (from Account-A to Account-B) consists of two individual processes. Transferring money will only be accurate and successful if both the processes are individually successful. If that is not happening, suppose process-1 succeeds but process-2 fails, then the money will be deducted from Account-A but not deposited to Account-B. If that happens, it will be very bad and no one will accept it.

5. Business Transaction

Business transactions are interactions between Customer/Supplier/StackHolders and other parties who are involved in doing business. In this article I am not going to present anything regarding business transactions.

6. Database Transaction

In software development, when we say transaction by default we guess that it is a database transaction. In a database transaction we can say, a series of data manipulation statements (insert/update/delete) execute as a whole. All statements either successfully execute, or will fail each and every statement, so that the database is in consistent mode. Database transactions actually represent a database state change in an accurate way.

7. Local Transaction

Local Transaction

A transaction where a series of data manipulation statements execute as a whole on a single data source/database. It is actually a single phase transaction handled by a database directly. To manage local transactions, System.Transactions has a Lightweight Transaction Manager (LTM). It acts like a gateway. All transactions are started by System.Transactions are handled directly by this component. If it finds the transaction nature is distributed based on some predefined rules it has a fallback transaction to the MSDTC distributed transaction.

8. Distributed Transaction

Distributed Transaction

A transaction which works with multiple data sources is called a distributed transaction. If a transaction fails then the affected data sources will be rolled back. In System.Transactions, MSDTC (Microsoft Distributed Transaction Coordinator) manages distributed transactions. It implements a two phase commit protocol. A distributed transaction is much slower than a local transaction. The transaction object automatically escalates a local transaction to a distributed transaction when it understands that a distributed transaction is needed. The developer can not do anything here.

9. Distributed Transaction System Architecture

We know that in a distributed transaction, several sites are involved. Each site has two components:

  1. Transaction Manager
  2. Transaction Coordinator

1. Transaction Manager: Maintains a log and uses that log if recovery is needed. It controls the whole transaction by initiating and completing, and managing the durability and atomicity of a transaction. It also coordinates transactions across one or more resources. There are two types of transaction managers.

  1. Local Transaction Manager: Coordinates transaction over a single resource only.
  2. Gloabl Transaction Manager: Coordinates transaction over multiple resources.

2. Transaction Coordinator: Starting the execution of transactions that originate at the site. Distributes subtransactions at appropriate sites so that they can execute in those sites. Coordinates each transaction of each site. As a result the transaction is committed or rolled back to all sites.

10. Connection Transaction

Transaction which is tied directly with a database connection (SqlConnection) is called Connection Transaction. SqlTransaction (IDbTransaction) is an example of a connection transaction. In .NET Framework 1.0/1.1 we use SqlTransaction.

string connString = ConfigurationManager.ConnectionStrings["db"].ConnectionString;
using (var conn = new SqlConnection(connString))
    using (IDbTransaction tran = conn.BeginTransaction())
            // transactional code...
            using (SqlCommand cmd = conn.CreateCommand())
                cmd.CommandText = "INSERT INTO Data(Code) VALUES('A-100');";
                cmd.Transaction = tran as SqlTransaction;
        catch(Exception ex)

11. Ambient Transaction

A transaction which automatically identifies a code block that needs to support a transaction without explicitly mentioning any transaction related things. An ambient transaction is not tied just to a database, any transaction aware provider can be used. TransactionScope implements an ambient transaction. If you see the use of TransactionScope, you will not find transaction related anything sent to any method or setting any property. A code block is automatically attached with the transaction if that code is in any TransactionScope. A WCF transaction is another example of a transaction aware provider. Any one can write a transaction aware provider like the WCF implementation.

12. Transaction Properties

There are four important properties for a transaction. We call them ACID properties. They are:

    1. A-Atomic
    2. C-Consistent
    3. I-Isolation
    4. D-Durable
  1. Atomic: If all parts of the transaction individually succeed then data will be committed and the database will be changed. If any single part of a transaction fails then all parts of the transaction will fail and the database will remain unchanged. Part of the transaction might fail for various reasons like business rule violation, power failure, system crash, hardware failure, etc.
  2. Consistent: Transaction will change the database from one valid state to another valid state following various database rules like various data integrity constraints (primary/unique key, check/not null constraint, referential integrity with valid reference, cascading rules ) etc.
  3. Isolation: One transaction will be hidden from another transaction. In another way we can say, one a transaction will not affect an other transaction if both work concurrently.
  4. Durability: After a transaction is successfully completed (committed to the database), changed data will not be lost in any situation like system failure, database crash, hardware failure, power failure etc.

13. Transaction Isolation Level

Now I will start explaining a very important thing directly related to transactions, and that is transaction isolation level. Why is it so important? First of all, I previously explained that isolation is an important transaction property. It describes each transaction is isolated from another and do not affect other concurrently executed transactions. How does a transaction management system achieve that important feature?

A Transaction Management System introduces a locking mechanism. With the help of this mechanism one transaction is isolated from another. The locking policy behaves differently based on the Isolation level set for each transaction. There are four very important isolation levels in .NET transaction scope. These are:

    1. Serializable
    2. Repeatable Read
    3. Read Committed
    4. Read UnComitted

Before I start explaining isolation levels, I need to explain data reading mechanizm inside a transaction. This data reading mechanism is very important to understand isolation levels properly.

  • Dirty Read: One transaction reads changed data of anohter tranaction but that data is still not committed. You may take decission/action based on that data. A problem will arise when data is rolled-back later. If rollback happens then your decision/action will be wrong and it produces a bug in your application.
  • Non Repeatable Read: A transaction reads the same data from same table multiple times. A problem will arise when for each read, data is different.
  • Phantom Read: Suppose a transaction will read a table first and it finds 100 rows. A problem will arise when the same tranaction goes for another read and it finds 101 rows. The extra row is called a phantom row.

Now I will start explaining in short the important isolation levels:

  1. Serializable: Highest level of isolation. It locks data exclusively when read and write occurs. It acquires range locks so that phantom rows are not created.
  2. Repeatable Read: Second highest level of isolation. Same as serializable except it does not acquire range locks so phantom rows may be created.
  3. Read Committed: It allow shared locks and read only committed data. That means never read changed data that are in the middle of any transaction.
  4. Read Un-Committed: It is the lowest level of Isolation. It allows dirty read.

Now I will start explaining TransactionScope and its usage pattern:

14. TranactionScope Default Properties

It is very important to know about the default properties of the TransactionScope object. Why? Because many times we create and use this object without configuring anything.

Three very important properties are:

  1. IsolationLevel
  2. Timeout
  3. TransactionScopeOptions

We create and use TransactionScope as follows:

using (var scope = new TransactionScope())
    //transctional code…

Here the TransactionScope object is created with the default constructor. We did not define any value for IsolationLevelTimeout, and TransactionScopeOptions. So it gets default values for all three properties. So now we need to know what the default property values of these properties.

Property Default Value Available Options
IsolationLevel Serializable Serializable, Read Committed, Read Un Committed, Repeatable Read
Timeout 1 Minute Maximum 10 Minutes
TransactionScopeOption Required Required, Required New, Suppress
  1. Isolation Level: It defines the locking mechanism and policy to read data inside another transaction.
  2. Timeout: How much time object will wait for a transaction to be completed. Never confuse it with the SqlCommand Timeout property. SqlCommand Timeout defines how much time the SqlCommand object will wait for a database operation (select/insert/update/delete) to be completed.
  3. TransactionScopeOption: It is an enumeration. There are three options available in this enumeration:
No Option Description
1 Required It is default value for TransactionScope. If any already exists any transaction then it will join with that transaciton otherwise create new one.
2 RequiredNew When select this option a new transaction is always created. This transaction is independent with its outer transaction.
3 Suppress When select this option, no transaction will be created. Even if it already

How to know the default values of these properties?

The System.Transactions assembly has two classes:

  1. Transaction
  2. TransactionManager

These classes will provide default values. Inside TransactionScope, if you run the following code, you will know the default values:

using (var scope = new System.Transactions.TransactionScope())
    IsolationLevel isolationLevel = Transaction.Current.IsolationLevel;
    TimeSpan defaultTimeout = TransactionManager.DefaultTimeout;
    TimeSpan maximumTimeout = TransactionManager.MaximumTimeout;

Is it possible to override the default property values?

Yes, you can. Suppose you want the default value to be 30 seconds and the maximum timeout value to be 20 minutes. If that is the requirement then you can do it using your web config.

    <defaultSettings timeout="30"/>
    <machineSettings maxTimeout="1200"/>
For the machineSettings value, you need to update your machine.config in your server.
<section name="machineSettings" type="System.Transactions.Configuration.MachineSettingsSection,
Custom=null"allowdefinition="MachineOnly"allowexedefinition="MachineToApplication" />

15. Transaction Isolation Level Selection

You need to have a proper knowledge when you use isolation levels. The following table will give you a very basic idea so that you can understand the basics and select the appropriate isolation level for your transaction scope.

Isolation Level Suggestion
Serializable It locks data exclusively at the time of read/write operations. For that reason, many times it may create a deadlock, and as a result you may get a timeout exception. You can use this isolation level for a highly secured transactional application like a financial application.
Repeatable Read Same as Serializable except allows phantom rows. May use in a financial application or a heavily transactional application but need to know where phantom row creational scenarios are not there.
Read Committed Most of the applications you can use it. SQL Server default isolation level is this.
Read Un-Committed Applications with these have no need to support concurrent transactions.

Now I will explain with scenarios, how we can use TransactionScope:

16. Requirement-1

Create a transaction in which isolation level will be read committed and transaction timeout will be 5 minutes.


var option = new TransactionOptions();
option.IsolationLevel = IsolationLevel.ReadCommitted;
option.Timeout = TimeSpan.FromMinutes(5);
using (var scope = new TransactionScope(TransactionScopeOption.Required, option))
    ExcuteSQL("CREATE TABLE MyNewTable(Id int);");                                        

First off, create TransactionOptions and set ReadCommitted and 5 minutes to its IsolationLevel and Timeout property, respectively.

Second, create a transactional block by creating a TransactionScope object with its parameterized constructor. In this constructor you will pass a TransactionOptions object which you created early and the TransactionScopeOption.Required value.

One important note, many times we are confused when using a DDL statement (Data Definition Language) in a transaction and a question arises, will it support DDL transaction? The answer is yes. You can use a DDL statement like create/alter/ drop statement in the transaction. You can even use a Truncate statement inside the transaction.

17. Requirement-2

We need to create a transaction where a database operation will be in my local database and another will be in a remote database.


using (var scope = new TransactionScope())

There is no difference between a local or remote/distributed transaction implementation code in transactions. Previously I said that TransactionScope implements ambient type transaction. This means, it automatically marks code blocks that need to support a transaction, local or remote. But you may find an error when working with distributed transactions. The error message will be like:

The partner transaction manager has disabled its support for remote/network transaction.

If you find that type of exception, you need to configure security settings, both your local and remote servers, for MSDTC, and make sure services are running.

To find the MSDTC configuration interface, you will go to:

ControlPanel > AdministritiveTools >ComponentServices > DistributedTransactionCoordinator > LocalDTC

Some options for the Security tab are described bellow:

Property Name Description
Network DTC Access If not selected, MSDTC will not allow any remote transaction
Allow Remote Clients If it is checked, MSDTC will allow to coordinate remote clients for transaction.
Allow Remote Administration Allow remote computers to access and configure these settings.
Allow Inbound Allow computers to flow transaction to local computers. This option is needed where MSDTC is hosted for a resource manager like SQL Server.
Allow Outbound Allow computers to flow transaction to remote computers. It is needed for a client computer where transaction is initiated.
Mutual Authentication Local and Remote computers communicate with encrypted messages. They establish a secured connection with the Windows Domain Account for message communication.
Incoming Calling Authentication Required If mutual authentication cannot be established but the incoming caller is authenticated then communication will be allowed. It supports only Windows 2003/XP ServicePack-2.
No Authentication Required It allows any non-authenticated non-encrypted communication.
Enable XA Transaction Allows different operating systems to communicate with MSDTC with XA Starndard.
DTC Logon Account DTC Service running account. Default account is Network Service.

18. Distributed Transaction Performance

Distributed transactions are slower than local transactions. A two phase commit protocol is used for managing distributed transactions. A two phase commit protocol is nothing but an algorithm by which a distributed transaction is performed. There are three commit protocols that are mostly used:

  1. Auto Commit: Transaction is committed automatically if all SQL statements are executed successfully or rolled-back if any of them fails to execute.
  2. Two Phase Commit: Transaction waits before final commit for messages from all other parties involved in transaction. It locks resources before commit or rollback. For this reason it is called a blocking protocol. In terms of performance it is the reason it is much slower. It is a widely used protocol for managing distributed transactions.
  3. Three Phase Commit: Transaction is finally committed if all nodes are agreed. It is a non-blocking protocol. In terms of performance it is faster than the two phase commit protocol. This protocol is complicated and more expensive but avoids some drawbacks in the two phase commit protocol.

19. Requirement-3

I want to create a transaction inside another transaction.


string connectionString = ConfigurationManager.ConnectionStrings["db"].ConnectionString;
var option = new TransactionOptions
     IsolationLevel = IsolationLevel.ReadCommitted,
     Timeout = TimeSpan.FromSeconds(60)
using (var scopeOuter = new TransactionScope(TransactionScopeOption.Required, option))
    using (var conn = new SqlConnection(connectionString))
        using (SqlCommand cmd = conn.CreateCommand())
            cmd.CommandText="INSERT INTO Data(Code, FirstName)VALUES('A-100','Mr.A')";
    using (var scopeInner = new TransactionScope(TransactionScopeOption.Required, option))
        using (var conn = new SqlConnection(connectionString))
            using (SqlCommand cmd = conn.CreateCommand())
                cmd.CommandText="INSERT INTO Data(Code, FirstName) VALUES('B-100','Mr.B')";

No problems in creating a transaction inside anohter (nested) transaction. You should define the behaviour or the inner transaction. This behaviour is dependent on the value of TransactionScopeOption. If you select Required as TransactionScopeOption, it will join its outer transaction. That means if the outer transaction is committed then the inner transaction will commit if the outer transaction is rolled back, then the inner transcation will be rolled back. If you select the RequiredNew value of TrasnactionScopeOption, a new transaction will be created and this transaction will independently be committed or rolled back. You must be clear about those concepts before working with nested transactions using TransactionScope.

20. Requirement-4

I want to call rollback explicitly from a transaction.


using (var scope = new TransactionScope())
    //either 1 of following lines will use
    //if you comment the following line transaction will
    //automatically be rolled back

If you do not call the TransactionScope.Complete() method then the transaction will automatically be rolled back. If you need to explicitly call rollback for some scenarios, then you have two options:

  1. Executing Transaction.Current.Rollback() will rollback the current transaction.
  2. Executing TransactionScope.Dispose() will also rollback the current transaction.

Just one thing: remember that if you explicitly call Transaction.Rollback() or TranactionScope.Dispose() then you should not call the TransactionScope.Complete() method. If you do so then you will get an ObjectDisposeException.

“Cannot access a disposed object. Object name ‘TransactionScope'”

21. Requirement-5

I want to create a file/folder dynamically inside a transaction scope. If my transaction is rolled back then I want that created file/folder to be removed automatically, like a database row.


string newDirectory = @"D:\TestDirectory";
string connectionString = ConfigurationManager.ConnectionStrings["db"].ConnectionString;
using (var scope = new TransactionScope())
    using (var conn = new SqlConnection(connectionString))
        using (SqlCommand cmd = conn.CreateCommand())
            cmd.CommandText = "Insert into data(Code) values ('A001');";

TranactionScope is not limited for only databases. It will support other data sources like FileSystem, MSMQ, etc. But you need more work to support TranactionScope. First of all what I show in the above code block will not work. Why? Because that directory creation and file creation will not be marked for transaction by default. Then what do we need to do?

public interface IEnlistmentNotification
    void Commit(Enlistment enlistment);       
    void InDoubt(Enlistment enlistment);      
    void Prepare(PreparingEnlistment preparingEnlistment);        
    void Rollback(Enlistment enlistment);

The System.Transactions namespace has an interface named IEnlistmentNotification. If I want my component/service to be transaction aware then I need to implement that interface. The following code will show a very simple and straightforward way to implement this:

public class DirectoryCreator : IEnlistmentNotification
    public string _directoryName; 
    private bool _isCommitSucceed = false;
    public DirectoryCreator(string directoryName)
        _directoryName = directoryName;
        Transaction.Current.EnlistVolatile(this, EnlistmentOptions.None);
    public void Commit(Enlistment enlistment)
        _isCommitSucceed = true;
    public void InDoubt(Enlistment enlistment)
    public void Prepare(PreparingEnlistment preparingEnlistment)
    public void Rollback(Enlistment enlistment)
        if (_isCommitSucceed))

The above class will create a directory (folder) and this component is transaction aware. We can use this class with any TranactionScope and if TranactionScope is committed the directory will be created, otherwise it will be deleted (if already created). I show here just the diretory creation, if you want you can create a class/component for file creation. Now, how to use this class in the transactions scope?

string newDirectory = @"D:\TestDirectory";
string connectionString = ConfigurationManager.ConnectionStrings["db"].ConnectionString;
using (var scope = new TransactionScope())
    using (var conn = new SqlConnection(connectionString))
        using (SqlCommand cmd = conn.CreateCommand())
            cmd.CommandText = "Insert into data(Code) values ('A001');";
    var creator = new DirectoryCreator(newDirectory);

Now, it will work!

Transactional NTFS(TxF) .NET is a open source project. You can use this library for creating/writing/coping file/directory inside transactionscope and it will support transaction automatically.

  • You first need to download component from
  • Add that component as reference to your project.
  • Use component api to your transactional block.

Txf API usage code sample:

using (var ts = new System.Transactions.TransactionScope())
    using (var conn = new SqlConnection(connectionString))
        using (SqlCommand cmd = conn.CreateCommand())
            cmd.CommandText = "Insert into data(Code) values ('A001');";
    TxF.Directory.CreateDirectory("d:\\xyz", true);
    TxF.File.CreateFile("D:\\abc.txt", File.CreationDisposition.OpensFileOrCreate);

TxF component supports:

  • Create/Delete Directory
  • Create/Delete File
  • Read/Write File
  • Copy File

22. Points of Interest

Transaction Management is actually a huge subject. It is a very complex subject too, specially distributed transaction. I tried my level best to present it as simple as possible. If you want to get all transaction related knowledge then you should study more on that. I suggest you read research papers on transactions, specially distributed transactions.

You can also explore more regarding transaction aware service/components. I showed here a very simple way to implement them. But in real life you may face difficult scenarios. So you need to prepare for that. In the near future Microsoft may add transaction aware components like dictionary/filesystem/directory service, etc. If it that happens then developers’ life will be more easier.

Sample Source Code

I have attached a source code sample with this article. I wrote this source code with the help of Visual Studio 2012 with .NET Framework 4.5. I added a Unit Test project so that you can debug/test the code and properly understand it.


Posted in ASP.NET MVC, C#, Database, Uncategorized | Leave a comment

Moq – Mock Database


Moq is a very useful framework which easily mocks service calls and methods for your unit testing.

This article helps you to understand Moq with respect to mocking a database (i.e. writing unit test cases for your repository project).

Here I have used Microsoft Enterprise Library objects (to make it easy to understand) you can very well extend it to any other framework, util or ADO.NET methods. I will also try to cover some advanced concepts used in Moq like anonymous methods, Callback() and Queueing.


I have been using Moq since last, almost an year and found that many people struggle or find difficult to mock databases. Many of us use Dev instance of database and make our test cases call the actual SQL Instance.

Using the code

First things first – Your repository should have a constructor (or a public property)  through which you can pass the mocked database object from the unit test.

Below is sample of such constructor:-

public MyRepository(Databse Db)
  this.database = Db;

Below is sample of an “ExecuteScalar” method (it returns number of employees in a certain location).

using (DbCommand cmd = database.GetStoredProcCommand(SPCREATEPRODUCTLIST))           
    this.database.AddParameter(cmd, "@Location", DbType.String, 0, 
      ParameterDirection.Input, true, 0, 0, "Location", DataRowVersion.Default, location);
    object result = database.ExecuteScalar(cmd);

This is how you can mock a scalar method:

private static Mock<Database> MockExecuteScalar(object returnValue)
   Mock<DbProviderFactory> mockedDBFactory = new Mock<DbProviderFactory>();
   Mock<Database> mockedDB = new Mock<Database>("MockedDB", mockedDBFactory.Object);
   mockedDB.Setup(x => x.ExecuteScalar(It.IsAny<DbCommand>())).Returns(returnValue);
   return mockedDB;

(You can read more about Enterprise library and its implementations at ).

This is quite straight forward, this method mocks the “ExecuteScalar” method (since this method is mentioned as virtual in Database class you are able to mock it. You can only mock Interfaces easily, while mocking a class you can only mock virtual properties and  methods)

Below is how you will call this in your unit test case:

Database mockedDB = MockExecuteScalar("5").Object;
MyRepository target = new MyRepository(mockedDB);
var result = target.GetEmployeeCount("London");

In the same way you can mock “ExecuteNonQuery” implementations:

private static Mock<Database> MockExecuteNonQuery(object returnValue)
   Mock<DbProviderFactory> mockedDBFactory = new Mock<DbProviderFactory>();
   Mock<Database> mockedDB = new Mock<Database>("MockedDB", mockedDBFactory.Object);
   mockedDB.Setup(x => x.ExecuteNonQuery(It.IsAny<DbCommand>())).Returns(1);         
   return mockedDB;

Now, let’s move on to “ExecuteReader” implementations. ExecuteReader can be a collection of rows and we loop the DataReader stream till the end of data. So here there are two functions to mock.

  1. ExecuteReader() – To get the actual data
  2. Read() – To return true till we get the desired data

Below is example of a typical implementation using “ExecuteReader“:

using (DbCommand cmd = database.GetStoredProcCommand("GetEmployeeDetails", parameters))
    using (IDataReader dr = database.ExecuteReader(cmd))
        while (dr.Read())
           listofEmployeeDetails.Add(new Employee
                EmployeeId = dr["EmpID"].ToString();
                EmployeeName = dr["EmployeeName"].toString();
                Location = dr["Location"].toString(); 

First let’s see a simple example where  we will mock “ExecuteReader” to return a single row of data from our MockedDatabase:

Step 1: Mock “Read” method

Before mocking read method I would like to brief you about anonymous methods in Moq functions andCallback() method.


We have already seen the .Returns() method which returns response for a mocked function call. If you want to execute your custom logic after the control comes back from Return() you can use Callback().

This will look something like below:

mockedObject.Setup(x=>x.myMethod(It.IsAny<string>())).Returns("Hello").Callback(//custom logic goes here);

Anonymous Methods

Anonymous methods come handy you are calling a mocked method multiple times and want to change the return value  dynamically.

Below is an example:

string returnValue = "Hello"  

When we call “myMethod” for the very first time, the return value will be “Hello” from second time onward it will return “World”. You can put any conditions or your custom implementation inside this anonymous method to suit your needs.

Now in this scenario we want “ExecuteReader” method to read one row of data. So in that casedataReader.Read() method should return true 1st time only.

So, we can mock .Read() method like:

var mockedDataReader = new Mock<IDataReader>();
bool readFlag = true;
mockedDataReader.Setup(x => x.Read()).Returns(() => readFlag).Callback(() => readFlag = false);

Step 2: Mock ExecuteReader

Before we mock “ExecuteReader” method we need to setup the response data. So when I call dr[“EmpID”]

I get my desired mocked value. We can achieve this like below:-

mockedDataReader.Setup(x => x["EmpID"]).Returns("43527");  
mockedDataReader.Setup(x => x["EmployeeName"]).Returns("Smith");  
mockedDataReader.Setup(x => x["Location"]).Returns("London");

Now we will mock the “ExecuteReader” method, which will return our mocked object.

Mock<DbProviderFactory> mockedDBFactory = new Mock<DbProviderFactory>();
Mock<Database> mockedDB = new Mock<Database>("MockedDB", mockedDBFactory.Object);
mockedDB.Setup(x => x.ExecuteReader(It.IsAny<DbCommand>())).Returns(mockedDataReader.Object);

The above way is same like “ExecuteScalar” and “ExecuteNonQuery” but here we are returning our customDataReader object. Below is how the complete method will look like.

private static Mock<Database> MockExecuteReader(Dictionary<string, object> returnValues)
    var mockedDataReader = new Mock<IDataReader>();
    bool readFlag = true;
    mockedDataReader.Setup(x => x.Read()).Returns(() => readFlag).Callback(() => readFlag = false);
    foreach (KeyValuePair<string, object> keyVal in returnValues)
        mockedDataReader.Setup(x => x[keyVal.Key]).Returns(keyVal.Value);
    Mock<DbProviderFactory> mockedDBFactory = new Mock<DbProviderFactory>();
    Mock<Database> mockedDB = new Mock<Database>("MockedDB", mockedDBFactory.Object);
    mockedDB.Setup(x => x.ExecuteReader(It.IsAny<DbCommand>())).Returns(mockedDataReader.Object);
    return mockedDB;

There might be cases where you want to select multiple rows from database.

Before I start explaining about mocking multiple rows, let me explain one tricky thing in Return() function.

Let say I mocked a method, which I am calling multiple times in my code.


The code above might look OK at first glance. But it will give “Third” as output every time.

Here anonymous functions come real handy but we need to ensure that we get output in certain order. We can achieve it by using Queue. The code will look something like this:-

Queue<object> responseQueue = new Queue<object>();

If you observe the Returns() method will now invoke an anonymous method which will dequeue the values one by one.

For returning multiple rows we will need something similar where we need to Dequeue() each row one by one. The completed method will look like below:-

private static Mock<Database> MockExecuteReader(List<Dictionary<string, object>> returnValues)
  var mockedDataReader = new Mock<IDataReader>();
  int count = 0;
  Queue<object> responseQueue = new Queue<object>();
  mockedDataReader.Setup(x => x.Read()).Returns(() => count<returnValues.Count).Callback(() => count++);
  returnValues.ForEach(rows =>
   foreach (KeyValuePair<string, object> keyVal in rows)
       mockedDataReader.Setup(x => x[keyVal.Key]).Returns(()=>responseQueue.Dequeue());
  Mock<DbProviderFactory> mockedDBFactory = new Mock<DbProviderFactory>();
  Mock<Database> mockedDB = new Mock<Database>("MockedDB", mockedDBFactory.Object);
  mockedDB.Setup(x => x.ExecuteReader(It.IsAny<DbCommand>())).Returns(mockedDataReader.Object);
  return mockedDB;

If you observe the mocking of Read(), it is based on the length of the no. of mocked datarows you want to return (length of the List<>).

On each Callback() a local variable count is incremented so that when it exceeds the number of datarows Read()method will return false.

You can implement the techniques of anonymous methods, Callback method and Queuing in all your unit tests. While mocking Repositories you can use these generic methods to Mock your databases.

Posted in ASP.NET MVC, C#, Database, Uncategorized | Leave a comment

Populating a SelectList from a DataTable


public static SelectList ToSelectList(this DataTable table, string valueField, string textField)
    List<SelectListItem> list = new List<SelectListItem>();

    foreach (DataRow row in table.Rows)
        list.Add(new SelectListItem() 
            Text = row[textField].ToString(), 
            Value = row[valueField].ToString()

    return new SelectList(list, "Value", "Text");

public static System.Web.Mvc.SelectList DT2SelectList(DataTable dt, string valueField, string textField){            
        if (dt == null || valueField == null || valueField.Trim().Length == 0
            || textField == null || textField.Trim().Length ==0)
            return null;

        var list = new List<Object>();

        for (int i = 0; i < dt.Rows.Count; i++)
                value = dt.Rows[i][valueField].ToString(),
                text = dt.Rows[i][textField].ToString()
        return new System.Web.Mvc.SelectList(list.AsEnumerable(), "value", "text");
Posted in ASP.NET MVC, C#, Uncategorized | Leave a comment

Time-series data: Why (and how) to use a relational database instead of NoSQL


These days, time-series data applications (e.g., data center / server / microservice / container monitoring, sensor / IoT analytics, financial data analysis, etc.) are proliferating.

As a result, time-series databases are in fashion (here are 33 of them). Most of these renounce the trappings of a traditional relational database and adopt what is generally known as a NoSQL model. Usage patterns are similar: a recent survey showed that developers preferred NoSQL to relational databases for time-series data by over 2:1.

Relational databases include: MySQL, MariaDB Server, PostgreSQL. NoSQL databases include: Elastic, InfluxDB, MongoDB, Cassandra, Couchbase, Graphite, Prometheus, ClickHouse, OpenTSDB, DalmatinerDB, KairosDB, RiakTS. Source:

Typically, the reason for adopting NoSQL time-series databases comes down to scale. While relational databases have many useful features that most NoSQL databases do not (robust secondary index support; complex predicates; a rich query language; JOINs, etc), they are difficult to scale.

And because time-series data piles up very quickly, many developers believe relational databases are ill-suited for it.

We take a different, somewhat heretical stance: relational databases can be quite powerful for time-series data. One just needs to solve the scaling problem. That is what we do in TimescaleDB.

When we announced TimescaleDB two weeks ago, we received a lot of positive feedback from the community. But we also heard from skeptics, who found it hard to believe that one should (or could) build a scalable time-series database on a relational database (in our case, PostgreSQL).

There are two separate ways to think about scaling: scaling up so that a single machine can store more data, and scaling out so that data can be stored across multiple machines.

Why are both important? The most common approach to scaling out across a cluster of N servers is to partition, or shard, a dataset into N partitions. If each server is limited in its throughput or performance (i.e., unable to scale up), then the overall cluster throughput is greatly reduced.

This post discusses scaling up. (A scaling-out post will be published on a later date.)

In particular, this post explains:

  • Why relational databases do not normally scale up well
  • How LSM trees (typically used in NoSQL databases) do not adequately solve the needs of many time-series applications
  • How time-series data is unique, how one can leverage those differences to overcome the scaling problem, and some performance results

Our motivations are twofold: for anyone facing similar problems, to share what we’ve learned; and for those considering using TimescaleDB for time-series data (including the skeptics!), to explain some of our design decisions.

Why databases do not normally scale up well: Swapping in/out of memory is expensive

A common problem with scaling database performance on a single machine is the significant cost/performance trade-off between memory and disk. While memory is faster than disk, it is much more expensive: about 20x costlier than solid-state storage like Flash, 100x more expensive than hard drives. Eventually, our entire dataset will not fit in memory, which is why we’ll need to write our data and indexes to disk.

This is an old, common problem for relational databases. Under most relational databases, a table is stored as a collection of fixed-size pages of data (e.g., 8KB pages in PostgreSQL), on top of which the system builds data structures (such as B-trees) to index the data. With an index, a query can quickly find a row with a specified ID (e.g., bank account number) without scanning the entire table or “walking” the table in some sorted order.

Now, if the working set of data and indexes is small, we can keep it in memory.

But if the data is sufficiently large that we can’t fit all (similarly fixed-size) pages of our B-tree in memory, then updating a random part of the tree can involve significant disk I/O as we read pages from disk into memory, modify in memory, and then write back out to disk (when evicted to make room for other B-tree pages). And a relational database like PostgreSQL keeps a B-tree (or other data structure) for each table index, in order for values in that index to be found efficiently. So, the problem compounds as you index more columns.

In fact, because the database only accesses the disk in page-sized boundaries, even seemingly small updates can cause these swaps to occur: To change one cell, the database may need to swap out an existing 8KB page and write it back to disk, then read in the new page before modifying it.

But why not use smaller- or variable-sized pages? There are two good reasons: minimizing disk fragmentation, and (in case of a spinning hard disk) minimizing the overhead of the “seek time” (usually 5–10ms) required in physically moving the disk head to a new location.

What about solid-state drives (SSDs)? While solutions like NAND Flash drives eliminate any physical “seek” time, they can only be read from or written to at the page-level granularity (today, typically 8KB). So, even to update a single byte, the SSD firmware needs to read an 8KB page from disk to its buffer cache, modify the page, then write the updated 8KB page back to a new disk block.

The cost of swapping in and out of memory can be seen in this performance graph from PostgreSQL, where insert throughput plunges with table size and increases in variance (depending on whether requests hit in memory or require (potentially multiple) fetches from disk).

Insert throughput as a function of table size for PostgreSQL 9.6.2, running with 10 workers on a Azure standard DS4 v2 (8 core) machine with SSD-based (premium LRS) storage. Clients insert individual rows into the database (each of which has 12 columns: a timestamp, an indexed randomly-chosen primary id, and 10 additional numerical metrics). The PostgreSQL rate starts over 15K inserts/second, but then begins to drop significantly after 50M rows and begins to experience very high variance (including periods of only 100s of inserts/sec).

Enter NoSQL databases with Log-Structured Merge Trees (and new problems)

About a decade ago, we started seeing a number of “NoSQL” storage systems address this problem via Log-structured merge (LSM) trees, which reduce the cost of making small writes by only performing larger append-only writes to disk.

Rather than performing “in-place” writes (where a small change to an existing page requires reading/writing that entire page from/to disk), LSM trees queue up several new updates (including deletes!) into pages and write them as a single batch to disk. In particular, all writes in an LSM tree are performed to a sorted table maintained in memory, which is then flushed to disk as an immutable batch when of sufficient size (as a “sorted string table”, or SSTable). This reduces the cost of making small writes.

In an LSM tree, all updates are first written a sorted table in memory, and then flushed to disk as an immutable batch, stored as an SSTable, which is often indexed in memory.

This architecture — which has been adopted by many “NoSQL” databases like LevelDB, Google BigTable, Cassandra, MongoDB (WiredTiger), and InfluxDB — may seem great at first. Yet it introduces other tradeoffs: higher memory requirements and poor secondary index support.

Higher-memory requirements: Unlike in a B-tree, in an LSM tree there is no single ordering: no global index to give us a sorted order over all keys. Consequently, looking up a value for a key gets more complex: first, check the memory table for the latest version of the key; otherwise, look to (potentially many) on-disk tables to find the latest value associated with that key. To avoid excessive disk I/O (and if the values themselves are large, such as the webpage content stored in Google’s BigTable), indexes for all SSTables may be kept entirely in memory, which in turn increases memory requirements.

Poor secondary index support: Given that they lack any global sorted order, LSM trees do not naturally support secondary indexes. Various systems have added some additional support, such as by duplicating the data in a different order. Or, they emulate support for richer predicates by building their primary key as the concatenation of multiple values. Yet this approach comes with the cost of requiring a larger scan among these keys at query time, thus supporting only items with a limited cardinality (e.g., discrete values, not numeric ones).

There is a better approach to this problem. Let’s start by better understanding time-series data.

Time-series data is different

Let’s take a step back, and look at the original problem that relational databases were designed to solve. Starting from IBM’s seminal System R in the mid-1970s, relational databases were employed for what became known as online transaction processing (OLTP).

Under OLTP, operations are often transactional updates to various rows in a database. For example, think of a bank transfer: a user debits money from one account and credits another. This corresponds to updates to two rows (or even just two cells) of a database table. Because bank transfers can occur between any two accounts, the two rows that are modified are somewhat randomly distributed over the table.

Time-series data arises from many different settings: industrial machines; transportation and logistics; DevOps, datacenter, and server monitoring;, and financial applications.

Now let’s consider a few examples of time-series workloads:

  • DevOps/server/container monitoring. The system typically collects metrics about different servers or containers: CPU usage, free/used memory, network tx/rx, disk IOPS, etc. Each set of metrics is associated with a timestamp, unique server name/ID, and a set of tags that describe an attribute of what is being collected.
  • IoT sensor data. Each IoT device may report multiple sensor readings for each time period. As an example, for environmental and air quality monitoring this could include: temperature, humidity, barometric pressure, sound levels, measurements of nitrogen dioxide, carbon monoxide, particulate matter, etc. Each set of readings is associated with a timestamp and unique device ID, and may contain other metadata.
  • Financial data. Financial tick data may include streams with a timestamp, the name of the security, and its current price and/or price change. Another type of financial data is payment transactions, which would include a unique account ID, timestamp, transaction amount, as well as any other metadata. (Note that this data is different than the OLTP example above: here we are recording every transaction, while the OLTP system was just reflecting the current state of the system.)
  • Fleet/asset management. Data may include a vehicle/asset ID, timestamp, GPS coordinates at that timestamp, and any metadata.

In all of these examples, the datasets are a stream of measurements that involve inserting “new data” into the database, typically to the latest time interval. While it’s possible for data to arrive much later than when it was generated/timestamped, either due to network/system delays or because of corrections to update existing data, this is typically the exception, not the norm.

In other words, these two workloads have very different characteristics:

OLTP Writes

  • Primarily UPDATES
  • Randomly distributed (over the set of primary keys)
  • Often transactions across multiple primary keys

Time-series Writes

  • Primarily INSERTs
  • Primarily to a recent time interval
  • Primarily associated with both a timestamp and a separate primary key (e.g., server ID, device ID, security/account ID, vehicle/asset ID, etc.)

Why does this matter? As we will see, one can take advantage of these characteristics to solve the scaling-up problem on a relational database.

A new way: Adaptive time/space chunking

When previous approaches tried to avoid small writes to disk, they were trying to address the broader OLTP problem of UPDATEs to random locations. But as we just established, time-series workloads are different: writes are primarily INSERTS (not UPDATES), to a recent time interval (not a random location). In other words, time-series workloads are append only.

This is interesting: it means that, if data is sorted by time, we would always be writing towards the “end” of our dataset. Organizing data by time would also allow us to keep the actual working set of database pages rather small, and maintain them in memory. And reads, which we have spent less time discussing, could also benefit: if many read queries are to recent intervals (e.g., for real-time dashboarding), then this data would be already cached in memory.

At first glance, it may seem like indexing on time would give us efficient writes and reads for free. But once we want any other indexes (e.g., another primary key like server/device ID, or any secondary indexes), then this naive approach would revert us back to making random inserts into our B-tree for that index.

There is another way, which we call, “adaptive time/space chunking”. This is what we use in TimescaleDB.

TimescaleDB stores each chunk in an internal database table, so indexes only grow with the size of each chunk, not the entire hypertable. As inserts are largely to the more recent interval, that one remains in memory, avoiding expensive swaps to disk.

Instead of just indexing by time, TimescaleDB builds distinct tables by splitting data according to two dimensions: the time interval and a primary key (e.g., server/device/asset ID). We refer to these as chunks to differentiate them from partitions, which are typically defined by splitting the primary key space. Because each of these chunks are stored as a database table itself, and the query planner is aware of the chunk’s ranges (in time and keyspace), the query planner can immediately tell to which chunk(s) an operation’s data belongs. (This applies both for inserting rows, as well as for pruning the set of chunks that need to be touched when executing queries.)

The key benefit of this approach is that now all of our indexes are built only across these much smaller chunks (tables), rather than a single table representing the entire dataset. So if we size these chunks properly, we can fit the latest tables (and their B-trees) completely in memory, and avoid this swap-to-disk problem, while maintaining support for multiple indexes.

Approaches to implementing chunking

The two intuitive approaches to design this time/space chunking each have significant limitations:

Approach #1: Fixed-duration intervals

Under this approach, all chunks can have fixed, identical time intervals, e.g., 1 day. This works well if the volume of data collected per interval does not change. However, as services become popular, their infrastructure correspondingly expands, leading to more servers and more monitoring data. Similarly, successful IoT products will deploy ever more numbers of devices. And once we start writing too much data to each chunk, we’re regularly swapping to disk (and will find ourselves back at square one). On the flip side, choosing too-small intervals to start with leads to other performance downsides, e.g., having to touch many tables at query time.

Each chunk has a fixed duration in time. Yet if the data volume per time increases, then eventually chunk size becomes too large to fit in memory.

Approach #2: Fixed-sized chunks

With this approach, all chunks have fixed target sizes, e.g., 1GB. A chunk is written to until it reaches its maximum size, at which point it becomes “closed” and its time interval constraints become fixed. Later data falling within the chunk’s “closed” interval will still be written to the chunk, however, in order to preserve the correctness of the chunk’s time constraints.

A key challenge is that the time interval of the chunk depends on the order of data. Consider if data (even a single datapoint) arrives “early” by hours or even days, potentially due to a non-synchronized clock, or because of varying delays in systems with intermittent connectivity. This early datapoint will stretch out the time interval of the “open” chunk, while subsequent on-time data can drive the chunk over its target size. The insert logic for this approach is also more complex and expensive, driving down throughput for large batch writes (such as large COPY operations), as the database needs to make sure it inserts data in temporal order to determine when a new chunk should be created (even in the middle of an operation). Other problems exist for fixed- or max-size chunks as well, including time intervals that may not align well with data retention policies (“delete data after 30 days”).

Each chunk’s time interval is fixed only once its maximum size has been reached. Yet if data arrives early, this creates a large interval for the chunk, and the chunk eventually becomes too large to fit in memory.

TimescaleDB takes a third approach that couples the strengths of both approaches.

Approach #3: Adaptive intervals (our current design)

Chunks are created with a fixed interval, but the interval adapts from chunk-to-chunk based on changes in data volumes in order to hit maximum target sizes.

By avoiding open-ended intervals, this approach ensures that data arriving early doesn’t create too-long time intervals that will subsequently lead to over-large chunks. Further, like static intervals, it more naturally supports retention policies specified on time, e.g., “delete data after 30 days”. Given TimescaleDB’s time-based chunking, such policies are implemented by simply dropping chunks (tables) in the database. This means that individual files in the underlying file system can simply be deleted, rather than needing to delete individual rows, which requires erasing/invalidating portions of the underlying file. Such an approach therefore avoids fragmentation in the underlying database files, which in turn avoids the need for vacuuming. And this vacuuming can be prohibitively expensive in very large tables.

Still, this approach ensures that chunks are sized appropriately so that the latest ones can be maintained in memory, even as data volumes may change.

Partitioning by primary key then takes each time interval and further splits it into a number of smaller chunks, which all share the same time interval but are disjoint in terms of their primary keyspace. This enables better parallelization both on servers with multiple disks — for both inserts and queries — — as well as multiple servers. More on these issues in a later post.

If the data volume per time increases, then chunk interval decreases to maintain right-sized chunks.
If data arrives early, then data is stored into a “future” chunk to maintain right-sized chunks.

Result: 15x improvement in insert rate

Keeping chunks at the right size is how we achieve our INSERT results that surpass vanilla PostgreSQL, that Ajay already showed in his earlier post.

Insert throughput of TimescaleDB vs. PostgreSQL, using the same workload as described earlier. Unlike vanilla PostgreSQL, TimescaleDB maintains a constant insert rate (of about 14.4K inserts/second, or 144K metrics/second, with very low variance), independent of dataset size.

This consistent insert throughput also persists when writing large batches of rows in single operations to TimescaleDB (instead of row-by-row). Such batched inserts are common practice for databases employed in more high-scale production environments, e.g., when ingesting data from a distributed queue like Kafka. In such scenarios, a single Timescale server can ingest 130K rows (or 1.3M metrics) per second, approximately 15x that of vanilla PostgreSQL once the table has reached a couple 100M rows.

Insert throughput of TimescaleDB vs. PostgreSQL when performing INSERTs of 10,000-row batches.


A relational database can be quite powerful for time-series data. Yet, the costs of swapping in/out of memory significantly impacts their performance. But NoSQL approaches that implement Log Structured Merge Trees have only shifted the problem, introducing higher memory requirements and poor secondary index support.

By recognizing that time-series data is different, we are able to organize data in a new way: adaptive time/space chunking. This minimizes swapping to disk by keeping the working data set small enough to fit inside memory, while allowing us to maintain robust primary and secondary index support (and the full feature set of PostgreSQL). And as a result, we are able to scale upPostgreSQL significantly, resulting in a 15x improvement on insert rates.

But what about performance comparisons to NoSQL databases? That post is coming soon.

In the meantime, you can download the latest version of TimescaleDB, released under the permissive Apache 2 license, on GitHub.

Posted in Business Model, Integration, Problem solving, Programming | Leave a comment

How to architect Online Payment Processing System for an online store?

Source from

So if you’ve decided to develop an On-line Payments system for your e-commerce I really advise you to read this short article.

What aspects should I consider before building Payment Processing System?

  1. PCI DSS for Credit Card (CC) Processing, which stands for Payment Card Industry Data Security Standard that has 12 rules that enforce some level of security to protect Credit Card information, but can be applied for any Personal Identifiable Information. In order to process credit cards you may be subjected to PCI DSS Audit and certification, which may imply big costs or personal liabilities.
  2. Security and Encryption. This aspect is closely related to PCI DSS, which is enforces multiple processes into your software development process. However, you don’t have to process credit cards to worry about security, but security should be of a high stake for your team and every team member, because it is very hard to gain trust and super easy to get it lost.
  3. Geography. This is very important subject on which will depend the list of methods of payment you should accept, which localisations and cultures should be supported, where your servers should be and how fast they should perform.
  4. Traffic and Scalability. Software Requirements, as well as Payments Processing, differs depending on the scale of your system. If you do a few sales per day you can 100% outsource Payment Processing to a Payment Service Provider (aka Stripe), but if you must process millions or billions per year than your system architecture and amount of partners will differ, as a result, complexity will skyrocket comparably to the basic case.
  5. MCC codes Or the Industry you are in. Depending on the industry your business is in effect on your architecture can be dramatic, system design and legal implications. If you do Poker, Gambling or Adult (18+) payment processing you will see pretty big difference and risks comparably to a e-commerce shop, as well as required knowledge and legal restrictions and regulations.
  6. Backup payment processors. If you need to handle retries and backup Payment Service Providers you may be forced to have different architecture and security restrictions.
  7. Operations and customer support. Thinking about Payment architecture you should not forget about your Customer Happiness team as well as Fraud and Business analysts who need to consume and reconfigure system “during the flight”.
  8. Business acquisition and merges. If your business has parent-child relationships with other companies and they have built a payment processing platform that can be reused you can save a lot of time and money. From another side, if you a CTO of an umbrella company you may need to work on a processing system that may be a Gateway to other companies in the group.
  9. Cloud vs On-Premise. This point comes more like “To be or not to be” in the current days and before saying that cloud is not security or reliable I would advise to check this page. And reliability mostly depends on the Engineering that rules the system.
  10. Analytics. If you want to be profitable and improve over time you will have to have analytics that can influence system architecture dramatically to help your team answer many questions. Another important point here is the duration of your waiting period before you need your answers (minutes vs days) which may have difference between OLAP or Data Streaming approaches.
  11. Fraud (External or Internal). Yes, the fraud can be internal as well, so developing payment processing system you always need to think about internal and external breaches, data trading or similar issues. Here you can employ already standard rule-base systems and extend them with machine learning systems and manual ad-hoc reviews.
  12. Mobile. Whether you need to support Native Mobile Application or not, mobile may have addition architectural and deployment implications. For example, you have a native App, but you cannot control when the user will update it, so you may be forced to support a huge set of versions and API, but knowing this upfront may help you a lot. And don’t forget that in the AppStores you cannot use your Payment system to pay for In-App purchases (and you may not want this) which has 30% revenue share, but if you sell good or services which are consumed outside of an App you may willingly go for off-App Payment Processing, which brings additional complexity.

There are may be more aspects that you need to consider while architecting payment processing system, but I believe I covered the most important ones.

PCI DSS requirements scope

Let me continue with Payment Processing architecture evolution (one of the possible branches and I would be happy to hear in comments more versions).

Architectural Evolution of Payment Processing Systems.

It is wise to begin with the most simple pragmatic solution, but always have future scalability and growth in mind (develop isolated in-proc services which can be further be decoupled/extracted).

Don’t do it at all or outsource as much as possible.

  1. Don’t do it at all or outsource as much as possible. I would go for this solution if it satisfies business and technical requirements. Why? Because Payments domain is complex and regulated to some extend, which can be very time consuming and not always rewarding. Outsourcing can be as simple as just redirecting a user to your payment partner to make a transaction where your customer will see branded payment method selection page that returns to your side after the transaction has been completed or cancelled. In this case you need to handle redirection and a bit of Payment State to not allow duplicates and receive notifications properly. Examples here can be Stripe, Adyen, GlobalCollect etc, however, normally Payment Processing companies charge you additional fees only for that branded page on top of processing and other fees. So if you just start your business 100% Payment Processing outsourcing is the best option OR I would say is the ONLY liable option.
  2. Process everything yourself, but outsource Credit Cards. When you grow you may need/want/desire more control over payment processing, but you still don’t want or are not ready to bother with PCI DSS certification (even Level 3–4 for merchants), so you may choose to make your payment method selection page and store all Payment Accounts/PII (Personal Identifiable Information), but outsource Credit Card entry forms and processing. For this end previous solutions, you probably will begin with monolith system and store payment accounts in your local database, which is part of your main database. So you will face high coupling, but ease of deployment and, to some extend, development. However, even having system deployed as a monolith I would recommend start thinking about decoupling, testability, scalability from the very beginning, because it is just plain easier to do this when you start.
  3. SOA and employ batching. This case this is natural evolution from the previous step where you see some parts of your system needed to be scaled/secured independently. For instance, you want to pay more attention to security of Payment Accounts and user data, so you extract Payment Account and User Account services where you may encrypt data and stored in separated databases OR put a HAProxy for those services separately and deploy them in a separated backend farm behind DMZ. Additional benefit can be employing of background processing in batches (or not) to offload front-end a bit, however, here you can have another issues which you might not have faced before — asynchronous payment processing and it’s complexity. But if you can please start with Asynchronous Payment Processing, it does imply that your checkout process is asynchronous, but it gives you huge benefits. For instance, Amazon and PokerStars do this, so they have time to deal with scalability and other issues.
  4. Utilize queues to make your system more reliable and resistant to change and load. As part of this step you can create software design where all communication, or most of it, between components done via queues (where async is permitted). For example, you may have queues for internal Fraud scoring, external Fraud scoring (aka MaxMind or ThreadMetrics), storage, notifications of other systems, a queue for every Payment Processor and many more moving pieces.
  5. Microservices and related parts. This is very trendy now, but it has a lot of benefits, as well as disadvantages, especially for Payments. If it is done well it can be very beneficial and cost efficient, however, to do it well you need to consider all aspects of the system, such as: deployment, monitoring, service discovery, reliability, security, administration, automation etc (I don’t say that all systems above should forget about these aspects, but when it was internal communication within those architectures here it may be out-of-proc communication and networks issues). This article is not about Mircoservices, but they can be used and make sense only at a big scale Merchants or Payment Service Providers. Contrary, even at a medium size business if the software development team has expertise managing Mircoservices.
  6. Multi Availability Zone (AZ) deployments. When you process payments across Globe you need to think better about latency and reliability which can be reached with Multi AZ installations. I put this as a separated point only to highlight the complexity, but most probably it will be employed from the monolith architecture already to have higher up-time and improve deployment (Red-Green-Blue). There are many issues come from multiple AZ, such as replication, split-brain and synchronisation, data gathering and analysis, traffic routing and monitoring, and of course security.

Microservices will work perfectly for Payments domain, but add a lot of additional complexity.

In relation to other aspects of your system, like mobile, you will need to consider mobile clients like another UI for your system, however, there is a difference that you don’t control client versions and lifecycle, so you may need to support multiple versions of your Payments API.

In summary, consider Online-Payment Processing System as a software system that has higher security and reliability requirements, because literally every call has direct financial impact on the bottom line of your business.

Thank you for reading and I look forward to reading your feedback.

Posted in ASP.NET MVC, Business Model, Integration, Knowledge, Technology, Web 2.0 API vs RSS Reader | Leave a comment