Populating a SelectList from a DataTable

From https://stackoverflow.com/questions/1439374/populating-a-selectlist-from-a-datatable

public static SelectList ToSelectList(this DataTable table, string valueField, string textField)
{
    List<SelectListItem> list = new List<SelectListItem>();

    foreach (DataRow row in table.Rows)
    {
        list.Add(new SelectListItem() 
        {
            Text = row[textField].ToString(), 
            Value = row[valueField].ToString()
        });
    }

    return new SelectList(list, "Value", "Text");
}


public static System.Web.Mvc.SelectList DT2SelectList(DataTable dt, string valueField, string textField){            
        if (dt == null || valueField == null || valueField.Trim().Length == 0
            || textField == null || textField.Trim().Length ==0)
            return null;


        var list = new List<Object>();

        for (int i = 0; i < dt.Rows.Count; i++)
        {
            list.Add(new
            {
                value = dt.Rows[i][valueField].ToString(),
                text = dt.Rows[i][textField].ToString()
            });
        }
        return new System.Web.Mvc.SelectList(list.AsEnumerable(), "value", "text");
    }
Posted in ASP.NET MVC, C#, Uncategorized | Leave a comment

Time-series data: Why (and how) to use a relational database instead of NoSQL

From https://blog.timescale.com/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c

These days, time-series data applications (e.g., data center / server / microservice / container monitoring, sensor / IoT analytics, financial data analysis, etc.) are proliferating.

As a result, time-series databases are in fashion (here are 33 of them). Most of these renounce the trappings of a traditional relational database and adopt what is generally known as a NoSQL model. Usage patterns are similar: a recent survey showed that developers preferred NoSQL to relational databases for time-series data by over 2:1.

Relational databases include: MySQL, MariaDB Server, PostgreSQL. NoSQL databases include: Elastic, InfluxDB, MongoDB, Cassandra, Couchbase, Graphite, Prometheus, ClickHouse, OpenTSDB, DalmatinerDB, KairosDB, RiakTS. Source: https://www.percona.com/blog/2017/02/10/percona-blog-poll-database-engine-using-store-time-series-data/

Typically, the reason for adopting NoSQL time-series databases comes down to scale. While relational databases have many useful features that most NoSQL databases do not (robust secondary index support; complex predicates; a rich query language; JOINs, etc), they are difficult to scale.

And because time-series data piles up very quickly, many developers believe relational databases are ill-suited for it.

We take a different, somewhat heretical stance: relational databases can be quite powerful for time-series data. One just needs to solve the scaling problem. That is what we do in TimescaleDB.

When we announced TimescaleDB two weeks ago, we received a lot of positive feedback from the community. But we also heard from skeptics, who found it hard to believe that one should (or could) build a scalable time-series database on a relational database (in our case, PostgreSQL).

There are two separate ways to think about scaling: scaling up so that a single machine can store more data, and scaling out so that data can be stored across multiple machines.

Why are both important? The most common approach to scaling out across a cluster of N servers is to partition, or shard, a dataset into N partitions. If each server is limited in its throughput or performance (i.e., unable to scale up), then the overall cluster throughput is greatly reduced.

This post discusses scaling up. (A scaling-out post will be published on a later date.)

In particular, this post explains:

  • Why relational databases do not normally scale up well
  • How LSM trees (typically used in NoSQL databases) do not adequately solve the needs of many time-series applications
  • How time-series data is unique, how one can leverage those differences to overcome the scaling problem, and some performance results

Our motivations are twofold: for anyone facing similar problems, to share what we’ve learned; and for those considering using TimescaleDB for time-series data (including the skeptics!), to explain some of our design decisions.


Why databases do not normally scale up well: Swapping in/out of memory is expensive

A common problem with scaling database performance on a single machine is the significant cost/performance trade-off between memory and disk. While memory is faster than disk, it is much more expensive: about 20x costlier than solid-state storage like Flash, 100x more expensive than hard drives. Eventually, our entire dataset will not fit in memory, which is why we’ll need to write our data and indexes to disk.

This is an old, common problem for relational databases. Under most relational databases, a table is stored as a collection of fixed-size pages of data (e.g., 8KB pages in PostgreSQL), on top of which the system builds data structures (such as B-trees) to index the data. With an index, a query can quickly find a row with a specified ID (e.g., bank account number) without scanning the entire table or “walking” the table in some sorted order.

Now, if the working set of data and indexes is small, we can keep it in memory.

But if the data is sufficiently large that we can’t fit all (similarly fixed-size) pages of our B-tree in memory, then updating a random part of the tree can involve significant disk I/O as we read pages from disk into memory, modify in memory, and then write back out to disk (when evicted to make room for other B-tree pages). And a relational database like PostgreSQL keeps a B-tree (or other data structure) for each table index, in order for values in that index to be found efficiently. So, the problem compounds as you index more columns.

In fact, because the database only accesses the disk in page-sized boundaries, even seemingly small updates can cause these swaps to occur: To change one cell, the database may need to swap out an existing 8KB page and write it back to disk, then read in the new page before modifying it.

But why not use smaller- or variable-sized pages? There are two good reasons: minimizing disk fragmentation, and (in case of a spinning hard disk) minimizing the overhead of the “seek time” (usually 5–10ms) required in physically moving the disk head to a new location.

What about solid-state drives (SSDs)? While solutions like NAND Flash drives eliminate any physical “seek” time, they can only be read from or written to at the page-level granularity (today, typically 8KB). So, even to update a single byte, the SSD firmware needs to read an 8KB page from disk to its buffer cache, modify the page, then write the updated 8KB page back to a new disk block.

The cost of swapping in and out of memory can be seen in this performance graph from PostgreSQL, where insert throughput plunges with table size and increases in variance (depending on whether requests hit in memory or require (potentially multiple) fetches from disk).

Insert throughput as a function of table size for PostgreSQL 9.6.2, running with 10 workers on a Azure standard DS4 v2 (8 core) machine with SSD-based (premium LRS) storage. Clients insert individual rows into the database (each of which has 12 columns: a timestamp, an indexed randomly-chosen primary id, and 10 additional numerical metrics). The PostgreSQL rate starts over 15K inserts/second, but then begins to drop significantly after 50M rows and begins to experience very high variance (including periods of only 100s of inserts/sec).


Enter NoSQL databases with Log-Structured Merge Trees (and new problems)

About a decade ago, we started seeing a number of “NoSQL” storage systems address this problem via Log-structured merge (LSM) trees, which reduce the cost of making small writes by only performing larger append-only writes to disk.

Rather than performing “in-place” writes (where a small change to an existing page requires reading/writing that entire page from/to disk), LSM trees queue up several new updates (including deletes!) into pages and write them as a single batch to disk. In particular, all writes in an LSM tree are performed to a sorted table maintained in memory, which is then flushed to disk as an immutable batch when of sufficient size (as a “sorted string table”, or SSTable). This reduces the cost of making small writes.

In an LSM tree, all updates are first written a sorted table in memory, and then flushed to disk as an immutable batch, stored as an SSTable, which is often indexed in memory.
(Source: https://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/)

This architecture — which has been adopted by many “NoSQL” databases like LevelDB, Google BigTable, Cassandra, MongoDB (WiredTiger), and InfluxDB — may seem great at first. Yet it introduces other tradeoffs: higher memory requirements and poor secondary index support.

Higher-memory requirements: Unlike in a B-tree, in an LSM tree there is no single ordering: no global index to give us a sorted order over all keys. Consequently, looking up a value for a key gets more complex: first, check the memory table for the latest version of the key; otherwise, look to (potentially many) on-disk tables to find the latest value associated with that key. To avoid excessive disk I/O (and if the values themselves are large, such as the webpage content stored in Google’s BigTable), indexes for all SSTables may be kept entirely in memory, which in turn increases memory requirements.

Poor secondary index support: Given that they lack any global sorted order, LSM trees do not naturally support secondary indexes. Various systems have added some additional support, such as by duplicating the data in a different order. Or, they emulate support for richer predicates by building their primary key as the concatenation of multiple values. Yet this approach comes with the cost of requiring a larger scan among these keys at query time, thus supporting only items with a limited cardinality (e.g., discrete values, not numeric ones).

There is a better approach to this problem. Let’s start by better understanding time-series data.


Time-series data is different

Let’s take a step back, and look at the original problem that relational databases were designed to solve. Starting from IBM’s seminal System R in the mid-1970s, relational databases were employed for what became known as online transaction processing (OLTP).

Under OLTP, operations are often transactional updates to various rows in a database. For example, think of a bank transfer: a user debits money from one account and credits another. This corresponds to updates to two rows (or even just two cells) of a database table. Because bank transfers can occur between any two accounts, the two rows that are modified are somewhat randomly distributed over the table.

Time-series data arises from many different settings: industrial machines; transportation and logistics; DevOps, datacenter, and server monitoring;, and financial applications.

Now let’s consider a few examples of time-series workloads:

  • DevOps/server/container monitoring. The system typically collects metrics about different servers or containers: CPU usage, free/used memory, network tx/rx, disk IOPS, etc. Each set of metrics is associated with a timestamp, unique server name/ID, and a set of tags that describe an attribute of what is being collected.
  • IoT sensor data. Each IoT device may report multiple sensor readings for each time period. As an example, for environmental and air quality monitoring this could include: temperature, humidity, barometric pressure, sound levels, measurements of nitrogen dioxide, carbon monoxide, particulate matter, etc. Each set of readings is associated with a timestamp and unique device ID, and may contain other metadata.
  • Financial data. Financial tick data may include streams with a timestamp, the name of the security, and its current price and/or price change. Another type of financial data is payment transactions, which would include a unique account ID, timestamp, transaction amount, as well as any other metadata. (Note that this data is different than the OLTP example above: here we are recording every transaction, while the OLTP system was just reflecting the current state of the system.)
  • Fleet/asset management. Data may include a vehicle/asset ID, timestamp, GPS coordinates at that timestamp, and any metadata.

In all of these examples, the datasets are a stream of measurements that involve inserting “new data” into the database, typically to the latest time interval. While it’s possible for data to arrive much later than when it was generated/timestamped, either due to network/system delays or because of corrections to update existing data, this is typically the exception, not the norm.

In other words, these two workloads have very different characteristics:

OLTP Writes

  • Primarily UPDATES
  • Randomly distributed (over the set of primary keys)
  • Often transactions across multiple primary keys

Time-series Writes

  • Primarily INSERTs
  • Primarily to a recent time interval
  • Primarily associated with both a timestamp and a separate primary key (e.g., server ID, device ID, security/account ID, vehicle/asset ID, etc.)

Why does this matter? As we will see, one can take advantage of these characteristics to solve the scaling-up problem on a relational database.


A new way: Adaptive time/space chunking

When previous approaches tried to avoid small writes to disk, they were trying to address the broader OLTP problem of UPDATEs to random locations. But as we just established, time-series workloads are different: writes are primarily INSERTS (not UPDATES), to a recent time interval (not a random location). In other words, time-series workloads are append only.

This is interesting: it means that, if data is sorted by time, we would always be writing towards the “end” of our dataset. Organizing data by time would also allow us to keep the actual working set of database pages rather small, and maintain them in memory. And reads, which we have spent less time discussing, could also benefit: if many read queries are to recent intervals (e.g., for real-time dashboarding), then this data would be already cached in memory.

At first glance, it may seem like indexing on time would give us efficient writes and reads for free. But once we want any other indexes (e.g., another primary key like server/device ID, or any secondary indexes), then this naive approach would revert us back to making random inserts into our B-tree for that index.

There is another way, which we call, “adaptive time/space chunking”. This is what we use in TimescaleDB.

TimescaleDB stores each chunk in an internal database table, so indexes only grow with the size of each chunk, not the entire hypertable. As inserts are largely to the more recent interval, that one remains in memory, avoiding expensive swaps to disk.

Instead of just indexing by time, TimescaleDB builds distinct tables by splitting data according to two dimensions: the time interval and a primary key (e.g., server/device/asset ID). We refer to these as chunks to differentiate them from partitions, which are typically defined by splitting the primary key space. Because each of these chunks are stored as a database table itself, and the query planner is aware of the chunk’s ranges (in time and keyspace), the query planner can immediately tell to which chunk(s) an operation’s data belongs. (This applies both for inserting rows, as well as for pruning the set of chunks that need to be touched when executing queries.)

The key benefit of this approach is that now all of our indexes are built only across these much smaller chunks (tables), rather than a single table representing the entire dataset. So if we size these chunks properly, we can fit the latest tables (and their B-trees) completely in memory, and avoid this swap-to-disk problem, while maintaining support for multiple indexes.

Approaches to implementing chunking

The two intuitive approaches to design this time/space chunking each have significant limitations:

Approach #1: Fixed-duration intervals

Under this approach, all chunks can have fixed, identical time intervals, e.g., 1 day. This works well if the volume of data collected per interval does not change. However, as services become popular, their infrastructure correspondingly expands, leading to more servers and more monitoring data. Similarly, successful IoT products will deploy ever more numbers of devices. And once we start writing too much data to each chunk, we’re regularly swapping to disk (and will find ourselves back at square one). On the flip side, choosing too-small intervals to start with leads to other performance downsides, e.g., having to touch many tables at query time.

Each chunk has a fixed duration in time. Yet if the data volume per time increases, then eventually chunk size becomes too large to fit in memory.

Approach #2: Fixed-sized chunks

With this approach, all chunks have fixed target sizes, e.g., 1GB. A chunk is written to until it reaches its maximum size, at which point it becomes “closed” and its time interval constraints become fixed. Later data falling within the chunk’s “closed” interval will still be written to the chunk, however, in order to preserve the correctness of the chunk’s time constraints.

A key challenge is that the time interval of the chunk depends on the order of data. Consider if data (even a single datapoint) arrives “early” by hours or even days, potentially due to a non-synchronized clock, or because of varying delays in systems with intermittent connectivity. This early datapoint will stretch out the time interval of the “open” chunk, while subsequent on-time data can drive the chunk over its target size. The insert logic for this approach is also more complex and expensive, driving down throughput for large batch writes (such as large COPY operations), as the database needs to make sure it inserts data in temporal order to determine when a new chunk should be created (even in the middle of an operation). Other problems exist for fixed- or max-size chunks as well, including time intervals that may not align well with data retention policies (“delete data after 30 days”).

Each chunk’s time interval is fixed only once its maximum size has been reached. Yet if data arrives early, this creates a large interval for the chunk, and the chunk eventually becomes too large to fit in memory.

TimescaleDB takes a third approach that couples the strengths of both approaches.

Approach #3: Adaptive intervals (our current design)

Chunks are created with a fixed interval, but the interval adapts from chunk-to-chunk based on changes in data volumes in order to hit maximum target sizes.

By avoiding open-ended intervals, this approach ensures that data arriving early doesn’t create too-long time intervals that will subsequently lead to over-large chunks. Further, like static intervals, it more naturally supports retention policies specified on time, e.g., “delete data after 30 days”. Given TimescaleDB’s time-based chunking, such policies are implemented by simply dropping chunks (tables) in the database. This means that individual files in the underlying file system can simply be deleted, rather than needing to delete individual rows, which requires erasing/invalidating portions of the underlying file. Such an approach therefore avoids fragmentation in the underlying database files, which in turn avoids the need for vacuuming. And this vacuuming can be prohibitively expensive in very large tables.

Still, this approach ensures that chunks are sized appropriately so that the latest ones can be maintained in memory, even as data volumes may change.

Partitioning by primary key then takes each time interval and further splits it into a number of smaller chunks, which all share the same time interval but are disjoint in terms of their primary keyspace. This enables better parallelization both on servers with multiple disks — for both inserts and queries — — as well as multiple servers. More on these issues in a later post.

If the data volume per time increases, then chunk interval decreases to maintain right-sized chunks.
If data arrives early, then data is stored into a “future” chunk to maintain right-sized chunks.

Result: 15x improvement in insert rate

Keeping chunks at the right size is how we achieve our INSERT results that surpass vanilla PostgreSQL, that Ajay already showed in his earlier post.

Insert throughput of TimescaleDB vs. PostgreSQL, using the same workload as described earlier. Unlike vanilla PostgreSQL, TimescaleDB maintains a constant insert rate (of about 14.4K inserts/second, or 144K metrics/second, with very low variance), independent of dataset size.

This consistent insert throughput also persists when writing large batches of rows in single operations to TimescaleDB (instead of row-by-row). Such batched inserts are common practice for databases employed in more high-scale production environments, e.g., when ingesting data from a distributed queue like Kafka. In such scenarios, a single Timescale server can ingest 130K rows (or 1.3M metrics) per second, approximately 15x that of vanilla PostgreSQL once the table has reached a couple 100M rows.

Insert throughput of TimescaleDB vs. PostgreSQL when performing INSERTs of 10,000-row batches.

Summary

A relational database can be quite powerful for time-series data. Yet, the costs of swapping in/out of memory significantly impacts their performance. But NoSQL approaches that implement Log Structured Merge Trees have only shifted the problem, introducing higher memory requirements and poor secondary index support.

By recognizing that time-series data is different, we are able to organize data in a new way: adaptive time/space chunking. This minimizes swapping to disk by keeping the working data set small enough to fit inside memory, while allowing us to maintain robust primary and secondary index support (and the full feature set of PostgreSQL). And as a result, we are able to scale upPostgreSQL significantly, resulting in a 15x improvement on insert rates.

But what about performance comparisons to NoSQL databases? That post is coming soon.

In the meantime, you can download the latest version of TimescaleDB, released under the permissive Apache 2 license, on GitHub.

Posted in Business Model, Integration, Problem solving, Programming | Leave a comment

How to architect Online Payment Processing System for an online store?

Source from https://medium.com/@distributedleo/how-to-architect-online-payment-processing-system-for-an-online-store-6dc84350a39

So if you’ve decided to develop an On-line Payments system for your e-commerce I really advise you to read this short article.

What aspects should I consider before building Payment Processing System?

  1. PCI DSS for Credit Card (CC) Processing, which stands for Payment Card Industry Data Security Standard that has 12 rules that enforce some level of security to protect Credit Card information, but can be applied for any Personal Identifiable Information. In order to process credit cards you may be subjected to PCI DSS Audit and certification, which may imply big costs or personal liabilities.
  2. Security and Encryption. This aspect is closely related to PCI DSS, which is enforces multiple processes into your software development process. However, you don’t have to process credit cards to worry about security, but security should be of a high stake for your team and every team member, because it is very hard to gain trust and super easy to get it lost.
  3. Geography. This is very important subject on which will depend the list of methods of payment you should accept, which localisations and cultures should be supported, where your servers should be and how fast they should perform.
  4. Traffic and Scalability. Software Requirements, as well as Payments Processing, differs depending on the scale of your system. If you do a few sales per day you can 100% outsource Payment Processing to a Payment Service Provider (aka Stripe), but if you must process millions or billions per year than your system architecture and amount of partners will differ, as a result, complexity will skyrocket comparably to the basic case.
  5. MCC codes Or the Industry you are in. Depending on the industry your business is in effect on your architecture can be dramatic, system design and legal implications. If you do Poker, Gambling or Adult (18+) payment processing you will see pretty big difference and risks comparably to a e-commerce shop, as well as required knowledge and legal restrictions and regulations.
  6. Backup payment processors. If you need to handle retries and backup Payment Service Providers you may be forced to have different architecture and security restrictions.
  7. Operations and customer support. Thinking about Payment architecture you should not forget about your Customer Happiness team as well as Fraud and Business analysts who need to consume and reconfigure system “during the flight”.
  8. Business acquisition and merges. If your business has parent-child relationships with other companies and they have built a payment processing platform that can be reused you can save a lot of time and money. From another side, if you a CTO of an umbrella company you may need to work on a processing system that may be a Gateway to other companies in the group.
  9. Cloud vs On-Premise. This point comes more like “To be or not to be” in the current days and before saying that cloud is not security or reliable I would advise to check this page. And reliability mostly depends on the Engineering that rules the system.
  10. Analytics. If you want to be profitable and improve over time you will have to have analytics that can influence system architecture dramatically to help your team answer many questions. Another important point here is the duration of your waiting period before you need your answers (minutes vs days) which may have difference between OLAP or Data Streaming approaches.
  11. Fraud (External or Internal). Yes, the fraud can be internal as well, so developing payment processing system you always need to think about internal and external breaches, data trading or similar issues. Here you can employ already standard rule-base systems and extend them with machine learning systems and manual ad-hoc reviews.
  12. Mobile. Whether you need to support Native Mobile Application or not, mobile may have addition architectural and deployment implications. For example, you have a native App, but you cannot control when the user will update it, so you may be forced to support a huge set of versions and API, but knowing this upfront may help you a lot. And don’t forget that in the AppStores you cannot use your Payment system to pay for In-App purchases (and you may not want this) which has 30% revenue share, but if you sell good or services which are consumed outside of an App you may willingly go for off-App Payment Processing, which brings additional complexity.

There are may be more aspects that you need to consider while architecting payment processing system, but I believe I covered the most important ones.

PCI DSS requirements scope

Let me continue with Payment Processing architecture evolution (one of the possible branches and I would be happy to hear in comments more versions).

Architectural Evolution of Payment Processing Systems.

It is wise to begin with the most simple pragmatic solution, but always have future scalability and growth in mind (develop isolated in-proc services which can be further be decoupled/extracted).

Don’t do it at all or outsource as much as possible.

  1. Don’t do it at all or outsource as much as possible. I would go for this solution if it satisfies business and technical requirements. Why? Because Payments domain is complex and regulated to some extend, which can be very time consuming and not always rewarding. Outsourcing can be as simple as just redirecting a user to your payment partner to make a transaction where your customer will see branded payment method selection page that returns to your side after the transaction has been completed or cancelled. In this case you need to handle redirection and a bit of Payment State to not allow duplicates and receive notifications properly. Examples here can be Stripe, Adyen, GlobalCollect etc, however, normally Payment Processing companies charge you additional fees only for that branded page on top of processing and other fees. So if you just start your business 100% Payment Processing outsourcing is the best option OR I would say is the ONLY liable option.
  2. Process everything yourself, but outsource Credit Cards. When you grow you may need/want/desire more control over payment processing, but you still don’t want or are not ready to bother with PCI DSS certification (even Level 3–4 for merchants), so you may choose to make your payment method selection page and store all Payment Accounts/PII (Personal Identifiable Information), but outsource Credit Card entry forms and processing. For this end previous solutions, you probably will begin with monolith system and store payment accounts in your local database, which is part of your main database. So you will face high coupling, but ease of deployment and, to some extend, development. However, even having system deployed as a monolith I would recommend start thinking about decoupling, testability, scalability from the very beginning, because it is just plain easier to do this when you start.
  3. SOA and employ batching. This case this is natural evolution from the previous step where you see some parts of your system needed to be scaled/secured independently. For instance, you want to pay more attention to security of Payment Accounts and user data, so you extract Payment Account and User Account services where you may encrypt data and stored in separated databases OR put a HAProxy for those services separately and deploy them in a separated backend farm behind DMZ. Additional benefit can be employing of background processing in batches (or not) to offload front-end a bit, however, here you can have another issues which you might not have faced before — asynchronous payment processing and it’s complexity. But if you can please start with Asynchronous Payment Processing, it does imply that your checkout process is asynchronous, but it gives you huge benefits. For instance, Amazon and PokerStars do this, so they have time to deal with scalability and other issues.
  4. Utilize queues to make your system more reliable and resistant to change and load. As part of this step you can create software design where all communication, or most of it, between components done via queues (where async is permitted). For example, you may have queues for internal Fraud scoring, external Fraud scoring (aka MaxMind or ThreadMetrics), storage, notifications of other systems, a queue for every Payment Processor and many more moving pieces.
  5. Microservices and related parts. This is very trendy now, but it has a lot of benefits, as well as disadvantages, especially for Payments. If it is done well it can be very beneficial and cost efficient, however, to do it well you need to consider all aspects of the system, such as: deployment, monitoring, service discovery, reliability, security, administration, automation etc (I don’t say that all systems above should forget about these aspects, but when it was internal communication within those architectures here it may be out-of-proc communication and networks issues). This article is not about Mircoservices, but they can be used and make sense only at a big scale Merchants or Payment Service Providers. Contrary, even at a medium size business if the software development team has expertise managing Mircoservices.
  6. Multi Availability Zone (AZ) deployments. When you process payments across Globe you need to think better about latency and reliability which can be reached with Multi AZ installations. I put this as a separated point only to highlight the complexity, but most probably it will be employed from the monolith architecture already to have higher up-time and improve deployment (Red-Green-Blue). There are many issues come from multiple AZ, such as replication, split-brain and synchronisation, data gathering and analysis, traffic routing and monitoring, and of course security.

Microservices will work perfectly for Payments domain, but add a lot of additional complexity.

In relation to other aspects of your system, like mobile, you will need to consider mobile clients like another UI for your system, however, there is a difference that you don’t control client versions and lifecycle, so you may need to support multiple versions of your Payments API.

In summary, consider Online-Payment Processing System as a software system that has higher security and reliability requirements, because literally every call has direct financial impact on the bottom line of your business.

Thank you for reading and I look forward to reading your feedback.

Posted in ASP.NET MVC, Business Model, Integration, Knowledge, Technology, Web 2.0 API vs RSS Reader | Leave a comment

Deep Learning in Computer Vision

In recent years, Deep Learning has become a dominant Machine Learning tool for a wide variety of domains. One of its biggest successes has been in Computer Vision where the performance in problems such object and action recognition has been improved dramatically. In this course, we will be reading up on various Computer Vision problems, the state-of-the-art techniques involving different neural architectures and brainstorming about promising new directions.

Please sign up here in the beginning of class.

This class is a graduate seminar course in computer vision. The class will cover a diverse set of topics in Computer Vision and various Neural Network architectures. It will be an interactive course where we will discuss interesting topics on demand and latest research buzz. The goal of the class is to learn about different domains of vision, understand, identify and analyze the main challenges, what works and what doesn’t, as well as to identify interesting new directions for future research.

Prerequisites: Courses in computer vision and/or machine learning (e.g., CSC320, CSC420, CSC411) are highly recommended (otherwise you will need some additional reading), and basic programming skills are required for projects.

 back to top

  • Time and Location

    Winter 2016

    Day: Tuesday
    Time: 9am-11am
    Room: ES B149 (Earth Science Building at 5 Bancroft Avenue)

    Instructor

    Sanja Fidler

    Email: fidler@cs dot toronto dot edu
    Homepage: http://www.cs.toronto.edu/~fidler
    Office hours: by appointment (send email)

When emailing me, please put CSC2523 in the subject line.

Forum

This class uses piazza. On this webpage, we will post announcements and assignments. The students will also be able to post questions and discussions in a forum style manner, either to their instructors or to their peers.

 back to top

We will have an invited speaker for this course:

  • Raquel Urtasun
    Assistant Professor, University of Toronto
    Talk title: Deep Structured Models

as well as several invited lectures / tutorials:

  • Yuri Burda, Postdoctoral Fellow, University of Toronto:    Lecture on Variational Autoencoders
  • Ryan Kiros, PhD student, University of Toronto:    Lecture on Recurrent Neural Networks and Neural Language Models
  • Jimmy Ba, PhD student, University of Toronto:    Lecture on Neural Programming
  • Yukun Zhu, Msc student, University of Toronto:    Lecture on Convolutional Neural Networks
  • Elman Mansimov, Research Assistant, University of Toronto:    Lecture on Image Generation with Neural Networks
  • Emilio Parisotto, Msc student, University of Toronto:    Lecture on Deep Reinforcement Learning
  • Renjie Liao, PhD student, University of Toronto:    Lecture on Highway and Residual Networks
  • Urban Jezernik, PhD student, University of Ljubljana:    Lecture on Music Generation


Each student will need to write two paper reviews each week, present once or twice in class (depending on enrollment), participate in class discussions, and complete a project (done individually or in pairs).

 

The final grade will consist of the following
Participation (attendance, participation in discussions, reviews) 15%
Presentation (presentation of papers in class) 25%
Project (proposal, final report) 60%

 back to top

The first class will present a short overview of neural network architectures, however, the details will be covered when reading on particular topics. Readings will touch on a diverse set of topics in Computer Vision. The course will be interactive — we will add interesting topics on demand and latest research buzz.

 

 back to top

Date Topic Reading / Material Speaker Slides
Jan 12 Admin & Introduction(s) Sanja Fidler admin
Convolutional Neural Networks
Jan 19 Convolutional Neural Nets(tutorial) Resources: Stanford’s cs231 class, VGG’s Practical CNN Tutorial
Code: CNN Tutorial for TensorFlowTutorial for caffe, CNN Tutorial for Theano
Yukun Zhu
(invited)
[pdf]
Image Segmentation Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs   [PDF] [code]
L-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L Yuille
Shenlong Wang [pdf]
[code]
Jan 26 Very Deep Networks Highway Networks  [PDF] [code]
Rupesh Kumar Srivastava, Klaus Greff, Jurgen Schmidhuber

Deep Residual Learning for Image Recognition  [PDF]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Renjie Liao
(invited)
[pdf]
Object Detection Rich feature hierarchies for accurate object detection and semantic segmentation   [PDF] [code]
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks   [PDF] [code (Matlab)] [code (Python)]
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun

Kaustav Kundu [pdf]
Feb 2 Stereo
Siamese Networks
Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches  [PDF] [code]
Jure Žbontar, Yann LeCun

Learning to Compare Image Patches via Convolutional Neural Networks  [PDF] [code]
Sergey Zagoruyko, Nikos Komodakis

Wenjie Luo [pdf]
Depth from Single Image Designing Deep Networks for Surface Normal Estimation   [PDF]
Xiaolong Wang, David Fouhey, Abhinav Gupta
Mian Wei [pptx]  [pdf]
Feb 9 Image Generation Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks   [PDF]
Alec Radford, Luke Metz, Soumith Chintala

Generating Images from Captions with Attention   [PDF]
Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov

Elman Mansimov
(invited)
[pdf]
Domain Adaptation, Zero-shot Learning Simultaneous Deep Transfer Across Domains and Tasks   [PDF]
Eric Tzeng, Judy Hoffman, Trevor Darrell

Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions   [PDF]
Jimmy Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov

Lluis Castrejon [pdf]
Recurrent Neural Networks
Feb 23 RNNs and Neural Language Models Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models   [PDF] [code]
Ryan Kiros, Ruslan Salakhutdinov, Richard Zemel

Skip-Thought Vectors   [PDF] [code]
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler

Jamie Kiros
(invited)
Mar 1 Modeling Words Efficient Estimation of Word Representations in Vector Space  [PDF] [code]
Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
Eleni Triantafillou [pdf]
Describing Videos Sequence to Sequence -- Video to Text   [PDF]
Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko
Erin Grant [pdf]
Image-based QA Ask Your Neurons: A Neural-based Approach to Answering Questions about Images   [PDF]
Mateusz Malinowski, Marcus Rohrbach, Mario Fritz
Yunpeng Li [pdf]
Mar 8 Variational Autoencoders Auto-Encoding Variational Bayes   [PDF]
Diederik P Kingma, Max Welling

Tutorial: Bayesian Reasoning and Deep Learning   [PDF]
Shakir Mohamed

Yura Burda
(invited)
[pdf]
Text-based QA End-To-End Memory Networks   [PDF]
Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus
Marina Samuel [pdf]
Neural Reasoning Recursive Neural Networks Can Learn Logical Semantics   [PDF]
Samuel R. Bowman, Christopher Potts, Christopher D. Manning
Rodrigo Toro Icarte [pdf]
Mar 15 Neural Programming Neural GPUs Learn Algorithms   [PDF]
Lukasz Kaiser, Ilya Sutskever

Neural Programmer-Interpreters   [PDF]
Scott Reed, Nando de Freitas

Neural Programmer: Inducing Latent Programs with Gradient Descent   [PDF]
Arvind Neelakantan, Quoc V. Le, Ilya Sutskever

Jimmy Ba
(invited)
Conversation Models A Neural Conversational Model   [PDF]
Oriol Vinyals, Quoc Le
Caner Berkay Antmen [pdf]
Sentiment Analysis Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank   [PDF]
Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts
Zhicong Lu [pdf]
Mar 22 Video Representations Unsupervised Learning of Video Representations using LSTMs  [PDF]
Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov
Kamyar Ghasemipour [pdf]
CNN Visualization Explaining and Harnessing Adversarial Examples   [PDF]
Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy
Neill Patterson [pdf]
Mar 29 Direction Following (Robotics) Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences   [PDF]
Hongyuan Mei, Mohit Bansal, Matthew R. Walter
Alan Yusheng Wu [pdf]
Visual Attention Recurrent Models of Visual Attention   [PDF]
Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu
Matthew Shepherd [pdf]
Music A First Look at Music Composition using LSTM Recurrent Neural Networks   [PDF]
Douglas Eck, Jurgen Schmidhuber

Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network   [PDF]
Andrew J.R. Simpson, Gerard Roma, Mark D. Plumbley

Charu Jaiswal [pdf]
Music generation Overview of music generation Urban Jezernik
(invited)
Pose and Attributes PANDA: Pose Aligned Networks for Deep Attribute Modeling  [PDF]
Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, Lubomir Bourdev
Sidharth Sahdev [pptx]
Image Style A Neural Algorithm of Artistic Style   [PDF]  [code]
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
Nancy Iskander [pdf]
Apr 5 Human gaze Where Are They Looking?   [PDF]
Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba
Abraham Escalante [pdf]
Instance Segmentation Monocular Object Instance Segmentation and Depth Ordering with CNNs   [PDF]
Ziyu Zhang, Alex Schwing, Sanja Fidler, Raquel Urtasun

Instance-Level Segmentation with Deep Densely Connected MRFs  [PDF]
Ziyu Zhang, Sanja Fidler, Raquel Urtasun

Min Bai [pdf]
Scene Understanding Attend, Infer, Repeat: Fast Scene Understanding with Generative Models   [PDF]
S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, Koray Kavukcuoglu, Geoffrey E. Hinton
Namdar Homayounfar [pdf]
Reinforcement Learning Playing Atari with Deep Reinforcement Learning   [PDF]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
Jonathan Chung [pdf]
Medical Imaging Classifying and Segmenting Microscopy Images Using Convolutional Multiple Instance Learning   [PDF]
Oren Z. Kraus, Lei Jimmy Ba, Brendan Frey
Alex Lu [pptx]
Humor We Are Humor Beings: Understanding and Predicting Visual Humor   [PDF]
Arjun Chandrasekaran, Ashwin K Vijayakumar, Stanislaw Antol, Mohit Bansal, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh
Shuai Wang [pdf]

 back to top

Tutorials, related courses:

  •   Introduction to Neural Networks, CSC321 course at University of Toronto
  •   Course on Convolutional Neural Networks, CS231n course at Stanford University
  •   Course on Probabilistic Graphical Models, CSC412 course at University of Toronto, advanced machine learning course

 

Software:

  •   Caffe: Deep learning for image classification
  •   Tensorflow: Open Source Software Library for Machine Intelligence (good software for deep learning)
  •   Theano: Deep learning library
  •   mxnet: Deep Learning library
  •   Torch: Scientific computing framework with wide support for machine learning algorithms
  •   LIBSVM: A Library for Support Vector Machines (Matlab, Python)
  •   scikit: Machine learning in Python

 

Popular datasets:

  •   ImageNet: Large-scale object dataset
  •   Microsoft Coco: Large-scale image recognition, segmentation, and captioning dataset
  •   Mnist: handwritten digits
  •   PASCAL VOC: Object recognition dataset
  •   KITTI: Autonomous driving dataset
  •   NYUv2: Indoor RGB-D dataset
  •   LSUN: Large-scale Scene Understanding challenge
  •   VQA: Visual question answering dataset
  •   Madlibs: Visual Madlibs (question answering)
  •   Flickr30K: Image captioning dataset
  •   Flickr30K Entities: Flick30K with phrase-to-region correspondences
  •   MovieDescription: a dataset for automatic description of movie clips
  •   Action datasets: a list of action recognition datasets
  •   MPI Sintel Dataset: optical flow dataset
  •   BookCorpus: a corpus of 11,000 books

 

Online demos:

 

Main conferences:

  •   NIPS (Neural Information Processing Systems)
  •   ICML (International Conference on Machine Learning)
  •   ICLR (International Conference on Learning Representations)
  •   AISTATS (International Conference on Artificial Intelligence and Statistics)
  •   CVPR (IEEE Conference on Computer Vision and Pattern Recognition)
  •   ICCV (International Conference on Computer Vision)
  •   ECCV (European Conference on Computer Vision)
  •   ACL (Association for Computational Linguistics)
  •   EMNLP (Conference on Empirical Methods in Natural Language Processing)

 

Posted in Business Model, Problem solving, Technology | Leave a comment

Microkernel Architecture Pattern & Applying it to Software Systems

Architectural Patterns always been interesting from designer perspective. MVC, Pipe’n’filter, Layered,3-tier, n-tier, etc. But one very basic architectural concept coming from Civil engineers in practice.

‘’Have common building block with minimal facility as a base, with modular and customizable components to suit customers need will provide flexibility to whole town /architectural planning and also help to save cost for designer ’’

Same concept is employed in late 70s decade, in area of OS research. Idea was quite simple have very monolithic kernel to provide cross platform support. Below is Microkernel Architectural Style (or also a pattern) which represents the idea.

Microkernel

Description

The Microkernel architectural pattern applies to software systems that must be able to adapt to changing system requirements. It separates a minimal functional core from extended functionality and customer-specific parts. The microkernel also serves as a socket for plugging in these extensions and coordinating their collaboration

Context and Problem

The pattern may be applied in the context of complex software systems serving as a platform for other software applications. Such complex systems usually should be extensible and adaptable to emerging technologies, capable of coping with a range of standards and technologies. They also need to possess high performance and scalability qualities; as a result, low memory consumption and low processing demands are required. Taken together, the above requirements are difficult to achieve.

Solution, Consequences and Liabilities

The most important core services of the system should be encapsulated in a microkernel component. The microkernel maintains the system resources and allows other components to interact with each other as well as to access the functionality of the microkernel. It encapsulates a significant part of system-specific dependencies, such as hardware-dependent aspects. The size of the microkernel should be kept to a minimum, therefore, only part of the core functionality can be included in it; the rest of the core functionality is deferred to separate internal servers.

microkernel-architecture

Internal servers extend the functionalities of the microkernel. Internal servers can for example handle graphics and storage media. Internal servers can have their own processes or they can be shared Dynamic Link Libraries (DLL) loaded inside the kernel.
The external server provides a more complex functionality; they are built on top of the core services provided by the microkernel. Different external servers may be needed in the system in order to provide the functionality for specific application domains. These servers run in separate processes and employ the communication facilities provided by microkernel to receive requests from their clients and to return the results.
The role of the adapter is to provide a transparent interface for clients to communicate with external servers. Adapter hides the system dependencies such as communication facilities from the client. Adapter thus improves the scalability and changeability of the system. The adapter enables the servers and clients to be distributed over a network.

The benefits of the pattern can be mentioned like:

  1. Good portability, since only microkernel need to be modified when porting the system to a new environment.
  2. High flexibility and extensibility, as modifications or extensions can be done by modifying or extending internal servers.
  3. Separation of low-level mechanisms (provided by microkernel and internal servers) and higher-level policies (provided by external servers) improves maintainability and changeability of the system.

There are some concerns about it as well.

  1. The microkernel system requires much more inter-process communication inside one application execution because of the calls to internal and external servers.
  2. The design and implementation of the microkernel -based system is far more complex than of a monolithic system

Known Use

Symbian OS for mobile phones, has Microkernel as core architectural pattern. Symbian OS microkernel contains a scheduler, memory management, and device drivers, but other services like networking, telephony, or file system support are placed in the OS Services Layer or Base Services Layer.
iPhone OS kernel also has its roots derived from early implementation of Microkernel called ‘Mach’ by CMU in early 80’s which core of iPhone OS ‘ predecessors MacOS X and NEXTSTEP.
Below is illustrative example of Mickrokernel architecture from Hydra Operating System. Which is developed by CMU- Carnegie-Mellon University . Purpose was to provide very basic monolithic kernel excluding drives as well. Schematic diagram itself is descriptive.

hydra-os-overview-diagram

Recent Development

Microsoft’s next generation experimental operating system in research code name ‘Singularity’has adopting concept of ‘Microkernel’. More details about concept if you are interested then can be found here

singulatiry-architecture

http://viralpatel.net/blogs/microkernel-architecture-pattern-apply-software-systems/

Posted in ASP.NET MVC, Integration, Programming, Software architecture, Technology | Leave a comment

What’s the difference between Architectural Patterns and Architectural Styles?

An Architectural Pattern is a way of solving a recurring architectural problem. MVC, for instance, solves the problem of separating the UI from the model. Sensor-Controller-Actuator, is a pattern that will help you with the problem of actuating in face of several input senses.

An Architectural Style, on the other hand, is just a name given to a recurrent architectural design. Contrary to a pattern, it doesn’t exist to “solve” a problem.

Pipe&filter doesn’t solve any specific problem, it’s just a way of organizing your code. Client/server, Main program & subroutine and Abstract Data Types / OO, the same.

Also, a single architecture can contain several architectural styles, and each architectural style can make use of several architectural patterns.

 

Frankly, i have always considered both these terms to be synonymous! And layman (relatively speaking) literature definitely treats them as such. Refer MSDN or Wikipedia

However, your question intrigued me a bit so i did a bit more digging and frankly…i couldnt find much except for a reference to A Practical Guide to Enterprise Architecture (The Coad Series), from which i quote :-

An architectural style (Base et al. 1997) and an architectural pattern 
(Buschmann et al. 1996) are essentially synonymous. 

Based on some more googling, this is what i think might be one possible way to differentiate the two

  • An architectural style is a conceptual way of how the system will be created / will work
  • An architectural pattern describes a solution for implementing a style at the level of subsystems or modules and their relationships.

How an architectural pattern will differ from a Design pattern i.e Adapter, observer is basically by the level of Granularity at which they are applied (I know this isnt part of the question but its related, i think)

Source:

https://stackoverflow.com/questions/3958316/whats-the-difference-between-architectural-patterns-and-architectural-styles

Posted in Business Model, C#, Problem solving, Software architecture, Technology, Uncategorized | Leave a comment

AWS Certified Solutions Architect

AWS Certified Solutions Architect – Associate Level dành cho cá nhân đang hoặc muốn làm việc như 1 Solution Architect. Chứng chỉ này xác nhận khả năng của thí sinh để:

  • Xác định và thu thập các yêu cầu để đề ra giải pháp dựa trên hiểu biết và các kỹ năng tốt nhất̀ về kiến trúc.
  • Có khả năng cung cấp các hướng dẫn tốt nhất về kiến trúc cho người phát triển và người quản trị hệ thống trong suốt vòng đời của dự án.

Các kiến thức và kỹ năng cần thiết ở cấp độ này bao gồm các lĩnh vực dưới đây. Mức độ kiến thức được định nghĩa phải có các thành phần chính sau đây:

Kiến thức về AWS

  • Kinh nghiệm thực tiễn với dịch vụ compute, networking, storage, và database AWS.
  • Kinh nghiệm chuyên môn về kiến trúc các hệ thống phân tán quy mô lớn.
  • Hiểu biết về các khái niệm Elasticity và Scalability.
  • Hiểu biết về các công nghệ mạng có liên quan đến AWS.
  • Hiểu biết tốt về tất cả các tính năng và công cụ bảo mật mà AWS cung cấp và mối quan hệ với các dịch vụ truyền thống.
  • Hiệu biết rất vững về cách tương tác với AWS (AWS SDK, AWS API, Command Line Interface, AWS CloudFormation).
  • Kinh nghiệm thực tiễn với các dịch vụ triển khai và quản lý của AWS.

Kiến thức về IT

  • Hiểu biết rất tốt về kiến trúc nhiều tầng (multi-tier): web servers (Apache, Nginx, IIS), caching, application servers, load balancers.
  • RDBMS (MySQL, Oracle, SQL Server), NoSQL
  • Kiến thức về hàng đợi thông điệp (message queuing) và Enterprise Service Bus (EBS).
  • Quen thuộc với loose coupling và stateless systems.
  • Hiểu biết về các mô hình nhất quán (consistency model) khác nhau trong các hệ thống phân tán.
  • Có kinh nghiệm với CDN và các khái niệm về hiệu suất (performance).
  • Kinh nghiệm về mạng với route tables, access control lists, firewalls, NAT, HTTP, DNS, IP và mạng OSI.
  • Kiến thức về RESTful Web Service, XML, JSON.
  • Quen thuộc với vòng đời phát triển phần mềm.
  • Kinh nghiệm làm việc với bảo mật thông tin và ứng dụng bao gồm mã hóa với khóa công khai, SSH, access credentials, và X.509 certificates.

Các khóa đào tạo hoặc các phương pháp tương đương khác sẽ hỗ trợ nhiều cho việc chuẩn bị kỳ thi:

  • Architecting on AWS (aws.amazon.com/training/architect)
  • Kiến thức hoặc đào tạo chuyên sâu về ít nhất 1 ngôn ngữ lập trình cấp cao.
  • AWS Cloud Computing Whitepapers (aws.amazon.com/whitepapers)
    • Tổng quan về Amazon Web Services
    • Tổng quan về Security Processes
    • AWS Risk & Compliance Whitepaper
    • Storage Options in the Cloud
    • Architecting for the AWS Cloud: Best Practices
  • Kinh nghiệm triển khai các hệ thống lai (hybrid) với on-premise và các thành phần AWS.
  • Dùng website của AWS Architecture Center (aws.amazon.com/architecture)

Chú y

    ́: Bảng kế hoạch này bao gồm các phần nội dung quan trọng, mục tiêu thử nghiệm, và các nội dung ví dụ. Các chủ đề và khái niệm ví dụ chỉ nhằm để làm rõ các mục tiêu thử nghiệm; chúng không nên được hiểu như là 1 danh sách toàn diện của tất cả các nội dung trong kỳ thi này.
    Bảng dưới đây liệt kê tỷ lệ của từng lĩnh vực kiến thức trong kỳ thi.
Domain % of Examination
1.0 Designing highly available, cost effective, fault tolerant, scalable systems 60%
2.0 Implementation/Deployment 10%
3.0 Data Security 20%
4.0 Troubleshooting 10%
TOTAL 100%

Các giới hạn trả lời

Thí sinh lựa chọn từ bốn (4) hoặc nhiều hơn các tùy chọn trả lời mà cho là tốt nhất để hoàn thành câu hỏi. Bỏ qua hoặc trả lời sai xem như là chưa hoàn thành kiến thức hoặc kỹ năng cần thiết.

Dạng thức thi được sử dùng là:

  • Multiple-choice: thí sinh chọn 1 lựa chọn tốt nhất để trả lời cho câu hỏi hoặc câu khẳng định. Các tùy chọn có thể được nhúng vào hình đồ họa để thí sinh có thể “points and clicks”.
  • Multiple-response: thí sinh chọn nhiều hơn 1 lựa chọn để trả lời cho cẩu hỏi hoặc câu khẳng định.
  • Sample Directions: đọc câu hỏi hoặc câu khẳng định và từ các tùy chọn trả lời, chỉ chọn 1 đáp án đại diện cho câu trả lời tốt nhất.

Các giới hạn nội dung

1.     Domain 1.0: Designing highly available, cost efficient, fault tolerant, scalable systems

1.1   Xác định và nhận xét kiến trúc điện toán đám mây, như các thành phần cơ bản và các thiết kế hiệu quả.

Nội dung bao gồm:

  • Cách thiết kế các dịch vụ cloud
  • Lập kế hoạch và thiết kế
  • Giám sát
  • Quen thuộc với:
  • Best practices
  • Phát triển Client Specifications gồm pricing/cost (e.g. on Demand vs. Reserved vs. Spot, RTO and RPO DR Design)
  • Các quyết định kiến trúc (high availability vs. cost, Amazon Relational Databas Service (RDS) vs. cài đặt CSDL của riêng bạn trên Amazon Elastic Compute Cloud (EC2)).
  • Tích hợp với các môi trường phát triển hiện có và xây dựng kiến trúc có khả năng mở rộng.
  • Elasticity và scalability.

2.     Domain 2.0: Implementation/Deployment

2.1   Xác định các kỹ thuật và phương pháp thích hợp dùng Amazon EC2, Amazon S3, Elastic Beanstalk, CloudFormation, Amazon Virtual Private Cloud (VPC), và AWS Identity and Access Management (IAM) để viết mã và cài đặt 1 giải pháp cloud.

Nội dung bao gồm:

  • Cấu hình Amazon Machine Image (AMI)
  • Vận hành và mở rộng dịch vụ quản lý trong private cloud
  • Cấu hình hợp lý trong private và public cloud
  • Khởi chạy các instances trong nhiều geographical regions.

3.     Domain 3.0: Data Security

3.1   Nhận diện và cài đặt các thủ tục bảo vệ cho việc triển khai và duy trì cloud được tối ưu

Nội dung bao gồm:

  • Cloud Security Best Practices
    • Cách xây dựng và dùng threat model
    • Cách xây dựng và dùng data flow diagram để quản lý rủi ro (risk management)
      • Use cases
      • Abuse Cases (Negative use cases)
  • Security Architecture with AWS
    • Shared Security Responsibility Model
    • AWS Platform Compliance
    • AWS security attributes (customer workloads down to physical layer)
    • Security Services
    • AWS Identity and Access Management (IAM)
    • Amazon Virtual Private Cloud (VPC)
    • CIA và AAA models, ingress vs. egress filtering, and which AWS services and features fit
    • “Core” Amazon EC2 and S3 security feature sets
    • Incorporating common conventional security products (Firewall, IDS:HIDS/NIDS, SIEM, VPN)
    • Design Pattern
    • DDOS mitigation
    • Encryption solutions
    • Complex access controls (building sophisticated security groups, ACLs, etc.)
    • Amazon CloudWatch for the security architect

3.2   Nhận diện các kỹ thuật khắc phục thảm họa nguy hiểm và cách cài đặt chúng

Nội dung bao gồm:

  • Disaster Recovery
    • Recovery time objective
    • Recovery point objective
    • Amazon Elastic Block Store
  • AWS Import/Export
  • AWS Storage Gateway
  • Amazon Route53
  • Testing the recovered data

4.     Domain 4.0: Troubleshooting

Nội dung bao gồm:

  • Xử lý sự cố về các thông tin và câu hỏi nói chung

http://awslagi.com/noi-dung-thi-aws/

Posted in Integration, Java, Software architecture | Leave a comment