Thursday, March 25, 2021

CAP Theorem and Distributed Database Management Systems

In the past, when we wanted to store more data or increase our processing power, the common option was to scale vertically (get more powerful machines) or further optimize the existing code base. However, with the advances in parallel processing and distributed systems, it is more common to expand horizontally, or have more machines to do the same task in parallel. We can already see a bunch of data manipulation tools in the Apache project like Spark, Hadoop, Kafka, Zookeeper and Storm. However, in order to effectively pick the tool of choice, a basic idea of CAP Theorem is necessary. CAP Theorem is a concept that a distributed database system can only have 2 of the 3: Consistency, Availability and Partition Tolerance.

CAP Theorem is very important in the Big Data world, especially when we need to make trade off’s between the three, based on our unique use case. On this blog, I will try to explain each of these concepts and the reasons for the trade off. I will avoid using specific examples as DBMS are rapidly evolving.

Partition Tolerance

This condition states that the system continues to run, despite the number of messages being delayed by the network between nodes. A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data records are sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages. When dealing with modern distributed systems, Partition Tolerance is not an option. It’s a necessity. Hence, we have to trade between Consistency and Availability.

High Consistency

This condition states that all nodes see the same data at the same time. Simply put, performing a read operation will return the value of the most recent write operation causing all nodes to return the same data. A system has consistency if a transaction starts with the system in a consistent state, and ends with the system in a consistent state. In this model, a system can (and does) shift into an inconsistent state during a transaction, but the entire transaction gets rolled back if there is an error during any stage in the process. In the image, we have 2 different records (“Bulbasaur” and “Pikachu”) at different timestamps. The output on the third partition is “Pikachu”, the latest input. However, the nodes will need time to update and will not be Available on the network as often.

High Availability

This condition states that every request gets a response on success/failure. Achieving availability in a distributed system requires that the system remains operational 100% of the time. Every client gets a response, regardless of the state of any individual node in the system. This metric is trivial to measure: either you can submit read/write commands, or you cannot. Hence, the databases are time independent as the nodes need to be available online at all times. This means that, unlike the previous example, we do not know if “Pikachu” or “Bulbasaur” was added first. The output could be either one. Hence why, high availability isn’t feasible when analyzing streaming data at high frequency.

Conclusion

Distributed systems allow us to achieve a level of computing power and availability that were simply not available in the past. Our systems have higher performance, lower latency, and near 100% up-time in data centers that span the entire globe. Best of all, the systems of today are run on commodity hardware that is easily obtainable and configurable at affordable costs. However, there is a price. Distributed systems are more complex than their single-network counterparts. Understanding the complexity incurred in distributed systems, making the appropriate trade-offs for the task at hand (CAP), and selecting the right tool for the job is necessary with horizontal scaling.


https://towardsdatascience.com/cap-theorem-and-distributed-database-management-systems-5c2be977950e

Sunday, March 21, 2021

Derivative Rules

 

Common FunctionsFunction
Derivative
Constantc0
Linex1
 axa
Squarex22x
Square Root√x(½)x
Exponentialexex
 axln(a) ax
Logarithmsln(x)1/x
 loga(x)1 / (x ln(a))
Trigonometry (x is in radians)sin(x)cos(x)
 cos(x)−sin(x)
 tan(x)sec2(x)
Inverse Trigonometrysin-1(x)1/√(1−x2)
 cos-1(x)−1/√(1−x2)
 tan-1(x)1/(1+x2)
   
RulesFunction
Derivative
Multiplication by constantcfcf’
Power Rulexnnxn−1
Sum Rulef + gf’ + g’
Difference Rulef - gf’ − g’
Product Rulefgf g’ + f’ g
Quotient Rulef/g(f’ g − g’ f )/g2
Reciprocal Rule1/f−f’/f2
   
Chain Rule
(as "Composition of Functions")
f º g(f’ º g) × g’
Chain Rule (using ’ )f(g(x))f’(g(x))g’(x)
Chain Rule (using ddx )dydx = dydududx

Wednesday, March 17, 2021

The statistical machine learning framework.


The Data Generating Process (DGP) generates observable training data from the unobservable environmental probability distribution P(e). The learning machine observes the training data and uses its beliefs about the structure of the environmental probability distribution in order to construct a bestapproximating distribution P of the environmental distribution P(e). The resulting bestapproximating distribution of the environmental probability distribution supports decisions and behaviors by the learning machine for the purpose of improving its success when interacting with an environment characterized by uncertainty.

 

Os perigos da antropomorfização de empresas na Internet

 Uma personalidade antropomorfizada engana o seu cérebro e faz com que você a encaixe num sistema de pensamento (schema) pré estabelecido. Ela se mescla ao cenário do seu dia a dia e passa a fazer parte do seu pensamento cotidiano sem você questionar.

 Então, não, o meme não vai te fazer comprar no iFood. Mas a questão não é a efetividade do anúncio em si (anúncio, porque o tweet engraçado é um anúncio); a questão é a completa onipresença das empresas em todos os aspectos e momentos da nossa vida e o quão complexa a mecanização de anunciar se tornou.

https://comunidadedoestagio.com/blog/os-perigos-da-antropomorfizacao-de-empresas-na-internet


Saturday, March 13, 2021

Architecture Versus Design

 


Laws of Software Architecture

Everything in software architecture is a trade-off.

-First Law of Software Architecture


Why is more important than how.

-Second Law of Software Architecture

Invalidating Axioms About Software Architecture

We also address the critically important issue of trade-off analysis. As a software developer, it’s easy to become enamored with a particular technology or approach. But architects must always soberly assess the good, bad, and ugly of every choice, and virtually nothing in the real world offers convenient binary choices—everything is a trade-off. 



Wednesday, March 10, 2021

Maior palavra do idioma português

Pneumoultramicroscopicossilicovulcanoconiótico 

adjetivo Relacionado com a doença que ataca os pulmões, causada pela inalação de cinzas vulcânicas, cinzas provenientes de vulcões. Refere-se à pneumoultramicroscopicossilicovulcanoconiose (doença). substantivo masculino Indivíduo portador dessa doença.

Sunday, March 07, 2021

The MVC architecture


 

Layered architecture


 

Main program/subprogram architecture


 

Data-flow architecture


 

Data-centered architecture


 

The relationship between effort and delivery time


 

Layers of the SCM process


 

Testing strategy


 

User experience design elements


 

Architecture Decision Description Template


 

Factors affecting a GSD team


 

A layered behavioral model for software engineering


 

Recommended Software Process Steps

  1. Requirements engineering 
    1. Gather user stories from all stakeholders.
    2. Have stakeholders describe acceptance criteria user stories.
  2. Preliminary architectural design
    1. Make use of paper prototypes and models.
    2. Assess alternatives using nonfunctional requirements.
    3. Document architecture design decisions.
  3. Estimate required project resources
    1. Use historic data to estimate time to complete each user story.
    2. Organize the user stories into sprints.
    3. Determine the number of sprints needed to complete the product.
    4. Revise the time estimates as use stories are added or deleted.
  4. Construct first prototype 
    1. Select subset of user stories most important to stakeholders.
    2. Create paper prototype as part of the design process.
    3. Design a user interface prototype with inputs and outputs.
    4. Engineer the algorithms needed for first prototypes.
    5. Prototype with deployment in mind.
  5. Evaluate prototype
    1. Create test cases while prototype is being designed.
    2. Test prototype using appropriate users.
    3. Capture stakeholder feedback for use in revision process.
  6. Go, no-go decision
    1. Determine the quality of the current prototype.
    2. Revise time and cost estimates for completing development.
    3. Determine the risk of failing to meet stakeholder expectations.
    4. Get commitment to continue development.
  7. Evolve system
    1. Define new prototype scope.
    2. Construct new prototype.
    3. Evaluate new prototype and include regression testing.
    4. Assess risks associated with continuing evolution.
  8. Release prototype
    1. Perform acceptance testing.
    2. Document defects identified.
    3. Share quality risks with management.
  9. Maintain software
    1. Understand code before making changes.
    2. Test software after making changes.
    3. Document changes.
    4. Communicate known defects and risks to all stakeholders.

Distribution of maintenance effort


SWEBOK refers to the Software Engineering Body of Knowledge, which can be accessed using the following link: https://www.computer.org/web/swebok/v3.

 

What Is Agility?

The underlying ideas that guide agile development led to the development of agile methods designed to overcome perceived and actual weaknesses in conventional software engineering. Agile development can provide important benefits, but it may not be applicable to all projects, all products, all people, and all situations. It is also not antithetical to solid software engineering practice and can be applied as an overriding philosophy for all software work.

In the modern economy, it is often difficult or impossible to predict how a computerbased system (e.g., a mobile application) will evolve as time passes. Market conditions change rapidly, end-user needs evolve, and new competitive threats emerge without warning. In many situations, you won’t be able to define requirements fully before the project begins. You must be agile enough to respond to a fluid business environment.

Fluidity implies change, and change is expensive—particularly if it is uncontrolled or poorly managed. One of the most compelling characteristics of the agile approach is its ability to reduce the costs of change through the software process. In a thought-provoking book on agile software development, Alistair Cockburn [Coc02] argues that the prescriptive process models have a major failing: they forget the frailties of the people who build computer software.

Software engineers are not robots. They exhibit great variation in working styles and significant differences in skill level, creativity, orderliness, consistency, and spontaneity. Some communicate well in written form, others do not. If process models are to work, they must provide a realistic mechanism for encouraging the discipline that is necessary, or they must be characterized in a manner that shows “tolerance” for the people who do software engineering work.

Dimensions of the design model


 

Modularity and software cost


 

Effectiveness of communication modes


 

Wednesday, March 03, 2021