Thursday, December 22, 2016

Brain Tricks – This is How Your Brain Works

17 gatilhos mentais

Gatilho Mental #1 – Escassez
  Quando percebemos que algo pode faltar tomamos uma decisão por impulso,

Gatilho Mental #2 – Urgência
  É um tipo de escassez.

Gatilho Mental #3 – Autoridade
  Quando entendemos a autoridade de alguém usamos a informação do mesmo sem pensar.

Gatilho Mental #4 – Reciprocidade
  Quando alguém nos oferece um café sentimos a obrigação de retribuir.

Gatilho Mental #5 – Prova Social
  Tentamos pensar parecido com o grupo que pertencemos.

Gatilho Mental #6 – Porque
  Precisamos entender os motivos para fazer algo.

Gatilho Mental #7 – Antecipação
  Gostamos de ficar curiosos com o que está por vir.

Gatilho Mental #8 – Novidade
  Coisas novas nos despertam prazer e criam novas sinapses

Gatilho Mental #9 – Relação Dor x Prazer
  Temos uma tendência maior a fugir da dor do que procurar o prazer.

Gatilho Mental #10 – Descaso
  Quando queremos muito algo podemos desdenhar o mesmo para parecer que aquilo não tem importância.

Gatilho Mental #11 – Compromisso e coerência
  Nossa cultura valoriza a congruência entre o que dizemos e o que fazemos.

Gatilho Mental #12 – Paradoxo da escolha
  Quanto mais opções pior é para decidirmos.

Gatilho Mental #13 – História
  Gostamos que a informação seja transmitida de forma temporal e acompanhada de um certo romance.

Gatilho Mental #14 – Simplicidade
  Gostamos de coisas mais simples: os 5 passos, o caminho mais fácil, etc...

Gatilho Mental #15 – Referência
  Decidimos somente com algo relativo (por isso 3 orçamentos).

Gatilho Mental #16 – Curiosidade
  Caso falte parte da informação sobre algo que queremos vamos tentar descobrir.

Gatilho Mental #17 – Inimigo Comum

  Crie um problema que você passou e outra pessoa também, isso fara com que ambos fiquem conectados.

The change curve


Wednesday, November 30, 2016

Steps to developing a usable algorithm.


  • Model the problem.
  • Find an algorithm to solve it. 
  • Fast enough? Fits in memory?
  • If not, figure out why.
  • Find a way to address the problem. 
  • Iterate until satisfied.

Thursday, November 24, 2016

List to Recognize and Measure the Data Quality

Accuracy. The value stored in the system for a data element is the right value for that occurrence of the data element. If you have a customer name and an address stored in a record, then the address is the correct address for the customer with that name. If you find the quantity ordered as 1000 units in the record for order number 12345678, then that quantity is the accurate quantity for that order.

Domain Integrity. The data value of an attribute falls in the range of allowable, defined values. The common example is the allowable values being “male” and “female” for the gender data element.

Data Type. Value for a data attribute is actually stored as the data type defined for that attribute. When the data type of the store name field is defined as “text,” all instances of that field contain the store name shown in textual format and not numeric codes.

Consistency. Theformandcontentofadatafieldisthesameacrossmultiplesourcesys- tems. If the product code for product ABC in one system is 1234, then the code for this product is 1234 in every source system.

Redundancy. Thesamedatamustnotbestoredinmorethanoneplaceinasystem.If,for reasons of efficiency, a data element is intentionally stored in more than one place in a system, then the redundancy must be clearly identified and verified.

Completeness. There are no missing values for a given attribute in the system. For example, in a customer file, there must be a valid value for the “state” field for every customer. In the file for order details, every detail record for an order must be completely filled.

Duplication. Duplicationofrecordsinasystemiscompletelyresolved.Iftheproductfile is known to have duplicate records, then all the duplicate records for each product are identified and a cross-reference created.

Conformance to Business Rules. The values of each data item adhere to prescribed business rules. In an auction system, the hammer or sale price cannot be less than the reserve price. In a bank loan system, the loan balance must always be positive or zero.

Structural Definiteness. Whereveradataitemcannaturallybestructuredintoindividual components, the item must contain this well-defined structure. For example, an indi- vidual’s name naturally divides into first name, middle initial, and last name. Values for names of individuals must be stored as first name, middle initial, and last name. This characteristic of data quality simplifies enforcement of standards and reduces missing values.

Data Anomaly. A field must be used only for the purpose for which it is defined. If the field Address-3 is defined for any possible third line of address for long addresses, then this field must be used only for recording the third line of address. It must not be used for entering a phone or fax number for the customer.

Clarity. Adataelementmaypossessalltheothercharacteristicsofqualitydatabutifthe users do not understand its meaning clearly, then the data element is of no value to the users. Proper naming conventions help to make the data elements well understood by the users.

Timely. The users determine the timeliness of the data. lf the users expect customer dimension data not to be older than one day, the changes to customer data in the source systems must be applied to the data warehouse daily.

Usefulness. Everydataelementinthedatawarehousemustsatisfysomerequirementsof the collection of users. A data element may be accurate and of high quality, but if it is of no value to the users, then it is totally unnecessary for that data element to be in the data warehouse.

Adherence to Data Integrity Rules. The data stored in the relational databases of the source systems must adhere to entity integrity and referential integrity rules. Any table that permits null as the primary key does not have entity integrity. Referential integrity forces the establishment of the parent–child relationships correctly. In a customer-to-order relationship, referential integrity ensures the existence of a customer for every order in the database.

Saturday, October 15, 2016

Pattern Recognition


A flow chart of the process of classifier design


Probabilistic distance measures


Dissimilarity measures for numeric variables (between x and y)


Linear regression


Statistical Pattern Recognition


Stages in a pattern recognition problem

1. Formulation of the problem: gaining a clear understanding of the aims of the investigation and planning the remaining stages.

2. Data collection: making measurements on appropriate variables and recording details of the data collection procedure (ground truth).

3. Initial examination of the data: checking the data, calculating summary statistics and producing plots in order to get a feel for the structure.

4. Feature selection or feature extraction: selecting variables from the measured set that are appropriate for the task. These new variables may be obtained by a linear or nonlinear transformation of the original set (feature extraction). To some extent, the division of feature extraction and classification is artificial.

5. Unsupervised pattern classification or clustering. This may be viewed as exploratory data analysis and it may provide a successful conclusion to a study. On the other hand, it may be a means of preprocessing the data for a supervised classification procedure.

6. Apply discrimination or regression procedures as appropriate. The classifier is de- signed using a training set of exemplar patterns.

7. Assessment of results. This may involve applying the trained classifier to an indepen- dent test set of labelled patterns.

8. Interpretation.

Monday, October 03, 2016

Tree replication problem. The same subtree can appear at different branches.

This makes the decision tree more complex than necessary and perhaps more difficult to interpret. Such a situation can arise from decision tree implementations that rely on a single attribute test condition at each internal node. Since most of the decision tree algorithms use a divide-and-conquer partitioning strategy, the same test condition can be applied to different parts of the attribute space, thus leading to the subtree replication problem.


Comparison among the impurity measures for binary classification problems


Test condition for continuous attributes


Test conditions for nominal attributes


Classifying an unlabeled vertebrate. The dashed lines represent the outcomes of applying various attribute test conditions on the unlabeled vertebrate. The vertebrate is eventually assigned to the Non-mammal class


Training and test error rates


Example of a decision tree and its decision boundaries for a two-dimensional data set


Sunday, September 04, 2016

The typical workflow of a data mining project


Every probability distribution tells a story.


Classes of activity.


Schematic of a process.


Life cycle of a product


Manufacturing strategy and lead time.


Extending ERP along the value chain


MRP II Planning Hierarchy


The development of ERP


Push versus pull: the gravity analogy


A simple model of control


Visões Diferentes Sobre a Mesma Realidade