Eswar Bandaru: January 2023

Monday, January 2, 2023

Identity and Access Management (IAM)

Identity and access management (IAM) is a framework of business processes, policies and technologies that facilitates the management of electronic or digital identities. With an IAM framework in place, information technology (IT) managers can control user access to critical information within their organizations.

IAM could be single sign-on systems, two-factor authentication, multifactor authentication and privileged access management. These technologies also provide the ability to securely store identity and profile data as well as data governance functions to ensure that only data that is necessary and relevant is shared.

IAM systems can be deployed on premises, provided by a third-party vendor through a cloud-based subscription model or deployed in a hybrid model. 

IAM encompasses the following components:

How individuals are identified in a system (understand the difference between identity management and authentication);
How roles are identified in a system and how they are assigned to individuals;
Adding, removing and updating individuals and their roles in a system;
Assigning levels of access to individuals or groups of individuals; and
Protecting the sensitive data within the system and securing the system itself.

IAM systems should capture and record user login information, manage the enterprise database of user identities, and orchestrate the assignment and removal of access privileges.

Benefits of IAM

Access privileges are granted according to policy, and all individuals and services are properly authenticated, authorized and audited.
Companies that properly manage identities have greater control of user access, which reduces the risk of internal and external data breaches.
Automating IAM systems allows businesses to operate more efficiently by decreasing the effort, time and money that would be required to manually manage access to their networks.
In terms of security, the use of an IAM framework can make it easier to enforce policies around user authentication, validation and privileges, and address issues regarding privilege creep.
IAM systems help companies better comply with government regulations by allowing them to show corporate information is not being misused. Companies can also demonstrate that any data needed for auditing can be made available on demand.
Companies can gain competitive advantages by implementing IAM tools and following related best practices. For example, IAM technologies allow the business to give users outside the organization -- like customers, partners, contractors and suppliers -- access to its network across mobile applications, on-premises applications and SaaS without compromising security. This enables better collaboration, enhanced productivity, increased efficiency and reduced operating costs.

Types of digital authentication

With IAM, enterprises can implement a range of digital authentication methods to prove digital identity and authorize access to corporate resources.

Unique passwords. The most common type of digital authentication is the unique password. To make passwords more secure, some organizations require longer or complex passwords that require a combination of letters, symbols and numbers. Unless users can automatically gather their collection of passwords behind a single sign-on entry point, they typically find remembering unique passwords onerous.
Pre-shared key (PSK). PSK is another type of digital authentication where the password is shared among users authorized to access the same resources -- think of a branch office Wi-Fi password. This type of authentication is less secure than individual passwords. A concern with shared passwords like PSK is that frequently changing them can be cumbersome.
Behavioral authentication. When dealing with highly sensitive information and systems, organizations can use behavioral authentication to get far more granular and analyze keystroke dynamics or mouse-use characteristics. By applying artificial intelligence, a trend in IAM systems, organizations can quickly recognize if user or machine behavior falls outside of the norm and can automatically lock down systems.
Biometrics. Modern IAM systems use biometrics for more precise authentication. For instance, they collect a range of biometric characteristics, including fingerprints, irises, faces, palms, gaits, voices and, in some cases, DNA. Biometrics and behavior-based analytics have been found to be more effective than passwords. One danger in relying heavily on biometrics is if a company's biometric data is hacked, then recovery is difficult, as users can't swap out facial recognition or fingerprints like they can passwords or other non-biometric information. Another critical technical challenge of biometrics is that it can be expensive to implement at scale, with software, hardware and training costs to consider.

Sunday, January 1, 2023

Kafka Connect

Kafka Connect is a free, open-source component of Apache Kafka® that works as a centralized data hub for simple data integration between databases, key-value stores, search indexes, and file systems. The information in this page is specific to Kafka Connect for Confluent Platform.

Confluent Cloud offers pre-built, fully managed, Kafka connectors that make it easy to instantly connect to popular data sources and sinks. With a simple GUI-based configuration and elastic scaling with no infrastructure to manage, Confluent Cloud connectors make moving data in and out of Kafka an effortless task, giving you more time to focus on application development.

You can deploy Kafka Connect as a standalone process that runs jobs on a single machine (for example, log collection), or as a distributed, scalable, fault-tolerant service supporting an entire organization. Kafka Connect provides a low barrier to entry and low operational overhead. You can start small with a standalone environment for development and testing, and then scale up to a full production environment to support the data pipeline of a large organization.

Benefits of Kafka Connect:

Data-centric pipeline: Connect uses meaningful data abstractions to pull or push data to Kafka.
Flexibility and scalability: Connect runs with streaming and batch-oriented systems on a single node (standalone) or scaled to an organization-wide service (distributed).
Reusability and extensibility: Connect leverages existing connectors or extends them to fit your needs and provides lower time to production.

Types of connectors:

Source connector: Source connectors ingest entire databases and stream table updates to Kafka topics. Source connectors can also collect metrics from all your application servers and store the data in Kafka topics–making the data available for stream processing with low latency.
Sink connector: Sink connectors deliver data from Kafka topics to secondary indexes, such as Elasticsearch, or batch systems such as Hadoop for offline analysis.

Scale-out versus Scale-up

Introduction

Scaling up and scaling out are two IT strategies that both increase the processing power and storage capacity of systems. The difference is in how engineers achieve this type of growth and system improvement. The terms "scale up" and "scale out" refer to the way in which the infrastructure is grown.

While scaling out involves adding more discrete units to a system in order to add capacity, scaling up involves building existing units by integrating resources into them.

One of the easiest ways to describe both of these methods is that scaling out generally means building horizontally, while scaling up means building vertically.

Scale Out

Scaling out takes the existing infrastructure, and replicates it to work in parallel. This has the effect of increasing infrastructure capacity roughly linearly. Data centers often scale out using pods. Build a compute pod, spin up applications to use it, then scale out by building another pod to add capacity. Actual application performance may not be linear, as application architectures must be written to work effectively in a scale-out environment.

Scale Up

Scaling up is taking what you’ve got, and replacing it with something more powerful. From a networking perspective, this could be taking a 1GbE switch, and replacing it with a 10GbE switch. Same number of switchports, but the bandwidth has been scaled up via bigger pipes. The 1GbE bottleneck has been relieved by the 10GbE replacement.

Scaling up is a viable scaling solution until it is impossible to scale up individual components any larger.

Kafka

What is event streaming?

Event streaming is the practice of capturing data in real-time from event sources like databases, sensors, mobile devices, cloud services, and software applications in the form of streams of events; storing these event streams durably for later retrieval; manipulating, processing, and reacting to the event streams in real-time as well as retrospectively; and routing the event streams to different destination technologies as needed. Event streaming thus ensures a continuous flow and interpretation of data so that the right information is at the right place, at the right time.

What can I use event streaming for?

Event streaming is applied to a wide variety of use cases across a plethora of industries and organizations. Its many examples include:

To process payments and financial transactions in real-time, such as in stock exchanges, banks, and insurances. To track and monitor cars, trucks, fleets, and shipments in real-time, such as in logistics and the automotive industry.
To continuously capture and analyze sensor data from IoT devices or other equipment, such as in factories and wind parks.
To collect and immediately react to customer interactions and orders, such as in retail, the hotel and travel industry, and mobile applications.
To monitor patients in hospital care and predict changes in condition to ensure timely treatment in emergencies.
To connect, store, and make available data produced by different divisions of a company.
To serve as the foundation for data platforms, event-driven architectures, and microservices.

Apache Kafka® is an event streaming platform. What does that mean?

Kafka combines three key capabilities so you can implement your use cases for event streaming end-to-end with a single battle-tested solution:

To publish (write) and subscribe to (read) streams of events, including continuous import/export of your data from other systems.
To store streams of events durably and reliably for as long as you want.
To process streams of events as they occur or retrospectively.

And all this functionality is provided in a distributed, highly scalable, elastic, fault-tolerant, and secure manner. Kafka can be deployed on bare-metal hardware, virtual machines, and containers, and on-premises as well as in the cloud. You can choose between self-managing your Kafka environments and using fully managed services offered by a variety of vendors.

How does Kafka work in a nutshell?

Kafka is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol. It can be deployed on bare-metal hardware, virtual machines, and containers in on-premise as well as cloud environments.

Servers: Kafka is run as a cluster of one or more servers that can span multiple datacenters or cloud regions. Some of these servers form the storage layer, called the brokers. Other servers run Kafka Connect to continuously import and export data as event streams to integrate Kafka with your existing systems such as relational databases as well as other Kafka clusters. To let you implement mission-critical use cases, a Kafka cluster is highly scalable and fault-tolerant: if any of its servers fails, the other servers will take over their work to ensure continuous operations without any data loss.

Clients: They allow you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner even in the case of network problems or machine failures. Kafka ships with some such clients included, which are augmented by dozens of clients provided by the Kafka community: clients are available for Java and Scala including the higher-level Kafka Streams library, for Go, Python, C/C++, and many other programming languages as well as REST APIs.

Eswar Bandaru