Uncategorized

Embedding and Vector DB

What is Embedding?

Embedding refers to a numerical representation of objects or entities such as words, sentences, or documents, images and audio in a continuous vector space.

Essentially, embeddings enable machine learning models to find similar objects. Given a photo or a document, a machine learning model that uses embeddings could find a similar photo or document. Since embeddings make it possible for computers to understand the relationships between words and other objects, they are foundational for AI.

WordEmbedding
cat    [0.2, 0.4, -0.1, 0.8, 0.5]
dog    [0.3, 0.6, -0.2, 0.7, 0.4]

The embeddings are designed such that words with similar meanings or contexts have embeddings that are close together in the embedding space. In this example, “cat” and “dog” might have embeddings that are relatively close to each other because they often appear in similar contexts, such as “pets” or “animals”.

What is a vector in machine learning?

In mathematics, a vector is an array of numbers that define a point in a dimensional space. In more practical terms, a vector is a list of numbers — like {1989, 22, 9, 180}. Each number indicates where the object is along a specified dimension.

In machine learning, the use of vectors makes it possible to search for similar objects. A vector-searching algorithm simply has to find two vectors that are close together in a vector database.

To store theses vector embedding we need DB , below are the few Databases that can store embeddings.

Few Vector DB

  • Pinecone
  • Chroma DB – opensource Free
  • Deep lake
  • SingleStore

Dot Product:

The dot product is a mathematical operation that measures the similarity between two vectors in a vector space. In the context of embeddings, such as word embeddings or sentence embeddings, the dot product can help in searching for similar items by quantifying the similarity between their embedded representations.

Tools used for Demo

Hugging Face:

all-MiniLM-L6-v2

This is a sentence-transformers model: It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for tasks like clustering or semantic search.

Sign Up/Login: Go to the Hugging Face website (https://huggingface.co/) and sign up for an account if you don’t already have one. If you have an account, log in using your credentials.

Access Account Settings: Once you are logged in, click on your profile icon at the top right corner of the page. Then, select “Settings” from the dropdown menu.

Generate API Key: In the “Settings” page, navigate to the “API keys” section. Here, you should see an option to generate a new API key.

Chroma DB:

Chroma is a database for building AI applications with embeddings. It comes with everything you need to get started built in, and runs on your machine.

Install:

pip install chromadb

create chroma db client :

chromadb.client(setting(chroma_db_impl=”duckdb+parquet“,persist_directory=”db/”))

DuckDB: DuckDB refers to the DuckDB database engine. DuckDB is an in-memory analytical database system designed for fast query processing and efficient data storage, as mentioned earlier.

Parquet: Parquet is a columnar storage file format commonly used in big data processing frameworks like Apache Hadoop and Apache Spark. Parquet is designed for efficient storage and processing of large datasets, especially in distributed environments. It provides features such as efficient compression, encoding, and column pruning, making it suitable for analytics workloads.

persist_directory  ; Directory to store data

Create collection: Collection is like table in SQL server

collection = chroma_client.create_collection(name=”my_collection”)

Query Collection :

Chroma collections can be queried in a variety of ways, using the .query method.

You can query by a set of query_embeddings, The query will return the n_results closest matches to each query_embedding, in order.

Distance:

Distance is a metric used to measure the similarity or dissimilarity between data points or embedding, when performing a query on a collection.

# Import necessary modules and classes from ChromaDB library

import chromadb

from chromadb.config import Settings

from chromadb.utils import embedding_functions

# Initialize the Hugging Face Embedding Function with the specified model and API key

hf_ef = embedding_functions.HuggingFaceEmbeddingFunction(

    model_name=”sentence-transformers/all-MiniLM-L6-v2″,

    api_key=”<<<api Key>>>”

)

# Create a ChromaDB client with specified settings

client = chromadb.Client(Settings(

    chroma_db_impl=”duckdb+parquet”,

    persist_directory=”db/”

))

# Get or create a collection named “goodlog” using the ChromaDB client

coll = client.get_or_create_collection(“goodlog”)

# Define a function to read a log file and add its embeddings to the collection

def read_log_file(filename, addToColl):

    indexx = 0

    try:

        with open(filename, ‘r’) as file:

            # Read each line in the file

            for line in file:

                # Process each log line

                indexx += 1

                process_log_line(line.strip(), indexx, addToColl)

    except FileNotFoundError:

        print(f”Error: Log file ‘{filename}’ not found.”)

# Define a function to process each log line and add its embeddings to the collection,

# or query for anomalies based on provided criteria

def process_log_line(log_line, linenumber, addToColl):

    if addToColl == 1:

        # Add log line embeddings to the collection

        logline = hf_ef([log_line])

        coll.add(embeddings=logline, documents=log_line, ids=str(linenumber))

    elif addToColl == 2:

        # Query for anomalies and print them if the distance is greater than 0.5

        query_vector = hf_ef([log_line])

        res = coll.query(

            query_embeddings=query_vector,

            n_results=1,

            include=[‘distances’],

        )

        if round(res[‘distances’][0][0], 2) > 0.5:

            print(log_line + ” ** this is Anomaly”)

            print(res[‘distances’][0][0])

    else:

        return log_line

# Main program entry point

if __name__ == “__main__”:

    # Uncomment the line below to add good log embeddings to the collection

    # read_log_file(“./random_log_file.txt”, 1)

    # Read log file and search for anomalies

    read_log_file(“./random_log_file_error.txt”, 2)

Azure, Azure Serice Fabric, Docker

Publish docker image to specific Node in Azure service fabric cluster

Update ServiceManifest.xml as below

<ServiceTypes>
<!– This is the name of your ServiceType.
The UseImplicitHost attribute indicates this is a guest service. –>
<StatelessServiceType ServiceTypeName=”<ServiceType>” UseImplicitHost=”true” >
<PlacementConstraints>(NodeTypeName==<NodeName>)</PlacementConstraints>
</StatelessServiceType>
</ServiceTypes>

PlacementConstraints will decide node to deploy,NodeName is the name from your cluster.

Uncategorized

CIM namespace root/Microsoft/Windows/DesiredStateConfiguration is invalid

Normally even WinRM quickconfig will also say “WinRM service is already running on this machine.” and winmgmt /verifyrepository says “WMI repository is consistent“.

To find whether WMI is properly setup or not use below setups to verify

  • Open wmimgmt.msc in Run.
  •  WMI Control will be opened.
  • Right click on and select properties.
  • In ideal scenario every thing should be good with proper wmi version etc.

Or

if you run get-cimclass it should return all class and name if it return CIM namespace root/Microsoft/Windows/DesiredStateConfiguration is invalid.

Follow below steps to fix issue

  1. Disable and stop the winmgmt service
  2. Remove or rename C:\Windows\System32\wbem\repository
  3. Enable and start the winmgmt service
  4. Open a CMD prompt as Administrator
  5. In the CMD prompt Navigate to C:\Windows\System32\wbem\
  6. Run the following command: in c:\

‘for /f  “delims=”%s  in (‘dir /b *.mof’) do mofcomp “%s”’

Note: This will take a minute or so to complete.

  1. Now run the command:

‘for /f “delims=” %s in (‘dir /b en-us\*.mfl’) do mofcomp en-us\%s’

 

Azure, Powershell

Using Azure Key vault for storing secret passwords

Instead of storing passwords in web.configs or some DB’s it is “the most” secured place to have your secret’s password,in this blog I will explain the process of storing and retiring secrets/password in azure key vaults using Power shell and C#.

Storing passwords in Key Vault as secret

Create Key vault in Azure

Using New-AzureRmKeyVault we have to create vault and add secrets using Set-AzureKeyVaultSecret

New-AzureRmKeyVault -VaultName ‘TestKeyVault’ -ResourceGroupName ‘TestResourceGroup’ -Location ‘East US’

$secret = Set-AzureKeyVaultSecret -VaultName ‘TestKeyVault’ -Name ‘Password’ -SecretValue “P@ssW0rd”

Once your are secret is ready you can start using them in your code or any deployment scripts

you can get the key using powershell cmdlets or using  API’s

Using Key vault Name :

Get-AzureKeyVaultSecret cmdlet will bring the current version of your secret.

$value = Get-AzureKeyVaultSecret -VaultName ‘TestKeyVault’ -Name ‘Password’

Using Key vault URL :

To call using URL

function Get-OAuth2Uri
(
[string]$url
)
{
$response = try { Invoke-RestMethod -Method GET -Uri $url -Headers @{} } catch { $_.Exception.Response }
$authHeader = $response.Headers[‘www-authenticate’]
$endpoint = [regex]::match($authHeader, ‘authorization=”(.*?)”‘).Groups[1].Value

return “$endpoint/oauth2/token”
}

function Get-AccessToken
(
[string]$url,
[string]$aadClientId,
[string]$aadClientSecret
)
{
$oath2Uri = Get-OAuth2Uri -url $url

$body = ‘grant_type=client_credentials’
$body += ‘&client_id=’ + $aadClientId
$body += ‘&client_secret=’ + [Uri]::EscapeDataString($aadClientSecret)
$body += ‘&resource=’ + [Uri]::EscapeDataString(“https://vault.azure.net&#8221;)

$response = Invoke-RestMethod -Method POST -Uri $oath2Uri -Headers @{} -Body $body

return $response.access_token
}

function Get-Secret
(
[string] $aadClientId,[string] $aadClientSecret,[string] $url
)
{
$accessToken = Get-AccessToken -url $url -aadClientId $aadClientId -aadClientSecret $aadClientSecret
$headers = @{ ‘Authorization’ = “Bearer $accessToken” }

$queryUrl = $url + ‘?api-version=2016-10-01’

$keyResponse = Invoke-RestMethod -Method GET -Uri $queryUrl -Headers $headers

return $keyResponse.value
}

Get-Secret -aadClientId “{ClientId}” -aadClientSecret “{AADSecretID}” -url “https://TestKeyVault.vault.azure.net/secrets/Password&#8221;

Using C#

If you are trying to convert keyvault secret to plain text in C# code use below approach

To use Keyvault in C# need to install “Microsoft.Azure.KeyVault” nuget package,use below command to install KeyVault nuget

Install-Package Microsoft.Azure.KeyVault -Version 2.0.6

public static async Task<string> GetToken(string authority, string resource, string scope)
{
var authContext = new AuthenticationContext(authority);
ClientCredential clientCred = new ClientCredential(WebConfigurationManager.AppSettings[“ClientId”],
WebConfigurationManager.AppSettings[“ClientSecret”]);
AuthenticationResult result = await authContext.AcquireTokenAsync(resource, clientCred);

if (result == null)
throw new InvalidOperationException(“Failed to obtain the JWT token”);

return result.AccessToken;
}

private async Task<string> GetKeyVaultValue(string keyVaultUrl)
{
var kv = new KeyVaultClient(GetToken);

var sec = await kv.GetSecretAsync(keyVaultUrl);
return sec.Value;
}

public async Task<string> Get(string keyVaultUrl)
{

return await GetKeyVaultValue(keyVaultUrl);

}

 

Uncategorized

Updating IIS configuration command line using appcmd.exe

 

Update using below command (try in Powershell)

E.g: & $Env:WinDir\system32\inetsrv\appcmd.exe set config “Default Web App/SiteName” /section:anonymousAuthentication /enabled:true

On Error : 

ERROR ( message:Can not set attribute “enabled” to value “true”.. Reason: This configuration section cannot be used at this path. This happens when the section is locked at a parent level. Locking is either by default (overrideModeDefault=”Deny”), or set explicitly by a location tag with overrideMode=”Deny” or the legacy allowOverride=”false”.. )

Unlock/Override using below Command

Unlocking Security/authentication/anonymousAuthentication

E.g : %appCommand% unlock config -section:security/authentication/anonymousAuthentication

List settings using below command(Section)

E.g & $Env:WinDir\system32\inetsrv\appcmd.exe list config “Default Web Site/SiteName” /section:anonymousAuthentication

IIS Host file location 

“C:\Windows\System32\inetsrv\config\applicationHost.config”

Uncategorized

Create Self-signed certificate using SSLCert

Download SelfSSL :

Download IIS 6.0 Resource Toolkit (includes SelfSSL utility) from Microsoft

Note: SSL will be installed in Windows XP or windows server 2003

SelfSSL will be installed in “C:\Program Files\IIS Resources\SelfSSL”

Use below command to create certificate

selfssl /N:CN=MY.domain.com /V:365

Note : ignore Error “Error opening metabase: 0x80040154”

Open MMC.exe and add certificates snapin

Go to Start > Run (or Windows Key + R) and enter “mmc”. You may receive a UAC prompt, accept it and an empty Management Console will open.

mmc

In the console, go to File > Add/Remove Snap-in.

Add Certificates from the left side

computer

You can see recently created certificate  .

 

mydomain

 

 

 

 

Uncategorized

Making the Windows 10 Start Menu Work Again

  1. Open Windows PowerShell as Administrator
    1. Right click Start button
    2. Choose Command prompt (Admin)
    3. Key in Powershell in the black window and hit Enter
  2. Paste the following command in the Administrator: Windows PowerShell window and press Enter key:
    Get-AppXPackage -AllUsers | Foreach {Add-AppxPackage -DisableDevelopmentMode -Register "$($_.InstallLocation)\AppXManifest.xml"}