Skip to content

Latest commit

 

History

History
186 lines (133 loc) · 4.8 KB

quickstart_ml.mdx

File metadata and controls

186 lines (133 loc) · 4.8 KB
title description
Data and ML checks
ML monitoring "hello world"

import CloudSignup from '/snippets/cloud_signup.mdx'; import CreateProject from '/snippets/create_project.mdx';

Need help? Ask on [Discord](https://discord.com/invite/xZjKRaNp8b).

1. Set up your environment

This quickstart shows both local open-source and cloud workflows.

You will run a simple evaluation in Python and explore results in Evidently Cloud.
### 1.1. Set up Evidently Cloud

<CloudSignup />

### 1.2. Installation and imports

Install the Evidently Python library:

```python
!pip install evidently
```

Components to run the evals:

```python
import pandas as pd
from sklearn import datasets

from evidently.future.datasets import Dataset
from evidently.future.datasets import DataDefinition

from evidently.future.report import Report
from evidently.future.metrics import *
from evidently.future.presets import *
from evidently.future.tests import *
```

Components to connect with Evidently Cloud:

```python
from evidently.ui.workspace.cloud import CloudWorkspace
```

### 1.3. Create a Project

<CreateProject />
You will run a simple evaluation locally and preview the results in your Python environment.
Install the Evidently Python library:

```python
!pip install evidently
```

Components to run the evals:

```python
import pandas as pd
from sklearn import datasets

from evidently.future.datasets import Dataset
from evidently.future.datasets import DataDefinition

from evidently.future.report import Report
from evidently.future.metrics import *
from evidently.future.presets import *
from evidently.future.tests import *
```

2. Prepare a toy dataset

Let's import a toy dataset with tabular data:

adult_data = datasets.fetch_openml(name="adult", version=2, as_frame="auto")
adult = adult_data.frame
If OpenML is not available, you can download the same dataset from here:
url = "https://github.com/evidentlyai/evidently/blob/main/test_data/adults.parquet?raw=true"
adult = pd.read_parquet(url, engine='pyarrow')

Let's split the data into two and introduce some artificial drift for demo purposes. Prod data will include people with education levels unseen in the reference dataset:

adult_ref = adult[~adult.education.isin(["Some-college", "HS-grad", "Bachelors"])]
adult_prod = adult[adult.education.isin(["Some-college", "HS-grad", "Bachelors"])]

Map the column types:

schema = DataDefinition(
    numerical_columns=["education-num", "age", "capital-gain", "hours-per-week", "capital-loss"],
    categorical_columns=["education", "occupation", "native-country", "workclass", "marital-status"],
    )

Create Evidently Datasets to work with:

eval_data_1 = Dataset.from_pandas(
    pd.DataFrame(adult_prod),
    data_definition=schema
)
eval_data_2 = Dataset.from_pandas(
    pd.DataFrame(adult_ref),
    data_definition=schema
)

3. Get a Report

Let's a summary of all columns in the dataset, and run auto-generated Tests to check for data quality and core statistics between two datasets:

report = Report([
    DataSummaryPreset()
],
include_tests="True")
my_eval = report.run(eval_data_1, eval_data_2)

4. Explore the results

**Upload the Report** with summary results:
```python
ws.add_run(project.id, my_eval, include_data=False)
```

**View the Report**. Go to [Evidently Cloud](https://app.evidently.cloud/), open your Project, navigate to "Reports" in the left and open the Report. You will see the summary with scores and Test results.

**Get a Dashboard**. As you run repeated evals, you may want to track the results in time. Go to the "Dashboard" tab in the left menu and enter the "Edit" mode. Add a new tab, and select the "Columns" template.

You'll see a set of panels that show column stats. Each has a single data point. As you log ongoing evaluation results, you can track trends and set up alerts.
To view the Report in an interactive Python environment like Jupyter notebook or Colab, run:
```python
my_eval
```

This will show the summary Report. In the separate Tab, you'll see the pass/fail results for all Tests.

You can also view the results as a JSON or Python dictionary:

```python
# my_eval.json()
# my_eval.dict()
```

Or save and open an HTML file externally:

```python
# my_report.save_html(“file.html”)
```