Skip to content

Commit

Permalink
[MAINTENANCE] Rename GE to GX across codebase (GREAT-1352) (great…
Browse files Browse the repository at this point in the history
  • Loading branch information
cdkini authored Dec 6, 2022
1 parent 23f0242 commit 7e92319
Show file tree
Hide file tree
Showing 215 changed files with 799 additions and 817 deletions.
4 changes: 2 additions & 2 deletions SLACK_GUIDELINES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
We cannot stress enough that we want this to be a safe, comfortable and inclusive environment. Please read our [code of conduct](https://github.com/great-expectations/great_expectations/blob/develop/CODE_OF_CONDUCT.md) if you need more information on this guideline.

## Keep timezones in mind and be respectful of peoples’ time.
People on Slack are distributed and might be in a very different time zone from you, so don't use @channel @here (this is reserved for admins anyways). Before you @-mention someone, think about what timezone they are in and if you are likely to disturb them. You can check someone's timezone in their profile. As of today, the core GE team is based solely in the United States but the community is world wide.
People on Slack are distributed and might be in a very different time zone from you, so don't use @channel @here (this is reserved for admins anyways). Before you @-mention someone, think about what timezone they are in and if you are likely to disturb them. You can check someone's timezone in their profile. As of today, the core GX team is based solely in the United States but the community is world wide.

If you post in off hours be patient, Someone will get back to you once the sun comes up.

Expand All @@ -13,7 +13,7 @@ If you post in off hours be patient, Someone will get back to you once the sun c
- Do your best to try and solve the problem first as your efforts will help us more easily answer the question.
- [Read "How to write a good question in Slack"](https://github.com/great-expectations/great_expectations/discussions/4951)
- Head over to our [Documentation](https://docs.greatexpectations.io/en/latest/)
- Checkout [GitHub Discussions](https://github.com/great-expectations/great_expectations/discussions) this is where we want most of our problem solving, discussion, updates, etc to go because it helps keep a more visible record for GE users.
- Checkout [GitHub Discussions](https://github.com/great-expectations/great_expectations/discussions) this is where we want most of our problem solving, discussion, updates, etc to go because it helps keep a more visible record for GX users.

#### Asking your question in Slack

Expand Down
22 changes: 11 additions & 11 deletions azure-pipelines-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ stages:
tests/integration/fixtures/**
tests/test_sets/**
[GEChanged]
[GXChanged]
great_expectations/**/*.py
pyproject.toml
setup.cfg
Expand All @@ -89,7 +89,7 @@ stages:

jobs:
- job: lint
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GEChanged'], true)
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GXChanged'], true)
steps:
- task: UsePythonVersion@0
inputs:
Expand Down Expand Up @@ -156,10 +156,10 @@ stages:

- script: |
pip install .
displayName: 'Install GE and required dependencies (i.e. not sqlalchemy)'
displayName: 'Install GX and required dependencies (i.e. not sqlalchemy)'
- script: |
python -c "import great_expectations as gx; print('Successfully imported GE Version:', gx.__version__)"
python -c "import great_expectations as gx; print('Successfully imported GX Version:', gx.__version__)"
displayName: 'Import Great Expectations'
- stage: required
Expand All @@ -170,7 +170,7 @@ stages:
jobs:
# Runs pytest without any additional flags
- job: minimal
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GEChanged'], true)
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GXChanged'], true)
strategy:
# This matrix is intended to split up our sizeable test suite into two distinct components.
# By splitting up slow tests from the remainder of the suite, we can parallelize test runs
Expand Down Expand Up @@ -248,7 +248,7 @@ stages:

# Runs pytest with Spark and Postgres enabled
- job: comprehensive
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GEChanged'], true)
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GXChanged'], true)
strategy:
# This matrix is intended to split up our sizeable test suite into two distinct components.
# By splitting up slow tests from the remainder of the suite, we can parallelize test runs
Expand Down Expand Up @@ -323,7 +323,7 @@ stages:

jobs:
- job: test_usage_stats_messages
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GEChanged'], true)
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GXChanged'], true)
variables:
python.version: '3.8'

Expand Down Expand Up @@ -359,7 +359,7 @@ stages:

jobs:
- job: mysql
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GEChanged'], true)
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GXChanged'], true)

services:
mysql: mysql
Expand Down Expand Up @@ -416,7 +416,7 @@ stages:
GE_USAGE_STATISTICS_URL: ${{ variables.GE_USAGE_STATISTICS_URL }}
- job: mssql
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GEChanged'], true)
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GXChanged'], true)

services:
mssql: mssql
Expand Down Expand Up @@ -463,7 +463,7 @@ stages:
GE_USAGE_STATISTICS_URL: ${{ variables.GE_USAGE_STATISTICS_URL }}
- job: trino
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GEChanged'], true)
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GXChanged'], true)

services:
trino: trino
Expand Down Expand Up @@ -522,7 +522,7 @@ stages:

jobs:
- job: test_cli
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GEChanged'], true)
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GXChanged'], true)

services:
postgres: postgres
Expand Down
4 changes: 2 additions & 2 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -466,10 +466,10 @@ stages:

- script: |
pip install .
displayName: 'Install GE and required dependencies (i.e. not sqlalchemy)'
displayName: 'Install GX and required dependencies (i.e. not sqlalchemy)'
- script: |
python -c "import great_expectations as gx; print('Successfully imported GE Version:', gx.__version__)"
python -c "import great_expectations as gx; print('Successfully imported GX Version:', gx.__version__)"
displayName: 'Import Great Expectations'
- stage: db_integration
Expand Down
4 changes: 2 additions & 2 deletions azure/user-install-matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
- script: |
great_expectations --version
great_expectations -y init --no-usage-stats
python -c "import great_expectations as gx; print('Successfully imported GE Version:', gx.__version__)"
python -c "import great_expectations as gx; print('Successfully imported GX Version:', gx.__version__)"
displayName: 'Confirm installation'
- job:
Expand All @@ -47,5 +47,5 @@ jobs:
source activate ge_dev
great_expectations --version
great_expectations -y init --no-usage-stats
python -c "import great_expectations as gx; print('Successfully imported GE Version:', gx.__version__)"
python -c "import great_expectations as gx; print('Successfully imported GX Version:', gx.__version__)"
displayName: 'Confirm installation'
38 changes: 19 additions & 19 deletions contrib/capitalone_dataprofiler_expectations/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ If you have suggestions or find a bug, [please open an issue](https://github.com

If you want to install the ml dependencies without generating reports use `DataProfiler[ml]`

If the ML requirements are too strict (say, you don't want to install tensorflow), you can install a slimmer package with `DataProfiler[reports]`. The slimmer package disables the default sensitive data detection / entity recognition (labler)
If the ML requirements are too strict (say, you don't want to install tensorflow), you can install a slimmer package with `DataProfiler[reports]`. The slimmer package disables the default sensitive data detection / entity recognition (labler)

Install from pypi: `pip install DataProfiler`

Expand All @@ -47,7 +47,7 @@ Install from pypi: `pip install DataProfiler`

# What is a Data Profile?

In the case of this library, a data profile is a dictionary containing statistics and predictions about the underlying dataset. There are "global statistics" or `global_stats`, which contain dataset level data and there are "column/row level statistics" or `data_stats` (each column is a new key-value entry).
In the case of this library, a data profile is a dictionary containing statistics and predictions about the underlying dataset. There are "global statistics" or `global_stats`, which contain dataset level data and there are "column/row level statistics" or `data_stats` (each column is a new key-value entry).

The format for a structured profile is below:

Expand All @@ -57,7 +57,7 @@ The format for a structured profile is below:
"column_count": int,
"row_count": int,
"row_has_null_ratio": float,
"row_is_null_ratio": float,
"row_is_null_ratio": float,
"unique_row_ratio": float,
"duplicate_row_count": int,
"file_type": string,
Expand All @@ -84,11 +84,11 @@ The format for a structured profile is below:
"null_types_index": {
string: list[int]
},
"data_type_representation": dict[string, float],
"data_type_representation": dict[string, float],
"min": [null, float, str],
"max": [null, float, str],
"mode": float,
"median": float,
"median": float,
"median_absolute_deviation": float,
"sum": float,
"mean": float,
Expand All @@ -98,15 +98,15 @@ The format for a structured profile is below:
"kurtosis": float,
"num_zeros": int,
"num_negatives": int,
"histogram": {
"histogram": {
"bin_counts": list[int],
"bin_edges": list[float],
},
"quantiles": {
int: float
},
"vocab": list[char],
"avg_predictions": dict[string, float],
"avg_predictions": dict[string, float],
"data_label_representation": dict[string, float],
"categories": list[str],
"unique_count": int,
Expand All @@ -122,7 +122,7 @@ The format for a structured profile is below:
'std': float,
'sample_size': int,
'margin_of_error': float,
'confidence_level': float
'confidence_level': float
},
"times": dict[string, float],
"format": string
Expand Down Expand Up @@ -180,7 +180,7 @@ The format for an unstructured profile is below:
* `duplicate_row_count` - the number of rows that occur more than once in the input dataset
* `file_type` - the format of the file containing the input dataset (ex: .csv)
* `encoding` - the encoding of the file containing the input dataset (ex: UTF-8)
* `correlation_matrix` - matrix of shape `column_count` x `column_count` containing the correlation coefficients between each column in the dataset
* `correlation_matrix` - matrix of shape `column_count` x `column_count` containing the correlation coefficients between each column in the dataset
* `chi2_matrix` - matrix of shape `column_count` x `column_count` containing the chi-square statistics between each column in the dataset
* `profile_schema` - a description of the format of the input dataset labeling each column and its index in the dataset
* `string` - the label of the column in question and its index in the profile schema
Expand Down Expand Up @@ -289,7 +289,7 @@ The format for an unstructured profile is below:
* BAN (bank account number, 10-18 digits)
* CREDIT_CARD
* EMAIL_ADDRESS
* UUID
* UUID
* HASH_OR_KEY (md5, sha1, sha256, random hash, etc.)
* IPV4
* IPV6
Expand Down Expand Up @@ -328,7 +328,7 @@ Along with other attributtes the `Data class` enables data to be accessed via a

```python
# Load a csv file, return a CSVData object
csv_data = Data('your_file.csv')
csv_data = Data('your_file.csv')

# Print the first 10 rows of the csv file
print(csv_data.data.head(10))
Expand All @@ -346,10 +346,10 @@ print(parquet_data.data.head(10))
json_data = Data('https://github.com/capitalone/DataProfiler/blob/main/dataprofiler/tests/data/json/iris-utf-8.json')
```

If the file type is not automatically identified (rare), you can specify them
If the file type is not automatically identified (rare), you can specify them
specifically, see section [Specifying a Filetype or Delimiter](#specifying-a-filetype-or-delimiter).

### Profile a File
### Profile a File

Example uses a CSV file for example, but CSV, JSON, Avro, Parquet or Text should also work.

Expand All @@ -358,7 +358,7 @@ import json
from dataprofiler import Data, Profiler

# Load file (CSV should be automatically identified)
data = Data("your_file.csv")
data = Data("your_file.csv")

# Profile the dataset
profile = Profiler(data)
Expand Down Expand Up @@ -395,7 +395,7 @@ Note that if the data you update the profile with contains integer indices that

### Merging Profiles

If you have two files with the same schema (but different data), it is possible to merge the two profiles together via an addition operator.
If you have two files with the same schema (but different data), it is possible to merge the two profiles together via an addition operator.

This also enables profiles to be determined in a distributed manner.

Expand All @@ -422,8 +422,8 @@ Note that if merged profiles had overlapping integer indices, when null rows are

### Profiler Differences
For finding the change between profiles with the same schema we can utilize the
profile's `diff` function. The diff will provide overall file and sampling
differences as well as detailed differences of the data's statistics. For
profile's `diff` function. The diff will provide overall file and sampling
differences as well as detailed differences of the data's statistics. For
example, numerical columns have a t-test applied to evaluate similarity.
More information is described in the Profiler section of the [Github Pages](
https://capitalone.github.io/DataProfiler/).
Expand Down Expand Up @@ -463,7 +463,7 @@ print(json.dumps(report["data_stats"][0], indent=4))
```

### Unstructured profiler
In addition to the structured profiler, DataProfiler provides unstructured profiling for the TextData object or string. The unstructured profiler also works with list[string], pd.Series(string) or pd.DataFrame(string) given profiler_type option specified as `unstructured`. Below is an example of the unstructured profiler with a text file.
In addition to the structured profiler, DataProfiler provides unstructured profiling for the TextData object or string. The unstructured profiler also works with list[string], pd.Series(string) or pd.DataFrame(string) given profiler_type option specified as `unstructured`. Below is an example of the unstructured profiler with a text file.
```python
import dataprofiler as dp
import json
Expand Down Expand Up @@ -500,4 +500,4 @@ Authors: Anh Truong, Austin Walters, Jeremy Goodsitt
The AAAI-21 Workshop on Knowledge Discovery from Unstructured Data in Financial Services
```

GE Integration Author: Taylor Turner ([taylorfturner](https://github.com/taylorfturner))
GX Integration Author: Taylor Turner ([taylorfturner](https://github.com/taylorfturner))
12 changes: 6 additions & 6 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,19 @@ ARG PYTHON_DOCKER_TAG

FROM python:${PYTHON_DOCKER_TAG}

ARG GE_EXTRA_DEPS="spark,sqlalchemy,redshift,s3,gcp,snowflake"
ARG GX_EXTRA_DEPS="spark,sqlalchemy,redshift,s3,gcp,snowflake"

ENV PYTHONIOENCODING utf-8
ENV LANG C.UTF-8
ENV HOME /root
ENV PATH /usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:${HOME}/.local/bin
# Path where the root of the GE project will be expected
ENV GE_HOME /usr/app/great_expectations
# Path where the root of the GX project will be expected
ENV GX_HOME /usr/app/great_expectations

LABEL maintainer="great-expectations"
LABEL org.opencontainers.image.title="Great Expectations"
LABEL org.opencontainers.image.description="Great Expectations. Always know what to expect from your data."
LABEL org.opencontainers.image.version=${GE_VERSION}
LABEL org.opencontainers.image.version=${GX_VERSION}
LABEL org.opencontainers.image.created=${CREATED}
LABEL org.opencontainers.image.url="https://github.com/great-expectations/great_expectations"
LABEL org.opencontainers.image.documentation="https://github.com/great-expectations/great_expectations"
Expand All @@ -29,10 +29,10 @@ COPY . /tmp/great_expectations_install

RUN mkdir -p /usr/app ${HOME} && \
cd /tmp/great_expectations_install && \
pip install .[${GE_EXTRA_DEPS}] && \
pip install .[${GX_EXTRA_DEPS}] && \
rm -rf /tmp/great_expectations_install

WORKDIR ${GE_HOME}
WORKDIR ${GX_HOME}

ENTRYPOINT ["great_expectations"]
CMD ["--help"]
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ deployment, with configurations and methods for all supporting components.

The DataContext is configured via a yml file stored in a directory called great_expectations; this configuration
file as well as managed Expectation Suites should be stored in version control. There are other ways to create a
Data Context that may be better suited for your particular deployment e.g. ephemerally or backed by GE Cloud
Data Context that may be better suited for your particular deployment e.g. ephemerally or backed by GX Cloud
(coming soon). Please refer to our documentation for more details.

You can Validate data or generate Expectations using Execution Engines including:
Expand Down
2 changes: 1 addition & 1 deletion docs/contributing/style_guides/docs_style.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ This style guide will be enforced for all incoming PRs. However, certain legacy
:::


* The **project name “Great Expectations” is always spaced and capitalized.** Good: “Great Expectations”. Bad: “great_expectations”, “great expectations”, “GE.”
* The **project name “Great Expectations” is always spaced and capitalized.** Good: “Great Expectations”. Bad: “great_expectations”, “great expectations”, “GX.”

* **We refer to ourselves in the first person plural.** Good: “we”, “our”. Bad: “I”. This helps us avoid awkward passive sentences. Occasionally, we refer to ourselves as “the Great Expectations team” (or community) for clarity.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ def file_task(

@workflow
def file_wf(
dataset: CSVFile = "https://raw.githubusercontent.com/superconductive/ge_tutorials/main/data/yellow_tripdata_sample_2019-01.csv",
dataset: CSVFile = "https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv",
) -> int:
return file_task(dataset=dataset)

Expand Down Expand Up @@ -156,7 +156,7 @@ def to_df(dataset: str) -> pd.DataFrame:
def schema_wf() -> int:
return schema_task(
dataframe=to_df(
dataset="https://raw.githubusercontent.com/superconductive/ge_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
dataset="https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
)
)

Expand Down
Loading

0 comments on commit 7e92319

Please sign in to comment.