Skip to content

Commit 6b2481c

Browse files
authored
Merge pull request #35 from mdreves/master
Project import generated by Copybara.
2 parents 429fd63 + 73760b2 commit 6b2481c

File tree

105 files changed

+6374
-2715
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

105 files changed

+6374
-2715
lines changed

README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -55,13 +55,14 @@ The following table is the TFMA package versions that are compatible with each
5555
other. This is determined by our testing framework, but other *untested*
5656
combinations may also work.
5757

58-
|tensorflow-model-analysis |tensorflow |apache-beam[gcp]|
59-
|---------------------------|--------------|----------------|
60-
|GitHub master |1.9 |2.6.0 |
61-
|0.9.2 |1.9 |2.6.0 |
62-
|0.9.1 |1.10 |2.6.0 |
63-
|0.9.0 |1.9 |2.5.0 |
64-
|0.6.0 |1.6 |2.4.0 |
58+
|tensorflow-model-analysis |tensorflow |apache-beam[gcp]|
59+
|---------------------------|--------------------|----------------|
60+
|GitHub master |1.11 |2.8.0 |
61+
|0.11.0 |1.11 |2.8.0 |
62+
|0.9.2 |1.9 |2.6.0 |
63+
|0.9.1 |1.10 |2.6.0 |
64+
|0.9.0 |1.9 |2.5.0 |
65+
|0.6.0 |1.6 |2.4.0 |
6566

6667
## Questions
6768

RELEASE.md

Lines changed: 76 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,77 @@
1+
# Release 0.11.0
2+
3+
## Major Features and Improvements
4+
5+
* We now support unsupervised models which have `model_fn`s that do not take a
6+
`labels` argument.
7+
* Improved performance by using `make_callable` instead of repeated
8+
`session.run` calls.
9+
* Improved performance by better choice of default "combine" batch size.
10+
* We now support passing in custom extractors in the model_eval_lib API.
11+
* Added support for models which have multiple examples per raw input (e.g.
12+
input is a compressed example which expands to multiple examples when parsed
13+
by the model). For such models, you must specify an `example_ref` parameter
14+
to your `EvalInputReceiver`. This 1-D integer Tensor should be batch aligned
15+
with features, predictions and labels and each element in it is an index in
16+
the raw input tensor to identify which input each feature / prediction /
17+
label came from. See
18+
`eval_saved_model/example_trainers/fake_multi_examples_per_input_estimator.py`
19+
for an example.
20+
* Added support for metrics with string `value_op`s.
21+
* Added support for metrics whose `value_op`s return multidimensional arrays.
22+
* We now support including your serving graph in the EvalSavedModel. You can
23+
do this by passing a `serving_input_receiver_fn` to `export_eval_savedmodel`
24+
or any of the TFMA Exporters.
25+
26+
## Bug fixes and other changes
27+
28+
* Depends on `apache-beam[gcp]>=2.8,<3`.
29+
* Depends on `tensorflow-transform>=0.11,<1`.
30+
* Requires pre-installed TensorFlow >=1.11,<2.
31+
* Factor our utility functions for adding sliceable "meta-features" to FPL.
32+
* Added public API docs
33+
* Add an extractor to add sliceable "meta-features" to FPL.
34+
* Potentially improved performance by fanning out large slices.
35+
* Add support for assets_extra in `tfma.exporter.FinalExporter`.
36+
* Add a light-weight library that includes only the export-related modules for
37+
TFMA for use in your Trainer. See docstring in
38+
`tensorflow_model_analysis/export_only/__init__.py` for usage.
39+
* Update `EvalInputReceiver` so the TFMA collections written to the graph only
40+
contain the results of the last call if multiple calls to `EvalInputReceiver`
41+
are made.
42+
* We now finalize the graph after it's loaded and post-export metrics are added,
43+
potentially improving performance.
44+
* Fix a bug in post-export PrecisionRecallAtK where labels with only 1 dimension
45+
were not correctly handled.
46+
* Fix an issue where we were not correctly wrapping SparseTensors for `features`
47+
and `labels` in `tf.identity`, which could cause TFMA to encounter
48+
TensorFlow issue #17568 if there were control dependencies on these `features`
49+
or `labels`.
50+
* We now correctly preserve `dtypes` when splitting and concatenating
51+
SparseTensors internally. The failure to do so previously could result in
52+
unexpectedly large memory usage if string values were involved due to the
53+
inefficient pickling of NumPy string arrays with a large number of elements.
54+
55+
## Breaking changes
56+
57+
* Requires pre-installed TensorFlow >=1.11,<2.
58+
* We now require that `EvalInputReceiver`, `export_eval_savedmodel`,
59+
`make_export_strategy`, `make_final_exporter`, `FinalExporter` and
60+
`LatestExporter` be called with keyword arguments only.
61+
* Removed `check_metric_compatibility` from `EvalSavedModel`.
62+
* We now enforce that the `receiver_tensors` dictionary for `EvalInputReceiver`
63+
contains exactly one key named `examples`.
64+
* Post-export metrics have now been moved up one level to
65+
`tfma.post_export_metrics`. They should now be accessed via
66+
`tfma.post_export_metrics.auc` instead of
67+
`tfma.post_export_metrics.post_export_metrics.auc` as they were before.
68+
* Separated extraction from evaluation. `EvaluteAndWriteResults` is now called
69+
`ExtractEvaluateAndWriteResults`.
70+
* Added `EvalSharedModel` type to encapsulate `model_path` and
71+
`add_metrics_callbacks` along with a handle to a shared model instance.
72+
73+
## Deprecations
74+
175
# Release 0.9.2
276

377
## Major Features and Improvements
@@ -21,7 +95,6 @@
2195
## Bug fixes and other changes
2296

2397
* Depends on `apache-beam[gcp]>=2.6,<3`.
24-
* Requires pre-installed TensorFlow >=1.10,<2.
2598
* Updated ExampleCount to use the batch dimension as the example count. It
2699
also now tries a few fallbacks if none of the standard keys are found in the
27100
predictions dictionary: the first key in sorted order in the predictions
@@ -34,6 +107,8 @@
34107

35108
## Breaking changes
36109

110+
* Requires pre-installed TensorFlow >=1.10,<2.
111+
37112
## Deprecations
38113

39114
# Release 0.9.0

examples/chicago_taxi/chicago_taxi_client.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ def _do_local_inference(host, port, serialized_examples):
4848
shape=[len(serialized_examples)],
4949
dtype=tf.string)
5050
# The name of the input tensor is 'examples' based on
51-
# https://github.com/tensorflow/tensorflow/blob/r1.9/tensorflow/python/estimator/export/export.py#L290
51+
# https://github.com/tensorflow/tensorflow/blob/r1.11/tensorflow/python/estimator/export/export.py#L306
5252
request.inputs['examples'].CopyFrom(tfproto)
5353
print(stub.Predict(request, _LOCAL_INFERENCE_TIMEOUT_SECONDS))
5454

examples/chicago_taxi/chicago_taxi_tfma_local_playground.ipynb

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -429,7 +429,7 @@
429429
"\n",
430430
" run_experiment(hparams)\n",
431431
"\n",
432-
"print('Done.')"
432+
"print('Done')"
433433
]
434434
},
435435
{
@@ -445,7 +445,7 @@
445445
" num_layers=4,\n",
446446
" first_layer_size=100,\n",
447447
" scale_factor=0.7)\n",
448-
"print('Done.')"
448+
"print('Done')"
449449
]
450450
},
451451
{
@@ -496,6 +496,9 @@
496496
" \"\"\"\n",
497497
" eval_model_base_dir = os.path.join(get_tf_output_dir(tf_run_id), EVAL_MODEL_DIR)\n",
498498
" eval_model_dir = os.path.join(eval_model_base_dir, next(os.walk(eval_model_base_dir))[1][0])\n",
499+
" eval_shared_model = tfma.default_eval_shared_model(\n",
500+
" eval_saved_model_path=eval_model_dir,\n",
501+
" add_metrics_callbacks=add_metrics_callbacks)\n",
499502
" schema = taxi.read_schema(schema_file)\n",
500503
" \n",
501504
" print(eval_model_dir)\n",
@@ -518,12 +521,13 @@
518521
" raw_data\n",
519522
" | 'ToSerializedTFExample' >> beam.Map(coder.encode))\n",
520523
"\n",
521-
" _ = raw_data | 'EvaluateAndWriteResults' >> tfma.EvaluateAndWriteResults(\n",
522-
" eval_saved_model_path=eval_model_dir,\n",
523-
" slice_spec=slice_spec,\n",
524-
" output_path=get_tfma_output_dir(tfma_run_id),\n",
525-
" add_metrics_callbacks=add_metrics_callbacks,\n",
526-
" display_only_data_location=input_csv)\n",
524+
" _ = (raw_data\n",
525+
" | 'ExtractEvaluateAndWriteResults' >>\n",
526+
" tfma.ExtractEvaluateAndWriteResults(\n",
527+
" eval_shared_model=eval_shared_model,\n",
528+
" slice_spec=slice_spec,\n",
529+
" output_path=get_tfma_output_dir(tfma_run_id),\n",
530+
" display_only_data_location=input_csv))\n",
527531
"\n",
528532
" return tfma.load_eval_result(output_path=get_tfma_output_dir(tfma_run_id))\n",
529533
" \n",
@@ -593,7 +597,7 @@
593597
" tfma_run_id=1,\n",
594598
" slice_spec=ALL_SPECS,\n",
595599
" schema_file=get_schema_file())\n",
596-
"print('Done.')\n"
600+
"print('Done')\n"
597601
]
598602
},
599603
{
@@ -667,12 +671,12 @@
667671
" add_metrics_callbacks=[\n",
668672
" # calibration_plot_and_prediction_histogram computes calibration plot and prediction\n",
669673
" # distribution at different thresholds.\n",
670-
" tfma.post_export_metrics.post_export_metrics.calibration_plot_and_prediction_histogram(),\n",
674+
" tfma.post_export_metrics.calibration_plot_and_prediction_histogram(),\n",
671675
" # auc_plots enables precision-recall curve and ROC visualization at different thresholds.\n",
672-
" tfma.post_export_metrics.post_export_metrics.auc_plots()\n",
676+
" tfma.post_export_metrics.auc_plots()\n",
673677
" ])\n",
674678
"\n",
675-
"print('Done.')"
679+
"print('Done')"
676680
]
677681
},
678682
{
@@ -823,7 +827,7 @@
823827
" first_layer_size=240,\n",
824828
" scale_factor=0.5)\n",
825829
"\n",
826-
"print('Done.')"
830+
"print('Done')"
827831
]
828832
},
829833
{
@@ -845,7 +849,7 @@
845849
" tfma_run_id=3,\n",
846850
" slice_spec=ALL_SPECS,\n",
847851
" schema_file=get_schema_file())\n",
848-
"print('Done.')"
852+
"print('Done')"
849853
]
850854
},
851855
{

examples/chicago_taxi/process_tfma.py

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,8 @@
2525
import tensorflow as tf
2626

2727
import tensorflow_model_analysis as tfma
28-
from tensorflow_model_analysis.eval_saved_model.post_export_metrics import post_export_metrics
29-
3028
from trainer import taxi
3129

32-
from tensorflow_model_analysis.slicer import slicer
33-
3430

3531
def process_tfma(eval_result_dir,
3632
schema_file,
@@ -53,7 +49,6 @@ def process_tfma(eval_result_dir,
5349
None.
5450
eval_model_dir: A directory where the eval model is located.
5551
max_eval_rows: Number of rows to query from BigQuery.
56-
5752
pipeline_args: additional DataflowRunner or DirectRunner args passed to the
5853
beam pipeline.
5954
@@ -66,12 +61,19 @@ def process_tfma(eval_result_dir,
6661
'one of --input_csv or --big_query_table should be provided.')
6762

6863
slice_spec = [
69-
slicer.SingleSliceSpec(),
70-
slicer.SingleSliceSpec(columns=['trip_start_hour'])
64+
tfma.SingleSliceSpec(),
65+
tfma.SingleSliceSpec(columns=['trip_start_hour'])
7166
]
7267

7368
schema = taxi.read_schema(schema_file)
7469

70+
eval_shared_model = tfma.default_eval_shared_model(
71+
eval_saved_model_path=eval_model_dir,
72+
add_metrics_callbacks=[
73+
tfma.post_export_metrics.calibration_plot_and_prediction_histogram(),
74+
tfma.post_export_metrics.auc_plots()
75+
])
76+
7577
with beam.Pipeline(argv=pipeline_args) as pipeline:
7678
if input_csv:
7779
csv_coder = taxi.make_csv_coder(schema)
@@ -97,13 +99,10 @@ def process_tfma(eval_result_dir,
9799
_ = (
98100
raw_data
99101
| 'ToSerializedTFExample' >> beam.Map(coder.encode)
100-
| 'EvaluateAndWriteResults' >> tfma.EvaluateAndWriteResults(
101-
eval_saved_model_path=eval_model_dir,
102+
|
103+
'ExtractEvaluateAndWriteResults' >> tfma.ExtractEvaluateAndWriteResults(
104+
eval_shared_model=eval_shared_model,
102105
slice_spec=slice_spec,
103-
add_metrics_callbacks=[
104-
post_export_metrics.calibration_plot_and_prediction_histogram(),
105-
post_export_metrics.auc_plots()
106-
],
107106
output_path=eval_result_dir))
108107

109108

@@ -130,8 +129,7 @@ def main():
130129
default=None,
131130
type=int)
132131
parser.add_argument(
133-
'--schema_file',
134-
help='File holding the schema for the input data')
132+
'--schema_file', help='File holding the schema for the input data')
135133

136134
known_args, pipeline_args = parser.parse_known_args()
137135

examples/chicago_taxi/setup.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,22 +14,22 @@
1414
"""Setup dependencies for local and cloud deployment."""
1515
import setuptools
1616

17-
TF_VERSION = '1.9.0'
17+
TF_VERSION = '1.11.0'
1818

1919
if __name__ == '__main__':
2020
setuptools.setup(
2121
name='tfx_chicago_taxi',
22-
version='0.9.2',
22+
version='0.11.0',
2323
packages=setuptools.find_packages(),
2424
install_requires=[
25-
'apache-beam[gcp]==2.6.0',
25+
'apache-beam[gcp]==2.8.0',
2626
'jupyter==1.0',
2727
'numpy==1.13.3',
2828
'protobuf==3.6.0',
29-
'tensorflow=='+TF_VERSION,
30-
'tensorflow-data-validation==0.9.0',
29+
'tensorflow==' + TF_VERSION,
30+
'tensorflow-data-validation==0.11.0',
3131
'tensorflow-metadata==0.9.0',
32-
'tensorflow-model-analysis==0.9.2',
33-
'tensorflow-serving-api==1.9.0',
32+
'tensorflow-model-analysis==0.11.0',
33+
'tensorflow-serving-api==1.11.0',
3434
'tensorflow-transform==0.11.0',
3535
])

examples/chicago_taxi/start_model_server_mlengine.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,9 @@ gsutil ls $WORKING_DIR/serving_model_dir/export/chicago-taxi/
3131
MODEL_BINARIES=$(gsutil ls $WORKING_DIR/serving_model_dir/export/chicago-taxi/ \
3232
| sort | grep '\/[0-9]*\/$' | tail -n1)
3333

34+
TF_VERSION=1.10
35+
3436
gcloud ml-engine versions create v1 \
3537
--model chicago_taxi \
3638
--origin $MODEL_BINARIES \
37-
--runtime-version 1.6
39+
--runtime-version $TF_VERSION

examples/chicago_taxi/train_mlengine.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,10 +61,12 @@ EVAL_FILE=$TFT_OUTPUT_PATH/train_transformed-*
6161
TRAIN_STEPS=100000
6262
EVAL_STEPS=1000
6363

64+
TF_VERSION=1.10
65+
6466
gcloud ml-engine jobs submit training $JOB_ID \
6567
--stream-logs \
6668
--job-dir $MODEL_DIR \
67-
--runtime-version 1.9 \
69+
--runtime-version $TF_VERSION \
6870
--module-name trainer.task \
6971
--package-path trainer/ \
7072
--region us-central1 \

g3doc/get_started.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,13 +110,15 @@ results. The results can be loaded for visualization using
110110

111111
```python
112112
# To run the pipeline.
113+
eval_shared_model = tfma.default_eval_shared_model(
114+
model_path='/path/to/eval/saved/model')
113115
with beam.Pipeline(runner=...) as p:
114116
_ = (p
115117
# You can change the source as appropriate, e.g. read from BigQuery.
116118
| 'ReadData' >> beam.io.ReadFromTFRecord(data_location)
117119
| 'ExtractEvaluateAndWriteResults' >>
118120
tfma.ExtractEvaluateAndWriteResults(
119-
eval_saved_model_path='/path/to/eval/saved/model',
121+
eval_shared_model=eval_shared_model,
120122
output_path='/path/to/output',
121123
display_only_data_location=data_location))
122124

g3doc/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,8 @@ combinations may also work.
5858

5959
|tensorflow-model-analysis |tensorflow |apache-beam[gcp]|
6060
|---------------------------|--------------|----------------|
61-
|GitHub master |1.9 |2.6.0 |
61+
|GitHub master |1.11 |2.8.0 |
62+
|0.11.0 |1.11 |2.8.0 |
6263
|0.9.2 |1.9 |2.6.0 |
6364
|0.9.1 |1.10 |2.6.0 |
6465
|0.9.0 |1.9 |2.5.0 |

0 commit comments

Comments
 (0)