User Guidelines

Download PDF

1. The purpose of Open Data Pakistan (ODP) is to create a culture of sharing data in an open format for the public good.

2. Only registered and approved organizations are able to access the platform as organization administrators to share data.

3. Please contact Open Data Pakistan to register as an organization through the connect page.

4. Organization members can:

View the organization’s private datasets.

5. Organization editors can do everything as member plus:

Add new datasets to the organization
Edit or delete any of the organization’s datasets
Make datasets public or private.

6. Organization administrators can do everything as editor plus:

Add users to the organization, and choose whether to make the new user a member, editor or admin
Change the role of any user in the organization, including other admin users
Remove members, editors or other admins from the organization
Edit the organization itself (for example: change the organization’s title, description or image)
Delete the organization

7. Guidelines for sharing sensitive or private information and protected datasets

a) As part of the publishing process, data can be classified as per the following[1]:

Level 1 -Public. Data available for public access or use.
Level 2 - Internal Use. Routine operating information for internal use; it is not proactively shared with the public. Use of level 2 data is intended for employees or a closed group in private mode. Certain data may be made available to external parties upon their request.
Level 3 -Sensitive. Data regulated by legal regulations and privacy laws, or agreements such as contracts with non-disclosure agreements or other terms and conditions.
Level 4 -Protected. Data that requires notifications to affected parties in case of a security breach.
Level 5 - Restricted. Data with high impact and threat to human life or risk of an epidemic or catastrophic loss of major assets. This data must be verified by leadership or the concerned authorities for its classification.

b) PII – “personally identifiable information refers to information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual. The definition of PII is not anchored to any single category of information or technology. Rather, it requires a case-by-case assessment of the specific risk that an individual can be identified. In performing this assessment, it is important for an organization administrator to recognize that non-PII can become PII whenever additional information is made publicly available (in any medium and from any source) that, when combined with other available information, could be used to identify an individual.”[2]

c) PII data cannot be shared on the ODP.

d) Data must be anonymized and aggregated to prevent release of sensitive, protected, and highly restricted data.

e) Organizations may strategically publish private or sensitive data by performing the following methods[3].

Method	What it is	Best for
Column Removal	Remove the privacy implicating columns. The simplest way to avoid any privacy issues, is to simply not publish the columns, which include private data. For example, if a dataset is a list of users and includes their name, address or other information, you can simply remove those columns from the dataset.	Datasets with private or personal information that is not necessary for consuming and understanding the data.
Obfuscation	Mask or transcribe the data. Obfuscation can happen in a number of ways, but a common case is with address data. Sometimes we want to retain a proxy of the address without aggregating the data.	Datasets with private or personal information that is not necessary for consuming and understanding the data.
Banding	Group the data. Banding is a way to obscure individual values. For example, instead of publishing age, you can publish age group. Other examples of banding include time (date to month to quarter).	Datasets where individual record data is important to publish but where too much detail can make it easy to identify individuals with uncommon mixes of characteristics.
Aggregation	Summarize the data based on a data property. Sometimes de-identifying the data is not sufficient. Your data might need to be aggregated either by geography or some other factor such as a category in the dataset.	Datasets where the individual records pose a privacy risk even if the identifying columns are removed. A common example of this is health related data. If the individual records (rows) are important to publish, use one of the other methods.

8. All data shared on this portal must be legally and ethically acquired, and properly sourced. Open licensing should be applied where possible.

9. Duplication of data should be avoided by checking open data already available on the portal.

10. Organizations can share data within their organization’s network of members privately or can publish data publicly. Private data will never be shared with the public.

11. Users can share all types of data, raw, aggregated, structured, curated, uncurated, however, data in an open and machine readable format is preferable.

12. Data in open format is a file format with no restrictions, monetary or otherwise, placed upon its use and can be fully processed with at least one free open-source software tool.

13. Machine readable data is one that can be automatically read and processed by a computer, such as CSV, JSON, XML, etc. Machine-readable data must be structured data.[4]

14. Open Data Pakistan supports the following open file formats:

Type	Media	Description
CSV	Text	Comma-separated values
JSON	Text	JavaScript Object Notation
PDF	Binary	Portable Document Format
RDF	Text	Resource Description Framework
RSS	Text	RDF Site Summary/Really Simple Syndication
XLS	Binary	Microsoft Excel
XLSX	Binary	Microsoft Excel Open XML
XML	Text	Extensible Markup Language
ZIP	Binary	Typically contains a shapefile set (SHP, SHX, DBF)

15. Data will be held for an indefinite period unless the data administrator deletes it or there is an exceptional request to delete the data.

16. Key to additional information or metadata for adding a dataset[5]:

Term	Definition	Comments
Title	A name given to the resource.
Description	An account of the resource.	Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource.
Category	Topic, sector or theme of the resource.	Agriculture, Food & Forests Cities & Regions Connectivity Culture Demography Economy & Finance Education Environment & Energy Government & Public Sector Health Housing & Public Services Manufacturing Public Safety Science & Technology
Tags	The topics of the resource.	Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary.
Temporal coverage	Time period of resource. A point or period of time associated with an event in the lifecycle of the resource.	Date may be used to express temporal information at any level of granularity. Example, month year to month year pertaining to the data variables in the dataset.
Spatial coverage	Location of resource. The spatial applicability of the resource, or the jurisdiction under which the resource is relevant.	Spatial coverage may be used to express spatial information at any level of granularity. Spatial topic and spatial applicability may be a named place or a location specified by its geographic coordinates.
Organization name	Name of organization publishing the resource.
Organization type		Federal government Provincial government Local government Education Private NGO Other
Dataset type		Geospatial Non-geospatial
Date created	Date resource was created.
Last updated	Date resource was last modified.
Source	Original source or link where resource was originally published or produced
Author	An entity (person, department or organization) primarily responsible for creating or producing the dataset. This could be the same as the publishing organization.
Publisher	An entity (person, department or organization) responsible for making the resource available. This could be the same or different from the author of the dataset.
Maintainer	Second point of contact responsible for the data.

17. Connect with us through the connect page if:

You want to share a data story
You want to register your organization
You become aware of sensitive or private data that should not be shared publicly
You become aware of duplication of data
Data has been shared by a third party source, and the original source disagrees or has reservations, with a request to delete it.

18. The employees associated with ODP do not endorse or agree with the opinions expressed in the data shared on the portal

19. ODP can modify these terms or apply additional terms to reflect the changes.

20. These user guidelines are licensed under Creative Commons Attribution-ShareAlike 4.0 International License.

[1] This is inspired from San Francisco’s data classification standards https://sfcoit.org/sites/default/files/2019-09/DataClassificationStandard_FINAL_DRAFT.pdf

[2] Open Data Policy M-13-13 https://obamawhitehouse.archives.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf

[3] https://datasf.org/publishing/guidelines/

[4] http://opendatahandbook.org/glossary/en/

[5] http://www.dublincore.org/documents/dces/?1401215562628